<a href="https://colab.research.google.com/github/kovacova/random-magic/blob/master/Practice%20%26%20Tutorials/JSON%20Dictionary%20Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# JSON Dictionary

*Last modified: 27/12/18*

JSON, or JavaScript Object Notation, is a minimal, readable format for structuring data. It is used primarily to transmit data between a server and web application, as an alternative to XML.

### Keys and Values
The two primary parts that make up JSON are keys and values. Together they make a key/value pair.

**Key:** A key is always a string enclosed in quotation marks.

**Value:** A value can be a string, number, boolean expression, array, or object.

**Key/Value Pair: **A key value pair follows a specific syntax, with the key followed by a colon followed by the value. Key/value pairs are comma separated.

```
# JSON ARRAY SAMPLE
"foo" : {
  "bar" : "Hello",
  "baz" : [ "quuz", "norf" ]
}```

### Types of Values
* **Array:** An associative array of values.
* **Boolean:** True or false.
* **Number:** An integer.
* **Object:** An associative array of key/value pairs. A value can be *an object*.
* **String:** Several plain text characters which usually form a word.

### Tutorials & Resources

[TDS Tutorial on Python Dictionaries & JSON format](https://towardsdatascience.com/master-python-through-building-real-world-applications-part-1-b040b2b7faad)

[JSON Tutorial](https://developers.squarespace.com/what-is-json/)

In [0]:
import json
!wget https://raw.githubusercontent.com/kovacova/datasets/master/dictionary.json -q

In [40]:
data = json.load(open('dictionary.json'))
type(data)

# print(data) 📌 Causes "Rate Exceeded" error and one beautiful day I will learn how to fix it

dict

## Creating a Function & Checking for Non-Existing Words with If-Else Statement

In [0]:
def retrieve_definition(word):
  # First we remove the case sensitivity with .lower() 
  # For example, "Rain" and "rain" will give the same output
  word = word.lower()
  
  # Check for non-existing words
  if word in data:
    return data[word]
  # 1st elif: To make sure the program return the definition of words that start with a capital letter (e.g. Delhi, Texas)
  elif word.title() in data:
    return data[word.title()]
  # 2nd elif: To make sure the program return the definition of acronyms (e.g. USA, NATO)
  elif word.upper() in data:
    return data[word.upper()]
  else:
    return "The word doesn't exist, please double check it"

In [42]:
# Input from User
word_user = input("Enter a word: ")

print(retrieve_definition(word_user))

Enter a word: Tina
['An addictive psychoactive drug of formula C₁₀H₁₅N.']


## Closest Word Matching 

Now, to if the user has made a typo while entering a word, we might want to suggest the closest word and ask them if they want the meaning of this word instead. We can do that with Python’s library **difflib**. 

We will test 2 ways of doing so.

In [0]:
import difflib

from difflib import SequenceMatcher
from difflib import get_close_matches

#### Method 1: Sequence Match

In the Sequence Matcher, the first parameter is 'Junk' which includes white spaces, blank lines and so onself. Second and third parameters are the words you want to find similarities in-between. Ratio is used to find how close those two words are in numerical terms.

In [44]:
method_1 = SequenceMatcher(None, "rainn", "rain").ratio()
print(method_1)

# Note to my future self - if I forget the brackets in the .ratio, I will get this ugly printout:
# <bound method SequenceMatcher.ratio of <difflib.SequenceMatcher object at 0x7f1006d7a630>>

0.8888888888888888


#### Method 2 :  Get Close Matches

The method works as follows, the first parameter is, of course, the word for which you want to find close matches. The second parameter is a list of words to match against.

The basic template of this function is as follows:

**get_close_matches(word, posibilities, n=3, cutoff=0.66)** 
* First parameter is of course the word for which you want to find close matches
* Second is a list of sequences against which to match the word
* [optional] Third is maximum number of close matches
* [optional] Where to stop considering a word as a match (0.99 being the closest to word while 0.0 being otherwise)

In [45]:
method_2 = get_close_matches("rain", ["help", "mate", "rainy"], n=1, cutoff = 0.75)
print(method_2)

['rainy']


#### A Helper "Did You Mean This Instead?" Function

It is first checking for the length of the close matches it got because we can print only if the word has 1 or more close matches. Get close matches function takes the word the user has entered as the first parameter and our whole data set to match against that word. Here, the key is the words in our data and value is their definition, as we learned it earlier. The [0] in return statement indicates the first close match of all matches.

In [0]:
def retrieve_definition(word):
  # First we remove the case sensitivity with .lower() 
  # For example, "Rain" and "rain" will give the same output
  word = word.lower()
  
  # Check for non-existing words
  if word in data:
    return data[word]
  # 1st elif: To make sure the program return the definition of words that start with a capital letter (e.g. Delhi, Texas)
  elif word.title() in data:
    return data[word.title()]
  # 2nd elif: To make sure the program return the definition of acronyms (e.g. USA, NATO)
  elif word.upper() in data:
    return data[word.upper()]
  elif len (get_close_matches(word, data.keys())) > 0:
    return ("Did you mean %s instead?" % get_close_matches(word, data.keys())[0])
  else:
    return "The word doesn't exist, please double check it"

In [49]:
word_user = input("Enter a word: ")

print(retrieve_definition(word_user))

Enter a word: horrible
Did you mean horribly instead?
