## JSON

# JSON - Javascript Object Notation
#### Invented by Douglas Crockford when working at Yahoo in early 2000s.

* Goal - Human Readable, Machine Parsable

* Specification: https://www.json.org/

JSON — short for JavaScript Object Notation — format for sharing data. 

JSON is derived from the JavaScript programming language

Available for use by many languages including Python 

usually file extension is .json when stored



In [2]:
# Sample JSON below from https://json.org/example.html
# Question why is Syntax highlighting working properly ? :)

In [1]:
{"widget": {
    "debug": "on",
    "window": {
        "title": "Sample Konfabulator Widget",
        "name": "main_window",
        "width": 500,
        "height": 500
    },
    "image": { 
        "src": "Images/Sun.png",
        "name": "sun1",
        "hOffset": 250,
        "vOffset": 250,
        "alignment": "center"
    },
    "text": {
        "data": "Click Here",
        "size": 36,
        "style": "bold",
        "name": "text1",
        "hOffset": 250,
        "vOffset": 100,
        "alignment": "center",
        "onMouseUp": "sun1.opacity = (sun1.opacity / 100) * 90;"
    }
}}    


{'widget': {'debug': 'on',
  'window': {'title': 'Sample Konfabulator Widget',
   'name': 'main_window',
   'width': 500,
   'height': 500},
  'image': {'src': 'Images/Sun.png',
   'name': 'sun1',
   'hOffset': 250,
   'vOffset': 250,
   'alignment': 'center'},
  'text': {'data': 'Click Here',
   'size': 36,
   'style': 'bold',
   'name': 'text1',
   'hOffset': 250,
   'vOffset': 100,
   'alignment': 'center',
   'onMouseUp': 'sun1.opacity = (sun1.opacity / 100) * 90;'}}}

In [2]:
# if this was string starting with { it would be our json
mydata = {
    "firstName": "Jane",
    "lastName": "Doe",
    "hobbies": ["running", "sky diving", "dancing"],
    "age": 43,
    "children": [
        {
            "firstName": "Alice",
            "age": 7
        },
        {
            "firstName": "Bob",
            "age": 13
        }
    ]
}

In [3]:
type(mydata)

dict

In [5]:
print(mydata)

{'firstName': 'Jane', 'lastName': 'Doe', 'hobbies': ['running', 'sky diving', 'dancing'], 'age': 43, 'children': [{'firstName': 'Alice', 'age': 7}, {'firstName': 'Bob', 'age': 13}]}


In [6]:
mylist = list(range(10))
print(mylist)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


The process of encoding JSON is usually called serialization. This term refers to the transformation of data into a series of bytes (hence serial) to be stored or transmitted across a network. You may also hear the term marshaling, but that’s a whole other discussion. Naturally, deserialization is the reciprocal process of decoding data that has been stored or delivered in the JSON standard.

All we’re talking about here is reading and writing. Think of it like this: encoding is for writing data to disk, while decoding is for reading data into memory.
 https://realpython.com/python-json/

In [7]:
import json

In [8]:
with open("data_file.json", mode="w") as write_file:
    json.dump(mydata, write_file)

In [9]:
with open("numbers.json", mode="w") as write_file:
    json.dump(mylist, write_file)

In [12]:
# use json string in our program
json_string = json.dumps(mydata)
print(json_string)

{"firstName": "Jane", "lastName": "Doe", "hobbies": ["running", "sky diving", "dancing"], "age": 43, "children": [{"firstName": "Alice", "age": 7}, {"firstName": "Bob", "age": 13}]}


In [11]:
print(mydata)

{'firstName': 'Jane', 'lastName': 'Doe', 'hobbies': ['running', 'sky diving', 'dancing'], 'age': 43, 'children': [{'firstName': 'Alice', 'age': 7}, {'firstName': 'Bob', 'age': 13}]}


In [13]:
# Convert Json_string back to our Python Object
my_obj = json.loads(json_string)
my_obj

{'firstName': 'Jane',
 'lastName': 'Doe',
 'hobbies': ['running', 'sky diving', 'dancing'],
 'age': 43,
 'children': [{'firstName': 'Alice', 'age': 7},
  {'firstName': 'Bob', 'age': 13}]}

In [14]:
newlist = json.loads('[1,3,5,"Valdis"]')
newlist

[1, 3, 5, 'Valdis']

In [18]:
badlist = json.loads('[1,3,5,"Vald"]')
badlist

[1, 3, 5, 'Vald']

In [19]:
type(json_string)

str

In [None]:
# Avove example JSON and Python object have the same syntax but there are some differences

![object](https://www.json.org/object.gif)

![Array](https://www.json.org/array.gif)

![Value](https://www.json.org/value.gif)

Simple Python objects are translated to JSON according to a fairly intuitive conversion.

Python	JSON

dict	object

list, tuple	array

str	string

int, long, 

float	number

True	true

False	false

None	null

In [23]:
newlist = json.loads('[true,2,null, false, 555.333]')
newlist

[True, 2, None, False, 555.333]

In [24]:
# The first option most people want to change is whitespace. You can use the indent keyword argument to specify the indentation size for nested structures. Check out the difference for yourself by using data, which we defined above, and running the following commands in a console:

json.dumps(mydata)


'{"firstName": "Jane", "lastName": "Doe", "hobbies": ["running", "sky diving", "dancing"], "age": 43, "children": [{"firstName": "Alice", "age": 7}, {"firstName": "Bob", "age": 13}]}'

In [25]:
# very useful for visibility!
print(json.dumps(mydata, indent=4))

{
    "firstName": "Jane",
    "lastName": "Doe",
    "hobbies": [
        "running",
        "sky diving",
        "dancing"
    ],
    "age": 43,
    "children": [
        {
            "firstName": "Alice",
            "age": 7
        },
        {
            "firstName": "Bob",
            "age": 13
        }
    ]
}


In [26]:
with open("data_file.json", "w") as write_file:
    json.dump(mydata, write_file, indent=4)

In [33]:
with open("data_file.json", "r") as read_file:
    data = json.load(read_file)
data

[{'firstName': 'Jane',
  'lastName': 'Doe',
  'hobbies': ['running', 'sky diving', 'dancing'],
  'age': 43,
  'children': [{'firstName': 'Alice', 'age': 7},
   {'firstName': 'Bob', 'age': 13}]},
 555]

In [34]:
type(data)

list

In [31]:
len(data)

1

In [36]:
type(data[0]), type(data[1])

(dict, int)

Keep in mind that the result of this method could return any of the allowed data types from the conversion table. This is only important if you’re loading in data you haven’t seen before. In most cases, the root object will be a dict or a list.

If you've gotten JSON data in from another program or have otherwise obtained a string of JSON formatted data in Python, you can easily deserialize that with loads(), which naturally loads from a string:

In [37]:
json_string = """
{
    "researcher": {
        "name": "Ford Prefect",
        "species": "Betelgeusian",
        "relatives": [
            {
                "name": "Zaphod Beeblebrox",
                "species": "Betelgeusian"
            }
        ]
    }
}
"""
data = json.loads(json_string)
data

{'researcher': {'name': 'Ford Prefect',
  'species': 'Betelgeusian',
  'relatives': [{'name': 'Zaphod Beeblebrox', 'species': 'Betelgeusian'}]}}

In [39]:
# get value of relative's name
data['researcher']

{'name': 'Ford Prefect',
 'species': 'Betelgeusian',
 'relatives': [{'name': 'Zaphod Beeblebrox', 'species': 'Betelgeusian'}]}

In [40]:
# get value of relative's name
data['researcher']['relatives']

[{'name': 'Zaphod Beeblebrox', 'species': 'Betelgeusian'}]

In [43]:
# get value of relative's name
data['researcher']['relatives'][0]

{'name': 'Zaphod Beeblebrox', 'species': 'Betelgeusian'}

In [44]:
# get value of relative's name
data['researcher']['relatives'][0]['name']

'Zaphod Beeblebrox'

In [46]:
data['researcher']['relatives'][0]['name'].split()[0]

'Zaphod'

In [47]:
data['researcher']['relatives'][0]['name'].split()[0][:4]

'Zaph'

In [15]:
type(data)

dict

In [48]:
import json
import requests

In [4]:
## Lets get some data https://jsonplaceholder.typicode.com/

In [55]:
response = requests.get("https://jsonplaceholder.typicode.com/todos")
if response.status_code != 200:
    print("Bad Response: ", response.status_code)
print(response.status_code)
todos = json.loads(response.text)


200


can open https://jsonplaceholder.typicode.com/todos in regular browser too..

In [56]:
type(todos)

list

In [57]:
len(todos)

200

In [58]:
todos[:10]

[{'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False},
 {'userId': 1,
  'id': 2,
  'title': 'quis ut nam facilis et officia qui',
  'completed': False},
 {'userId': 1, 'id': 3, 'title': 'fugiat veniam minus', 'completed': False},
 {'userId': 1, 'id': 4, 'title': 'et porro tempora', 'completed': True},
 {'userId': 1,
  'id': 5,
  'title': 'laboriosam mollitia et enim quasi adipisci quia provident illum',
  'completed': False},
 {'userId': 1,
  'id': 6,
  'title': 'qui ullam ratione quibusdam voluptatem quia omnis',
  'completed': False},
 {'userId': 1,
  'id': 7,
  'title': 'illo expedita consequatur quia in',
  'completed': False},
 {'userId': 1,
  'id': 8,
  'title': 'quo adipisci enim quam ut ab',
  'completed': True},
 {'userId': 1,
  'id': 9,
  'title': 'molestiae perspiciatis ipsa',
  'completed': False},
 {'userId': 1,
  'id': 10,
  'title': 'illo est ratione doloremque quia maiores aut',
  'completed': True}]

In [23]:
myl = [('Valdis', 40), ('Alice',35), ('Bob', 23),('Carol',70)]

In [None]:
# Lambda = anonymous function

In [None]:
def myfun(el):
    return el[1]
# same as myfun = lambda el: el[1]

In [28]:
sorted(myl, key = lambda el: el[1], reverse=True)

[('Carol', 70), ('Valdis', 40), ('Alice', 35), ('Bob', 23)]

In [None]:
# Exercise find out top 3 users with most tasks completed!

# TIPS
# we need some sort of structure to store these user results before finding out top 3
# at least two good data structure choices here :)
# here the simplest might actually be the best if we consider userId values


In [59]:
todos[0]

{'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}

In [60]:
todos[0]['userId']

1

In [61]:
todos[0]['completed']

False

In [62]:
# Here we create a new dictionary and and count the completed works by id
newdict = {}
for todo in todos:
    if todo['completed'] == True:
        if todo['userId'] in newdict:
            newdict[todo['userId']] += 1
        else:
            newdict[todo['userId']] = 1

In [63]:
newdict

{1: 11, 2: 8, 3: 7, 4: 6, 5: 12, 6: 6, 7: 9, 8: 11, 9: 8, 10: 12}

In [64]:
sorted(newdict.items())

[(1, 11),
 (2, 8),
 (3, 7),
 (4, 6),
 (5, 12),
 (6, 6),
 (7, 9),
 (8, 11),
 (9, 8),
 (10, 12)]

In [68]:
bestworkers = sorted(newdict.items(), key=lambda el: el[1], reverse=True)
bestworkers[:3]

[(5, 12), (10, 12), (1, 11)]

In [29]:
users = [ el['userId'] for el in todos]
len(users),users[:15]

(200, [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [30]:
uniqusers = set(users)
uniqusers

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

In [32]:
# dictionary comprehension but could live without one
users = { el['userId'] : 0 for el in todos} 

In [33]:
users

{1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0}

In [35]:
users.keys()

dict_keys([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

In [None]:
users.value

In [None]:
#{'completed': True,
# 'id': 8,
#  'title': 'quo adipisci enim quam ut ab',
#  'userId': 1}

In [36]:
#idiomatic
for el in todos:
    users[el['userId']] += el['completed'] # Boolean False is 0 True is 1 obviously this might not be too readable

In [35]:
# same as above could be useful in more complicated cases
for el in todos:
    if el['completed'] == True:
        users[el['userId']] += 1

In [None]:
# there could be a one liner or a solution with from collections import Counter

In [36]:
users.items()

dict_items([(1, 11), (2, 8), (3, 7), (4, 6), (5, 12), (6, 6), (7, 9), (8, 11), (9, 8), (10, 12)])

In [39]:
list(users.items())

[(1, 11),
 (2, 8),
 (3, 7),
 (4, 6),
 (5, 12),
 (6, 6),
 (7, 9),
 (8, 11),
 (9, 8),
 (10, 12)]

In [37]:
userlist=list(users.items())

In [38]:
type(userlist[0])

tuple

In [40]:
# we pass a key anonymous(lambda) function
sorted(userlist, key=lambda el: el[1], reverse=True)[:3]

[(5, 12), (10, 12), (1, 11)]

In [46]:
# lets try a simple way

In [48]:
mylist=[0]
mylist*=11

In [49]:
for el in todos:
    if el['completed'] == True:
        mylist[el['userId']] +=1

In [50]:
mylist

[0, 11, 8, 7, 6, 12, 6, 9, 11, 8, 12]

In [51]:
mylist.index(max(mylist))

5

In [None]:
# kind of hard to get more values need to get tricky

# How about Pandas and Json ?

In [69]:
import pandas as pd

In [70]:
df = pd.read_json('https://jsonplaceholder.typicode.com/todos')

In [71]:
df

Unnamed: 0,completed,id,title,userId
0,False,1,delectus aut autem,1
1,False,2,quis ut nam facilis et officia qui,1
2,False,3,fugiat veniam minus,1
3,True,4,et porro tempora,1
4,False,5,laboriosam mollitia et enim quasi adipisci qui...,1
5,False,6,qui ullam ratione quibusdam voluptatem quia omnis,1
6,False,7,illo expedita consequatur quia in,1
7,True,8,quo adipisci enim quam ut ab,1
8,False,9,molestiae perspiciatis ipsa,1
9,True,10,illo est ratione doloremque quia maiores aut,1


In [13]:
df.groupby(['userId'])['completed'].sum()

userId
1     11.0
2      8.0
3      7.0
4      6.0
5     12.0
6      6.0
7      9.0
8     11.0
9      8.0
10    12.0
Name: completed, dtype: float64

In [15]:
df.groupby(['userId'])['completed'].sum().sort_values()

userId
4      6.0
6      6.0
3      7.0
2      8.0
9      8.0
7      9.0
1     11.0
8     11.0
5     12.0
10    12.0
Name: completed, dtype: float64

In [16]:
df.groupby(['userId'])['completed'].sum().sort_values(ascending=False)

userId
10    12.0
5     12.0
8     11.0
1     11.0
7      9.0
9      8.0
2      8.0
3      7.0
6      6.0
4      6.0
Name: completed, dtype: float64

# Exercise Find Public JSON API get data and convert it into Pandas DataFrame

## Many possible sources

https://github.com/toddmotto/public-apis
    
### You want the ones without authorization and WITH CORS unless you are feeling adventurous and want to try with auth



In [None]:
## For authorization you generally need some sort of token(key)
# One example for zendesk API  https://develop.zendesk.com/hc/en-us/community/posts/360001652447-API-auth-in-python


# For an API token, append '/token' to your username and use the token as the password:
## This will not work for those without zendesk access token

url = 'https://your_subdomain.zendesk.com/api/v2/users/123.json'
r = requests.get(url, auth=('user@example.com/token', 'your_token'))
# For an OAuth token, set an Authorization header:

bearer_token = 'Bearer ' + access_token
header = {'Authorization': bearer_token}
url = 'https://your_subdomain.zendesk.com/api/v2/users/123.json'
r = requests.get(url, headers=header)