# JSON

## Lesson Objectives

1. Understand what JSON is.
2. Understand the structure of JSON.
3. Use JSON in Python.
4. Use JSON as a data interchange format from APIs
5. JSON in Digital Discourse

## Brief Introduction

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of  Python and also of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, and many others. These properties make JSON an ideal data-interchange language.

## JSON in Digital Discourse

JSON is used in many places in digital discourse. It is used in APIs to send and receive data from platforms such as Twitter, Facebook, and Google. It is used in web development to store and send data between the client and server. It is used in data storage and retrieval in databases. It is used in configuration files for applications. It is used in many other places as well.

Many popular social media and communication platforms such as Twitter, Reddit, Mastodon, Telegram, and others provide APIs that allow developers to interact with their platforms programmatically. These APIs return data in JSON format. This data can be used to build applications that interact with these platforms.

Sadly, many of these platforms have shifted to only providing high cost access to their APIs. This has made it difficult for students and researchers to access data from these platforms. However, there are still some platforms that provide free access to their APIs. 

##  JSON - Javascript Object Notation - What is it?

### Invented by Douglas Crockford when working at Yahoo in early 2000s.

* Goal - Human Readable, Machine Parsable

* Specification: https://www.json.org/

JSON — short for JavaScript Object Notation — format for sharing data.

JSON is derived from the JavaScript programming language

Available for use by many languages including Python

usually file extension is .json when stored



In [1]:
# Sample JSON below from https://json.org/example.html
# Question why is Syntax highlighting working properly ? :)

In [2]:
# what data structure in Python does this look like?
{"widget": {
    "debug": "on",
    "window": {
        "title": "Sample Konfabulator Widget",
        "name": "main_window",
        "width": 500,
        "height": 500
    },
    "image": {
        "src": "Images/Sun.png",
        "name": "sun1",
        "hOffset": 250,
        "vOffset": 250,
        "alignment": "center"
    },
    "text": {
        "data": "Click Here",
        "size": 36,
        "style": "bold",
        "name": "text1",
        "hOffset": 250,
        "vOffset": 100,
        "alignment": "center",
        "onMouseUp": "sun1.opacity = (sun1.opacity / 100) * 90;"
    }
}}


{'widget': {'debug': 'on',
  'window': {'title': 'Sample Konfabulator Widget',
   'name': 'main_window',
   'width': 500,
   'height': 500},
  'image': {'src': 'Images/Sun.png',
   'name': 'sun1',
   'hOffset': 250,
   'vOffset': 250,
   'alignment': 'center'},
  'text': {'data': 'Click Here',
   'size': 36,
   'style': 'bold',
   'name': 'text1',
   'hOffset': 250,
   'vOffset': 100,
   'alignment': 'center',
   'onMouseUp': 'sun1.opacity = (sun1.opacity / 100) * 90;'}}}

In [3]:
# if this was string starting with { it would be our json
mydata = {
    "firstName": "Jane",
    "lastName": "Doe",
    "hobbies": ["running", "sky diving", "dancing"],
    "age": 43,
    "children": [
        {
            "firstName": "Alice",
            "age": 7
        },
        {
            "firstName": "Bob",
            "age": 13
        }
    ]
}

In [4]:
# now mydata is just a Python dictionary nothing to do with JSON
# here we converted JSON text into Python dictionary
type(mydata)

dict

In [None]:
# reminder that Python dictionary is just a collection of unordered key value pairs

In [5]:
print(mydata) # this is not JSON anymore this is just a Python dictionary(could have come from another source not only JSON)

{'firstName': 'Jane', 'lastName': 'Doe', 'hobbies': ['running', 'sky diving', 'dancing'], 'age': 43, 'children': [{'firstName': 'Alice', 'age': 7}, {'firstName': 'Bob', 'age': 13}]}


In [None]:
# so mydata is just a Python dictionary nothing to do with JSON anymore
# mydata contains keys and values
# some of the values are strings, some are integers, some are lists, some are dictionaries
# and some lists contain dictionaries and some dictionaries contain lists
# it is very common for data obtained from JSON to be nested - hierarchically structured

In [6]:
mydata['children'] # we use key to get value in this case we get a list

[{'firstName': 'Alice', 'age': 7}, {'firstName': 'Bob', 'age': 13}]

In [7]:
type(mydata['children']) # when you see square brackets in json expect a list in Python

list

In [8]:
# we can get the first child from the list
mydata['children'][0] # remember indexing starts with 0

{'firstName': 'Alice', 'age': 7}

In [9]:
type(mydata['children'][0]) # so first item in the list is another dictionary

dict

In [10]:
mydata['children'][0]['age'] # we can get the age of the first child by using key

7

In [None]:
# we could save this data into a variable and so on
alices_age = mydata['children'][0]['age']
print(f"Alice is {alices_age} years old")

Alice is 7 years old


In [12]:
mydata.keys()  # outer dictionary has 5 keys

dict_keys(['firstName', 'lastName', 'hobbies', 'age', 'children'])

In [13]:
mydata['children'][0].keys() # inner dictionary has 2 keys

dict_keys(['firstName', 'age'])

In [14]:
mydata['children'][0]['firstName']
# so how do we know there is name, well we had to investiage the structure first


'Alice'

In [None]:
mydata['children'][0].keys()

dict_keys(['firstName', 'age'])

In [15]:
# so for random dictionary we could have printed everything for 2nd child with
for key, value in mydata['children'][1].items(): # 1 because indexing starts with 0
    print(key, "::", value)

firstName :: Bob
age :: 13


In [None]:
for key, value in mydata.items():
    print(key, "::", value)

firstName :: Jane
lastName :: Doe
hobbies :: ['running', 'sky diving', 'dancing']
age :: 43
children :: [{'firstName': 'Alice', 'age': 7}, {'firstName': 'Bob', 'age': 13}]


In [None]:
# so one of the issues with dealing with data extraced from deeply nested JSON is that
# you have to know the structure of the data before you can access it
# even then you can't flatten it out into a table without some concessions

In [None]:
mydata['children'][-1]['age']

13

In [18]:
# remember for dictionaries get method will not throw an error if key is not found
mydata.get('hobbies') # get has the default value None if the key is not found

['running', 'sky diving', 'dancing']

In [14]:
try:
    print(mydata['car'])
except KeyError as e:
    print("KeyError:", e)

KeyError: 'car'


In [20]:
mydata.get("car") # gives us None if not found

In [None]:
# JSON does not require any specific order of keys
# no specific structure of data - convenient but not always easy to work with

In [21]:
mydata.get('hobbies')[-1],mydata['hobbies'][-1],mydata['hobbies'][2]  # so 3 ways to access the same item

('dancing', 'dancing', 'dancing')

In [None]:
# list has no get [1,2,3].get(2) get is dictionary specific not for lists!

In [None]:
# until now we have been dealing with data stored in a dictionary
# now we are going to write data to a file in JSON format

## Serialization - encoding Python data to JSON


The process of encoding JSON is usually called serialization. This term refers to the transformation of data into a series of bytes (hence serial) to be stored or transmitted across a network. You may also hear the term marshaling, but that’s a whole other discussion. Naturally, deserialization is the reciprocal process of decoding data that has been stored or delivered in the JSON standard.

All we’re talking about here is reading and writing. Think of it like this: encoding is for writing data to disk, while decoding is for reading data into memory.
 https://realpython.com/python-json/

In [15]:
mydata # simply a PYthon dictionary with some lists inside etc
# theoretically I could copy and paste the output here into a text file and it would be valid JSOn
# however that is not always going to be the case due to small differences
# also it is not very practical especially for bigger data amounts

{'firstName': 'Jane',
 'lastName': 'Doe',
 'hobbies': ['running', 'sky diving', 'dancing'],
 'age': 43,
 'children': [{'firstName': 'Alice', 'age': 7},
  {'firstName': 'Bob', 'age': 13}]}

In [16]:
# we need a library for decoding and encoding json
# json is built into any standard Python installation
import json # part of Standard Library
# need to only import once per script / notebook

In [17]:
# let's create a json string out of mydata
json_data = json.dumps(mydata) # dumps is short for dump string
json_data

'{"firstName": "Jane", "lastName": "Doe", "hobbies": ["running", "sky diving", "dancing"], "age": 43, "children": [{"firstName": "Alice", "age": 7}, {"firstName": "Bob", "age": 13}]}'

In [18]:
# so again we have mydata which is a Python dictionary
# and json_data which is a string in JSON format - matches the JSON format specification
print(f"Type of mydata: {type(mydata)}")
print(f"Type of json_data: {type(json_data)}")

Type of mydata: <class 'dict'>
Type of json_data: <class 'str'>


In [19]:
# first we are going to dump our data into a text file
# we will use with context manager to open and close the file for safety
# instead of "data_file.json" any normal file name would do
# it is not required that you use .json extension but it would certainly make the most sense here
# technically it is going to be just a text file
with open("data_file.json", mode="w") as write_file: # w means write
    json.dump(mydata, write_file) # so mydata is going to be written here
    # mydata will typically be dictionary or list - which can have more inside
    # but technically it could be a string  (then you would get a normal text file)
    # or  single number but that would be very tiny json file :)
# remember that stream is closed here and file is written by now

### Saving indented JSON to a file



In [20]:
# this will be nicer for humans to read
with open("data_file_indented.json", mode="w") as write_file:
    json.dump(mydata, write_file, indent=4) # so mydata could be aANY standard python data structure

In [21]:
# again .json files are just text files
# so if we read the file we will get a string
with open("data_file_indented.json") as f:
    raw_txt = f.read()
raw_txt[:150] # so again raw JSON is just text

'{\n    "firstName": "Jane",\n    "lastName": "Doe",\n    "hobbies": [\n        "running",\n        "sky diving",\n        "dancing"\n    ],\n    "age": 43,\n  '

In [22]:
type(raw_txt)

str

In [23]:
print(raw_txt)

{
    "firstName": "Jane",
    "lastName": "Doe",
    "hobbies": [
        "running",
        "sky diving",
        "dancing"
    ],
    "age": 43,
    "children": [
        {
            "firstName": "Alice",
            "age": 7
        },
        {
            "firstName": "Bob",
            "age": 13
        }
    ]
}


In [None]:
# deserialization - converting JSON text into Python(or other programming language) data structure
# why would you want to do that?
# because you want to work with the data in Python - and working with strings is not very convenient
# we want the structure of the data to be preserved

## Deserializing, decoding raw JSON text into Python data format

In [25]:
# parse, deserialize, decode from json string into Python Data structure
# this is if you have a string that meets JSON format specification
my_data_decoded = json.loads(raw_txt) # raw_txt is a string
print(f"The type of my_data_decoded is {type(my_data_decoded)}")
# typically we will be using json.load for reading json from files not strings

The type of my_data_decoded is <class 'dict'>


In [27]:
my_data_decoded.keys()

dict_keys(['firstName', 'lastName', 'hobbies', 'age', 'children'])

In [28]:
my_data_decoded['children']

[{'firstName': 'Alice', 'age': 7}, {'firstName': 'Bob', 'age': 13}]

In [33]:
charlie = {'firstName': 'Čarlijs','age': 14 }  # new dictionary  with 2 keys-value pairs
charlie  #this dictionary is not in JSON format, it is not related to JSON at all
# so this will apply to any non English language that uses Unicode characters beyond 255

{'firstName': 'Čarlijs', 'age': 14}

In [34]:
# now my_data_decoded has a key children which is a list of dictionaries
print("Type of my_data_decoded['children']:", type(my_data_decoded['children']))

Type of my_data_decoded['children']: <class 'list'>


In [35]:
# let me only keep the first two children in the list
my_data_decoded['children'] = my_data_decoded['children'][:2] # we do not want the old charlie in the list
# we can use good old append method to add a new dictionary to the list
my_data_decoded['children'].append(charlie)

In [36]:
my_data_decoded

{'firstName': 'Jane',
 'lastName': 'Doe',
 'hobbies': ['running', 'sky diving', 'dancing'],
 'age': 43,
 'children': [{'firstName': 'Alice', 'age': 7},
  {'firstName': 'Bob', 'age': 13},
  {'firstName': 'Čarlijs', 'age': 14}]}

In [37]:
my_data_decoded['hobbies'].append('šahs')  #šahs - chess in Latvian
my_data_decoded['hobbies'].append('šūpošanās') # šūpoles - swings in Latvian

In [38]:
# I am adding a new key-value pair to the dictionary
my_data_decoded["car"] = "Ņiva" # i am trying to make a point about Unicode
my_data_decoded

{'firstName': 'Jane',
 'lastName': 'Doe',
 'hobbies': ['running', 'sky diving', 'dancing', 'šahs', 'šūpošanās'],
 'age': 43,
 'children': [{'firstName': 'Alice', 'age': 7},
  {'firstName': 'Bob', 'age': 13},
  {'firstName': 'Čarlijs', 'age': 14}],
 'car': 'Ņiva'}

In [39]:
# so i will save the freshly updated data to a file
# not everything is in English!
with open("data_file_indented_ascii.json", mode="w") as f_stream:
    json.dump(my_data_decoded, f_stream, indent=4)

### Saving indented JSON with UTF-8 encoding to a file

In [40]:
# so if we want to save data that contains Unicode
# in UTF-8 encoding we need to specify it
# this is the recipe to use if you want to save in Unicode
with open("data_file_indented_unicode.json", mode="w", encoding="UTF-8") as f_stream:
    json.dump(my_data_decoded, f_stream, indent=4, ensure_ascii=False) # I want to see Unicode

In [None]:
# three are no strict restriction on what we append to our lists
# in the previous example I could have added a number or another list, not necessarily a string


## Loading data from JSON file immediately into memory

In [41]:
# more often we will load json immediately
with open("data_file_indented_unicode.json", encoding="utf-8") as file_stream:
    my_data_2 = json.load(file_stream) # if json is malformed then you will get some sort of error
print(f"Type of my_data_2: {type(my_data_2)}")

Type of my_data_2: <class 'dict'>


In [42]:
my_data_2

{'firstName': 'Jane',
 'lastName': 'Doe',
 'hobbies': ['running', 'sky diving', 'dancing', 'šahs', 'šūpošanās'],
 'age': 43,
 'children': [{'firstName': 'Alice', 'age': 7},
  {'firstName': 'Bob', 'age': 13},
  {'firstName': 'Čarlijs', 'age': 14}],
 'car': 'Ņiva'}

In [45]:
# contents are the same but two different objects
# contents are equal but not the same
# like two shopping bags with the same contents, same milk, same bread, same eggs
print("Contents of my_data_decoded and my_data_2 are same?", my_data_decoded == my_data_2)
# so == checks that the contents are the same
print("Objects holding data are same in memory?", my_data_decoded is my_data_2) 
# is checks that they are the same object in memory

Contents of my_data_decoded and my_data_2 are same? True
Objects holding data are same in memory? False


## Mapping JSON to Python Data Types and vice versa

Let's show a table of mapping between JSON and Python data types

| JSON | Python |
|------|--------|
| object | dict |
| array | list |
| string | str |
| number (int) | int |
| number (real) | float |
| true | True |
| false | False |
| null | None |

#### Now Python to JSON

| Python | JSON |
|--------|------|
| dict | object |
| list, tuple | array |
| str | string |
| int, float | number |
| True | true |
| False | false |
| None | null |


In [None]:
# there is a mapping of Python data types to JSON data types
# Python data type  JSON data type
# dict              object
# list, tuple       array
# str               string
# int, float        number
# True              true
# False             false
# None              null

# notice that JSON is more limited than Python
# not everything can be converted to JSON without loss of information
# for example tuples are converted to lists and you can't tell the difference

# another limitation is that JSON does not allow for comments!
# there have been recommendations to use JSON5 but it is not widely adopted

### Additions to JSON

There have been some proposals that include comments and other features to JSON. These are not part of the JSON standard. However, there are some libraries that allow for these features.

In [None]:
# uses of JSON

# JSON is used for data interchange between web applications
# also used for data interchange between web applications and mobile applications
# also used for settings files, configuration files, data files
# there are databases that use JSON as a data storage format
# there are databases that use JSON as a query language

# here we are interested in using JSON as a data interchange format
# to obtain data from a web application / page

## API  in general - JSON APIs

In [None]:
# API - Application Programming Interface, a set of functions that allow you to interact with an application
# JSON API - a set of functions that allow you to interact with an application using JSON

# extremely popular way to interact with web applications
# why ? because it is simple and easy to use

# there are more difficult ways to interact with web applications

In [46]:
try:
    import requests
    print("requests library is installed and is version:", requests.__version__)
except ImportError as e:
    print("requests library is not installed")
    print("pip install requests from a terminal")
    print("on Jupyter you can use !pip install requests")
    print("https://pypi.org/project/requests/")
    print("https://requests.readthedocs.io/en/master/")

# this library is not included with Python but is very popular and comes with Anaconda
# requests bundles the standard library urllib3 and adds some convenience functions
# requests is also included with Google Colab
# pip install requests otherwise in a terminal
# above would be line on local machine
# more on requests: https://pypi.org/project/requests/


requests library is installed and is version: 2.31.0


### Making a web get request using requests library

In [47]:
url = "https://jsonplaceholder.typicode.com/users" # this is a public API
# url - uniform resource locator
# uri - uniform resource identifier
# url vs uri - https://stackoverflow.com/questions/176264/what-is-the-difference-between-a-uri-a-url-and-a-uri
print("URL:", url)

URL: https://jsonplaceholder.typicode.com/users


### Basic Requests Workflow

* Import the requests library
* Use the requests.get() method to make a GET request to the API
* Check the status code of the response to ensure the request was successful
* Use the .json() method to convert the response to a JSON object
* Use the JSON object to access the data returned by the API

In [48]:
# basic requests workflow

response = requests.get(url) # so this is a GET request similar to what happens when web browser loads a page
# so get method returns a response object, it made a network request to the server at that url
print(response.status_code) # Response Code 200 is good
# in your notebooks you might want to include something like assert or if statement
assert response.status_code == 200 # this would give you error if response is not 200
# idea is to stop your notebook if you are running All code sells from top to bottom
# assert does nothing if the evaluation is True

200


In [None]:
# list of HTTP response codes
# https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
# 404 - not found
# 500 - internal server error
# 200 - OK
# 4xx - client error - it is our fault
# 5xx - server error - it is the server's fault

In [49]:
# let's print first 200 characters of the response
print(f"First 200 characters of the response: {response.text[:200]}")

First 200 characters of the response: [
  {
    "id": 1,
    "name": "Leanne Graham",
    "username": "Bret",
    "email": "Sincere@april.biz",
    "address": {
      "street": "Kulas Light",
      "suite": "Apt. 556",
      "city": "Gwen


In [50]:
# it would be much easier to work with this data if it was in a Python data structure
# so requests provides a method to convert the response text into a Python data structure
# use this if you know that the response is in JSON format
users = response.json() # this will work if we requested a JSON resource (NOT HTML!!)
print(f"Type of users: {type(users)}")
print(f"Got {len(users)} users")

# you can get an idea beforehand by peaking at first character of JSON text { -> dict, [ -> list

Type of users: <class 'list'>
Got 10 users


In [51]:
# now we just do regular Python things with the data
# the web server has sent us a list of dictionaries after converting it to Python data structure
users[-1] # last user

{'id': 10,
 'name': 'Clementina DuBuque',
 'username': 'Moriah.Stanton',
 'email': 'Rey.Padberg@karina.biz',
 'address': {'street': 'Kattie Turnpike',
  'suite': 'Suite 198',
  'city': 'Lebsackbury',
  'zipcode': '31428-2261',
  'geo': {'lat': '-38.2386', 'lng': '57.2232'}},
 'phone': '024-648-3804',
 'website': 'ambrose.net',
 'company': {'name': 'Hoeger LLC',
  'catchPhrase': 'Centralized empowering task-force',
  'bs': 'target end-to-end models'}}

In [51]:
type(users[-1]) # will be a dictionary

dict

### List of dictionaries

Lists of dictionaries are often used to present data that was originally in tables
Each row is an entry in list, then each entry is a dictionary
and then each column name is key and each column value for that row is the value

In [52]:
users[-1]['name']  # a string

'Clementina DuBuque'

In [None]:
# how could we get first and last?

In [53]:
last_user_names = users[-1]['name']
last_user_names

'Clementina DuBuque'

In [54]:
# now I could creata first and last name list
name_list = last_user_names.split() # just a regular Python split method nothing to do with JSON
name_list

['Clementina', 'DuBuque']

In [55]:
first_name = name_list[0]
last_name = name_list[-1]
first_name, last_name

('Clementina', 'DuBuque')

In [56]:
# so here latitute and longitude are strings
# they are deeply nested in the data structure
# list then dict then dict and one last dict
# lattitude for the last user
lat = users[-1]['address']['geo']['lat'] # so we have 3 levels of dictionaries here!!
# it would be safer to use get method if you are not sure that the key exists
lat # it is a string

'-38.2386'

In [60]:
lat = float(lat) # another common issue is that often JSON numbers are actually strings
# so check whether number is in quotes as in here
lat # now it is a number, floating (with a comma)

-38.2386

In [None]:
# not all APIs are free
# many require API keys
# those can be paid or free
# you obtain keys from the service provider
# in general
# API keys should not be shown in public notebooks!
# especially the paid ones
# if you publicly share your Amazon AWS key expect a VERY LARGE BILL SOON - or a very long talk with your provider


In [None]:
# mockaroo.com is a free service that provides fake data similar to what you would get from a real API
# you can use it to test your code
# but to use it you it is recommended to register and get an API key

In [58]:
# we make a http request to a url and print the response code
# url stands for uniform resource locator
url = "https://my.api.mockaroo.com/ageincluded.json?key=58227cb0"
# careful with publishing your API keys - they can be used to make requests on your behalf
# in this case this key was registered by me and is not a secret, long time ago I used it for a course
# there are many horror stories about people publishing their API keys and then getting charged for requests
# so you should get your own API key and not use mine in general
print("URL:", url)
# response = requests.get(url)
# print(response.status_code) # Response Code 200 is good!

URL: https://my.api.mockaroo.com/ageincluded.json?key=58227cb0


### List of HTTP codes
* 200 is OK - everything is as expected, we asked for resource and we got it
* the most famous is 404 code for not found, most likely from typos or resource missing
* https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

can open https://jsonplaceholder.typicode.com/todos in regular browser too..

### Passing URL parameters using requests

In [None]:
response = requests.get("https://cat-fact.herokuapp.com/facts/random",
                        params={"animal_type":"cat", "amount":20}) 
# of course the particular API has to support these parameters
# you would have to look at the documentation
if response.status_code != 200:
    print("Bad Response: ", response.status_code)
print("Response code", response.status_code)
# cats = json.loads(response.text)
# simpler response.json()
cats = response.json()
cats[:3]

Response code 200


[{'status': {'verified': None, 'sentCount': 0},
  '_id': '641ec20d8023fe33f6a13bd6',
  'user': '640603b3ebdaaca99be7edd5',
  'text': 'Лплорпдоп.',
  'type': 'cat',
  'deleted': False,
  'createdAt': '2023-03-25T09:42:37.023Z',
  'updatedAt': '2023-03-25T09:42:37.023Z',
  '__v': 0},
 {'status': {'verified': None, 'sentCount': 0},
  '_id': '618a432d717216c55a0464e6',
  'user': '618a4300717216c55a0464ae',
  'text': 'Postman.',
  'type': 'cat',
  'deleted': False,
  'createdAt': '2021-11-09T09:45:17.638Z',
  'updatedAt': '2021-11-09T09:45:17.638Z',
  '__v': 0},
 {'status': {'verified': True, 'sentCount': 1},
  '_id': '591f98703b90f7150a19c15d',
  '__v': 0,
  'text': 'Most cats killed on the road are un-neutered toms, as they are more likely to roam further afield and cross busy roads.',
  'source': 'api',
  'updatedAt': '2020-08-23T20:20:01.611Z',
  'type': 'cat',
  'createdAt': '2018-01-04T01:10:54.673Z',
  'deleted': False,
  'used': False,
  'user': '5a9ac18c7478810ea6c06381'}]

In [None]:
## For authorization you generally need some sort of token(key)
# One example for zendesk API  https://develop.zendesk.com/hc/en-us/community/posts/360001652447-API-auth-in-python


# For an API token, append '/token' to your username and use the token as the password:
## This will not work for those without zendesk access token

# url = 'https://your_subdomain.zendesk.com/api/v2/users/123.json'
# r = requests.get(url, auth=('user@example.com/token', 'your_token'))
# # For an OAuth token, set an Authorization header:

# bearer_token = 'Bearer ' + access_token
# header = {'Authorization': bearer_token}
# url = 'https://your_subdomain.zendesk.com/api/v2/users/123.json'
# r = requests.get(url, headers=header)

### Function to read JSON from url

In [60]:
def read_JSON(url):
    """Simple wrapper function around requests
    Argument: url - what you want to access
    Returns: Python data structure parsed from JSON response
    Returns NONE if non 200 response
    """
    response = requests.get(url)
    if response.status_code != 200:
        print("Bad Response: ", response.status_code)
        return None
    print("Status CODE", response.status_code)
    return response.json() # easier than json.loads(response.text)


In [62]:
rawdrinks = read_JSON("https://www.thecocktaildb.com/api/json/v1/1/search.php?s=margarita")
print("Type of rawdrinks:", type(rawdrinks))

Status CODE 200
Type of rawdrinks: <class 'dict'>


In [63]:
rawdrinks.keys() # turns out we have an outer dictionary with a single key

dict_keys(['drinks'])

In [64]:
# let's remove this outer wrapper and get to the list of drinks
raw_drinks_list = rawdrinks['drinks']
# so we have our desired list of dictionary
print("container is of type", type(raw_drinks_list))
print("Items are of type", type(raw_drinks_list[0]))

container is of type <class 'list'>
Items are of type <class 'dict'>


In [None]:
# how many drinks to we have?
print("Length of list", len(raw_drinks_list))

LEngth of list 6


In [None]:
# for 100 suggestion is to use time.sleep(0.2)
# it is good manners to sleep a little to avoid DDOS attack on API server
import time
time.sleep(0.5) # half a second delay

## Tip - Use Delay (Sleep) in API Calls

Small amount of sleep between multiple API calls to avoid getting blocked by the server

Most API services have rate limits to prevent abuse.

Could be 1 request per second, 10 requests per minute, etc. you would need to check the API documentation

In [66]:
url = "https://data.gov.lv/dati/lv/api/3/action/datastore_search?resource_id=27fcc5ec-c63b-4bfd-bb08-01f073a52d04&limit=5"
r = requests.get(url)
r.status_code

200

In [90]:
r.text

'{"help": "https://data.gov.lv/dati/lv/api/3/action/help_show?name=datastore_search", "success": true, "result": {"include_total": true, "resource_id": "27fcc5ec-c63b-4bfd-bb08-01f073a52d04", "fields": [{"type": "int", "id": "_id"}, {"type": "numeric", "id": "id"}, {"type": "numeric", "id": "file_id"}, {"type": "numeric", "id": "legal_entity_registration_number"}, {"type": "text", "id": "source_schema"}, {"type": "text", "id": "source_type"}, {"type": "numeric", "id": "year"}, {"type": "timestamp", "id": "year_started_on"}, {"type": "timestamp", "id": "year_ended_on"}, {"type": "numeric", "id": "employees"}, {"type": "text", "id": "rounded_to_nearest"}, {"type": "text", "id": "currency"}, {"type": "timestamp", "id": "created_at"}], "records_format": "objects", "records": [{"_id":1,"id":709390,"file_id":16544390,"legal_entity_registration_number":40103504912,"source_schema":"DokGPUIENv1","source_type":"UGP","year":2016,"year_started_on":"2016-01-01T00:00:00","year_ended_on":"2016-12-31T

In [None]:
with open("ur_yearly.json", "w", encoding="utf-8") as f:
    f.write(r.text)

In [None]:
# how about Twitter JSON API ?

# https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html
# https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/overview

# over the years Twitter has been changing their API
# so you need to check the documentation for the version you are using
# also you would need to register for a developer account and get a token - think of it as a key
# Academic research is free, but you need to apply for it
# site to apply for academic research https://developer.twitter.com/en/apply-for-access

In [None]:
# there are Python libraries for Twitter API which make it easier to use
# you could use tweepy or twitter libraries
# but for this example we will use requests library


# in any case with any key you are going to be limited in the number of requests you can make per day
# so do not expect to obtain a lot of data

## JSON APIs for Digital Discourse

### Social Networks and Communication Platforms

* Twitter API - https://developer.twitter.com/en/docs - paid only :(
* Reddit API - https://www.reddit.com/dev/api/ - free but very limited, paid for more
* Mastodon API - https://docs.joinmastodon.org/api/ - free, but authorization needed, a bit complex
* Telegram API - https://core.telegram.org/bots/api - free, but authorization needed, a bit complex
* Instagram API - https://www.instagram.com/developer/ - paid only after December 4th, 2024:(

### News and Media Platforms

* New York Times API - https://developer.nytimes.com/apis - free, but authorization needed, might be rate limited
* The Guardian API - https://open-platform.theguardian.com/documentation/ - free, but authorization needed, might be rate limited Sign up for key here -https://open-platform.theguardian.com/access/

### Public API directories

* Awesome API Github list - https://github.com/burningtree/awesome-json - many APIs listed here
* Public APIs - https://publicapis.io - many APIs listed here

### APIs used for Large Language Models

Most of the large language models such as GPT-4, Claude, Gemini, BERT, etc. are available as APIs. These APIs can be used to generate text, summarize text, translate text, etc. These APIs are paid and require authorization.

### API aggregrators

For LLMS, there are services that offer to aggregate multiple APIs into one. These services provide a single API that can be used to access multiple APIs. These services are paid and require authorization.

Personal recommendation: https://openrouter.ai/docs/quick-start - same price as buying directly from OpenAI, but you get access to multiple APIs.