# JSON and JSON APIs

## JSON - Javascript Object Notation

JSON is the most widely format for data interchange on the web. It is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.

### Inventor - Douglas Crockford from Yahoo

* Goal - To provide a simple data exchange format that is easy to read and write.
* Human Readable - JSON is easy to read and write.
* Machine Parsable - JSON is easy to parse and generate for computers.
* JSON is language independent.
* JSON is "self-describing" and easy to understand.
* JSON is built on two structures:
    * A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
    * An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.


### JSON Official Page

[JSON Official Page](https://www.json.org/json-en.html)

## JSON Syntax

Main thing to remember is that JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others.

In [1]:
# let's make a simple JSON string
import json # standard library # no pip install needed

# let's make a simple list of dictionaries that describe my favorite foods
foods = [{"name": "pizza", "type": "Italian", "price": 10.99},
         {"name": "sushi", "type": "Japanese", "price": 24.99},
         {"name": "cheeseburger", "type": "American", "price": 8.99},
         {"name": "tacos", "type": "Mexican", "price": 12.99},
         ]

# the keys and values could be anything really valid in Python

# print data type of foods
print("Data type of foods:", type(foods))
# print data type of first element in foods
print("Data type of first element in foods:", type(foods[0]))

# at the moment foods has no relationship to JSON it is just data in Python

Data type of foods: <class 'list'>
Data type of first element in foods: <class 'dict'>


In [2]:
# let's print them line by line
for food in foods:
    print(food)

{'name': 'pizza', 'type': 'Italian', 'price': 10.99}
{'name': 'sushi', 'type': 'Japanese', 'price': 24.99}
{'name': 'cheeseburger', 'type': 'American', 'price': 8.99}
{'name': 'tacos', 'type': 'Mexican', 'price': 12.99}


In [3]:
# now we will convert this list of dictionaries to a JSON string
# it is called serialization in a programming context
# why because Python data format is not understood by other programming languages
# strings on other hand can be understood by other programming languages and humans
foods_json = json.dumps(foods) # dumps takes any Python data type and converts it to a JSON string
print("Data type of foods_json:", type(foods_json))

Data type of foods_json: <class 'str'>


In [4]:
# print the JSON string
print("foods_json:\n", foods_json)

foods_json:
 [{"name": "pizza", "type": "Italian", "price": 10.99}, {"name": "sushi", "type": "Japanese", "price": 24.99}, {"name": "cheeseburger", "type": "American", "price": 8.99}, {"name": "tacos", "type": "Mexican", "price": 12.99}]


In [5]:
# let's get back the data from the JSON string
# it is called deserialization in a programming context
# we are converting a JSON string to a Python data type
foods_from_json = json.loads(foods_json) # loads takes a JSON string and converts it to a Python data type
print("Data type of foods_from_json:", type(foods_from_json))

Data type of foods_from_json: <class 'list'>


In [6]:
# print all data
print("foods_from_json:")
# again let's use a loop since foods_from_json is a list of dictionaries
for food in foods_from_json:
    print(food)

foods_from_json:
{'name': 'pizza', 'type': 'Italian', 'price': 10.99}
{'name': 'sushi', 'type': 'Japanese', 'price': 24.99}
{'name': 'cheeseburger', 'type': 'American', 'price': 8.99}
{'name': 'tacos', 'type': 'Mexican', 'price': 12.99}


## Saving data to JSON file

In [7]:
# first let's add some latvian food item to our list
# foods is just a Python list so I can use the append method
foods.append({"name": "kartupeļi", "type": "Latvian", "price": 3.99})
print("foods:", foods)

foods: [{'name': 'pizza', 'type': 'Italian', 'price': 10.99}, {'name': 'sushi', 'type': 'Japanese', 'price': 24.99}, {'name': 'cheeseburger', 'type': 'American', 'price': 8.99}, {'name': 'tacos', 'type': 'Mexican', 'price': 12.99}, {'name': 'kartupeļi', 'type': 'Latvian', 'price': 3.99}]


In [8]:
# let's save our data to a file
with open("foods.json", mode="w") as file: # again file is a file stream
    json.dump(foods, file) # dump takes a Python data type and saves it to a file in JSON format
    # note dump not dumps, dump takes also a file stream as second argument

In [10]:
# let's add indent to make it more readable
with open("foods_pretty.json", "w") as file:
    json.dump(foods, file, indent=2) # indent makes it more readable

In [12]:
# let's add encoding utf-8 to make sure it is saved in utf-8 format
with open("foods_utf8.json", mode="w", encoding="utf-8") as file:
    json.dump(foods, file, indent=4, ensure_ascii=False) # ensure_ascii=False makes sure it is saved in utf-8 format

## Reading JSON files



In [15]:
# we will use load to read the data from the file
# we will open the utf-8 file
with open("foods_utf8.json", "r", encoding="utf-8") as file:
    foods_from_file = json.load(file) # load takes a file stream and reads the JSON data from it
# file is automatically closed after the with block is done

# here foods_from_file is a Python data type - not a string!
print("Data type of foods_from_file:", type(foods_from_file))
# it is a list of dictionaries
# since we know we are loading a list of dictionaries we can print it line by line
# import this will not work if the structure is different! 
# print what we loaded line by line
for food in foods_from_file:
    print(food)

Data type of foods_from_file: <class 'list'>
{'name': 'pizza', 'type': 'Italian', 'price': 10.99}
{'name': 'sushi', 'type': 'Japanese', 'price': 24.99}
{'name': 'cheeseburger', 'type': 'American', 'price': 8.99}
{'name': 'tacos', 'type': 'Mexican', 'price': 12.99}
{'name': 'kartupeļi', 'type': 'Latvian', 'price': 3.99, 'description': 'Latvian potato dish, often served with sour cream or butter.', 'commentfield': 'This is a comment field for the Latvian dish.', 'rating': 4.5}


## JSON vs Python data types

Not all of Python's data types can be represented in JSON. JSON supports the following data types:
* Number (integer, real, or floating point)
* String (double-quoted Unicode with backslash escaping)
* Boolean (true and false)
* Array (an ordered sequence of values, comma-separated and enclosed in square brackets)
* Object (collection of key/value pairs, comma-separated and enclosed in curly braces)

JSON does not support:
* NaN, Infinity, -Infinity
* Date, datetime, time, timedelta
* Set, frozenset
* Bytes, bytearray
* complex
* comments

In [16]:
# let's see the loss of some types in action

# let's make a dictionary with some data types
data = {"int": 1, 
        "float": 1.1, 
        "str": "hello", 
        "list": [1, 2, 3], 
        "dict": {"a": 1, "b": 2},
        "bool": True,
        "bool2": False,
        "none": None,
        "tuple": (1, 2, 3),
        # "set": {1, 2, 3,3,3} # sets are not JSON serializable
        
        }

# print data one by one
for key, value in data.items():
    print(f"{key}: {value}")

int: 1
float: 1.1
str: hello
list: [1, 2, 3]
dict: {'a': 1, 'b': 2}
bool: True
bool2: False
none: None
tuple: (1, 2, 3)


In [17]:
# let's convert it to an indented JSON string
data_json = json.dumps(data, indent=4)
# data type of data_json is string
print("Data type of data_json:", type(data_json))
# print the JSON string - this will be multi-line
print(data_json)

Data type of data_json: <class 'str'>
{
    "int": 1,
    "float": 1.1,
    "str": "hello",
    "list": [
        1,
        2,
        3
    ],
    "dict": {
        "a": 1,
        "b": 2
    },
    "bool": true,
    "bool2": false,
    "none": null,
    "tuple": [
        1,
        2,
        3
    ]
}


In [18]:
# therefore when we convert from JSON string back to Python data type
# instead of original tuple we will get a list
data_from_json = json.loads(data_json)
# print data one by one
for key, value in data_from_json.items():
    print(f"{key}: {value}")

# note that there is no more tuple but a list - because JSON does not have a tuple data type

int: 1
float: 1.1
str: hello
list: [1, 2, 3]
dict: {'a': 1, 'b': 2}
bool: True
bool2: False
none: None
tuple: [1, 2, 3]


## Conclusion on JSON basics

So we have a very flexible format which is a string and can be easily converted to Python data types. We can save data to JSON files and read data from JSON files.

There are some limitations on the data types that can be represented in JSON but for most of the use cases, JSON is a very good format to use.

JSON itself does not support comments, there are newer JSON formats like JSON5 which support comments. However they are not as widely used as JSON.

## Pickle vs JSON

Python offers two main serialization formats: Pickle and JSON. Both formats are used to serialize and deserialize Python objects, but they have some key differences:
* Pickle is a binary serialization format, while JSON is a text-based format.

* Pickle is a Python-specific serialization format, while JSON is a language-independent format.
* Pickle can serialize a wider range of Python objects, including custom classes and functions, while JSON can only serialize basic data types (strings, numbers, lists, dictionaries).

In [None]:
# let's pickle some data
import pickle # standard library # no pip install needed
# pickle is a Python specific serialization format
# it is not human readable and not cross platform
# it is only for Python and not for other programming languages

# let's save our data to a file using pickle
# common extension for pickle files is .pickle or .pkl
with open("foods.pickle", mode="wb") as file: # wb means write binarby
    pickle.dump(foods, file) # dump takes a Python data type and saves it to a file in pickle format

In [21]:
# how about saving our data to pickle file 
# so let's save data to a pickle file
with open("data.pickle", mode="wb") as file: # wb means write binary
    pickle.dump(data, file) # dump takes a Python data type and saves it to a file in pickle format

In [22]:
# now we should be able to unpickle the data without losing any data types
with open("data.pickle", mode="rb") as file: # rb means read binary
    unpickled_data = pickle.load(file) # load takes a file stream and reads the pickle data from it

# unlike JSON we should have a tuple back
# let's print the data one by one
for key, value in unpickled_data.items():
    print(f"{key}: {value}")

int: 1
float: 1.1
str: hello
list: [1, 2, 3]
dict: {'a': 1, 'b': 2}
bool: True
bool2: False
none: None
tuple: (1, 2, 3)


## JSON APIs

JSON data is often used in APIs. APIs are a set of rules and protocols that allow one software application to communicate with another.

Often an internet API will return data in JSON format. This data can be used in your Python program.

### Sources of free APIs for testing

#### Testing APIs

* [JSONPlaceholder](https://jsonplaceholder.typicode.com/) - Fake Online REST API for Testing and Prototyping


In [25]:
# we could use standard library tools to get data from internet
# instead we will use requests library - which is not part of standard library
# requests is a third party library - very popular for making HTTP requests
# docs: https://docs.python-requests.org/en/master/

# try to import requests
try:
    import requests
    print("requests library is installed")
    # print version
    print(f"requests library version: {requests.__version__}")
except ImportError:
    print("requests library is not installed")
    print("please install it with: pip install requests")

# Google Colab has requests library installed by default and many other libraries

requests library is installed
requests library version: 2.32.3


In [None]:
# !pip install requests
# ! is just like using terminal command
# again we only need to install it once then we can comment it out

Collecting requests
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting charset-normalizer<4,>=2 (from requests)
  Downloading charset_normalizer-3.4.2-cp312-cp312-win_amd64.whl.metadata (36 kB)
Collecting idna<4,>=2.5 (from requests)
  Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests)
  Downloading urllib3-2.4.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests)
  Downloading certifi-2025.4.26-py3-none-any.whl.metadata (2.5 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Downloading certifi-2025.4.26-py3-none-any.whl (159 kB)
Downloading charset_normalizer-3.4.2-cp312-cp312-win_amd64.whl (105 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Downloading urllib3-2.4.0-py3-none-any.whl (128 kB)
Installing collected packages: urllib3, idna, charset-normalizer, certifi, requests
Successfully installed certifi-2025.4.26 charset-normalizer-3.4.2 idna-3.10 requests-2.32.3 u


[notice] A new release of pip is available: 24.2 -> 25.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [26]:
# to make a request we first need url - uniform resource locator
url = "https://jsonplaceholder.typicode.com/todos"
print("url:", url)

url: https://jsonplaceholder.typicode.com/todos


### Base recipe for using requests and JSON


In [28]:
# now I will make a htpp GET request to the url
# kind of similar to what a web browser does when you type in a url

response = requests.get(url) # this connects to the url and gets the data
# thik of requests.get as similar to what a browser does when you type in a url
# print response code - could check for 200 which means success
print("response code:", response.status_code)
# wikipedia has a list of response codes: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

response code: 200


In [29]:
# what data type is response
print("Data type of response:", type(response))

Data type of response: <class 'requests.models.Response'>


In [30]:
# now since response is 200 we are good to go
# we could read the text from the response
text = response.text
# print first 100 characters of the text
print("text:", text[:100])

text: [
  {
    "userId": 1,
    "id": 1,
    "title": "delectus aut autem",
    "completed": false
  },
 


In [31]:
# looks like JSON string we can conver it to a Python data type
data = json.loads(text) # again note loads not load since we are converting a string to a Python data type
# print data type of data
# we should have normal Python data type - list of dictionaries
print("data type of data:", type(data))
# print first element of data
print("first element of data:", data[0])

data type of data: <class 'list'>
first element of data: {'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}


In [32]:
# usually if we know our response is JSON we can use response.json() method immediately
data = response.json() # very convenient
# above is same as data = json.loads(response.text) # hope you agree above is more convenient
# print data type of data
print("data type of data:", type(data))
# print first element of data
print("first element of data:", data[0])

data type of data: <class 'list'>
first element of data: {'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}


### Using data from JSON APIs

Now that we have data in Python data format, we do whatever needs to be done with the data. We can use the data to do some analysis, visualization, etc.

We can adjust data as needed and save it to a file or database.

### Awesome JSON APIs

* [Awesome JSON Datasets](https://github.com/public-api-lists/public-api-lists)



In [30]:
# let's get some cat facts from the Cat Facts API

cat_url = "https://cat-fact.herokuapp.com/facts"
print("cat_url:", cat_url)
cat_facts  = requests.get(cat_url).json() # we assume it will not fail and is JSON
# raw facts
print("cat_facts:", cat_facts)


cat_url: https://cat-fact.herokuapp.com/facts
cat_facts: [{'status': {'verified': True, 'sentCount': 1}, '_id': '58e008780aac31001185ed05', 'user': '58e007480aac31001185ecef', 'text': 'Owning a cat can reduce the risk of stroke and heart attack by a third.', '__v': 0, 'source': 'user', 'updatedAt': '2020-08-23T20:20:01.611Z', 'type': 'cat', 'createdAt': '2018-03-29T20:20:03.844Z', 'deleted': False, 'used': False}, {'status': {'verified': True, 'sentCount': 1}, '_id': '58e009390aac31001185ed10', 'user': '58e007480aac31001185ecef', 'text': "Most cats are lactose intolerant, and milk can cause painful stomach cramps and diarrhea. It's best to forego the milk and just give your cat the standard: clean, cool drinking water.", '__v': 0, 'source': 'user', 'updatedAt': '2020-08-23T20:20:01.611Z', 'type': 'cat', 'createdAt': '2018-03-04T21:20:02.979Z', 'deleted': False, 'used': False}, {'status': {'verified': True, 'sentCount': 1}, '_id': '588e746706ac2b00110e59ff', 'user': '588e6e8806ac2b00110

In [31]:
# let's get only facts from the cat facts
# the facts are in the "text" key for each dictionary which we have a list of
cat_facts_text = []
for fact in cat_facts:
    cat_facts_text.append(fact.get("text", "no fact")) # the rest of keys we do not need
    # we expect that text key exists but just in case we use get method

# print cat facts one by one
for fact in cat_facts_text:
    print(fact)

Owning a cat can reduce the risk of stroke and heart attack by a third.
Most cats are lactose intolerant, and milk can cause painful stomach cramps and diarrhea. It's best to forego the milk and just give your cat the standard: clean, cool drinking water.
Domestic cats spend about 70 percent of the day sleeping and 15 percent of the day grooming.
The frequency of a domestic cat's purr is the same at which muscles and bones repair themselves.
Cats are the most popular pet in the United States: There are 88 million pet cats and 74 million dogs.


In [33]:
# we could get more cat facts by adding specific parameters to the url
# what parameters are available we would need to experiment or read documentation.. hopefully there is documentation
cat_url_page2 = "https://cat-fact.herokuapp.com/facts/random?animal_type=cat&amount=50"
print("cat_url_page2:", cat_url_page2)

cat_url_page2: https://cat-fact.herokuapp.com/facts/random?animal_type=cat&amount=50


In [None]:
# let's get

## Some tips for using requests for JSON APIs

In [34]:
# you might need to add headers to your request
# some APIs require headers
# headers are like metadata for the request
# you can add headers to your request

# let's get some cat facts from the Cat Facts API
# we will add headers to our request

response = requests.get(cat_url_page2, headers={"Accept": "application/json"})
# so headers is a dictionary where keys are header names and values are header values
# keys could be "Accept", "User-Agent", "Content-Type", "Authorization" etc.
# we are saying that we want JSON data
# print code
print("response code:", response.status_code)

response code: 200


In [None]:
# now some APIs require authentication and authorization

# authentication is proving who you are
# authorization is proving you have permission to do something

# let's say you have API_KEY for some service

# then you could use headers to send the API_KEY to the service

# we could also use shorter syntax in requests
# we could pass parameters as a dictionary to get method

# more information on authentication and authorization in requests library
# https://docs.python-requests.org/en/master/user/authentication/
