# JSON in Python

This notebook demonstrates reading JSON data from files into lists, dictionaries, and dataframes in Python.

Notice the Python version running on the cluster.

In [3]:
%python
import sys
sys.version

## Contents 
1. Setup
2. List example
1. Dictionary example
3. Mixed examples
1. Realistic examples

## 1. Setup

Load the `pandas` Python library.

In [7]:
%python
import pandas as pd

In this notebook we read data from some of these JSON files.

In [9]:
%sh ls /dbfs/mnt/datalab-datasets/file-samples/*.json

The contents of individual files can be displayed with the `head` shell command. 

For example, the following command displays the first three lines of the `one_list.json` file. Notice the parameter `-n 3`.

The `%sh` magic at the beginning of the code cell indicates that the cell contains shell commands.

In [11]:
%sh head -n 3 /dbfs/mnt/datalab-datasets/file-samples/one_list.json

The contents of files are often read into Python using the `open` function. (An example is three cells below.)
- The first argument to this function is the full path of the file. 
- The second argument is the mode with which to read the file. An `r` indicates the file will be read. 
- The function returns a file object that can be used to read (in this case) from the indicated file.
- See the documentation https://docs.python.org/3.5/library/functions.html#open for details.

The `with` command (in the example two cells below) is called a _context manager_.

In the cell below (and in many cells below that), the `with` command opens the file and stores the _file object_, returned by the `open` function, in the `infile` variable. All indented commands that follow the `with` command can access this variable and the contents of the file, but when the indentation stops then the file object is deleted and it is no longer possible to read from the file. 

The basic reason to use the `with` and `open` statements like this is to automatically close the file after the indented commands are run.
There are more complicated reasons to use them, but they're not that important (to know) if you __always__ use it to open files for reading or writing.

The following code cell:
1. opens the `simple_list.json` file for reading
2. loads the contents of the file into a list (using the `load` function from the `json` module)
3. stores this list in `list_from_file` 
4. displays the list (as it is the last command of the cell)

In [15]:
%python
with open('/dbfs/mnt/datalab-datasets/file-samples/simple_list.json', 'r') as infile:
  list_from_file = json.load(infile)

list_from_file

Notice the resulting list (above) corresponds exactly to the characters from the file (below).

In [17]:
%sh head -n 3 /dbfs/mnt/datalab-datasets/file-samples/simple_list.json

The remaining sections of this notebook demonstrate the reading of JSON text (from files) into Python lists. dictionaries, and tables.

## 2. List example

The `json` library provides Python functions to read and write JSON text. 

The `loads` function:
- expects its argument to be a string of JSON data and 
- returns the Python object (usually a list or dictionary) that corresponds to the JSON data

In [21]:
%python
import json
my_json = json.loads('["hello","goodbye",777,null]')
my_json

The `null` JSON value is often used to indicate missing data. Notice that this value corresponds to the `None` value in Python.

The JSON text for the next example comes from the file `simple_list.json`, which is displayed below. 

This is the same example as above in the Setup section. The following command displays the contents of this file (which only contains one line).

In [24]:
%sh head -n 3 /dbfs/mnt/datalab-datasets/file-samples/simple_list.json

The following code reads the contents of the file `simple_list.json` into a Python list.

In [26]:
%python
import json
with open("/dbfs/mnt/datalab-datasets/file-samples/simple_list.json", 'r') as infile:
  list_from_file = json.load(infile)
  
list_from_file

The following code demonstrates that the `load` function does in fact return a list.

In [28]:
%python 
type(list_from_file), list_from_file[-1]

## 2. Dictionary example

The JSON text for this example comes from the file `simple_dict.json`, which is displayed below.

In [31]:
%sh cat /dbfs/mnt/datalab-datasets/file-samples/simple_dict.json

The following code reads the contents of the file `simple_dict.json` into a Python dictionary.

In [33]:
%python
with open("/dbfs/mnt/datalab-datasets/file-samples/simple_dict.json", 'r') as infile:
  dict_from_file = json.load(infile)

dict_from_file

In [34]:
%python
dict_from_file.get('b')

## 3. Mixed examples

This section contains two examples, each in a section:
1. Dictionary of lists
1. List of dictionaries

### 3.1 Dictionary of lists

In [38]:
%sh cat /dbfs/mnt/datalab-datasets/file-samples/dict_of_lists.json

The following code reads the contents of the file `dict_of_lists.json` into a Python dictionary.

In [40]:
%python
with open("/dbfs/mnt/datalab-datasets/file-samples/dict_of_lists.json", 'r') as infile:
  dict_from_file = json.load(infile)

dict_from_file

In [41]:
%python 
type(dict_from_file)

In [42]:
%python 
dict_from_file.get('b')

In [43]:
%python 
dict_from_file.get('b')[2]

### 3.2 List of dictionaries

In [45]:
%sh cat /dbfs/mnt/datalab-datasets/file-samples/list_of_dicts.json

The following code reads the contents of the file `list_of_dicts.json` into a Python list with items that are dictionaries.

In [47]:
%python
with open("/dbfs/mnt/datalab-datasets/file-samples/list_of_dicts.json", 'r') as infile:
  dict_from_file = json.load(infile)

dict_from_file

In [48]:
%python
type(dict_from_file), dict_from_file[1]

In [49]:
%python
dict_from_file[1].get('d')

## 4. More realistic examples

The three subsections below demonstrate three ways in which tables are stored in JSON formatted files. 
These three ways are distinguished by the format of JSON data stored in the file to be read. 
The JSON data is: 
1. A list of dictionaries
1. A dictionary contains a list of dictionaries
1. Each line of the file is a dictionary

In each of the three cases above, the dictionaries correspond to records of the table stored in the file.

### 4.1 Dataset from a single list of JSON records

In [53]:
%sh head -n 30 /dbfs/mnt/datalab-datasets/file-samples/one_list.json

The following code reads the contents of the file `one_list.json` into a Python list of dictionaries

In [55]:
%python
import json
with open('/dbfs/mnt/datalab-datasets/file-samples/one_list.json', 'r') as infile:
  dict_from_file = json.load(infile)
  
dict_from_file

The pandas `DataFrame` constructor can take as input a list of dictionaries. 

The keys of the dictionaries are interpreted as column names.

In [57]:
%python 
import pandas as pd 
pd.DataFrame(data=dict_from_file)

### 4.2 Dataset from single dictionary

In [59]:
%sh head -n 9 /dbfs/mnt/datalab-datasets/file-samples/one_dictionary.json

The following code reads the contents of the file `one_dictionary.json` into a Python dictionary.

Notice that the last command converts the dictionary `my_dict_from_file['data']` into a Pandas dataframe.

In [61]:
%python
with open('/dbfs/mnt/datalab-datasets/file-samples/one_dictionary.json', 'r') as infile:
  my_dict_from_file = json.load(infile)
my_dict_from_file

pd.DataFrame(data=my_dict_from_file['data'])

### 4.3 Dataset from one JSON record per line (JSONL format)

This is a very brief introduction to the _line delimited_ JSON format. For details see
- http://ndjson.org/
- http://jsonlines.org/
- [Wikipedia](https://en.wikipedia.org/wiki/JSON_streaming)

The acronyms JSONL, NDJSON and LDJSON are equivalent terms and refer to a format where each line of a file contains a single JSON record which is a dictionary and describes a record of data.

In [64]:
%sh head -n 7 /dbfs/mnt/datalab-datasets/file-samples/each_line.json

This next example introduces the `jsonlines` package (from Python).

The `open` function (in this package) reads a file in JSONL and returns a list of dictionaries.

In [66]:
%python
import pandas as pd
import jsonlines
list_of_dictionaries = list(jsonlines.open("/dbfs/mnt/datalab-datasets/file-samples/each_line.json"))
list_of_dictionaries 

In [67]:
%python
pdf_from_list_of_dictionaries = pd.DataFrame(data=list_of_dictionaries)
pdf_from_list_of_dictionaries

In [68]:
%python
pdf_from_list_of_dictionaries.info()

__The End__