# JSON

## Introduction

In this lesson, you'll continue investigating new formats for data. Specifically, you'll investigate one of the most popular data formats for the web: JSON files.

## Objectives
You will be able to:

* Describe features of the JSON format and the Python `json` module
* Use Python to load and parse JSON documents


## JSON Format

JSON stands for JavaScript Object Notation. Similar to CSV, JSON is a **plain text** data format. However the structure of JSON — based on the syntax of JavaScript — is more complex.

Here's a brief preview of a JSON file:  

<img src="images/json_preview.png" width="850">

As you can see, JSON is not a tabular format with one set of rows and one set of columns. JSON files are often nested in a hierarchical structure and will have data structures analogous to Python dictionaries and lists. Here's all of the built-in supported data types in JSON and their counterparts in Python: 

<img src="images/json_python_datatypes.png" width=500>


## `json` Module

In theory we could write our own custom code to split that string on `{`, `"`, `:` etc. and parse the contents of the file into the appropriate Python data structures.

Instead, we'll go ahead and use a pre-built Python module designed for this purpose. It will give us a powerful starting point for accessing and manipulating the data in JSON files. This module is called `json`.

You can find full documentation for this module [here](https://docs.python.org/3/library/json.html).

To use the `json` module, start by importing it:

> In the cell below, use the `import` keyword to import the `json` package. To do this, type the following code:

<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>CLICK to Reveal Code</u></b>
    </summary>
    <pre><code language="python">import json
    </code></pre>
</details>

In [1]:
# replace this comment with the code to import the json library.

### `json.load`

To load data from a JSON file, you first open the file using Python's built-in `open` function. Then you pass the file object to the `json.load` function, which returns a Python object representing the contents of the file.

In the cell below, we open the campaign finance JSON file previewed above:

In [None]:
# run this cell with no changes to load the json file
with open('nyc_2001_campaign_finance.json') as f:
    data = json.load(f)
print(type(data))

---
#### Expected Output
<pre><code>&lt;class 'dict'&gt;
</code></pre>

---

As you can see, this loaded the data as a dictionary. You can begin to investigate the contents of a JSON file by using our traditional Python methods.

### Parsing a JSON File

Since we have a dictionary, check its keys:

> In the cell below, use the `keys()` method to display the keys of the `json` file we just created with the name `data`:

<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>CLICK to Reveal Code</u></b>
    </summary>
    <pre><code language="python">data.keys()
    </code></pre>
</details>

In [None]:
# replace this comment with the code to display the keys of the json file, data, that we just created

---
#### Expected Output
<pre><code>dict_keys(['meta', 'data'])
</code></pre>

---

Investigate what data types are stored within the values associated with those keys:

> In the cell below, type the code to parse through the `keys` of `data` and print the data type using the `type()` function in a for loop.

<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>CLICK to Reveal Code</u></b>
    </summary>
    <pre><code language="python">for v in data.values():
    print(type(v))
    or
    print([type(v) for v in data.values()])
    </code></pre>
</details>

In [None]:
# replace this comment with the code to parse through the keys and print the data type of each

---
#### Expected Output
<pre><code>with a for loop:
&lt;class 'dict'&gt;
&lt;class 'list'&gt;
or
with list comprehension
[&lt;class 'dict'&gt;, &lt;class 'list'&gt;]
</code></pre>

---

#### Parsing Metadata

Then we can dig a level deeper. What are the keys of the nested dictionary?
> In the cell below, use the `keys()` method to get the keys of the nested dictionary `meta` within the json file `data`. To do this type the following code:

<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>CLICK to Reveal Code</u></b>
    </summary>
    <pre><code language="python">data['meta'].keys()
    </code></pre>
</details>

In [None]:
# replace this comment with the code to display the keys of the nested dictionary meta within the data json

---
#### Expected Output
<pre><code>dict_keys(['view'])
</code></pre>

---

And what is the type of the value associated with that key?
> In the cell below, use the `type()` function to display the data type of `view` in the nested dictionary `meta`. To do this type the following code:

<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>CLICK to Reveal Code</u></b>
    </summary>
    <pre><code language="python">type(data['meta']['view'])
    </code></pre>
</details>

In [None]:
# replace this comment with the code to display the data type of view in the nested dictionary meta.

---
#### Expected Output
<pre><code>dict
</code></pre>

---


Again, what are the keys of that twice-nested dictionary?
> In the cell below, use the `keys()` method to display the keys of the twice-nested dictionary `view`.  To do this, type the following code:

<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>CLICK to Reveal Code</u></b>
    </summary>
    <pre><code language="python">data['meta']['view'].keys()
    </code></pre>
</details>

In [None]:
# replace this comment with the code to display the keys of the twice-nested dictionary view

---
#### Expected Output
<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>Click to Expand Complete Output</u></b>
    </summary>
    <pre><code language="python">dict_keys(['id', 'name', 'attribution', 'averageRating', 'category', 'createdAt',
            'description', 'displayType', 'downloadCount', 'hideFromCatalog', 'hideFromDataJson',
            'indexUpdatedAt', 'newBackend', 'numberOfComments', 'oid', 'provenance',
            'publicationAppendEnabled', 'publicationDate', 'publicationGroup', 'publicationStage',
            'rowClass', 'rowsUpdatedAt', 'rowsUpdatedBy', 'tableId', 'totalTimesRated',
            'viewCount', 'viewLastModified', 'viewType', 'columns', 'grants', 'metadata', 'owner',
            'query', 'rights', 'tableAuthor', 'tags', 'flags'])
    </code></pre>
</details>

---

That is a lot of keys! One way we might try to view all of that information is using the `pandas` package to make a table.

> In the cell below, type the following code to create a dataframe using `data['meta']['view'].keys()` as the index, and `data['meta']['view'].values()` as the data:

<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>CLICK to Reveal Code</u></b>
    </summary>
    <pre><code language="python">import pandas as pd
pd.set_option("max_colwidth", 120)
pd.DataFrame(
    data=data['meta']['view'].values(),
    index=data['meta']['view'].keys(),
    columns=["value"]
    </code></pre>
</details>

In [None]:
# replace this comment with the code to create a dataframe from data

---
#### Expected Output
<pre><code>a pandas DataFrame with 37 rows and 2 columns
</code></pre>
<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>Click to Expand Complete Output</u></b>
    </summary>
    <img src="images/dataframe.png">
</details>

---

So, it looks like the information under the `meta` key is essentially all of the metadata about the dataset, including the category, attribution, tags, etc.

Now let's look at the main data.

#### Parsing Data

This time, let's look at the value associated with the `data` key. Recall that we previously identified that this had a `list` data type, so let's look at the length:
> In the cell below, use the `len()` method to display the length of the contents of the `data` key.  To do this, type the following code:

<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>CLICK to Reveal Code</u></b>
    </summary>
    <pre><code language="python">len(data['data'])
    </code></pre>
</details>

In [None]:
# replace this comment with the code to display the length of the contents of the data key.

---
#### Expected Output
<pre><code>285
</code></pre>

---

Now let's look at a couple different values:
> In the cell below type the code to display the value of the item at the index of `0` in the nested dictionary `data`

<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>CLICK to Reveal Code</u></b>
    </summary>
    <pre><code language="python">data['data'][0]
    </code></pre>
</details>

In [None]:
# replace this comment with the code to display the value of the item with the index of 0

---
#### Expected Output
<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>Click to Expand Complete Output</u></b>
    </summary>
    <pre><code language="python">[1,
 'E3E9CC9F-7443-43F6-94AF-B5A0F802DBA1',
 1,
 1315925633,
 '392904',
 1315925633,
 '392904',
 '{\n  "invalidCells" : {\n    "1519001" : "TOTALPAY",\n    "1518998" : "PRIMARYPAY",\n    "1519000" : "RUNOFFPAY",\n    "1518999" : "GENERALPAY",\n    "1518994" : "OFFICECD",\n    "1518996" : "OFFICEDIST",\n    "1518991" : "ELECTION"\n  }\n}',
 None,
 'CANDID',
 'CANDNAME',
 None,
 'OFFICEBORO',
 None,
 'CANCLASS',
 None,
 None,
 None,
 None]
    </code></pre>
</details>

---

Let's take a look at the next item in this nested dictionary at the index of `1`.  Use the same code from the previous example replacing the index value with `1`

In [None]:
# replace this comment with the code to display the value of the item at the index of 1

---
#### Expected Output
<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>Click to Expand Complete Output</u></b>
    </summary>
    <pre><code language="python">[2,
 '9D257416-581A-4C42-85CC-B6EAD9DED97F',
 2,
 1315925633,
 '392904',
 1315925633,
 '392904',
 '{\n}',
 '2001',
 'B4',
 'Aboulafia, Sandy',
 '5',
 None,
 '44',
 'P',
 '45410.00',
 '0',
 '0',
 '45410.00']
    </code></pre>
</details>

---

Let's take a look at the next item in this nested dictionary at the index of `2`.  Use the same code from the previous example replacing the index value with `2`

In [None]:
# replace this comment with the code to display the value of the item at the index of 2

---
#### Expected Output
<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>Click to Expand Complete Output</u></b>
    </summary>
    <pre><code language="python">[3,
 'B80D7891-93CF-49E8-86E8-182B618E68F2',
 3,
 1315925633,
 '392904',
 1315925633,
 '392904',
 '{\n}',
 '2001',
 '445',
 'Adams, Jackie R',
 '5',
 None,
 '7',
 'P',
 '11073.00',
 '0',
 '0',
 '11073.00']
    </code></pre>
</details>

---

This looks more like some kind of tabular data, where the first (`0`-th) row is some kind of header. Again, let's use pandas to make this into a more-readable table format:
> In the cell below, construct a dataframe with pandas from the `data` key. To do this type the following code:

<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>CLICK to Reveal Code</u></b>
    </summary>
    <pre><code language="python">pd.DataFrame(data['data'])
    </code></pre>
</details>

In [None]:
# replace this comment with the code to create a dataframe from the data key

---
#### Expected Output
<pre><code>a pandas dataframe with 285 rows and 19 columns
</code></pre>
<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>Click to Expand Complete Output</u></b>
    </summary>
    <img src="images/data_df.png">
</details>

---

We still have some work to do to understand what all of the columns are supposed to mean, but now we have a general sense of what the data looks like.

### Extracting a Value from a JSON File

Now, let's say that our task is:

> Extract the description of the dataset

We know from our initial exploration that this JSON file contains `meta` and `data`, and that `meta` has this kind of high-level information whereas `data` has the actual records relating to campaign finance.

Let's look at the keys of the twice nested dictionary`view` again:

> In the cell below, type the following code to display the keys for the nested dictionary `view`.

```
data['meta']['view'].keys()
```

In [None]:
# replace this comment with the code to display the keys for the nested dictionary view.

---
#### Expected Output
<pre><code>a list of the keys in data > meta > view
</code></pre>
<details>
    <summary style="cursor: pointer; display: inline">
        <b><u>Click to Expand Complete Output</u></b>
    </summary>
    <pre><code language="python">dict_keys(['id', 'name', 'attribution', 'averageRating', 'category', 'createdAt', 'description', 'displayType', 'downloadCount', 'hideFromCatalog', 'hideFromDataJson', 'indexUpdatedAt', 'newBackend', 'numberOfComments', 'oid', 'provenance', 'publicationAppendEnabled', 'publicationDate', 'publicationGroup', 'publicationStage', 'rowClass', 'rowsUpdatedAt', 'rowsUpdatedBy', 'tableId', 'totalTimesRated', 'viewCount', 'viewLastModified', 'viewType', 'columns', 'grants', 'metadata', 'owner', 'query', 'rights', 'tableAuthor', 'tags', 'flags'])
    </code></pre>
</details>

---

Ok, `description` is the 7th one! Let's pull the value associated with the `description` key:

> In the cell below type the following code to display the value associated the `description` key.

```
data['meta']['view']['description']
```

In [None]:
# replace this comment with the code to display the value associated with the description key

---
#### Expected Output
<pre><code>'A listing of public funds payments for candidates for City office during the 2001 election cycle'
</code></pre>

---

Great! This is the general process you will use when extracting information from a JSON file.

## Summary
As you can see, there's a lot going on here with the deeply nested structure of JSON data files. In the upcoming lab, you'll get a chance to practice loading files and continuing to parse and extract the data as you did here.