#  Loading data from json files using `pandas` 

## What is a `JSON` file?

`JSON` is a file format that encodes data structures as strings to ensure readability by a computer. It stands for **J**ava**S**cript **O**bject **N**otation and is often used to store data that is *unstructured*, i.e. the data is not organized in a pre-defined manner.

In this notebook, we are going to take a look at how to read a `JSON` file into `Python` and explore its contents. We'll then compare this to using `pandas` to read in the same file.

### Table of Contents

1. [Reading the `JSON` file with `pandas`](#json_pandas)
1. [Reading a `JSON` file with `json` (optional)](#file_read)
1. [Exploring the file contents with `json` (optional)](#file_explore)

----

### About the Data 

**Located:** `/dsa/data/all_datasets/inc5000_2016.json`

This is a file containing data on the fastest growing private companies in the United States. The file was obtained through a data sharing website called [`Data.World`](https://data.world) and the file can be found [here](http://www.inc.com/inc5000list/json/inc5000_2016.json).

Attribute | Description
----------|------------
`city`      | company's city
`company`   | company name
`growth  `  | growth of the company
`id      `  | company file id 
`industry`  | type of company
`metro   `  | metro area that the company belongs to
`rank    `  | rank in terms of growth
`revenue `  | revenue of the company
`state_l `  | state abbreviation
`state_s `  | state name
`url     `  | company website
`workers `  | number of workers
`yrs_on_list`| how long it has been on this list
`

<a id='json_pandas'></a>
## Reading a `JSON` file with `pandas`

The `json` library isn't the only library capable of handling the `JSON` format. 
`pandas` can also handle `JSON` and it reads the data into the easy-to-read data frame object. 
Not only is it more visually appealing for humans, but the data frame also provides flexible data manipulation capabilities, the ability to story records of differing data types (i.e. `string`, `int`), and intuitive indexing. 
Also, reading in the file is as simple as calling the `read_json()` method and passing the file as the argument.

In [None]:
import pandas as pd

df = pd.read_json('/dsa/data/all_datasets/inc5000_2016.json')

df.head()

And now we have a data frame where the column headers are the keys to the data and each record becomes a row in the frame.

All things you have learned or will learn using pandas data frames are now available on the data.

<a id='file_read'></a>
## Reading a `JSON` file with `json` (optional)

`Python` has a library made for reading in `JSON` files; it is called ** `json` **. 
The `json` library comes with a method called `.load()`, which we will use to read in our `inc5000_2016.json` file. 

In [None]:
import json

with open('/dsa/data/all_datasets/inc5000_2016.json') as f:
    file = json.load(f)
    print(file)

Notice how we used the `with open(...) as` syntax. 
What this does is:
  1. opens the file, 
  2. process its contents, and then 
  3. closes it after the final function. 
In this case, it opens ** '../../../datasets/inc5000_2016.json'**, loads it into memory with the `json.load()` method by assigning it to the variable `file`, and then finally, prints out the `file` contents.

At the moment, it looks rather messy, but let's take a closer look at its contents.

<a id='file_explore'></a>
## Exploring the file contents with `json` (optional)

In the previous code block, we read in the contents of the **inc5000_2016.json** file and saved it to a variable called, `file`. 
We can now access different components of the `file` variable, but first we should better understand we are working with. 
Let's see what type of object the `file` is as well as the items within the `file` object.

In [None]:
file_type = file.__class__.__name__ # find the object type of 'file'
item_type = file[0].__class__.__name__ # find the object type of the items in 'file'

print("the object, 'file', is a {} and its items are {}s".format(
        file_type,item_type))

Now we see we are working with a list of dictionaries, which means that we can access a record by specifying an index of the object `file`, which is essentially the line number. 
Let's take a look at the first three records...

In [None]:
file[0:3]

In the output above, each new record is encapsulated by curly brackets **`{  }`**, and is therefore a Dict data structure.
We can see that in each record, data are stored in key-value pairs where the key is the attribute and the value is the assigned value for that record. 
Keep in mind that we can access an attributes value by referencing the key of that particular record. 
Imagine that we wanted to grab the value of the "metro" attribute of the first record.

In [None]:
file[0]['metro']

In the above example, `file[0]` is a reference to the dictionary object that is our first record in the file, i.e., conceptually the first line of the file. 
By specifying `['metro']` on this object, we are referencing the key of the `file[0]` dictionary, which returns the value for "metro" for the first record. 

# Save your notebook, then `File > Close and Halt`