# Exploring and Transforming JSON Schemas

# Introduction

In this lesson, you'll formalize how to explore a JSON file whose structure and schema is unknown to you. This often happens in practice when you are handed a file or stumble upon one with little documentation.

## Objectives
You will be able to:
* Use the JSON module to load and parse JSON documents
* Load and explore unknown JSON schemas
* Convert JSON to a pandas dataframe

## Loading the JSON file

Load the data from the file `disease_data.json`.

In [2]:
import json
import pandas as pd


data = json.load(open('disease_data.json'))

## Explore the first and second levels of the schema hierarchy

In [5]:
data.keys()

dict_keys(['meta', 'data'])

In [7]:
data['meta']['view']

{'id': 'g4ie-h725',
 'name': 'U.S. Chronic Disease Indicators (CDI)',
 'attribution': 'Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Population Health',
 'attributionLink': 'http://www.cdc.gov/nccdphp/dph/',
 'averageRating': 0,
 'category': 'Chronic Disease Indicators',
 'createdAt': 1463517008,
 'description': "CDC's Division of Population Health provides cross-cutting set of 124 indicators that were developed by consensus and that allows states and territories and large metropolitan areas to uniformly define, collect, and report chronic disease data that are important to public health practice and available for states, territories and large metropolitan areas. In addition to providing access to state-specific indicator data, the CDI web site serves as a gateway to additional information and data resources.",
 'displayType': 'table',
 'downloadCount': 80068,
 'hideFromCatalog': False,
 'hideFromDataJson': 

In [9]:
len(data['data'])

60266

## Convert to a DataFrame

Create a DataFrame from the JSON file. Be sure to retrive the column names for the dataframe. (Search within the 'meta' key of the master dictionary.) The DataFrame should include all 42 columns.

In [20]:
df = pd.DataFrame(data['data'])

df.columns = [col_dict['name'].lower() for col_dict in data['meta']['view']['columns']]

## Level-Up
## Create a bar graph of states with the highest asthma rates for adults age 18+

In [26]:
df[df.topic == 'Asthma']

Unnamed: 0,sid,id,position,created_at,created_meta,updated_at,updated_meta,meta,yearstart,yearend,...,locationid,topicid,questionid,datavaluetypeid,stratificationcategoryid1,stratificationid1,stratificationcategoryid2,stratificationid2,stratificationcategoryid3,stratificationid3
4725,4726,786EA689-97C8-45C7-B733-9CF01D8AEB62,4726,1527194522,959778,1527194522,959778,,2016,2016,...,17,AST,AST1_1,CRDPREV,GENDER,GENM,,,,
5529,5530,AC33E8A2-F507-48D5-B02C-9179EDC425E3,5530,1527194522,959778,1527194522,959778,,2016,2016,...,18,AST,AST1_1,CRDPREV,GENDER,GENM,,,,
5632,5633,1E855D58-2A98-44E2-A062-AE1E8A2F7DB6,5633,1527194522,959778,1527194522,959778,,2016,2016,...,19,AST,AST1_1,CRDPREV,GENDER,GENM,,,,
6777,6778,D300D76F-6293-4C41-B47F-AB8A93426EE2,6778,1527194522,959778,1527194522,959778,,2016,2016,...,20,AST,AST1_1,CRDPREV,GENDER,GENM,,,,
7034,7035,5868F7F7-82F1-4D72-A144-767DFA87D581,7035,1527194522,959778,1527194522,959778,,2016,2016,...,21,AST,AST1_1,CRDPREV,GENDER,GENM,,,,
7102,10245,6A49F874-7FD9-42E6-B7F0-50BA588CA63F,10245,1527194523,959778,1527194523,959778,,2016,2016,...,27,AST,AST1_2,CRDPREV,OVERALL,OVR,,,,
7337,7337,0B99F1EA-3837-48D6-B5CD-70F6BD38AD8D,7337,1527194523,959778,1527194523,959778,,2016,2016,...,22,AST,AST1_1,CRDPREV,GENDER,GENM,,,,
7428,7428,99E47B23-41EC-4FC6-8215-C15BDD4025B5,7428,1527194523,959778,1527194523,959778,,2016,2016,...,23,AST,AST1_1,CRDPREV,GENDER,GENM,,,,
7499,7499,A698381D-30F8-4F11-8A2F-036B228BEBE0,7499,1527194523,959778,1527194523,959778,,2016,2016,...,24,AST,AST1_1,CRDPREV,GENDER,GENM,,,,
7966,7966,838ED10F-2F25-45FE-8FA0-80BF7F0722FC,7966,1527194523,959778,1527194523,959778,,2016,2016,...,50,AST,AST1_1,CRDPREV,GENDER,GENM,,,,


## Summary

Well done! In this lab you got some extended practice exploring the structure of JSON files, converting json files to pandas DataFrame, and visualizing data!