<a href="https://codeimmersives.com"><img src = "https://www.codeimmersives.com/wp-content/uploads/2019/09/CodeImmersives_Logo_RGB_NYC_BW.png" width = 400> </a>


<h1 align=center><font size = 5>Agenda</font></h1>

### 
<div class="alert alert-block alert-info" style="margin-top: 20px">

1.  [Review -json](#0)<br>
2.  [json normalize](#2)<br>    
3.  [Exercise](#3)<br> 
</div>
<hr>

<h2>Review</h2>
json

A useful tool for dealing with JSON files is [here](https://www.freeformatter.com/json-escape.html#ad-output).
<code>
https://www.freeformatter.com/json-escape.html#ad-output
</code>
You can transform your json formatted files if you're having trouble parsing them.

A better tool for dealing with JSON files is [here](https://jsonformatter.org/json-pretty-print).
<code>
https://jsonformatter.org/json-pretty-print
</code>
You can transform your json formatted files if you're having trouble parsing them.

<h2>More complex json file formats</h2>
You will have to parse more difficult json file formats when dealing<br>
with output from a graphQL query. To tackle this issue we have to <br>
revert to using pandas and its json_normalize method.<br>
For example you are getting movie data from one of the movie<br>
data apis (ie. www.imdb.com, www.tmdb.com).  The file format <br>
might look like the following:<br>
    <code>
    {
  "data": {
    "human": {
      "name": "Luke Skywalker",
      "height": 1.72
    }
  }
}
    </code>

In [None]:
import json

movie_info = """{
  "data": {
    "human": {
      "name": "Luke Skywalker",
      "height": 1.72
    }
  }
}"""
print(movie_info)
print(type(movie_info))
res = json.loads(movie_info)
print(res)
print(type(res))   # <--- Returns a dictionary
print('*'*35)

<h2>Exercise</h2>
How would we extract the name and height values from this<br>
dictionary?<br>
The only way is to parse into the layers of the data structure<br>

In [None]:
# Place your code here



<h2>pandas to the rescue!</h2>
We can use the <b>json_normalize</b> method to flatten the json data<br>
so that we can get to the information we want.

In [None]:
import pandas as pd

res = json.loads(movie_info)
df = pd.json_normalize(res)
print(df)

To change the column heading to an underscore instead of a period<br>
we use the: sep = '_' optional kwarg

In [None]:
import pandas as pd

res = json.loads(movie_info)
df = pd.json_normalize(res,sep="_")
print(df)

<h2>Exercise</h2>
Flatten the following json data and replace the periods with <br>
underscores for the column names:<br>
<code>
data = [{'id': 1,
        'name': "Ralph Reed",'fitness': {'height': 70, 'weight': 200}},
        {'name': "Ayn Rand",'fitness': {'height': 66, 'weight': 140}},
    {'id': 2, 'name': 'Rachel Baker','fitness': 
    {'height': 62, 'weight': 120}}]
</code>

<br>
<br>
<br>





<b>The key to the solution was replacing the single quotes with double quotes!!</b>

In [None]:
import json
import pandas as pd

data = """[{'id': 1,
        'name': "Ralph Reed",'fitness': {'height': 70, 'weight': 200}},
        {'name': "Ayn Rand",'fitness': {'height': 66, 'weight': 140}},
    {'id': 2, 'name': 'Rachel Baker','fitness': 
    {'height': 62, 'weight': 120}}]"""



We can use the max_level kwarg to flatten the data to a <br>
certain level.  The additional levels, if any, remain in dictionary format

In [None]:
import json
import pandas as pd

data = """[{'id': 1,
        'name': "Ralph Reed",'fitness': {'height': 70, 'weight': 200}},
        {'name': "Ayn Rand",'fitness': {'height': 66, 'weight': 140}},
    {'id': 2, 'name': 'Rachel Baker','fitness': 
    {'height': 62, 'weight': 120}}]"""



If the data has imbeded lists that contain other dictionaries<br>
we have to use another approach to extract the data.<br>
In this case the data did not change after max_level = 1

In [None]:
import json
import pandas as pd

data = """[{'state': 'Florida',
    'shortname': 'FL',
    'info': {'governor': 'Rick Scott'},
    'counties': [
        {'name': 'Dade', 'population': 12345},
        {'name': 'Broward', 'population': 40000},
        {'name': 'Palm Beach', 'population': 60000}]},
    {'state': 'Ohio',
    'shortname': 'OH',
    'info': {'governor': 'John Kasich'},
    'counties': [
        {'name': 'Summit', 'population': 1234},
        {'name': 'Cuyahoga', 'population': 1337}]},
    {'state': 'New York',
    'shortname': 'NY',
    'info': {'governor': 'Andrew Cuuomo'},
    'counties': [
        {'name': 'Kings', 'population': 3200},
        {'name': 'New York', 'population': 2700}]}       
       ]"""
data = data.replace("'",'"')    # <--- Remember double quotes only
res = json.loads(data)
df = pd.json_normalize(res, sep = "_", max_level = 0)  # Flatten to level 0
print(df)
print('='*35)
df = pd.json_normalize(res, sep = "_", max_level = 1)  # Flatten to level 1
print(df)
print('='*35)
df = pd.json_normalize(res, sep = "_", max_level = 4)  # Flatten to level 4
print(df)

Let's examine the solution below<br>
1 - The first level keys are state,shortname, counties<br>
2 - The 2nd level has a dictionary key of 'info' and a key of 'governor'<br>
3 - The counties have a list with embeded dictionary values<br>
NOTE: we wrapped the column names inside a list to extract the values<br>

In [None]:
import json
import pandas as pd

data = """[
    {'state': 'Florida',
    'shortname': 'FL',
        'info': {'governor': 'Rick Scott'},
        'counties': [
            {'name': 'Dade', 'population': 12345},
            {'name': 'Broward', 'population': 40000},
            {'name': 'Palm Beach', 'population': 60000}]},
    {'state': 'Ohio',
    'shortname': 'OH',
        'info': {'governor': 'John Kasich'},
        'counties': [
            {'name': 'Summit', 'population': 1234},
            {'name': 'Cuyahoga', 'population': 1337}]},
    {'state': 'New York',
    'shortname': 'NY',
        'info': {'governor': 'Andrew Cuuomo'},
        'counties': [
            {'name': 'Kings', 'population': 3200},
            {'name': 'New York', 'population': 2700}]}       
       ]"""
data = data.replace("'",'"')    # <--- Remember double quotes only
res = json.loads(data)
df = pd.json_normalize(res, 'counties',\
                       ['state', 'shortname',['info', 'governor']])  # Flatten to level 0
print(df)
print('='*35)

# print(res)
df = pd.json_normalize(res, 'counties',['state', 'shortname'])  # Flatten to level 0
print(df)
print('='*35)

In [None]:
import json
import pandas as pd

data = """{
  "data": {
    "Comparison": {
      "name": "Luke Skywalker",
      "appearsIn": [
        "NEWHOPE",
        "EMPIRE",
        "JEDI"
      ],
      "friends": [
        {
          "name": "Han Solo"
        },
        {
          "name": "Leia Organa"
        },
        {
          "name": "C-3PO"
        },
        {
          "name": "R2-D2"
        }
      ]
    },
    "Comparison1": {
      "name": "R2-D2",
      "appearsIn": [
        "NEWHOPE",
        "EMPIRE",
        "JEDI"
      ],
      "friends": [
        {
          "name": "Luke Skywalker"
        },
        {
          "name": "Han Solo"
        },
        {
          "name": "Leia Organa"
        }
      ]
    }
  }
}"""

res = json.loads(data)
print(res['data'])
df = pd.json_normalize(res['data'])  # Flatten to level 0
print(df)
print('='*35)
df = pd.json_normalize(res['data'],max_level = 1, sep = '_')  # Flatten to level 0
print(df)
print('='*70)
print(type(res['data']['Comparison'].values()))
df = pd.json_normalize(res['data']['Comparison'],'appearsIn',['name'],sep = '_')  
print(df)
print('='*70)

<h2>Exercise</h2>
Use the file nyc_phil.txt<br>
Create a dataframe with useful aspects of the json file<br>
step 1 - go to https://jsonformatter.org/json-pretty-print and paste the data<br>
click the 'make pretty' button<br>
step 2 - Click on the 'tree view' on the right hand side of the window.<br>
step 3 - Explore the levels<br>
step 4 - for paramters use the kwargs - record_path = 'path'<br>
In our case explore individually - 'works' and  'concerts'<br>
step 5 - To include the flat data before the records use the kwargs - 'meta = [col1,col2,..]'<br>
<code>
meta=['id', 'orchestra','programID', 'season']
</code>


In [6]:
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
import pandas as pd

# flask app to set up the db 
app = Flask(__name__)
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
# specifiying the database engine and filename
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///log.db'
# creating database instance
db = SQLAlchemy(app)


# class/table model for storing data to database
class Day(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    Date = db.Column(db.String(20))
    Symbol = db.Column(db.String(20))
    Open = db.Column(db.Float)
    High = db.Column(db.Float)
    Low = db.Column(db.Float)
    Close = db.Column(db.Float)
    Volume = db.Column(db.Float)
    Adj_Close = db.Column(db.Float)

    def __repr__(self):
        return f"{self.Date}"

# create database and its tables
db.create_all()

# retrieve csv data
crap = pd.read_csv('nyc_phil.txt')
soloist_data = pd.json_normalize(data=crap['programs'], record_path=['works', 'soloists'], meta=['id'])
soloist_data.head(3)
print(days)
# rename the column with whitespace in the name
# days.rename(columns={'Adj Close':'Adj_Close'}, inplace=True)
# make the indices dates for viewability,
#  Transpose the table 
# days = days.set_index('Date').T.to_dict()

# for day in days:
#     days[day]["Date"] = day
#     new_day = Day(**days[day])
#     db.session.add(new_day)
#     db.session.commit()

Empty DataFrame
Columns: [{"programs": [{"season": "1842-43",  "orchestra": "New York Philharmonic",  "concerts": [{"Date": "1842-12-07T05:00:00Z",  "eventType": "Subscription Season",  "Venue": "Apollo Rooms",  "Location": "Manhattan,  NY",  "Time": "8:00PM"}],  "programID": "3853",  "works": [{"workTitle": "SYMPHONY NO. 5 IN C MINOR,  OP.67",  "conductorName": "Hill,  Ureli Corelli",  "ID": "52446*",  "soloists": [],  "composerName": "Beethoven,   Ludwig  van"},  {"workTitle": "OBERON",  "composerName": "Weber,   Carl  Maria Von",  "conductorName": "Timm,  Henry C.",  "ID": "8834*4",  "soloists": [{"soloistName": "Otto,  Antoinette",  "soloistRoles": "S",  "soloistInstrument": "Soprano"}],  "movement": "\"Ozean,  du Ungeheuer\" (Ocean,  thou mighty monster),  Reiza (Scene and Aria),  Act II"},  {"workTitle": "QUINTET,  PIANO,  D MINOR,  OP. 74",  "ID": "3642*",  "soloists": [{"soloistName": "Scharfenberg,  William",  "soloistRoles": "A",  "soloistInstrument": "Piano"},  {"soloistName

The code block labelled 'concert level data' is done a different way below

This notebook is part of a course at www.codeimmersives.com called **Python Flask and Django**. If you accessed this notebook outside the course, 
you can get more information about this course online by clicking [here](https://www.codeimmersives.com/programs/python-aws/).

<hr>

Copyright &copy; 2021