# Data Import - Working with Web APIs and JSON (Movies Dataset)

## Importing Data from JSON files 

1. __Import__ the json files __blockbusters.json__, __blockbusters2.json__, __blockbusters3.json__ and load the datasets into Pandas DataFrames.


In [None]:
#JSON - Standard format to transfer data (through Web APIs)
#JSON - not necessarily tabular data
#often complex/nested data structures


In [None]:
import pandas as pd
import json
import requests
pd.options.display.max_columns=30

In [None]:
with open("blockbusters.json") as f:
    data=json.load(f)

In [None]:
data

In [None]:
type(data)

In [None]:
len(data)

In [None]:
data[0]

In [None]:
df=pd.DataFrame(data) #converts a dictionary into pandas data frame
df

In [None]:
#directly load JSON files into pandas
df=pd.read_json("blockbusters.json")
df

In [None]:
df.info()

In [None]:
df["genres"]

In [None]:
df["genres"][0]

In [None]:
df["belongs_to_collection"]

In [None]:
df["belongs_to_collection"][0]

In [None]:
#normalizes or flatten JSON file into a flat table -- when loading into pandas 
pd.json_normalize(data=data,sep="_")
#belongs_to_collection column is flatten into four columns

In [None]:
pd.json_normalize(data=data,sep="_").genres[0] #genres column have not changed 

In [None]:
pd.json_normalize(data=data,record_path="genres") #to normalize genres column

In [None]:
pd.json_normalize(data=data,record_path="genres",meta="title")

In [None]:
pd.json_normalize(data=data,record_path="genres",meta=["title","id"],record_prefix="genre_")
#this will distinguish id from the original table and id from genres column

In [None]:
#three orientation in JSON file --> record, column, split
#blockbusters.json --> has a list, each and every list is one row -- record orientation
#blockbusters2.json --> has columns orientation
#blockbusters3.json --> has split orientation. first, we have column level, row level then data level
# record orientation is best for loading into pandas

## Importing blockbusters2.json

In [None]:
with open("blockbusters2.json") as f:
    data2=json.load(f)

In [None]:
data2

In [None]:
type(data2)

In [None]:
list(data2)

In [None]:
len(data2)

In [None]:
df2=pd.DataFrame(data2)
df2

In [None]:
df2=pd.read_json("blockbusters2.json")
df2

In [None]:
df2.info()

In [None]:
df2["genres"]

In [None]:
df2["belongs_to_collection"]

In [None]:
#columns orientation cannot work with pd.json_normalization

## Importing blockbusters3.json

In [None]:
with open("blockbusters3.json") as f:
    data3=json.load(f)

In [None]:
data3

In [None]:
type(data3)

In [None]:
len(data3)

In [None]:
#df3=pd.DataFrame(data3)
#it wont work because it is split oriented dataset
df3=pd.read_json("blockbusters3.json",orient="split")
df3
#in this case , pd.json_normalize wont work 

In [None]:
df3.info()

## Working with APIs and JSON (Part 1)

2. __Create an account__ on https://www.themoviedb.org/

3. Get your personal __API Key__

4. __API-Request__ (movie module): Load all available information for the movie with __movie id = 140607__ into a Pandas DataFrame. <br> See https://developers.themoviedb.org/3/movies/get-movie-details for more information

In [None]:
api_key="                      "
#api_key="api_key=insert api_key here"

In [None]:
movie_id=140607

In [None]:
movie_api="https://api.themoviedb.org/3/movie/{}?"
movie_api

In [None]:
url=movie_api.format(movie_id)+api_key
url

In [None]:
r=requests.get(url) #response 200 means data is received -- everything is OK
r

In [None]:
data4=r.json()  #returns the json encoded content of a response
data4

In [None]:
type(data4)

In [None]:
pd.Series(data4)

In [None]:
df4=pd.Series(data4).to_frame().T
df4   #T stands for transpose dataframe

In [None]:
pd.json_normalize(data4,sep="_") #flatten/normalize some columns

In [None]:
pd.json_normalize(data=data4,record_path="genres",meta="title")

In [None]:
pd.json_normalize(data=data4,record_path="production_companies",meta="title")

API-Request (discover module): Load all movies with release date between 2020-01-01 and 2020-02-29 into a Pandas DataFrame.
See https://www.themoviedb.org/documentation/api/discover and https://developers.themoviedb.org/3/discover/movie-discover for more information.

In [None]:
discover_api="https://api.themoviedb.org/3/discover/movie?"

In [None]:
query="&primary_release_date.gte=2020-01-01&primary_release_date.lte=2020-02-29"

In [None]:
url=discover_api+api_key+query
url

In [None]:
data5=requests.get(url)
data5

In [None]:
data5=requests.get(url).json()

In [None]:
data5

In [None]:
pd.DataFrame(data5)

In [None]:
pd.DataFrame(data5["results"])

In [None]:
##second part

In [None]:
query="&primary_release_date.gte=2020-01-01&primary_release_date.lte=2020-02-29&page=2"

In [None]:
url=discover_api+api_key+query

In [None]:
data6=requests.get(url).json()

In [None]:
data6

In [None]:
pd.DataFrame(data6)

In [None]:
pd.DataFrame(data6["results"])

## Importing and storing the movies database

6. __API-Request__ (movie module): Load all available information for the movies with movie id = [__299534, 19995, 140607, 299536, 597, 135397, 420818, 24428, 168259, 99861, 284054, 12445, 181808, 330457, 351286, 109445, 321612, 260513__] into a Pandas DataFrame and __save the dataset in a local json file__.

In [None]:
movie_id=[0,299534, 19995, 140607, 299536, 597, 135397, 420818, 24428, 168259, 99861, 284054, 12445, 181808, 330457, 351286, 109445, 321612, 260513]
# 18 valid id and one invalid id 0 (that doesnt exist)

In [None]:
basic_url="https://api.themoviedb.org/3/movie/{}?{}"
#two place holder for movie id and api key

In [None]:
#for every movie in movie_id, get the json data and store it in json_list
#for every movie, replace the placeholder, by the respective movie_id and the api key
# r is the response object
# 200 is desired code means data is available
#if the movie_id exists, we get the data for that movie_id in the json format
#finally convert the complete json_list into a data frame
json_list=[]
for movie in movie_id:
    url=basic_url.format(movie,api_key)
    r=requests.get(url)
    if r.status_code!=200:
        continue
    else:
        data=r.json()
        json_list.append(data)
df_movies=pd.DataFrame(json_list)

In [None]:
requests.get(basic_url.format(0,api_key)).status_code #get the status_code

In [None]:
df_movies

In [None]:
df_movies=df_movies.loc[:,["title","id","revenue","genres","belongs_to_collection","runtime"]].sort_values(by="revenue",ascending=False)
#sort revenues by high to low

In [None]:
df_movies

In [None]:
#how to save and store the dataset 
df_movies.to_json("movies.json",orient="records")

In [None]:
with open("movies.json") as f:
    data7=json.load(f)

In [None]:
pd.json_normalize(data7) #flatten the data

In [None]:
pd.json_normalize(data7,"genres","title")  #flatten genres column

## Importing and storing the movies database [Real world Scenario]

In [None]:
df_movies

In [None]:
#save it in the csv files
df_movies.to_csv("movies_raw.csv",index=False)

In [None]:
df_movies_raw=pd.read_csv("movies_raw.csv")
df_movies_raw

In [None]:
#nested column genres
df_movies_raw["genres"]

In [None]:
df_movies_raw["genres"][0]

In [None]:
print("The End")