# Project 2: Data Import - Working with Web APIs and JSON (Movies Dataset)

# Project Brief for Self-Coders

Here you´ll have the opportunity to code major parts of Project 2 on your own. If you need any help or inspiration, have a look at the Videos or the Jupyter Notebook with the full code. <br> <br>
Keep in mind that it´s all about __getting the right results/conclusions__. It´s not about finding the identical code. Things can be coded in many different ways. Even if you come to the same conclusions, it´s very unlikely that we have the very same code. 

## Importing Data from JSON files 

1. __Import__ the json files __blockbusters.json__, __blockbusters2.json__, __blockbusters3.json__ and load the datasets into Pandas DataFrames.


## Working with APIs and JSON (Part 1)

2. __Create an account__ on https://www.themoviedb.org/

3. Get your personal __API Key__

4. __API-Request__ (movie module): Load all available information for the movie with __movie id = 140607__ into a Pandas DataFrame. <br> See https://developers.themoviedb.org/3/movies/get-movie-details for more information

## Working with APIs and JSON (Part 2)

5. __API-Request__ (discover module): Load all movies with __release date between 2020-01-01 and 2020-02-29__ into a Pandas DataFrame. <br>
See https://www.themoviedb.org/documentation/api/discover and https://developers.themoviedb.org/3/discover/movie-discover for more information.

##  Importing and Saving the Movies Dataset (Best Practice)

6. __API-Request__ (movie module): Load all available information for the movies with movie id = [__299534, 19995, 140607, 299536, 597, 135397, 420818, 24428, 168259, 99861, 284054, 12445, 181808, 330457, 351286, 109445, 321612, 260513__] into a Pandas DataFrame and __save the dataset in a local json file__.

# +++++++++ See some Hints below +++++++++++++

# ++++++++++++++ Hints +++++++++++++++++++++

__Hints for 1.__ <br>
To load json files you can use 

In [None]:
with open("filename.json") as f:
    data = json.load(f)

and 

In [None]:
pd.DataFrame(data), pd.read_json(filename.json), pd.json_normalize(data)

the json files have the following orientation (important when using pd.read_json()):
- blockbusters.json -> record
- blockbusters2.json -> column
- blockbusters3.json -> split 

In [3]:
import pandas as pd

In [120]:
df1 = pd.read_json('blockbusters.json')
df1.head()

Unnamed: 0,title,id,revenue,genres,belongs_to_collection,runtime
0,Avengers: Endgame,299534,2797800564,"[{'id': 12, 'name': 'Adventure'}, {'id': 878, ...","{'id': 86311, 'name': 'The Avengers Collection...",181
1,Avatar,19995,2787965087,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 87096, 'name': 'Avatar Collection', 'po...",162
2,Star Wars: The Force Awakens,140607,2068223624,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 10, 'name': 'Star Wars Collection', 'po...",136
3,Avengers: Infinity War,299536,2046239637,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...","{'id': 86311, 'name': 'The Avengers Collection...",149
4,Titanic,597,1845034188,"[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'n...",,194


In [117]:
def extract_genre(row):
    genre_df = pd.DataFrame(row['genres']).applymap(lambda x:str(x))
    row['genre_id'] = genre_df.id.str.cat(sep='|')
    row['genre_name'] = genre_df.name.str.cat(sep='|')
    return row

def extract_belongs_to_collection(row):
#     print(type(row))
    belongs_to_collection = row['belongs_to_collection']
    if belongs_to_collection != None:
        row['btc_id'] = belongs_to_collection['id']
        row['btc_name'] = belongs_to_collection['name']
        row['btc_poster_path'] = belongs_to_collection['poster_path']
        row['btc_backdrop_path'] = belongs_to_collection['backdrop_path']
    return row

In [121]:
df1 = df1.apply(extract_genre,axis=1)
df1 = df1.apply(extract_belongs_to_collection,axis=1)
df1.drop(columns=['genres','belongs_to_collection'],inplace= True)
df1.head()

Unnamed: 0,btc_backdrop_path,btc_id,btc_name,btc_poster_path,genre_id,genre_name,id,revenue,runtime,title
0,/zuW6fOiusv4X9nnW3paHGfXcSll.jpg,86311.0,The Avengers Collection,/yFSIUVTCvgYrpalUktulvk3Gi5Y.jpg,12|878|28,Adventure|Science Fiction|Action,299534,2797800564,181,Avengers: Endgame
1,/8nCr9W7sKus2q9PLbYsnT7iCkuT.jpg,87096.0,Avatar Collection,/nslJVsO58Etqkk17oXMuVK4gNOF.jpg,28|12|14|878,Action|Adventure|Fantasy|Science Fiction,19995,2787965087,162,Avatar
2,/d8duYyyC9J5T825Hg7grmaabfxQ.jpg,10.0,Star Wars Collection,/iTQHKziZy9pAAY4hHEDCGPaOvFC.jpg,28|12|878|14,Action|Adventure|Science Fiction|Fantasy,140607,2068223624,136,Star Wars: The Force Awakens
3,/zuW6fOiusv4X9nnW3paHGfXcSll.jpg,86311.0,The Avengers Collection,/yFSIUVTCvgYrpalUktulvk3Gi5Y.jpg,12|28|878,Adventure|Action|Science Fiction,299536,2046239637,149,Avengers: Infinity War
4,,,,,18|10749|53,Drama|Romance|Thriller,597,1845034188,194,Titanic


In [40]:
df2 = pd.read_json('blockbusters2.json')
df2.head()

Unnamed: 0,title,id,revenue,genres,belongs_to_collection,runtime
0,Avengers: Endgame,299534,2797800564,"[{'id': 12, 'name': 'Adventure'}, {'id': 878, ...","{'id': 86311, 'name': 'The Avengers Collection...",181
1,Avatar,19995,2787965087,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 87096, 'name': 'Avatar Collection', 'po...",162
2,Star Wars: The Force Awakens,140607,2068223624,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 10, 'name': 'Star Wars Collection', 'po...",136
3,Avengers: Infinity War,299536,2046239637,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...","{'id': 86311, 'name': 'The Avengers Collection...",149
4,Titanic,597,1845034188,"[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'n...",,194


In [50]:
import json
with open("blockbusters3.json") as f:
    data = json.load(f)
    
df = pd.DataFrame(data['data'])
df.columns = data['columns']
df.index = data['index']
df.head()

Unnamed: 0,title,id,revenue,genres,belongs_to_collection,runtime
0,Avengers: Endgame,299534,2797800564,"[{'id': 12, 'name': 'Adventure'}, {'id': 878, ...","{'id': 86311, 'name': 'The Avengers Collection...",181
1,Avatar,19995,2787965087,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 87096, 'name': 'Avatar Collection', 'po...",162
2,Star Wars: The Force Awakens,140607,2068223624,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 10, 'name': 'Star Wars Collection', 'po...",136
3,Avengers: Infinity War,299536,2046239637,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...","{'id': 86311, 'name': 'The Avengers Collection...",149
4,Titanic,597,1845034188,"[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'n...",,194


In [53]:

with open("blockbusters.json") as f:
    data = json.load(f)

In [57]:
data

[{'title': 'Avengers: Endgame',
  'id': 299534,
  'revenue': 2797800564,
  'genres': [{'id': 12, 'name': 'Adventure'},
   {'id': 878, 'name': 'Science Fiction'},
   {'id': 28, 'name': 'Action'}],
  'belongs_to_collection': {'id': 86311,
   'name': 'The Avengers Collection',
   'poster_path': '/yFSIUVTCvgYrpalUktulvk3Gi5Y.jpg',
   'backdrop_path': '/zuW6fOiusv4X9nnW3paHGfXcSll.jpg'},
  'runtime': 181},
 {'title': 'Avatar',
  'id': 19995,
  'revenue': 2787965087,
  'genres': [{'id': 28, 'name': 'Action'},
   {'id': 12, 'name': 'Adventure'},
   {'id': 14, 'name': 'Fantasy'},
   {'id': 878, 'name': 'Science Fiction'}],
  'belongs_to_collection': {'id': 87096,
   'name': 'Avatar Collection',
   'poster_path': '/nslJVsO58Etqkk17oXMuVK4gNOF.jpg',
   'backdrop_path': '/8nCr9W7sKus2q9PLbYsnT7iCkuT.jpg'},
  'runtime': 162},
 {'title': 'Star Wars: The Force Awakens',
  'id': 140607,
  'revenue': 2068223624,
  'genres': [{'id': 28, 'name': 'Action'},
   {'id': 12, 'name': 'Adventure'},
   {'id': 8

__Hints for 4., 5., 6.__<br>
Make API GET-requests with the library requests (import requests):

In [None]:
data = requests.get(url).json()

__Hints for 4. and 6.,__ <br> url structure for movie module:

"https://api.themoviedb.org/3/movie/insert_movie_id?api_key=insert_api_key" (replace "insert_movie_id" with movie id and "insert_api_key" with your personal api-key)

__Hints for 5.__<br>
url structure for discover module:

"https://api.themoviedb.org/3/discover/movie?api_key=insert_api_key&query1&query2..." (replace "insert_api_key" with your personal api-key and add appropriate queries)