**Learning Pandas with movie data**

**Goals:**

Try to retrieve data from the movie website and map the names to their IDs.

**Steps:**

0. Importing libraries

1. Example 1 (Discover)

    1.1. Request the data

    1.2. Adjust the dataframe

2. Example 2 (Genre)

    2.1. Request the data

    2.2. Manipulate the data

3. Combine Example 1 and Example 2

4. Recommend movies



---



# **0. Importing libraries**

# **1. Example 1 (Discover)**

In [32]:
import requests
import pprint
import pandas as pd
import json

## **1. Request the data**

In [33]:
# request the data
url = 'https://api.themoviedb.org/3/discover/movie?include_adult=false&include_video=false&language=en-US&page=10&sort_by=popularity.desc'

headers = {
    "accept": "application/json",
    "Authorization": "Bearer eyJhbGciOiJIUzI1NiJ9.eyJhdWQiOiI5ZGM1NTI1ZWNjMDY5MTU3NTYxMDAyOTBiNjQwMTRlYSIsInN1YiI6IjY1MWE4NjgyZWE4NGM3MDBlYjlhMWFhNCIsInNjb3BlcyI6WyJhcGlfcmVhZCJdLCJ2ZXJzaW9uIjoxfQ.H7zYH-ZAzJGEdWKk3Odl9bZXwcQhMbCfzEV3kSqK-UQ"
}

response = requests.get(url, headers=headers)
pprint.pprint(response.text)

('{"page":10,"results":[{"adult":false,"backdrop_path":"/iaMua3MLv6KJse0MEjGy6NVbqIa.jpg","genre_ids":[16,10751,35],"id":870359,"original_language":"en","original_title":"Urkel '
 'Saves Santa: The Movie!","overview":"The holiday season has arrived, and '
 'brilliant but accident-prone Steve Urkel has already ruined his local '
 'celebration by publicly humiliating a shopping mall Santa. In his attempt to '
 'make things right and score some nice points with the big guy in the North '
 'Pole, Steve creates an invention that only makes things worse. Using his big '
 'brain and even bigger heart, Steve must find the real Santa to see if '
 'together they can help the city rediscover the holiday '
 'spirit.","popularity":144.091,"poster_path":"/hSBIk1EQrt2DJHQHjYqQCHQeHRt.jpg","release_date":"2023-11-21","title":"Urkel '
 'Saves Santa: The '
 'Movie!","video":false,"vote_average":5.7,"vote_count":11},{"adult":false,"backdrop_path":"/rRcNmiH55Tz0ugUsDUGmj8Bsa4V.jpg","genre_ids":[35,10749],

In [34]:
# check if the request was successful (it will show status code 200)
if response.status_code == 200:
    # parse the json response
    data = response.json()

    # extract the list of movies from the response
    movies = data.get('results', [])

    # create a dataframe
    df = pd.DataFrame(movies)
    print('It works.')
else:
    print('Error fetching data. Status code:', response.status_code)

It works.


## **2. Adjust the dataframe**

In [35]:
# check the dataframe
df.head()

Unnamed: 0,adult,backdrop_path,genre_ids,id,original_language,original_title,overview,popularity,poster_path,release_date,title,video,vote_average,vote_count
0,False,/iaMua3MLv6KJse0MEjGy6NVbqIa.jpg,"[16, 10751, 35]",870359,en,Urkel Saves Santa: The Movie!,"The holiday season has arrived, and brilliant ...",144.091,/hSBIk1EQrt2DJHQHjYqQCHQeHRt.jpg,2023-11-21,Urkel Saves Santa: The Movie!,False,5.7,11
1,False,/rRcNmiH55Tz0ugUsDUGmj8Bsa4V.jpg,"[35, 10749]",884605,en,No Hard Feelings,"On the brink of losing her childhood home, Mad...",149.885,/gD72DhJ7NbfxvtxGiAzLaa0xaoj.jpg,2023-06-15,No Hard Feelings,False,7.1,1825
2,False,/5bJG7HaFogEqPEjRbOs8S0Szb4x.jpg,"[16, 35, 10751]",8920,en,Garfield,"Garfield, the fat, lazy, lasagna lover, has ev...",153.701,/vqwTSWNLyH55g8kBT61s2DgNYEp.jpg,2004-06-10,Garfield,False,5.7,3682
3,False,/rzdPqYx7Um4FUZeD8wpXqjAUcEm.jpg,"[18, 10749]",597,en,Titanic,101-year-old Rose DeWitt Bukater tells the sto...,138.095,/9xjZS2rlVxm8SFx8kPC3aIGCOYQ.jpg,1997-11-18,Titanic,False,7.9,23901
4,False,/xVMtv55caCEvBaV83DofmuZybmI.jpg,"[53, 28]",724209,en,Heart of Stone,An intelligence operative for a shadowy global...,154.951,/vB8o2p4ETnrfiWEgVxHmHWP9yRl.jpg,2023-08-09,Heart of Stone,False,6.9,1349


*We will drop the following columns because we will not use them: backdrop_path, id, title, video, adult and original_language.*

---



In [36]:
# drop some columns at the same time
df.drop(columns=['backdrop_path', 'id', 'title', 'video', 'adult',  'original_language'], inplace=True)

In [37]:
# check the dataframe again
df

Unnamed: 0,genre_ids,original_title,overview,popularity,poster_path,release_date,vote_average,vote_count
0,"[16, 10751, 35]",Urkel Saves Santa: The Movie!,"The holiday season has arrived, and brilliant ...",144.091,/hSBIk1EQrt2DJHQHjYqQCHQeHRt.jpg,2023-11-21,5.7,11
1,"[35, 10749]",No Hard Feelings,"On the brink of losing her childhood home, Mad...",149.885,/gD72DhJ7NbfxvtxGiAzLaa0xaoj.jpg,2023-06-15,7.1,1825
2,"[16, 35, 10751]",Garfield,"Garfield, the fat, lazy, lasagna lover, has ev...",153.701,/vqwTSWNLyH55g8kBT61s2DgNYEp.jpg,2004-06-10,5.7,3682
3,"[18, 10749]",Titanic,101-year-old Rose DeWitt Bukater tells the sto...,138.095,/9xjZS2rlVxm8SFx8kPC3aIGCOYQ.jpg,1997-11-18,7.9,23901
4,"[53, 28]",Heart of Stone,An intelligence operative for a shadowy global...,154.951,/vB8o2p4ETnrfiWEgVxHmHWP9yRl.jpg,2023-08-09,6.9,1349
5,"[12, 35, 10751, 14]",Charlie and the Chocolate Factory,A young boy wins a tour through the most magni...,147.973,/wfGfxtBkhBzQfOZw4S8IQZgrH0a.jpg,2005-07-13,7.0,14134
6,"[14, 28, 12]",Knights of the Zodiac,"When a headstrong street orphan, Seiya, in sea...",154.26,/qW4crfED8mpNDadSmMdi7ZDzhXF.jpg,2023-04-27,6.6,901
7,"[53, 28, 80]",Wrath of Man,A cold and mysterious new security guard for a...,131.527,/M7SUK85sKjaStg4TKhlAVyGlz3.jpg,2021-04-22,7.7,4739
8,"[16, 18, 10402]",BLUE GIANT,Dai Miyamoto's life is turned upside down the ...,134.264,/auQ5ExCcuuudw0tQwG03obX5Zrh.jpg,2023-02-17,9.0,6
9,"[14, 12]",Harry Potter and the Deathly Hallows: Part 2,"Harry, Ron and Hermione continue their quest t...",147.761,/c54HpQmuwXjHq2C9wmoACjxoom3.jpg,2011-07-12,8.1,19397


# **2. Example 2 (Genre)**

## **1. Request the data**

In [38]:
# request the data
url = "https://api.themoviedb.org/3/genre/movie/list?language=en"

headers = {
    "accept": "application/json",
    "Authorization": "Bearer eyJhbGciOiJIUzI1NiJ9.eyJhdWQiOiI5ZGM1NTI1ZWNjMDY5MTU3NTYxMDAyOTBiNjQwMTRlYSIsInN1YiI6IjY1MWE4NjgyZWE4NGM3MDBlYjlhMWFhNCIsInNjb3BlcyI6WyJhcGlfcmVhZCJdLCJ2ZXJzaW9uIjoxfQ.H7zYH-ZAzJGEdWKk3Odl9bZXwcQhMbCfzEV3kSqK-UQ"
}

response = requests.get(url, headers=headers)

pprint.pprint(response.text)

('{"genres":[{"id":28,"name":"Action"},{"id":12,"name":"Adventure"},{"id":16,"name":"Animation"},{"id":35,"name":"Comedy"},{"id":80,"name":"Crime"},{"id":99,"name":"Documentary"},{"id":18,"name":"Drama"},{"id":10751,"name":"Family"},{"id":14,"name":"Fantasy"},{"id":36,"name":"History"},{"id":27,"name":"Horror"},{"id":10402,"name":"Music"},{"id":9648,"name":"Mystery"},{"id":10749,"name":"Romance"},{"id":878,"name":"Science '
 'Fiction"},{"id":10770,"name":"TV '
 'Movie"},{"id":53,"name":"Thriller"},{"id":10752,"name":"War"},{"id":37,"name":"Western"}]}')


## **2. Manipulate the data**

In [39]:
# parse the json response
genre_data = response.json()
genre_data

{'genres': [{'id': 28, 'name': 'Action'},
  {'id': 12, 'name': 'Adventure'},
  {'id': 16, 'name': 'Animation'},
  {'id': 35, 'name': 'Comedy'},
  {'id': 80, 'name': 'Crime'},
  {'id': 99, 'name': 'Documentary'},
  {'id': 18, 'name': 'Drama'},
  {'id': 10751, 'name': 'Family'},
  {'id': 14, 'name': 'Fantasy'},
  {'id': 36, 'name': 'History'},
  {'id': 27, 'name': 'Horror'},
  {'id': 10402, 'name': 'Music'},
  {'id': 9648, 'name': 'Mystery'},
  {'id': 10749, 'name': 'Romance'},
  {'id': 878, 'name': 'Science Fiction'},
  {'id': 10770, 'name': 'TV Movie'},
  {'id': 53, 'name': 'Thriller'},
  {'id': 10752, 'name': 'War'},
  {'id': 37, 'name': 'Western'}]}

In [40]:
# map id to its name
genre_map = {
    i['id']: i['name']
    for i in genre_data.get('genres', [])
}
genre_map

{28: 'Action',
 12: 'Adventure',
 16: 'Animation',
 35: 'Comedy',
 80: 'Crime',
 99: 'Documentary',
 18: 'Drama',
 10751: 'Family',
 14: 'Fantasy',
 36: 'History',
 27: 'Horror',
 10402: 'Music',
 9648: 'Mystery',
 10749: 'Romance',
 878: 'Science Fiction',
 10770: 'TV Movie',
 53: 'Thriller',
 10752: 'War',
 37: 'Western'}

# **3. Combine Example 1 and Example 2**

In [41]:
# replace genre_id with genre name
df['genre_names'] = df['genre_ids'].apply(lambda x: [genre_map[i] for i in x])

In [42]:
# check the dataframe
df

Unnamed: 0,genre_ids,original_title,overview,popularity,poster_path,release_date,vote_average,vote_count,genre_names
0,"[16, 10751, 35]",Urkel Saves Santa: The Movie!,"The holiday season has arrived, and brilliant ...",144.091,/hSBIk1EQrt2DJHQHjYqQCHQeHRt.jpg,2023-11-21,5.7,11,"[Animation, Family, Comedy]"
1,"[35, 10749]",No Hard Feelings,"On the brink of losing her childhood home, Mad...",149.885,/gD72DhJ7NbfxvtxGiAzLaa0xaoj.jpg,2023-06-15,7.1,1825,"[Comedy, Romance]"
2,"[16, 35, 10751]",Garfield,"Garfield, the fat, lazy, lasagna lover, has ev...",153.701,/vqwTSWNLyH55g8kBT61s2DgNYEp.jpg,2004-06-10,5.7,3682,"[Animation, Comedy, Family]"
3,"[18, 10749]",Titanic,101-year-old Rose DeWitt Bukater tells the sto...,138.095,/9xjZS2rlVxm8SFx8kPC3aIGCOYQ.jpg,1997-11-18,7.9,23901,"[Drama, Romance]"
4,"[53, 28]",Heart of Stone,An intelligence operative for a shadowy global...,154.951,/vB8o2p4ETnrfiWEgVxHmHWP9yRl.jpg,2023-08-09,6.9,1349,"[Thriller, Action]"
5,"[12, 35, 10751, 14]",Charlie and the Chocolate Factory,A young boy wins a tour through the most magni...,147.973,/wfGfxtBkhBzQfOZw4S8IQZgrH0a.jpg,2005-07-13,7.0,14134,"[Adventure, Comedy, Family, Fantasy]"
6,"[14, 28, 12]",Knights of the Zodiac,"When a headstrong street orphan, Seiya, in sea...",154.26,/qW4crfED8mpNDadSmMdi7ZDzhXF.jpg,2023-04-27,6.6,901,"[Fantasy, Action, Adventure]"
7,"[53, 28, 80]",Wrath of Man,A cold and mysterious new security guard for a...,131.527,/M7SUK85sKjaStg4TKhlAVyGlz3.jpg,2021-04-22,7.7,4739,"[Thriller, Action, Crime]"
8,"[16, 18, 10402]",BLUE GIANT,Dai Miyamoto's life is turned upside down the ...,134.264,/auQ5ExCcuuudw0tQwG03obX5Zrh.jpg,2023-02-17,9.0,6,"[Animation, Drama, Music]"
9,"[14, 12]",Harry Potter and the Deathly Hallows: Part 2,"Harry, Ron and Hermione continue their quest t...",147.761,/c54HpQmuwXjHq2C9wmoACjxoom3.jpg,2011-07-12,8.1,19397,"[Fantasy, Adventure]"


*since we already have genre name column, so we will drop genre id column.*

---



In [43]:
# drop genre_ids column
df.drop(columns=['genre_ids'], inplace=True)

In [44]:
# show the dataframe
df

Unnamed: 0,original_title,overview,popularity,poster_path,release_date,vote_average,vote_count,genre_names
0,Urkel Saves Santa: The Movie!,"The holiday season has arrived, and brilliant ...",144.091,/hSBIk1EQrt2DJHQHjYqQCHQeHRt.jpg,2023-11-21,5.7,11,"[Animation, Family, Comedy]"
1,No Hard Feelings,"On the brink of losing her childhood home, Mad...",149.885,/gD72DhJ7NbfxvtxGiAzLaa0xaoj.jpg,2023-06-15,7.1,1825,"[Comedy, Romance]"
2,Garfield,"Garfield, the fat, lazy, lasagna lover, has ev...",153.701,/vqwTSWNLyH55g8kBT61s2DgNYEp.jpg,2004-06-10,5.7,3682,"[Animation, Comedy, Family]"
3,Titanic,101-year-old Rose DeWitt Bukater tells the sto...,138.095,/9xjZS2rlVxm8SFx8kPC3aIGCOYQ.jpg,1997-11-18,7.9,23901,"[Drama, Romance]"
4,Heart of Stone,An intelligence operative for a shadowy global...,154.951,/vB8o2p4ETnrfiWEgVxHmHWP9yRl.jpg,2023-08-09,6.9,1349,"[Thriller, Action]"
5,Charlie and the Chocolate Factory,A young boy wins a tour through the most magni...,147.973,/wfGfxtBkhBzQfOZw4S8IQZgrH0a.jpg,2005-07-13,7.0,14134,"[Adventure, Comedy, Family, Fantasy]"
6,Knights of the Zodiac,"When a headstrong street orphan, Seiya, in sea...",154.26,/qW4crfED8mpNDadSmMdi7ZDzhXF.jpg,2023-04-27,6.6,901,"[Fantasy, Action, Adventure]"
7,Wrath of Man,A cold and mysterious new security guard for a...,131.527,/M7SUK85sKjaStg4TKhlAVyGlz3.jpg,2021-04-22,7.7,4739,"[Thriller, Action, Crime]"
8,BLUE GIANT,Dai Miyamoto's life is turned upside down the ...,134.264,/auQ5ExCcuuudw0tQwG03obX5Zrh.jpg,2023-02-17,9.0,6,"[Animation, Drama, Music]"
9,Harry Potter and the Deathly Hallows: Part 2,"Harry, Ron and Hermione continue their quest t...",147.761,/c54HpQmuwXjHq2C9wmoACjxoom3.jpg,2011-07-12,8.1,19397,"[Fantasy, Adventure]"


# **4. Recommend movies**

In [49]:
# create a function that recommends movies
def movie_list(df, genre):
    # include only the rows where the 'genre_names' column contains the specified genre
    filtered_list = df[df['genre_names'].apply(lambda x: genre in x)]

    if not filtered_list.empty:
        print(f'Recommended {genre} movies ({len(filtered_list)}), ordered by vote average:')
        # rearange from high to low by vote average
        sorted_df = filtered_list.sort_values(by='vote_average', ascending=False)
        recommendations = [f"{row['original_title']} --- {row['vote_average']} --- {row['release_date']}" for index, row in sorted_df.iterrows()]
        return recommendations

    else:
        print(f'No movies of genre {genre} found in the list, but I can recommend something else.')
        sorted_df = df.sort_values(by='vote_average', ascending=False)
        return sorted_df[['original_title', 'vote_average', 'release_date']]

In [53]:
# call function
movie_list(df, 'Adventure')

Recommended Adventure movies (8), ordered by vote average:


['Harry Potter and the Deathly Hallows: Part 2 --- 8.1 --- 2011-07-12',
 'The Bad Guys --- 7.6 --- 2022-03-17',
 'Avatar --- 7.6 --- 2009-12-15',
 'Charlie and the Chocolate Factory --- 7.0 --- 2005-07-13',
 'Iron Man 2 --- 6.8 --- 2010-04-28',
 'The Polar Express --- 6.7 --- 2004-11-10',
 'Knights of the Zodiac --- 6.6 --- 2023-04-27',
 'Die Schule der magischen Tiere 2 --- 6.2 --- 2022-09-29']

In [54]:
# call function
movie_list(df, 'Horror')

No movies of genre Horror found in the list, but I can recommend something else.


Unnamed: 0,original_title,vote_average,release_date
8,BLUE GIANT,9.0,2023-02-17
9,Harry Potter and the Deathly Hallows: Part 2,8.1,2011-07-12
3,Titanic,7.9,1997-11-18
14,The Batman,7.7,2022-03-01
7,Wrath of Man,7.7,2021-04-22
17,Avatar,7.6,2009-12-15
16,The Bad Guys,7.6,2022-03-17
11,流浪地球2,7.2,2023-01-22
1,No Hard Feelings,7.1,2023-06-15
15,Silent Night,7.1,2023-11-30
