<h1 align='center'> Movie Genre Predictor</h1>


## Overview

Welcome to the Movie Genre Predictor project! This project aims to predict movie genres based on various features extracted from movie data obtained from The Movie Database (TMDB). The goal is to develop a machine learning model that accurately predicts the genre of a movie given its title, overview.

## Business Problem

In the realm of movie production and recommendation systems, accurately predicting movie genres is crucial for various applications such as content recommendation, marketing strategies, and audience targeting. This project addresses the challenge of predicting movie genres effectively using machine learning techniques.

In [141]:
# Import necessery libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns

### Data Acquisition

In [142]:
from credientials import api_key

In [143]:
import requests
response = requests.get(f"https://api.themoviedb.org/3/movie/top_rated?api_key={api_key}&language=en-US&page=1")
response.status_code

200

In [144]:
response.json()

{'page': 1,
 'results': [{'adult': False,
   'backdrop_path': '/kXfqcdQKsToO0OUXHcrrNCHDBzO.jpg',
   'genre_ids': [18, 80],
   'id': 278,
   'original_language': 'en',
   'original_title': 'The Shawshank Redemption',
   'overview': 'Framed in the 1940s for the double murder of his wife and her lover, upstanding banker Andy Dufresne begins a new life at the Shawshank prison, where he puts his accounting skills to work for an amoral warden. During his long stretch in prison, Dufresne comes to be admired by the other inmates -- including an older prisoner named Red -- for his integrity and unquenchable sense of hope.',
   'popularity': 126.841,
   'poster_path': '/9cqNxx0GxF0bflZmeSMuL5tnGzr.jpg',
   'release_date': '1994-09-23',
   'title': 'The Shawshank Redemption',
   'video': False,
   'vote_average': 8.704,
   'vote_count': 25817},
  {'adult': False,
   'backdrop_path': '/tmU7GeKVybMWFButWEGl2M4GeiP.jpg',
   'genre_ids': [18, 80],
   'id': 238,
   'original_language': 'en',
   'orig

In [145]:
response.json()["results"]

[{'adult': False,
  'backdrop_path': '/kXfqcdQKsToO0OUXHcrrNCHDBzO.jpg',
  'genre_ids': [18, 80],
  'id': 278,
  'original_language': 'en',
  'original_title': 'The Shawshank Redemption',
  'overview': 'Framed in the 1940s for the double murder of his wife and her lover, upstanding banker Andy Dufresne begins a new life at the Shawshank prison, where he puts his accounting skills to work for an amoral warden. During his long stretch in prison, Dufresne comes to be admired by the other inmates -- including an older prisoner named Red -- for his integrity and unquenchable sense of hope.',
  'popularity': 126.841,
  'poster_path': '/9cqNxx0GxF0bflZmeSMuL5tnGzr.jpg',
  'release_date': '1994-09-23',
  'title': 'The Shawshank Redemption',
  'video': False,
  'vote_average': 8.704,
  'vote_count': 25817},
 {'adult': False,
  'backdrop_path': '/tmU7GeKVybMWFButWEGl2M4GeiP.jpg',
  'genre_ids': [18, 80],
  'id': 238,
  'original_language': 'en',
  'original_title': 'The Godfather',
  'overview':

In [146]:
movies = pd.DataFrame(response.json()["results"])[['genre_ids','title','overview']]
movies.sample(10)

Unnamed: 0,genre_ids,title,overview
8,"[35, 53, 18]",Parasite,"All unemployed, Ki-taek's family takes peculia..."
12,"[12, 14, 28]",The Lord of the Rings: The Return of the King,Aragorn is revealed as the heir to the ancient...
3,"[18, 36, 10752]",Schindler's List,The true story of how businessman Oskar Schind...
6,"[35, 18, 10749]",Dilwale Dulhania Le Jayenge,"Raj is a rich, carefree, happy-go-lucky second..."
14,[37],"The Good, the Bad and the Ugly",While the Civil War rages on between the Union...
11,"[53, 80]",Pulp Fiction,"A burger-loving hit man, his philosophical par..."
15,"[18, 80]",GoodFellas,"The true story of Henry Hill, a half-Irish, ha..."
2,"[18, 80]",The Godfather Part II,In the continuing saga of the Corleone crime f...
9,"[14, 18, 80]",The Green Mile,A supernatural tale set on death row in a Sout...
19,"[28, 18]",Seven Samurai,A samurai answers a village's request for prot...


In [147]:
movies['overview'].values

array(['Framed in the 1940s for the double murder of his wife and her lover, upstanding banker Andy Dufresne begins a new life at the Shawshank prison, where he puts his accounting skills to work for an amoral warden. During his long stretch in prison, Dufresne comes to be admired by the other inmates -- including an older prisoner named Red -- for his integrity and unquenchable sense of hope.',
       'Spanning the years 1945 to 1955, a chronicle of the fictional Italian-American Corleone crime family. When organized crime family patriarch, Vito Corleone barely survives an attempt on his life, his youngest son, Michael steps in to take care of the would-be killers, launching a campaign of bloody revenge.',
       'In the continuing saga of the Corleone crime family, a young Vito Corleone grows up in Sicily and in 1910s New York. In the 1950s, Michael Corleone attempts to expand the family business into Las Vegas, Hollywood and Cuba.',
       'The true story of how businessman Oskar Sc

In [148]:
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

In [149]:
movies = pd.DataFrame()
for page_no in range(1, 464):
    response = requests.get(f"https://api.themoviedb.org/3/movie/top_rated?api_key={api_key}&language=en-US&page={page_no}")
    tmp_df = pd.DataFrame(response.json()["results"])[['genre_ids', 'title', 'overview']]
    movies = movies.append(tmp_df, ignore_index=True)
movies.head()

Unnamed: 0,genre_ids,title,overview
0,"[18, 80]",The Shawshank Redemption,Framed in the 1940s for the double murder of h...
1,"[18, 80]",The Godfather,"Spanning the years 1945 to 1955, a chronicle o..."
2,"[18, 80]",The Godfather Part II,In the continuing saga of the Corleone crime f...
3,"[18, 36, 10752]",Schindler's List,The true story of how businessman Oskar Schind...
4,"[16, 10751, 14]",Spirited Away,"A young girl, Chihiro, becomes trapped in a st..."


In [150]:
movies.shape

(9251, 3)

In [151]:
response = requests.get(f"https://api.themoviedb.org/3/genre/movie/list?api_key={api_key}&language=en-US")

In [152]:
response.json()

{'genres': [{'id': 28, 'name': 'Action'},
  {'id': 12, 'name': 'Adventure'},
  {'id': 16, 'name': 'Animation'},
  {'id': 35, 'name': 'Comedy'},
  {'id': 80, 'name': 'Crime'},
  {'id': 99, 'name': 'Documentary'},
  {'id': 18, 'name': 'Drama'},
  {'id': 10751, 'name': 'Family'},
  {'id': 14, 'name': 'Fantasy'},
  {'id': 36, 'name': 'History'},
  {'id': 27, 'name': 'Horror'},
  {'id': 10402, 'name': 'Music'},
  {'id': 9648, 'name': 'Mystery'},
  {'id': 10749, 'name': 'Romance'},
  {'id': 878, 'name': 'Science Fiction'},
  {'id': 10770, 'name': 'TV Movie'},
  {'id': 53, 'name': 'Thriller'},
  {'id': 10752, 'name': 'War'},
  {'id': 37, 'name': 'Western'}]}

In [153]:
response.json()['genres']

[{'id': 28, 'name': 'Action'},
 {'id': 12, 'name': 'Adventure'},
 {'id': 16, 'name': 'Animation'},
 {'id': 35, 'name': 'Comedy'},
 {'id': 80, 'name': 'Crime'},
 {'id': 99, 'name': 'Documentary'},
 {'id': 18, 'name': 'Drama'},
 {'id': 10751, 'name': 'Family'},
 {'id': 14, 'name': 'Fantasy'},
 {'id': 36, 'name': 'History'},
 {'id': 27, 'name': 'Horror'},
 {'id': 10402, 'name': 'Music'},
 {'id': 9648, 'name': 'Mystery'},
 {'id': 10749, 'name': 'Romance'},
 {'id': 878, 'name': 'Science Fiction'},
 {'id': 10770, 'name': 'TV Movie'},
 {'id': 53, 'name': 'Thriller'},
 {'id': 10752, 'name': 'War'},
 {'id': 37, 'name': 'Western'}]

In [154]:
genres = pd.DataFrame(response.json()['genres'])
genres

Unnamed: 0,id,name
0,28,Action
1,12,Adventure
2,16,Animation
3,35,Comedy
4,80,Crime
5,99,Documentary
6,18,Drama
7,10751,Family
8,14,Fantasy
9,36,History


In [162]:
genres.to_csv('Datasets/Genres.csv')

In [155]:
def merge_genres(genre_ids):
    genre_names = []
    for genre_id in genre_ids:
        genre_name = genres[genres['id'] == genre_id]['name'].values[0]
        genre_names.append(genre_name)
    return ','.join(genre_names)

movies['genres'] = movies['genre_ids'].apply(merge_genres)

In [156]:
movies.head()

Unnamed: 0,genre_ids,title,overview,genres
0,"[18, 80]",The Shawshank Redemption,Framed in the 1940s for the double murder of h...,"Drama,Crime"
1,"[18, 80]",The Godfather,"Spanning the years 1945 to 1955, a chronicle o...","Drama,Crime"
2,"[18, 80]",The Godfather Part II,In the continuing saga of the Corleone crime f...,"Drama,Crime"
3,"[18, 36, 10752]",Schindler's List,The true story of how businessman Oskar Schind...,"Drama,History,War"
4,"[16, 10751, 14]",Spirited Away,"A young girl, Chihiro, becomes trapped in a st...","Animation,Family,Fantasy"


In [157]:
movies.drop(columns=['genre_ids'], inplace=True)
movies.head()

Unnamed: 0,title,overview,genres
0,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,"Drama,Crime"
1,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...","Drama,Crime"
2,The Godfather Part II,In the continuing saga of the Corleone crime f...,"Drama,Crime"
3,Schindler's List,The true story of how businessman Oskar Schind...,"Drama,History,War"
4,Spirited Away,"A young girl, Chihiro, becomes trapped in a st...","Animation,Family,Fantasy"


In [158]:
movies.to_csv('Datasets/TMDB_Movies.csv')

In [160]:
movies = pd.read_csv('Datasets/TMDB_Movies.csv')[['title', 'overview','genres']]
movies.shape

(9251, 3)

In [161]:
movies.head()

Unnamed: 0,title,overview,genres
0,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,"Drama,Crime"
1,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...","Drama,Crime"
2,The Godfather Part II,In the continuing saga of the Corleone crime f...,"Drama,Crime"
3,Schindler's List,The true story of how businessman Oskar Schind...,"Drama,History,War"
4,Spirited Away,"A young girl, Chihiro, becomes trapped in a st...","Animation,Family,Fantasy"
