# INTRODUCTION

My final project will involve a data analysis of popular movies and their ratings. With it, I hope to shed some light into how well different genres of movies do in the box office and how they are rated by moviegoers, so that others can make an informed decision about whether or not they want to see a movie or not. As the entertainment industry continues to grow the choices of content become more vast. I hope that my analysis will help narrow down this choice when choosing how to spend your free time and your money. As many people do not have an excess of money to spend, when they do save enough to enjoy some entertainment, it would be helpful to know which kinds are better than others, so they don’t waste their money on something boring or bad. The way I plan to approach gathering my data is to use the TMDB (The Movie Database) API. The Movie Database is a popular website where anyone can make an account and post a review about a movie they have seen. They also collect data on the cast and crew of each movie to request information about their highest rated movies as well as the top grossing ones. I can then clean the data and plot it, so that those who see my project can come to conclusions for how best to choose an entertaining movie.


In [1]:
import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import requests

- **re**: This will be used to clean the data involving strings
- **numpy**: This will be used for cleaning the data and indexing
- **pandas**: This will be used for creating dataframes and doing operations on them
- **matplotlib**: This will be used largely as a basis for other modules and maybe some minor plotting
- **seaborn**: This will be used for generating visualizations and plots with the data
- **requests**: This will be used for making requests to the API used for collecting the data

In [2]:
api_key = 'd50077fa2408f994e445df2feb74cf4b'
token_maybe = 'eyJhbGciOiJIUzI1NiJ9.eyJhdWQiOiJkNTAwNzdmYTI0MDhmOTk0ZTQ0NWRmMmZlYjc0Y2Y0YiIsInN1YiI6IjY2MTcyNjU4ZDhmNDRlMDE3YzJlMDljZiIsInNjb3BlcyI6WyJhcGlfcmVhZCJdLCJ2ZXJzaW9uIjoxfQ.Jb7YFyCLG2UgfGAmvhjaS6s5prcre3dy8iwMwvURkf0'

In [3]:
imdb = pd.read_csv('imdb_top_1000.csv')
imdb.head()

Unnamed: 0,Poster_Link,Series_Title,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Overview,Meta_score,Director,Star1,Star2,Star3,Star4,No_of_Votes,Gross
0,https://m.media-amazon.com/images/M/MV5BMDFkYT...,The Shawshank Redemption,1994,A,142 min,Drama,9.3,Two imprisoned men bond over a number of years...,80.0,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,2343110,28341469
1,https://m.media-amazon.com/images/M/MV5BM2MyNj...,The Godfather,1972,A,175 min,"Crime, Drama",9.2,An organized crime dynasty's aging patriarch t...,100.0,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,1620367,134966411
2,https://m.media-amazon.com/images/M/MV5BMTMxNT...,The Dark Knight,2008,UA,152 min,"Action, Crime, Drama",9.0,When the menace known as the Joker wreaks havo...,84.0,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,2303232,534858444
3,https://m.media-amazon.com/images/M/MV5BMWMwMG...,The Godfather: Part II,1974,A,202 min,"Crime, Drama",9.0,The early life and career of Vito Corleone in ...,90.0,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,1129952,57300000
4,https://m.media-amazon.com/images/M/MV5BMWU4N2...,12 Angry Men,1957,U,96 min,"Crime, Drama",9.0,A jury holdout attempts to prevent a miscarria...,96.0,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,689845,4360000


In [4]:
imdb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Poster_Link    1000 non-null   object 
 1   Series_Title   1000 non-null   object 
 2   Released_Year  1000 non-null   object 
 3   Certificate    899 non-null    object 
 4   Runtime        1000 non-null   object 
 5   Genre          1000 non-null   object 
 6   IMDB_Rating    1000 non-null   float64
 7   Overview       1000 non-null   object 
 8   Meta_score     843 non-null    float64
 9   Director       1000 non-null   object 
 10  Star1          1000 non-null   object 
 11  Star2          1000 non-null   object 
 12  Star3          1000 non-null   object 
 13  Star4          1000 non-null   object 
 14  No_of_Votes    1000 non-null   int64  
 15  Gross          831 non-null    object 
dtypes: float64(2), int64(1), object(13)
memory usage: 125.1+ KB


In [7]:
imdb.drop(columns=['Overview', 'Poster_Link', 'No_of_Votes'], inplace=True)
imdb.head()

Unnamed: 0,Series_Title,Released_Year,Certificate,Runtime,Genre,IMDB_Rating,Meta_score,Director,Star1,Star2,Star3,Star4,Gross
0,The Shawshank Redemption,1994,A,142 min,Drama,9.3,80.0,Frank Darabont,Tim Robbins,Morgan Freeman,Bob Gunton,William Sadler,28341469
1,The Godfather,1972,A,175 min,"Crime, Drama",9.2,100.0,Francis Ford Coppola,Marlon Brando,Al Pacino,James Caan,Diane Keaton,134966411
2,The Dark Knight,2008,UA,152 min,"Action, Crime, Drama",9.0,84.0,Christopher Nolan,Christian Bale,Heath Ledger,Aaron Eckhart,Michael Caine,534858444
3,The Godfather: Part II,1974,A,202 min,"Crime, Drama",9.0,90.0,Francis Ford Coppola,Al Pacino,Robert De Niro,Robert Duvall,Diane Keaton,57300000
4,12 Angry Men,1957,U,96 min,"Crime, Drama",9.0,96.0,Sidney Lumet,Henry Fonda,Lee J. Cobb,Martin Balsam,John Fiedler,4360000


In [8]:
imdb.columns

Index(['Series_Title', 'Released_Year', 'Certificate', 'Runtime', 'Genre',
       'IMDB_Rating', 'Meta_score', 'Director', 'Star1', 'Star2', 'Star3',
       'Star4', 'Gross'],
      dtype='object')