# Capstone Breakdown Group 1A

**Computing Vision (a made-up company for the purposes of this project) sees all the big companies creating original video content and they want to get in on the fun. They have decided to create a new movie studio, but they don’t have much background in creating movies. You are charged with exploring what types of films are currently doing the best at the box office using different samples of available data. You then will translate those findings into actionable insights that the head of Computing Vision's new movie studio can use to help decide what type of films to create.**

## EDA process

Our project is divided in the following two umbrella categories:

PROFIT:

- Budget v Revenue
- Genre vs. Revenue 
- Popularity v Revenue
- Foreign/Domestic Results vs. Revenue  

POPULARITY: X

- Director vs. Popularity/Voter Avg. x
- Genre vs. Popularity/Voter Avg. 
- Domestic/International vs. Popularity/Voter Avg.  

## Specific insight

We want to gain insights from our data on the following specific points. We are performing EDA to be able to answer questions regarding:

- Revenue compared to rating of the film (critics and audience) x
- I.P and foreign/domestic revenue x
- Original language to revenue 
- Market: domestic/ global
- Writers and directors to revenue

## Import Packages

In [1]:
# Import packages

import numpy as np
import pandas as pd
import sqlite3
import seaborn as sns
import matplotlib.pyplot as plt 
import itertools
%matplotlib inline

## Read In Data

In [2]:
# Read data sets

rtDF = pd.read_csv("Data/rt.movie_info.tsv", sep="\t") #Rotten Tomatoes Movies
rtDF_reviews = pd.read_csv("Data/rt.reviews.tsv", sep="\t", encoding = "latin_1") #Rotten Tomatoes Reviews
bomDF = pd.read_csv("Data/bom.movie_gross.csv") #Box Office Mojo Database
tmdbDF = pd.read_csv("Data/tmdb.movies.csv",index_col=0) #The MovieDB
tnmDF = pd.read_csv("Data/tn.movie_budgets.csv") #The Numbers

conn = sqlite3.connect('Data/im.db')

## Preview Data

In [3]:
#Visualize rotten tomatoes
print(rtDF.info())
rtDF.head()


#Visualize rotten tomatoes reviews
print(rtDF_reviews.info())
rtDF_reviews.head()


#Visualize Box office mojo
print(bomDF.info())
bomDF.head()


#Visualize the movieDB
print(tmdbDF.info())
tmdbDF.head()


#Visualize the numbers
print(tnmDF.info())
tnmDF.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1560 entries, 0 to 1559
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   id            1560 non-null   int64 
 1   synopsis      1498 non-null   object
 2   rating        1557 non-null   object
 3   genre         1552 non-null   object
 4   director      1361 non-null   object
 5   writer        1111 non-null   object
 6   theater_date  1201 non-null   object
 7   dvd_date      1201 non-null   object
 8   currency      340 non-null    object
 9   box_office    340 non-null    object
 10  runtime       1530 non-null   object
 11  studio        494 non-null    object
dtypes: int64(1), object(11)
memory usage: 146.4+ KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54432 entries, 0 to 54431
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          54432 non-null  int64 
 1   review      48

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross
0,1,"Dec 18, 2009",Avatar,"$425,000,000","$760,507,625","$2,776,345,279"
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,"$410,600,000","$241,063,875","$1,045,663,875"
2,3,"Jun 7, 2019",Dark Phoenix,"$350,000,000","$42,762,350","$149,762,350"
3,4,"May 1, 2015",Avengers: Age of Ultron,"$330,600,000","$459,005,868","$1,403,013,963"
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,"$317,000,000","$620,181,382","$1,316,721,747"


## Cleaning Up Data

### Rotten Tomatoes DF Cleaning

In [4]:
# 
rtDF.drop(columns=['currency','box_office','studio','synopsis','dvd_date'],inplace=True)

In [5]:
rtDF.head()

Unnamed: 0,id,rating,genre,director,writer,theater_date,runtime
0,1,R,Action and Adventure|Classics|Drama,William Friedkin,Ernest Tidyman,"Oct 9, 1971",104 minutes
1,3,R,Drama|Science Fiction and Fantasy,David Cronenberg,David Cronenberg|Don DeLillo,"Aug 17, 2012",108 minutes
2,5,R,Drama|Musical and Performing Arts,Allison Anders,Allison Anders,"Sep 13, 1996",116 minutes
3,6,R,Drama|Mystery and Suspense,Barry Levinson,Paul Attanasio|Michael Crichton,"Dec 9, 1994",128 minutes
4,7,NR,Drama|Romance,Rodney Bennett,Giles Cooper,,200 minutes


In [6]:
def splits(string):
    string = str(string)
    return string.split("|")
rtDF["genre"] = rtDF["genre"].apply(splits)
rtDF['genre'] = rtDF['genre'].astype('object')
rtDF

Unnamed: 0,id,rating,genre,director,writer,theater_date,runtime
0,1,R,"[Action and Adventure, Classics, Drama]",William Friedkin,Ernest Tidyman,"Oct 9, 1971",104 minutes
1,3,R,"[Drama, Science Fiction and Fantasy]",David Cronenberg,David Cronenberg|Don DeLillo,"Aug 17, 2012",108 minutes
2,5,R,"[Drama, Musical and Performing Arts]",Allison Anders,Allison Anders,"Sep 13, 1996",116 minutes
3,6,R,"[Drama, Mystery and Suspense]",Barry Levinson,Paul Attanasio|Michael Crichton,"Dec 9, 1994",128 minutes
4,7,NR,"[Drama, Romance]",Rodney Bennett,Giles Cooper,,200 minutes
...,...,...,...,...,...,...,...
1555,1996,R,"[Action and Adventure, Horror, Mystery and Sus...",,,"Aug 18, 2006",106 minutes
1556,1997,PG,"[Comedy, Science Fiction and Fantasy]",Steve Barron,Terry Turner|Tom Davis|Dan Aykroyd|Bonnie Turner,"Jul 23, 1993",88 minutes
1557,1998,G,"[Classics, Comedy, Drama, Musical and Performi...",Gordon Douglas,,"Jan 1, 1962",111 minutes
1558,1999,PG,"[Comedy, Drama, Kids and Family, Sports and Fi...",David Mickey Evans,David Mickey Evans|Robert Gunter,"Apr 1, 1993",101 minutes


### Rotten Tomatoes Review DF Cleaning

In [7]:
rtDF_reviews.info()
rtDF_reviews.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 54432 entries, 0 to 54431
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          54432 non-null  int64 
 1   review      48869 non-null  object
 2   rating      40915 non-null  object
 3   fresh       54432 non-null  object
 4   critic      51710 non-null  object
 5   top_critic  54432 non-null  int64 
 6   publisher   54123 non-null  object
 7   date        54432 non-null  object
dtypes: int64(2), object(6)
memory usage: 3.3+ MB


Unnamed: 0,id,review,rating,fresh,critic,top_critic,publisher,date
0,3,A distinctly gallows take on contemporary fina...,3/5,fresh,PJ Nabarro,0,Patrick Nabarro,"November 10, 2018"
1,3,It's an allegory in search of a meaning that n...,,rotten,Annalee Newitz,0,io9.com,"May 23, 2018"
2,3,... life lived in a bubble in financial dealin...,,fresh,Sean Axmaker,0,Stream on Demand,"January 4, 2018"
3,3,Continuing along a line introduced in last yea...,,fresh,Daniel Kasman,0,MUBI,"November 16, 2017"
4,3,... a perverse twist on neorealism...,,fresh,,0,Cinema Scope,"October 12, 2017"
5,3,... Cronenberg's Cosmopolis expresses somethin...,,fresh,Michelle Orange,0,Capital New York,"September 11, 2017"
6,3,"Quickly grows repetitive and tiresome, meander...",C,rotten,Eric D. Snider,0,EricDSnider.com,"July 17, 2013"
7,3,Cronenberg is not a director to be daunted by ...,2/5,rotten,Matt Kelemen,0,Las Vegas CityLife,"April 21, 2013"
8,3,"Cronenberg's cold, exacting precision and emot...",,fresh,Sean Axmaker,0,Parallax View,"March 24, 2013"
9,3,Over and above its topical urgency or the bit ...,,fresh,Kong Rithdee,0,Bangkok Post,"March 4, 2013"


In [8]:
print(sum(rtDF_reviews['rating'].isna()))
# Dropping top_critic, publisher, and date columns because the information provided is not relevant to the scope of this study
rtDF_reviews.drop(columns=["top_critic","publisher","date"],inplace=True)

13517


In [9]:
#replace with binary value, 1 is fresh
rtDF_reviews['fresh'].replace(['fresh', 'rotten'], [1,0], inplace=True)

In [10]:
rtDF_reviews.groupby(['id'])['fresh'].sum()

id
3       103
5        18
6        32
8        56
10       50
       ... 
1996     96
1997     10
1998      2
1999     27
2000     18
Name: fresh, Length: 1135, dtype: int64

In [11]:
rtDF_reviews.groupby(['id', 'fresh']).size()

id    fresh
3     0         60
      1        103
5     0          5
      1         18
6     0         25
              ... 
1998  1          2
1999  0         19
      1         27
2000  0         20
      1         18
Length: 2070, dtype: int64

In [12]:
#Create df grouped by id
rtDF_grouped = rtDF_reviews.groupby(['id'])
#add column in main df that adds sum of fresh column by grouped id 
rtDF_reviews['sum_fresh'] = rtDF_grouped['fresh'].transform(sum)
#add column in main df that counts the total observations in column grouped by id
rtDF_reviews['count_fresh'] = rtDF_grouped['fresh'].transform('count')

In [13]:
#add column in main df that divides the sum of fresh reviews by the total number of reviews - gives percentages to each movie
rtDF_reviews['percentage'] = rtDF_reviews['sum_fresh'] / rtDF_reviews['count_fresh']
rtDF_reviews

Unnamed: 0,id,review,rating,fresh,critic,sum_fresh,count_fresh,percentage
0,3,A distinctly gallows take on contemporary fina...,3/5,1,PJ Nabarro,103,163,0.631902
1,3,It's an allegory in search of a meaning that n...,,0,Annalee Newitz,103,163,0.631902
2,3,... life lived in a bubble in financial dealin...,,1,Sean Axmaker,103,163,0.631902
3,3,Continuing along a line introduced in last yea...,,1,Daniel Kasman,103,163,0.631902
4,3,... a perverse twist on neorealism...,,1,,103,163,0.631902
...,...,...,...,...,...,...,...,...
54427,2000,The real charm of this trifle is the deadpan c...,,1,Laura Sinagra,18,38,0.473684
54428,2000,,1/5,0,Michael Szymanski,18,38,0.473684
54429,2000,,2/5,0,Emanuel Levy,18,38,0.473684
54430,2000,,2.5/5,0,Christopher Null,18,38,0.473684


In [14]:
# rtDF_counts_fresh = rtDF_reviews.groupby(['id', 'fresh']).size().to_frame('size')
# rtDF_counts_fresh

In [15]:
# rtDF_counts_fresh['size']

In [16]:
# rtDF_counts_fresh.reset_index(level=0, inplace=True)

In [17]:
# rtDF_counts_fresh['id'][1]

In [18]:
rtDF_reviews['rating'].unique()

array(['3/5', nan, 'C', '2/5', 'B-', '2/4', 'B', '3/4', '4/5', '4/4',
       '6/10', '1/4', '8', '2.5/4', '4/10', '2.0/5', '3/10', '7/10', 'A-',
       '5/5', 'F', '3.5/4', 'D+', '1.5/4', '3.5/5', '8/10', 'B+', '9/10',
       '2.5/5', '7.5/10', '5.5/10', 'C-', '1.5/5', '1/5', '5/10', 'C+',
       '0/5', '6', '0.5/4', 'D', '3.1/5', '3/6', '4.5/5', '0/4', '2/10',
       'D-', '7', '1/10', '3', 'A+', 'A', '4.0/4', '9.5/10', '2.5',
       '2.1/2', '6.5/10', '3.7/5', '8.4/10', '9', '1', '7.2/10', '2.2/5',
       '0.5/10', '5', '0', '2', '4.5', '7.7', '5.0/5', '8.5/10', '3.0/5',
       '0.5/5', '1.5/10', '3.0/4', '2.3/10', '4.5/10', '4/6', '3.5',
       '8.6/10', '6/8', '2.0/4', '2.7', '4.2/10', '5.8', '4', '7.1/10',
       '5/4', 'N', '3.5/10', '5.8/10', 'R', '4.0/5', '0/10', '5.0/10',
       '5.9/10', '2.4/5', '1.9/5', '4.9', '7.4/10', '1.5', '2.3/4',
       '8.8/10', '4.0/10', '2.2', '3.8/10', '6.8/10', '7.3', '7.0/10',
       '3.2', '4.2', '8.4', '5.5/5', '6.3/10', '7.6/10', '8.1/10',
  

### Box Office Mojo DF Cleaning

In [19]:
print(bomDF.info())
bomDF.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3387 entries, 0 to 3386
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   title           3387 non-null   object 
 1   studio          3382 non-null   object 
 2   domestic_gross  3359 non-null   float64
 3   foreign_gross   2037 non-null   object 
 4   year            3387 non-null   int64  
dtypes: float64(1), int64(1), object(3)
memory usage: 132.4+ KB
None


Unnamed: 0,title,studio,domestic_gross,foreign_gross,year
0,Toy Story 3,BV,415000000.0,652000000,2010
1,Alice in Wonderland (2010),BV,334200000.0,691300000,2010
2,Harry Potter and the Deathly Hallows Part 1,WB,296000000.0,664300000,2010
3,Inception,WB,292600000.0,535700000,2010
4,Shrek Forever After,P/DW,238700000.0,513900000,2010
5,The Twilight Saga: Eclipse,Sum.,300500000.0,398000000,2010
6,Iron Man 2,Par.,312400000.0,311500000,2010
7,Tangled,BV,200800000.0,391000000,2010
8,Despicable Me,Uni.,251500000.0,291600000,2010
9,How to Train Your Dragon,P/DW,217600000.0,277300000,2010


In [20]:
# Dropped rows where domestic gross was NaN
bomDF = bomDF[bomDF['domestic_gross'].notna()]
# 
bomDF['foreign_gross'] = bomDF['foreign_gross'].replace(',','', regex=True)
bomDF["foreign_gross"] = pd.to_numeric(bomDF["foreign_gross"])
bomDF.drop(columns=['studio'],inplace=True)
bomDF.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3359 entries, 0 to 3386
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   title           3359 non-null   object 
 1   domestic_gross  3359 non-null   float64
 2   foreign_gross   2009 non-null   float64
 3   year            3359 non-null   int64  
dtypes: float64(2), int64(1), object(1)
memory usage: 131.2+ KB


### The MovieDB DF Cleaning

In [21]:
print(tmdbDF.info())
tmdbDF.head(5)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 26517 entries, 0 to 26516
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   genre_ids          26517 non-null  object 
 1   id                 26517 non-null  int64  
 2   original_language  26517 non-null  object 
 3   original_title     26517 non-null  object 
 4   popularity         26517 non-null  float64
 5   release_date       26517 non-null  object 
 6   title              26517 non-null  object 
 7   vote_average       26517 non-null  float64
 8   vote_count         26517 non-null  int64  
dtypes: float64(2), int64(2), object(5)
memory usage: 2.0+ MB
None


Unnamed: 0,genre_ids,id,original_language,original_title,popularity,release_date,title,vote_average,vote_count
0,"[12, 14, 10751]",12444,en,Harry Potter and the Deathly Hallows: Part 1,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788
1,"[14, 12, 16, 10751]",10191,en,How to Train Your Dragon,28.734,2010-03-26,How to Train Your Dragon,7.7,7610
2,"[12, 28, 878]",10138,en,Iron Man 2,28.515,2010-05-07,Iron Man 2,6.8,12368
3,"[16, 35, 10751]",862,en,Toy Story,28.005,1995-11-22,Toy Story,7.9,10174
4,"[28, 878, 12]",27205,en,Inception,27.92,2010-07-16,Inception,8.3,22186


In [22]:
GenreDict = {28:"Action", 12:"Adventure", 16:"Animation", 35:"Comedy", 80:"Crime",
             99:"Documentary",18:"Drama",10751:"Family",14:"Fantasy",36:"History",
             27:"Horror",10402:"Music",9648:"Mystery",10749:"Romance",
             878:"Science Fiction", 10770:"TV Movie",53:"Thriller",10752:"War",37:"Western"}

In [53]:
tmdbDF.head()

Unnamed: 0,genre_ids,id,original_language,original_title,popularity,release_date,title,vote_average,vote_count
0,"[Adventure, Fantasy, Family]",12444,en,Harry Potter and the Deathly Hallows: Part 1,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788
1,"[Fantasy, Adventure, Animation, Family]",10191,en,How to Train Your Dragon,28.734,2010-03-26,How to Train Your Dragon,7.7,7610
2,"[Adventure, Action, Science Fiction]",10138,en,Iron Man 2,28.515,2010-05-07,Iron Man 2,6.8,12368
3,"[Animation, Comedy, Family]",862,en,Toy Story,28.005,1995-11-22,Toy Story,7.9,10174
4,"[Action, Science Fiction, Adventure]",27205,en,Inception,27.92,2010-07-16,Inception,8.3,22186


In [60]:
#is in
#
tmdbDF['genre_ids'].isin(["[Action]"])

TypeError: only list-like objects are allowed to be passed to isin(), you passed a [str]

In [66]:
tmdbDF['genre_ids'].str.find(sub = 'Drama')

TypeError: expected a string object, not list

In [23]:
# def clean_alt_list(list_):
#           lst = list_.replace(', ', '","')
#           lst = lst.replace('[', '["')
#           lst = lst.replace(']', '"]')
#           return list_
# tmdbDF['genre_ids'] = tmdbDF['genre_ids'].apply(clean_alt_list)
# tmdbDF['genre_ids'] = tmdbDF['genre_ids'].apply(eval)

In [24]:
# def clean_alt_list(list_):
#        lst = list_.replace(', ', '","')
#        lst = lst.replace('[', '["')
#        lst = lst.replace(']', '"]')
#        return lst
# tmdbDF['genre_ids'] = tmdbDF['genre_ids'].apply(clean_alt_list)
# tmdbDF['genre_ids'] = tmdbDF['genre_ids'].apply(eval)

In [25]:
#Make each item in genre id an iterable list
tmdbDF['genre_ids'] = tmdbDF['genre_ids'].apply(eval)

#Replace each list item with the corresponding dictionary value
tmdbDF['genre_ids']= tmdbDF['genre_ids'].apply(lambda x: [GenreDict[i] for i in x])

In [26]:
for i in tmdbDF['genre_ids']:
    for j in i:
        print(j)


Adventure
Fantasy
Family
Fantasy
Adventure
Animation
Family
Adventure
Action
Science Fiction
Animation
Comedy
Family
Action
Science Fiction
Adventure
Adventure
Fantasy
Family
Action
Adventure
Fantasy
Science Fiction
Animation
Family
Comedy
Animation
Family
Comedy
Animation
Action
Comedy
Family
Science Fiction
Animation
Comedy
Family
Family
Fantasy
Adventure
Thriller
Adventure
Action
Animation
Family
Horror
Crime
Adventure
Fantasy
Drama
Romance
Action
Thriller
Science Fiction
Music
Romance
Action
Drama
Thriller
Drama
Thriller
Mystery
Action
Drama
Mystery
Thriller
Action
Comedy
Adventure
Family
Fantasy
Drama
Romance
Thriller
Action
Adventure
Comedy
Crime
Drama
History
Action
Comedy
Crime
Thriller
Action
Adventure
Family
Fantasy
Action
Science Fiction
Adventure
Thriller
Drama
Action
Thriller
Drama
History
Action
Thriller
Science Fiction
Adventure
Action
Drama
Mystery
Thriller
Adventure
Fantasy
Action
Comedy
Romance
Action
Adventure
Drama
Comedy
Fantasy
Family
Comedy
Adventure
Fantasy
Anim

Documentary
History
Documentary
Romance
Comedy
Animation
Horror
Comedy
Documentary
Documentary
Animation
Horror
Horror
Animation
Documentary
Music
Documentary
Drama
Documentary
Drama
Documentary
Horror
Documentary
Comedy
Documentary
Mystery
Romance
Comedy
Documentary
Science Fiction
Comedy
Horror
Animation
Documentary
Documentary
Documentary
Horror
TV Movie
Fantasy
Science Fiction
Documentary
Drama
Romance
Drama
Documentary
Documentary
Horror
Drama
Action
Drama
Documentary
Comedy
Comedy
Horror
Mystery
Thriller
Comedy
Romance
Documentary
Drama
Crime
Comedy
Comedy
Family
Drama
Drama
Comedy
Romance
Documentary
Drama
Drama
Drama
Thriller
Documentary
Drama
Horror
Thriller
Drama
Action
Horror
Action
Thriller
Science Fiction
Comedy
Romance
Documentary
Documentary
TV Movie
Documentary
Documentary
Documentary
Drama
Documentary
Documentary
Documentary
Animation
Comedy
Romance
Science Fiction
Documentary
Comedy
Documentary
Music
Documentary
Music
Documentary
Family
Comedy
Horror
Thriller
Document

Horror
Comedy
Comedy
Documentary
Music
Family
Documentary
Documentary
Family
Animation
Drama
Documentary
Documentary
Music
Animation
Horror
Thriller
Crime
Drama
Romance
Crime
Drama
Thriller
Family
Comedy
Romance
Romance
Comedy
Drama
Thriller
Drama
Documentary
Music
Music
Documentary
Comedy
Documentary
Documentary
Comedy
Comedy
Action
Adventure
Animation
Action
Comedy
Music
Documentary
Horror
Comedy
Drama
Adventure
Action
Thriller
Animation
Documentary
Comedy
Drama
Comedy
Romance
Family
Drama
Documentary
Comedy
Documentary
Horror
Thriller
Mystery
Music
Drama
War
Documentary
Thriller
Drama
Horror
Drama
Fantasy
Comedy
Comedy
Comedy
Drama
History
Romance
Music
Music
Animation
Drama
Drama
Action
Music
Documentary
Music
Music
Documentary
Comedy
Drama
Drama
Horror
Comedy
Horror
Thriller
Thriller
Drama
Animation
Science Fiction
Documentary
Drama
Science Fiction
Mystery
Action
Documentary
Thriller
Horror
Music
Drama
Comedy
Music
Comedy
Fantasy
Animation
Comedy
Documentary
Action
Documentary
Doc

Fantasy
Action
Science Fiction
Drama
Family
Fantasy
Science Fiction
Drama
Horror
Thriller
Horror
Horror
Thriller
Animation
Family
Fantasy
Action
Crime
Comedy
Action
Thriller
Horror
Action
Comedy
Action
Crime
Thriller
Drama
Comedy
Romance
Drama
Drama
Comedy
Horror
Science Fiction
Thriller
Mystery
Drama
Comedy
Animation
Action
Drama
Crime
Thriller
Drama
Fantasy
Comedy
Horror
Family
Music
Adventure
Family
Horror
Romance
Horror
Mystery
Science Fiction
Thriller
Documentary
Music
Comedy
Comedy
Horror
Horror
Mystery
Thriller
TV Movie
Comedy
Drama
Thriller
TV Movie
Comedy
Animation
Crime
Horror
Action
Comedy
Drama
Romance
Drama
Romance
TV Movie
Drama
Drama
Thriller
Comedy
Drama
Music
TV Movie
Comedy
Family
Drama
Drama
Family
Documentary
Drama
Comedy
Drama
History
War
Documentary
Mystery
Horror
Thriller
Comedy
Drama
Crime
Horror
Thriller
Thriller
Comedy
Horror
Science Fiction
Animation
Comedy
Crime
Thriller
Horror
Horror
Drama
Crime
Mystery
Romance
Drama
Thriller
Horror
Thriller
Action
Comedy
C

Horror
Drama
Drama
Comedy
Adventure
Drama
Family
Science Fiction
Adventure
Comedy
Horror
Comedy
Romance
Drama
History
Action
Science Fiction
Comedy
Drama
Horror
Thriller
Drama
Mystery
Thriller
Fantasy
Action
Horror
Comedy
Animation
Comedy
Family
Fantasy
Horror
Thriller
Comedy
TV Movie
Family
Drama
Animation
Action
Comedy
Family
Drama
Action
Comedy
Horror
Science Fiction
Science Fiction
Adventure
Family
Drama
Horror
Thriller
Comedy
Documentary
Horror
Thriller
Drama
Drama
Action
Action
Science Fiction
Adventure
Comedy
Music
Fantasy
Family
Fantasy
Comedy
Drama
Horror
Science Fiction
Drama
Romance
Comedy
Documentary
Drama
Drama
Romance
Drama
Drama
History
Action
History
Adventure
Drama
Action
Thriller
Drama
Comedy
Drama
Science Fiction
Thriller
Documentary
Action
Science Fiction
Comedy
Drama
Comedy
Adventure
Drama
Horror
Fantasy
Horror
Horror
Drama
Action
Science Fiction
Adventure
Fantasy
Action
Adventure
Science Fiction
Comedy
Romance
Horror
TV Movie
Family
Fantasy
Drama
Drama
Thriller
Hi

Fantasy
Comedy
Action
Crime
Drama
Thriller
Drama
Thriller
Crime
Action
Thriller
Mystery
Comedy
Crime
Drama
Drama
Comedy
Drama
Family
Drama
Mystery
Horror
War
Drama
History
Action
Animation
Adventure
Family
Crime
Drama
Mystery
Thriller
Horror
Thriller
Drama
Crime
Drama
History
Drama
War
Thriller
Action
Crime
Drama
Thriller
Comedy
Drama
Drama
Drama
Thriller
Crime
Drama
Comedy
Romance
Romance
Drama
Music
Horror
Comedy
Adventure
Drama
Drama
Horror
Comedy
Romance
Animation
Adventure
Comedy
Family
Fantasy
Comedy
Romance
Comedy
Crime
Documentary
Thriller
Crime
Horror
Family
Science Fiction
Animation
Comedy
Family
Fantasy
Comedy
Horror
Crime
Comedy
Action
Romance
Comedy
Drama
Drama
Western
Comedy
Animation
Action
Drama
Action
Drama
Thriller
Thriller
Science Fiction
Drama
Mystery
Action
Drama
Thriller
Crime
Drama
Comedy
Animation
Drama
Fantasy
Comedy
Action
Fantasy
Horror
Thriller
Horror
Action
Comedy
Comedy
Music
Romance
Drama
Comedy
Western
Animation
Action
Adventure
Drama
Action
Adventure
Fa

Adventure
Family
Fantasy
Drama
Comedy
Drama
Documentary
Comedy
Animation
Horror
Thriller
Animation
Music
Documentary
Documentary
Music
TV Movie
Documentary
Romance
Comedy
Mystery
Horror
Action
Documentary
Comedy
Science Fiction
Mystery
Horror
Drama
TV Movie
Horror
Horror
Horror
Comedy
Documentary
Music
Documentary
Adventure
Action
Documentary
Animation
Documentary
Documentary
Documentary
Comedy
Music
Documentary
Comedy
Family
Documentary
Romance
Comedy
TV Movie
Mystery
Drama
Drama
Drama
Comedy
Horror
Science Fiction
Drama
Documentary
Adventure
Comedy
Documentary
Drama
Documentary
Drama
Comedy
Romance
Comedy
Comedy
Crime
Drama
Comedy
Documentary
Thriller
Drama
Comedy
Comedy
Documentary
Documentary
Documentary
Comedy
Drama
Comedy
Romance
Family
Adventure
Comedy
Horror
Comedy
Music
Documentary
Family
TV Movie
Animation
Action
Documentary
Comedy
Horror
Drama
Action
Adventure
Comedy
Crime
Drama
Music
Western
Music
Family
Adventure
Documentary
Music
Documentary
Animation
Fantasy
Comedy
Docum

Documentary
Thriller
Crime
Drama
Comedy
Thriller
Animation
Science Fiction
Documentary
Thriller
Documentary
Action
Crime
Thriller
Drama
Comedy
Science Fiction
Action
Drama
Thriller
Drama
Drama
Drama
Romance
Thriller
Documentary
Animation
Animation
Comedy
Documentary
Music
Documentary
Documentary
Horror
Drama
Comedy
Romance
Adventure
Drama
Science Fiction
Horror
Animation
Family
Drama
Comedy
Documentary
Drama
Comedy
War
Action
Drama
Crime
Drama
Horror
Thriller
Thriller
Drama
Adventure
Drama
Music
Documentary
Documentary
Action
Documentary
Documentary
Documentary
Drama
Comedy
Comedy
Documentary
Comedy
Drama
Romance
Romance
Comedy
Drama
Documentary
Music
Documentary
Thriller
Action
Drama
Thriller
Comedy
Mystery
Comedy
Drama
Drama
Fantasy
Music
Drama
Mystery
Thriller
TV Movie
Thriller
Music
Comedy
Documentary
Comedy
Documentary
Drama
History
War
Action
Documentary
Family
Documentary
Comedy
Drama
Comedy
Drama
Comedy
Documentary
Crime
Drama
Mystery
Thriller
Documentary
Comedy
Adventure
Famil

TV Movie
Crime
Mystery
Comedy
Drama
Romance
TV Movie
Adventure
Comedy
Drama
Drama
Horror
Drama
Science Fiction
Action
Drama
Horror
Thriller
Thriller
Drama
Comedy
Fantasy
Comedy
Drama
Thriller
Animation
Drama
History
Drama
Thriller
Horror
Crime
Drama
Thriller
Thriller
Comedy
Horror
Drama
Action
Crime
Drama
TV Movie
Romance
Action
Drama
Comedy
Drama
Thriller
Action
History
Thriller
Fantasy
Comedy
Drama
Drama
Drama
Thriller
Drama
Drama
History
TV Movie
Romance
Thriller
Thriller
Horror
Drama
Drama
Thriller
Adventure
Documentary
Comedy
Comedy
Family
Documentary
Adventure
Action
Comedy
Crime
Drama
Thriller
Mystery
Family
TV Movie
Drama
Comedy
Horror
Thriller
Mystery
Horror
Horror
Thriller
Horror
Science Fiction
Thriller
Drama
Drama
Family
Action
Drama
Comedy
Romance
TV Movie
Crime
Thriller
Horror
Documentary
Drama
Horror
Thriller
Romance
TV Movie
Comedy
Family
Science Fiction
Action
Adventure
Thriller
Action
Mystery
Documentary
Family
Animation
Animation
Family
Animation
Comedy
Drama
Comedy


Thriller
Documentary
Drama
Thriller
Drama
Mystery
Mystery
Science Fiction
Documentary
Drama
Drama
Documentary
Comedy
Comedy
Horror
Horror
Documentary
Drama
Comedy
Documentary
Drama
Drama
Romance
Science Fiction
Horror
Mystery
Horror
Comedy
Comedy
Crime
Documentary
Comedy
Thriller
Horror
Action
Crime
Comedy
Comedy
Drama
Thriller
Horror
Mystery
Thriller
Comedy
Horror
Drama
Mystery
Drama
Romance
Horror
Drama
Adventure
Family
Thriller
Mystery
Thriller
Mystery
TV Movie
Thriller
Horror
Adventure
Comedy
Drama
TV Movie
Animation
Thriller
Horror
Drama
Music
Romance
Thriller
Drama
Horror
Horror
Science Fiction
Comedy
Horror
Science Fiction
Thriller
Horror
Comedy
Family
Documentary
History
Horror
Comedy
Horror
Comedy
Drama
Romance
Horror
Drama
Horror
Thriller
Thriller
Horror
Thriller
Drama
Action
Crime
Thriller
TV Movie
Thriller
Drama
Mystery
Thriller
TV Movie
Drama
Thriller
Adventure
Horror
Documentary
Drama
Horror
Comedy
Horror
Comedy
Drama
Comedy
Horror
Documentary
Documentary
Horror
Thriller


Drama
Thriller
Documentary
Animation
Comedy
TV Movie
Mystery
Drama
Thriller
Horror
Comedy
Documentary
Thriller
Thriller
Documentary
Drama
Horror
Thriller
Horror
Drama
Romance
TV Movie
Thriller
Documentary
Comedy
Science Fiction
Thriller
Drama
Mystery
Drama
Crime
Thriller
Drama
Animation
Comedy
Documentary
Drama
TV Movie
Romance
History
Drama
Comedy
Comedy
Animation
Family
Comedy
Science Fiction
Thriller
Documentary
Drama
Music
Thriller
Romance
TV Movie
Comedy
TV Movie
Thriller
Documentary
Thriller
Horror
Thriller
Drama
Drama
Comedy
Thriller
TV Movie
Comedy
Drama
Music
Animation
Adventure
Drama
Romance
Drama
Thriller
TV Movie
Drama
Fantasy
Horror
Comedy
Documentary
Documentary
Music
Drama
TV Movie
Drama
Family
Music
TV Movie
Music
Documentary
Music
Documentary
Documentary
Thriller
TV Movie
Horror
Adventure
Family
Romance
Comedy
Drama
Animation
Documentary
Documentary
Comedy
Horror
Thriller
Comedy
Animation
Comedy
Family
Romance
Comedy
Drama
Documentary
Music
Science Fiction
Horror
Comed

Drama
Horror
Drama
Horror
Comedy
Family
Science Fiction
Romance
Drama
TV Movie
Drama
Thriller
Drama
Family
Comedy
Drama
Romance
Drama
Crime
Horror
Comedy
Drama
Romance
Comedy
Drama
Documentary
Drama
Music
TV Movie
Family
Horror
Drama
Thriller
Drama
Thriller
Mystery
Science Fiction
Documentary
Drama
Thriller
Mystery
Science Fiction
Thriller
Horror
Science Fiction
Comedy
Drama
Romance
TV Movie
Comedy
Drama
Action
Crime
Drama
Thriller
Drama
Science Fiction
Thriller
Drama
Romance
Comedy
Adventure
Fantasy
Horror
Mystery
Thriller
Romance
TV Movie
Documentary
Comedy
Drama
Action
Crime
Thriller
TV Movie
Documentary
Thriller
Action
Drama
Horror
Mystery
Thriller
Thriller
Drama
Drama
Romance
TV Movie
Action
Thriller
TV Movie
Romance
Horror
Comedy
Drama
Thriller
Drama
Romance
Drama
Drama
Action
Crime
Action
Horror
Science Fiction
TV Movie
Horror
Action
Drama
Comedy
Comedy
Documentary
Comedy
Drama
Comedy
Drama
Romance
TV Movie
Comedy
Drama
Comedy
Fantasy
Romance
TV Movie
Mystery
TV Movie
Drama
Come

In [29]:

tmdbDF['genre_ids'].value_counts()

TypeError: unhashable type: 'list'

Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
  File "pandas\_libs\hashtable_class_helper.pxi", line 1709, in pandas._libs.hashtable.PyObjectHashTable.map_locations
TypeError: unhashable type: 'list'


[Documentary]                                                 3700
[]                                                            2479
[Drama]                                                       2268
[Comedy]                                                      1660
[Horror]                                                      1145
                                                              ... 
[Comedy, Drama, Family, Fantasy, Romance, Science Fiction]       1
[Thriller, Crime, Drama, TV Movie]                               1
[Fantasy, Comedy, Science Fiction, Family]                       1
[Science Fiction, Action, Animation, Adventure]                  1
[Crime, Mystery, Thriller, Drama]                                1
Name: genre_ids, Length: 2477, dtype: int64

In [32]:
def to_1D(series):
    return pd.Series([x for _list in series for x in _list])
to_1D(tmdbDF["genre_ids"]).value_counts()

Drama              8303
Comedy             5652
Documentary        4965
Thriller           4207
Horror             3683
Action             2612
Romance            2321
Science Fiction    1762
Family             1565
Crime              1515
Animation          1486
Adventure          1400
Music              1267
Mystery            1237
Fantasy            1139
TV Movie           1084
History             622
War                 330
Western             205
dtype: int64

In [50]:
tmdbDF[["genre_ids", 'vote_average']]

Unnamed: 0,genre_ids,vote_average
0,"[Adventure, Fantasy, Family]",7.7
1,"[Fantasy, Adventure, Animation, Family]",7.7
2,"[Adventure, Action, Science Fiction]",6.8
3,"[Animation, Comedy, Family]",7.9
4,"[Action, Science Fiction, Adventure]",8.3
...,...,...
26512,"[Horror, Drama]",0.0
26513,"[Drama, Thriller]",0.0
26514,"[Fantasy, Action, Adventure]",0.0
26515,"[Family, Adventure, Action]",0.0


In [41]:
[(k, g) for k, g in itertools.groupby(tmdbDF["genre_ids"])]

[(['Adventure', 'Fantasy', 'Family'], <itertools._grouper at 0x215c5007430>),
 (['Fantasy', 'Adventure', 'Animation', 'Family'],
  <itertools._grouper at 0x215c4fcd970>),
 (['Adventure', 'Action', 'Science Fiction'],
  <itertools._grouper at 0x215c4fcdc40>),
 (['Animation', 'Comedy', 'Family'], <itertools._grouper at 0x215c4fcdf40>),
 (['Action', 'Science Fiction', 'Adventure'],
  <itertools._grouper at 0x215c4fcd520>),
 (['Adventure', 'Fantasy', 'Family'], <itertools._grouper at 0x215c4fcdc70>),
 (['Action', 'Adventure', 'Fantasy', 'Science Fiction'],
  <itertools._grouper at 0x215c4fcdd60>),
 (['Animation', 'Family', 'Comedy'], <itertools._grouper at 0x215c4fcdb50>),
 (['Animation', 'Action', 'Comedy', 'Family', 'Science Fiction'],
  <itertools._grouper at 0x215c4fcd5e0>),
 (['Animation', 'Comedy', 'Family'], <itertools._grouper at 0x215c4fcd670>),
 (['Family', 'Fantasy', 'Adventure'], <itertools._grouper at 0x215c7e600a0>),
 (['Thriller', 'Adventure', 'Action'], <itertools._grouper 

### The Numbers DF Cleaning

In [None]:
print(tnmDF.info())
tnmDF.head()


## To-do
- Plug-in genre to the movies tables
- Split genres in rtDF
- Fill in nulls in foreign gross
- movie & genre IDs match up (explore relationships between dataframes)

## Joining Tables

## Exploratory Data Analysis 
Begin looking into relationships between variables, uncover information that will form our recommendations to Computing Vision (client)

## Linear Model
Test out linear model with added variables 