![example](images/director_shot.jpeg)

# Project Title

**Authors:** Student 1, Student 2, Student 3
***

## Overview

A one-paragraph overview of the project, including the business problem, data, methods, results and recommendations.

## Business Problem

Summary of the business problem you are trying to solve, and the data questions that you plan to answer to solve them.

***
Questions to consider:
* What are the business's pain points related to this project?
* How did you pick the data analysis question(s) that you did?
* Why are these questions important from a business perspective?
***

## Data Understanding

Describe the data being used for this project.
***
Questions to consider:
* Where did the data come from, and how do they relate to the data analysis questions?
* What do the data represent? Who is in the sample and what variables are included?
* What is the target variable?
* What are the properties of the variables you intend to use?
***

In [1]:
# Import standard packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [2]:
# Here you run your code to explore the data
tomato_reviews = pd.read_csv('data/zippedData/rotten_tomatoes_critic_reviews.csv')
tomato_reviews.head(5)

Unnamed: 0,rotten_tomatoes_link,critic_name,top_critic,publisher_name,review_type,review_score,review_date,review_content
0,m/0814255,Andrew L. Urban,False,Urban Cinefile,Fresh,,2010-02-06,A fantasy adventure that fuses Greek mythology...
1,m/0814255,Louise Keller,False,Urban Cinefile,Fresh,,2010-02-06,"Uma Thurman as Medusa, the gorgon with a coiff..."
2,m/0814255,,False,FILMINK (Australia),Fresh,,2010-02-09,With a top-notch cast and dazzling special eff...
3,m/0814255,Ben McEachen,False,Sunday Mail (Australia),Fresh,3.5/5,2010-02-09,Whether audiences will get behind The Lightnin...
4,m/0814255,Ethan Alter,True,Hollywood Reporter,Rotten,,2010-02-10,What's really lacking in The Lightning Thief i...


In [3]:
workers_info = pd.read_csv('data/zippedData/imdb.name.basics.csv')
workers_info.head(5)


Unnamed: 0,nconst,primary_name,birth_year,death_year,primary_profession,known_for_titles
0,nm0061671,Mary Ellen Bauder,,,"miscellaneous,production_manager,producer","tt0837562,tt2398241,tt0844471,tt0118553"
1,nm0061865,Joseph Bauer,,,"composer,music_department,sound_department","tt0896534,tt6791238,tt0287072,tt1682940"
2,nm0062070,Bruce Baum,,,"miscellaneous,actor,writer","tt1470654,tt0363631,tt0104030,tt0102898"
3,nm0062195,Axel Baumann,,,"camera_department,cinematographer,art_department","tt0114371,tt2004304,tt1618448,tt1224387"
4,nm0062798,Pete Baxter,,,"production_designer,art_department,set_decorator","tt0452644,tt0452692,tt3458030,tt2178256"


In [4]:
workers_info['primary_profession'] = workers_info['primary_profession'].str.split(',').tolist()
workers_info.head(5)


Unnamed: 0,nconst,primary_name,birth_year,death_year,primary_profession,known_for_titles
0,nm0061671,Mary Ellen Bauder,,,"[miscellaneous, production_manager, producer]","tt0837562,tt2398241,tt0844471,tt0118553"
1,nm0061865,Joseph Bauer,,,"[composer, music_department, sound_department]","tt0896534,tt6791238,tt0287072,tt1682940"
2,nm0062070,Bruce Baum,,,"[miscellaneous, actor, writer]","tt1470654,tt0363631,tt0104030,tt0102898"
3,nm0062195,Axel Baumann,,,"[camera_department, cinematographer, art_depar...","tt0114371,tt2004304,tt1618448,tt1224387"
4,nm0062798,Pete Baxter,,,"[production_designer, art_department, set_deco...","tt0452644,tt0452692,tt3458030,tt2178256"


In [5]:
def is_director(in_list):
    if 'director' in in_list:
        return True
    else:
        return False

In [6]:
workers_info = workers_info.dropna(subset = ['primary_profession'])
workers_info.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 555308 entries, 0 to 606647
Data columns (total 6 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   nconst              555308 non-null  object 
 1   primary_name        555308 non-null  object 
 2   birth_year          80414 non-null   float64
 3   death_year          5991 non-null    float64
 4   primary_profession  555308 non-null  object 
 5   known_for_titles    535137 non-null  object 
dtypes: float64(2), object(4)
memory usage: 29.7+ MB


In [7]:
workers_info['is_director'] = workers_info['primary_profession'].apply(is_director)

In [8]:
workers_info = workers_info.loc[workers_info['is_director'] == True]

In [9]:
workers_info['known_for_titles'] = workers_info['known_for_titles'].str.split(',').tolist()
workers_info['known_for_titles'] = workers_info['known_for_titles'].fillna(0)



In [11]:
title_translation = pd.read_csv('data/zippedData/imdb.title.akas.csv')
title_translation = title_translation.loc[title_translation['is_original_title']==1]
title_translation.head()

Unnamed: 0,title_id,ordering,title,region,language,types,attributes,is_original_title
38,tt0369610,45,Jurassic World,,,original,,1.0
80,tt0401729,7,John Carter,,,original,,1.0
83,tt10010134,1,Versailles Rediscovered - The Sun King's Vanis...,,,original,,1.0
86,tt10027708,1,Miguelito - Canto a Borinquen,,,original,,1.0
90,tt10050722,1,Thing I Don't Get,,,original,,1.0


In [21]:
def locate_translation(list_id, id_set):
    output = []
    if list_id == 0:
        return 0;
    else:
        for i in list_id:
            name = list(id_set.loc[id_set['title_id'] == i]['title']);
            output.append(name)
        return output;

test = locate_translation(['tt0369610','tt0401729'],title_translation)
print(test)

[['Jurassic World'], ['John Carter']]


## Data Preparation

Describe and justify the process for preparing the data for analysis.

***
Questions to consider:
* Were there variables you dropped or created?
* How did you address missing values or outliers?
* Why are these choices appropriate given the data and the business problem?
***

In [None]:
# Here you run your code to clean the data

## Data Modeling
Describe and justify the process for analyzing or modeling the data.

***
Questions to consider:
* How did you analyze or model the data?
* How did you iterate on your initial approach to make it better?
* Why are these choices appropriate given the data and the business problem?
***

In [None]:
# Here you run your code to model the data


## Evaluation
Evaluate how well your work solves the stated business problem.

***
Questions to consider:
* How do you interpret the results?
* How well does your model fit your data? How much better is this than your baseline model?
* How confident are you that your results would generalize beyond the data you have?
* How confident are you that this model would benefit the business if put into use?
***

## Conclusions
Provide your conclusions about the work you've done, including any limitations or next steps.

***
Questions to consider:
* What would you recommend the business do as a result of this work?
* What are some reasons why your analysis might not fully solve the business problem?
* What else could you do in the future to improve this project?
***