# COGS 118A - Project Checkpoint

# Names

Hopefully your team is at least this good. Obviously you should replace these with your names.

- Banso Nguyen
- Kirsten Nino
- Rufeng Chen
- Shan He

# Abstract 
Our goal is to find out the factors that influence movie ratings and use those factors to try to predict the performance of upcoming movies. The dataset we used contains various variables such as genre, release date, production companies and other related information about 45,000 movies released around the world. We will be performing two separate linear regressions once we have cleaned our dataset. In order to measure the performance of the movies, we have decided to use the following variables: revenue and vote average (audience score). 

# Background

There are many factors that can influence how a movie is rated, like its budget, director, actors/actresses, its genre, and etc.. How these factors influence the movie ratings has been an area of interest for researchers.

The budget of a movie is generally thought to be the main determinant of its rating. Higher budgets often lead to higher production quality, top-notch actors, and more extensive advertisements. These could potentially draw larger audiences and better reviews. A 2013 study by Eliashberg, Elberse, and Leenders found that higher budgets often leads to more promising film projects.<a name="Eliashberg"></a>[<sup>[1]</sup>](#Eliashberg)

People often anticipate movies by their favorite directors, and directors who've produced great movies in the past often receive higher ratings for their new works. Wallace, Seigerman, and Holbrook found that directors with previous successes are more likely to have higher box-office sales as well as better reviews.<a name="Wallace"></a>[<sup>[2]</sup>](#Wallace)

Moreover, genre is another big contributing factor to movie ratings. Some genres just seem to resonate more with audiences and critics. De Vany and Walls' study showed that action flicks and dramas usually get higher ratings than comedies or horror movies.<a name="De Vany"></a>[<sup>[3]</sup>](#De) 
	
There are also many more factors that can alter the performance of a movie, such as the movie's casts, release date, and its marketing strategies. We will find out more about what factors influence movie ratings and predict the movie’s ratings based on those factors.

# Problem Statement

In the rapidly expanding world of film and television, discerning which movies are set to be hits or misses is a challenge for average audiences. Typically, we wait for critic reviews or box office results, but what if we could predict the outcome of a movie? Our goal is to try building a way of predicting a movie's success before it's even hit the screens. We will be creating a model that takes into account a range of information, such as budget, production company, genre and use these variables to predict a movie's success. All these factors can be represented by numbers, some variables can be represented by encoding them. The measurable aspect of this problem relates to the metrics used to determine a movie's success: its revenue and audience score (vote average). The model's performance can be measured by comparing the predicted success against the actual performance of the film’s post-release, which offers an objective method for evaluating our model's accuracy and reliability. The model we're building can be replicated for all movies. By using a large dataset of 45,000 movies from around the world, our model's learning and predictions can be reproduced and improved upon over time. To solve this problem, we'll experiment with multiple different machine learning models - specifically linear regression models. We'll compare them and pick the most accurate. The selected model would then be used to predict the performance of upcoming movies, providing audiences with a guide to potential movie success. 

# Data

While cleaning the data from our original dataset, we realized that eliminating rows with null values resulted in a much too small dataset. We went from 119k samples to only several thousand. Thus we changed our dataset to similar one, the Movies Dataset:


The Movies Dataset
https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset?select=movies_metadata.csv


The dataset used for this project is obtained from the FullMovieLens Dataset, a community-built movie database. The dataset contains information on more than 45,000 released internationally. The dataset is in CSV format and is pipe-delimited.


Size of the Dataset:
The dataset consists of approximately 45,000 records and has 24 features (variables) that provide comprehensive information about each movie or TV show.


Observation:
Each observation in the dataset represents a movie. It contains various details about the production, release, cast, crew, and other relevant information associated with the title.


Critical Variables:
Some critical variables in the dataset include:


|Variable|Description|
|---|---|
|adult | If this movie is rated for adult audiences. |
|belongs_to_collection | Indicates whether the movie belongs to a collection, with the collection specified if it exists. |
|budget | The budget of the movie. |
|genres | The genres that movie falls into. |
|homepage | Link to homepage of movie website. |
|original_language| Specifies the original language of the movie. |
|original_title| Specifies the original title of the movie. |
|overview | Provides a summary or synopsis of the movie. |
|popularity | Represents the popularity index of the movie. |
|production_companies | Lists the companies involved in producing the movie. |
|production_countries | Indicates the country where the movie is produced. |
|release_date | Represents the release date of the movie. |
|revenue | Represents the revenue generated by the movie. If missing, it is represented by 0. |
|runtime | Denotes the duration of the movie in minutes. |
|spoken_languages | The different languages spoken in the movie. |
|status | Indicates whether the movie is released or not. |
|tagline | Provides the tagline associated with the movie. |
|title | Specifies the English alias title of the movie. |
|vote_average | Represents the average vote rating given by viewers. |
|overview | Represents the synopsis of the movie. |


Handling, Transformations, Cleaning:
* *Handling Missing Values: Check for missing values in the budget, popularity, and income variables. We will use imputation (replace missing values with mean, median or mode) or remove observations with missing values.*
* *Handling Categorical Variables: Since the dataset contains categorical variables such as original_language or production_companies, we need to encode them using one-hot encoding technique to convert them into a suitable numerical representation.*
* *Data type conversion: Ensure budget, popularity, and revenue variables are in numeric format. Convert them from string or object data types to numeric data types (such as integers or floating-point numbers) to perform mathematical operations and analysis on them.*
* *Scaling or normalization: since we may be using regression models or clustering algorithms which are sensitive to the extent of the data, we need to normalize the budget, popularity and income variables using z-score normalization or min-max scaling , making them comparable in size.*
* *Remove irrelevant features: Identify any irrelevant or redundant features that do not contribute significantly to the analysis or prediction. Removing these features simplifies the dataset and reduces noise.*
* *Data Splitting: Since we plan to build a predictive model of revenue or popularity, we need to split the dataset into training and testing sets to evaluate the performance of the model. This allows us to train the model on partial data and evaluate its accuracy on unseen data.*


# Proposed Solution

To predict a movie's success, we're focusing on two key indicators: revenue and audience score (also known as vote average). We propose employing the following machine learning models to accomplish this - Linear Regression and Logistic Regression.

We will be performing two separate linear regressions. Since we're dealing with continuous target variables, which are revenue and vote average, linear Regression would be perfect for helping us understand the relationship between factors like budget, director, genre, and release date and our target variables. This model works by fitting a line through our data in a way that best predicts the performance of upcoming movies. We will also use regularization techniques (Lasso) to squish the least important parameters to zero to leave the most prevalent ones. To implement this, we will use the Scikit-learn library's Linear Regression model to fit the data. The model's performance will be evaluated using a loss function, with gradient descent employed to minimize this loss and optimize the model.

Another solution is to perform a logistic regression which classifies a movie's success based on a certain threshold value for revenue and audience score. With Logistic Regression, we can measure how far a movie's predicted score is from a decision boundary and use that to determine the likelihood of its success. For instance, a decision boundary will be made if the vote average is above or below a rating of 7 and the farther a point is away from the decision boundary, the higher the probability the movie is successful/unsuccessful (depending on which side the point is on). We're planning on using the Scikit-learn library to implement our Logistic Regression model. We'll measure the model's performance using the soft log loss function and adjust the weights using gradient descent.

We may plan to exclude certain features, such as actors (since one-hot encoding would be too numerous) or overview (since movie plots are too unique to compare). Also, as mentioned earlier in the Data section, we will make sure missing values are handled with imputation, one-hot encode the according non-numerical features, and split the dataset with k-fold (value to be determined). With the remaining variables we can run the above mentioned methods with the Numpy, Scikit-learn libraries. A potential benchmark model is k-NN, and we can make two, one for predicting revenue and the other voter average. We can then compare these two with the corresponding linear and logistic regressions. Comparing our proposed models to this benchmark will help us assess the relative effectiveness of our chosen methods.

# Evaluation Metrics

One evaluation metric that we could use to quantify the performance of the linear regression model in predicting whether or not the movie will be successful or not is with mean squared error. This will measure how accurate our model is. 

Another possible metric, while less effective but still applicable, would be positive predictive value (PPV) or precision. Our project could be interpreted by predicting a binary result of successful/ not successful, however this metric is much simpler and provides much less insight. We could also make a ROC-AUC to evaluate how good our model is. 

The following metrics are not complex, and easily work with linear and logistic regression respectively. MSE is able to evaluate the prediction from a linear regression model, and the binary classification result from logistic regression can be evaluated with PPV.


# Preliminary results


In [69]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import zipfile
import ast
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error, accuracy_score
from ast import literal_eval
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [70]:
movie_df = pd.read_csv('movies_metadata.csv')

  movie_df = pd.read_csv('movies_metadata.csv')


## Data Cleaning

In [71]:
selected_columns = ['budget', 'genres', 'original_language', 'popularity', 'production_companies', 'production_countries', 'revenue', 'runtime', 'spoken_languages', 'vote_average', 'vote_count']
movie_df = movie_df[selected_columns]
movie_df = movie_df.dropna()

In [72]:
#remove any unwanted data
df = movie_df[(movie_df != 0 ).all(1)]
df = df[(df != '0').all(1)]
df = df[(df != 0.0).all(1)]
df = df[(df != '0.0').all(1)]
df = df[(df != '[]').all(1)]

In [73]:
#extract parts wanted from each cell
df['genres'] = df['genres'].apply(literal_eval).apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])
df['production_companies'] = df['production_companies'].apply(literal_eval).apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])
df['production_countries'] = df['production_countries'].apply(literal_eval).apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])
df['spoken_languages'] = df['spoken_languages'].apply(literal_eval).apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])

In [74]:
df_1 = df.drop('production_companies', axis=1)

In [75]:
def encode_and_bind(original_dataframe, feature_to_encode):
    dummies = pd.get_dummies(original_dataframe[feature_to_encode].apply(pd.Series).stack()).sum(level=0)
    res = pd.concat([original_dataframe, dummies], axis=1)
    res = res.drop([feature_to_encode], axis=1)
    return(res) 

In [76]:
#one hot encoding df
df = encode_and_bind(df, 'genres')
df = encode_and_bind(df, 'original_language')
df = encode_and_bind(df, 'production_companies')
df = encode_and_bind(df, 'production_countries')
df = encode_and_bind(df, 'spoken_languages')

  dummies = pd.get_dummies(original_dataframe[feature_to_encode].apply(pd.Series).stack()).sum(level=0)
  dummies = pd.get_dummies(original_dataframe[feature_to_encode].apply(pd.Series).stack()).sum(level=0)
  dummies = pd.get_dummies(original_dataframe[feature_to_encode].apply(pd.Series).stack()).sum(level=0)
  dummies = pd.get_dummies(original_dataframe[feature_to_encode].apply(pd.Series).stack()).sum(level=0)
  dummies = pd.get_dummies(original_dataframe[feature_to_encode].apply(pd.Series).stack()).sum(level=0)


In [77]:
df_1 = encode_and_bind(df_1, 'genres')
df_1 = encode_and_bind(df_1, 'original_language')
df_1 = encode_and_bind(df_1, 'production_countries')
df_1 = encode_and_bind(df_1, 'spoken_languages')

  dummies = pd.get_dummies(original_dataframe[feature_to_encode].apply(pd.Series).stack()).sum(level=0)
  dummies = pd.get_dummies(original_dataframe[feature_to_encode].apply(pd.Series).stack()).sum(level=0)
  dummies = pd.get_dummies(original_dataframe[feature_to_encode].apply(pd.Series).stack()).sum(level=0)
  dummies = pd.get_dummies(original_dataframe[feature_to_encode].apply(pd.Series).stack()).sum(level=0)


In [80]:
df = df.apply(pd.to_numeric, errors='coerce').dropna()
df_1 = df_1.apply(pd.to_numeric, errors='coerce').dropna()

In [106]:
df

Unnamed: 0,budget,popularity,revenue,runtime,vote_average,vote_count,Action,Adventure,Animation,Comedy,...,বাংলা,ਪੰਜਾਬੀ,தமிழ்,తెలుగు,ภาษาไทย,ქართული,广州话 / 廣州話,日本語,普通话,한국어/조선말
0,30000000,21.946943,373554033.0,81.0,7.7,5415.0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
1,65000000,17.015539,262797249.0,104.0,6.9,2413.0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
3,16000000,3.859495,81452156.0,127.0,6.1,34.0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
5,60000000,17.924927,187436818.0,170.0,7.7,1886.0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,35000000,5.231580,64350171.0,106.0,5.5,174.0,1,1,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45014,60000000,50.903593,71000000.0,95.0,5.7,688.0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
45139,50000000,33.694599,66913939.0,86.0,5.8,327.0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
45167,11000000,40.796775,184770205.0,111.0,7.4,181.0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
45250,12000000,1.323587,19000000.0,185.0,6.9,25.0,1,0,0,1,...,0,0,1,1,0,0,0,0,0,0


In [107]:
df_1

Unnamed: 0,budget,popularity,revenue,runtime,vote_average,vote_count,Action,Adventure,Animation,Comedy,...,தமிழ்,తెలుగు,ภาษาไทย,ქართული,广州话 / 廣州話,日本語,普通话,한국어/조선말,success_vote,success_revenue
0,30000000,21.946943,373554033.0,81.0,7.7,5415.0,0,0,1,1,...,0,0,0,0,0,0,0,0,1,1
1,65000000,17.015539,262797249.0,104.0,6.9,2413.0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,1
3,16000000,3.859495,81452156.0,127.0,6.1,34.0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
5,60000000,17.924927,187436818.0,170.0,7.7,1886.0,1,0,0,0,...,0,0,0,0,0,0,0,0,1,1
8,35000000,5.231580,64350171.0,106.0,5.5,174.0,1,1,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45014,60000000,50.903593,71000000.0,95.0,5.7,688.0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
45139,50000000,33.694599,66913939.0,86.0,5.8,327.0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
45167,11000000,40.796775,184770205.0,111.0,7.4,181.0,1,0,0,0,...,0,0,0,0,0,0,0,0,1,1
45250,12000000,1.323587,19000000.0,185.0,6.9,25.0,1,0,0,1,...,1,1,0,0,0,0,0,0,0,0


## Linear Regression

In [81]:
# Target and Features
# use df_1: without production company
X = df_1.drop(['revenue', 'vote_average'], axis=1)
y_revenue = df_1['revenue']
y_vote = df_1['vote_average']

In [82]:
X_train, X_test, y_train_revenue, y_test_revenue = train_test_split(X, y_revenue, test_size=0.2, random_state=42)
_, _, y_train_vote, y_test_vote = train_test_split(X, y_vote, test_size=0.2, random_state=42)

In [83]:
# Linear Regression for revenue prediction
lin_reg_revenue = LinearRegression()
lin_reg_revenue.fit(X_train, y_train_revenue)
y_pred_revenue = lin_reg_revenue.predict(X_test)
print(f"RMSE for revenue prediction: {np.sqrt(mean_squared_error(y_test_revenue, y_pred_revenue))}")

RMSE for revenue prediction: 84412942.31129745


In [84]:
# Linear Regression for vote average prediction
lin_reg_vote = LinearRegression()
lin_reg_vote.fit(X_train, y_train_vote)
y_pred_vote = lin_reg_vote.predict(X_test)
print(f"RMSE for vote average prediction: {np.sqrt(mean_squared_error(y_test_vote, y_pred_vote))}")

RMSE for vote average prediction: 0.7056352715124519


## Logistic Regression

In [108]:
df_1['success_vote'] = df_1['vote_average'].apply(lambda x: 1 if x > 7 else 0)
df_1['success_revenue'] = df_1['revenue'].apply(lambda x: 1 if x > 130000000 else 0)

In [109]:
y_success_vote = df_1['success_vote']
y_success_revenue = df_1['success_revenue']

In [110]:
X_train, X_test, y_train_success_vote, y_test_success_vote = train_test_split(X, y_success_vote, test_size=0.2, random_state=42)
_, _, y_train_success_revenue, y_test_success_revenue = train_test_split(X, y_success_revenue, test_size=0.2, random_state=42)

In [111]:
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

In [112]:
log_reg = LogisticRegression(random_state=42)

In [113]:
# Hyperparameters for Grid Search
param_grid = {
              'penalty': ['l1', 'l2'],
              'C' : np.logspace(-4, 4, 20),
              'solver': ['liblinear']
              }

In [114]:
# Grid Search Cross Validation
clf = GridSearchCV(log_reg, param_grid = param_grid, cv = 5)

In [115]:
# Fit the model
best_clf_vote = clf.fit(X_train_std, y_train_success_vote)

# show Best Parameters
print("Best Parameters: ", best_clf_vote.best_params_)

Best Parameters:  {'C': 0.08858667904100823, 'penalty': 'l1', 'solver': 'liblinear'}


In [116]:
# Fit the model
best_clf_revenue = clf.fit(X_train_std, y_train_success_revenue)

# show Best Parameters
print("Best Parameters: ", best_clf_revenue.best_params_)

Best Parameters:  {'C': 0.03359818286283781, 'penalty': 'l1', 'solver': 'liblinear'}


In [117]:
y_pred_success_vote = best_clf_vote.predict(X_test_std)
y_pred_success_revenue = best_clf_revenue.predict(X_test_std)

In [146]:
print("Logistic Regression Accuracy(Vote): ", accuracy_score(y_test_success_vote, y_pred_success_vote))

Logistic Regression Accuracy(Vote):  0.7746615087040619


In [147]:
print("Logistic Regression Accuracy(Revenue): ", accuracy_score(y_test_success_revenue, y_pred_success_revenue))

Logistic Regression Accuracy(Revenue):  0.9003868471953579


## Random Forest Classifier

In [120]:
rf = RandomForestClassifier(n_estimators=100, random_state=42)

**-Dataset with production company**

In [128]:
df['success_vote'] = df['vote_average'].apply(lambda x: 1 if x > 7 else 0)
df['success_revenue'] = df['revenue'].apply(lambda x: 1 if x > 130000000 else 0)

In [129]:
y_success_vote_1 = df['success_vote']
y_success_revenue_1 = df['success_revenue']

In [130]:
X_train, X_test, y_train_success_vote_1, y_test_success_vote_1 = train_test_split(X, y_success_vote_1, test_size=0.2, random_state=42)
_, _, y_train_success_revenue_1, y_test_success_revenue_1 = train_test_split(X, y_success_revenue_1, test_size=0.2, random_state=42)

In [131]:
# Define a parameter grid
param_grid_rf = {
    'n_estimators': [100, 200, 500],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth' : [4,5,6,7,8],
    'criterion' :['gini', 'entropy']
}

# Grid Search Cross Validation
clf_rf = GridSearchCV(estimator=rf, param_grid=param_grid_rf, cv= 5)


**- With production company**

In [136]:
# Fit the model
best_clf_vote_rf_1 = clf_rf.fit(X_train, y_train_success_vote_1)

# Print Best Parameters
print("Best Parameters: ", best_clf_vote_rf_1.best_params_)

Best Parameters:  {'criterion': 'gini', 'max_depth': 8, 'max_features': 'auto', 'n_estimators': 100}


In [137]:
# Fit the model
best_clf_revenue_rf_1 = clf_rf.fit(X_train, y_train_success_revenue_1)

# Print Best Parameters
print("Best Parameters: ", best_clf_revenue_rf_1.best_params_)

Best Parameters:  {'criterion': 'gini', 'max_depth': 8, 'max_features': 'auto', 'n_estimators': 100}


In [138]:
y_predvote_rf_1 = best_clf_vote_rf_1.predict(X_test)
y_predrevenue_rf_1 = best_clf_revenue_rf_1.predict(X_test)

In [139]:
# Evaluate the model
print("Random Forest Classifier Accuracy(Vote): ", accuracy_score(y_test_success_vote_1, y_predvote_rf_1))

Random Forest Classifier Accuracy(Vote):  0.7833655705996132


In [140]:
# Evaluate the model
print("Random Forest Classifier Accuracy(Revenue): ", accuracy_score(y_test_success_revenue_1, y_predrevenue_rf_1))

Random Forest Classifier Accuracy(Revenue):  0.8878143133462283


**- Without production company**

In [141]:
# Fit the model
best_clf_vote_rf = clf_rf.fit(X_train, y_train_success_vote)

# Print Best Parameters
print("Best Parameters: ", best_clf_rf.best_params_)

Best Parameters:  {'criterion': 'gini', 'max_depth': 4, 'max_features': 'auto', 'n_estimators': 100}


In [142]:
# Fit the model
best_clf_revenue_rf = clf_rf.fit(X_train, y_train_success_revenue)

# Print Best Parameters
print("Best Parameters: ", best_clf_revenue_rf.best_params_)

Best Parameters:  {'criterion': 'gini', 'max_depth': 8, 'max_features': 'auto', 'n_estimators': 100}


In [143]:
y_predvote_rf = best_clf_vote_rf.predict(X_test)
y_predrevenue_rf = best_clf_revenue_rf.predict(X_test)

In [144]:
# Evaluate the model
print("Random Forest Classifier Accuracy(Vote): ", accuracy_score(y_test_success_vote, y_pred_rf))

Random Forest Classifier Accuracy(Vote):  0.8249516441005803


In [145]:
# Evaluate the model
print("Random Forest Classifier Accuracy(Revenue): ", accuracy_score(y_test_success_revenue, y_predrevenue_rf))

Random Forest Classifier Accuracy(Revenue):  0.8878143133462283


# Ethics & Privacy

Personal Information Privacy: The dataset contains the names of people involved in the film industry, such as actors, directors, or production staff. We will anonymize the personal information in the data, i.e. remove relevant variables.

Fairness in revenue forecasting: This dataset will be used for forecasting, so inadvertent unfairness may arise. For example, certain genre, production company, or country of origin data may be favored in revenue projections, leading to potential discrepancies or unequal opportunities.

Unintended analysis and discrimination: When analyzing datasets, there is a risk of unintended analysis and discrimination. Models or algorithms trained on data may inadvertently learn to bias certain groups based on factors such as language, genre, or country of production.

# Team Expectations 

* *Communicate through Discord and regularly schedule weekly remote meetings*
* *Major project decisions, review of work done during zoom or discord meetings2*
* *Work is to be divided equally, individually completed before next meeting*

# Project Timeline Proposal

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 5/17  |  4:30 |  Each part of the project proposal  | Finalize proposal and push to GitHub | 
| 5/23  |  6:15 |  Read the proposals | Complete peer review, discuss parts of the checkpoint and assign | 
| 5/30  | 6:15  | Assigned checkpoint parts  | Finalize the checkpoint and push to GitHub. Talk about what is left for final project and assign parts |
| 6/8  | 6:15  | Make good progress/finish final project parts | Discuss what is completed and make edits. |
| 6/13  | 6:15  | Finish assigned final parts | Finalize and push to GitHub |

# Footnotes
<a name="Eliashberg"></a>1.[^](#Eliashberg):Eliashberg, J., Elberse, A., & Leenders, M. A. (2013). The Motion Picture Industry: Critical Issues in Practice, Current Research, and New Research Directions:https://repository.upenn.edu/cgi/viewcontent.cgi?article=1179&context=oid_papers<br> 

<a name="Wallace"></a>2.[^](#Wallace):Wallace, W. T., Seigerman, A., & Holbrook, M. B. (1993). The Role of Actors and Actresses in the Success of Films: How Much Is a Movie Star Worth?https://link.springer.com/article/10.1007/BF00820765<br>

<a name="De Vany"></a>3.[^](#De):De Vany, A., & Walls, W. D. (1996). Bose–Einstein dynamics and adaptive contracting in the motion picture industry.https://econpapers.repec.org/article/ecjeconjl/v_3a106_3ay_3a1996_3ai_3a439_3ap_3a1493-1514.htm<br>
