![example](images/director_shot.jpeg)

# Project Title

**Authors:** Student 1, Student 2, Student 3
***

## Overview

A one-paragraph overview of the project, including the business problem, data, methods, results and recommendations.

## Business Problem

Summary of the business problem you are trying to solve, and the data questions that you plan to answer to solve them.

***
Questions to consider:
* What are the business's pain points related to this project?
* How did you pick the data analysis question(s) that you did?
* Why are these questions important from a business perspective?
***

## Data Understanding

Describe the data being used for this project.
***
Questions to consider:
* Where did the data come from, and how do they relate to the data analysis questions?
* What do the data represent? Who is in the sample and what variables are included?
* What is the target variable?
* What are the properties of the variables you intend to use?
***

In [1]:
# Import standard packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [2]:
# Here you run your code to explore the data
!ls data/zippedData

bom.movie_gross.csv.gz
imdb.name.basics.csv.gz
imdb.title.akas.csv.gz
imdb.title.basics.csv.gz
imdb.title.crew.csv.gz
imdb.title.principals.csv.gz
imdb.title.ratings.csv.gz
rt.movie_info.tsv.gz
rt.reviews.tsv.gz
tmdb.movies.csv.gz
tn.movie_budgets.csv.gz


## IMPORT ALL DATA

In [3]:
#Box Office Mojo Data (bom)
bom_moviegross_df = pd.read_csv('data/zippedData/bom.movie_gross.csv.gz')

#IMDB Data (imdb)
imdb_name_basics = pd.read_csv('data/zippedData/imdb.name.basics.csv.gz') 
imdb_title_akas_df = pd.read_csv('data/zippedData/imdb.title.akas.csv.gz')
imdb_title_basics_df = pd.read_csv('data/zippedData/imdb.title.basics.csv.gz')
imdb_title_crew_df = pd.read_csv('data/zippedData/imdb.title.crew.csv.gz')
imdb_title_principals_df = pd.read_csv('data/zippedData/imdb.title.principals.csv.gz')
imdb_title_ratings_df = pd.read_csv('data/zippedData/imdb.title.ratings.csv.gz')

#Rotten Tomatoes Data (rt). 
#rt_movie_info_df = pd.read_csv('data/zippedData/rt.movie_info.tsv.gz', sep = '/t', encoding = 'windows-1254')
#rt_reviews_df =  pd.read_csv('data/zippedData/rt.reviews.tsv.gz', sep = '/t', encoding = 'windows-1254')

#The Movie Database (tmdb)
tmdb_movies_df = pd.read_csv('data/zippedData/tmdb.movies.csv.gz')

#The Numbers (tn)
tn_movie_budgets_df = pd.read_csv('data/zippedData/tn.movie_budgets.csv.gz')

## Preview DataFrames

In [4]:
bom_moviegross_df.head()

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year
0,Toy Story 3,BV,415000000.0,652000000,2010
1,Alice in Wonderland (2010),BV,334200000.0,691300000,2010
2,Harry Potter and the Deathly Hallows Part 1,WB,296000000.0,664300000,2010
3,Inception,WB,292600000.0,535700000,2010
4,Shrek Forever After,P/DW,238700000.0,513900000,2010


In [5]:
#imdb_name_basics.head()

In [6]:
# imdb_title_akas_df.head()

In [7]:
# imdb_title_basics_df.head()

In [8]:
# imdb_title_crew_df.head()

In [9]:
# imdb_title_principals_df.head()

In [10]:
# imdb_title_ratings_df.head()

In [11]:
# rt_move_info_df.head()

In [12]:
#rt_reviews_df.head()

In [13]:
#tmdb_movies_df.head()

In [14]:
# tn_movie_budgets_df.head()

## Check for missing values

In [15]:
bom_moviegross_df.isna().sum()

title                0
studio               5
domestic_gross      28
foreign_gross     1350
year                 0
dtype: int64

In [16]:
#imdb_name_basics.isna().sum()

In [17]:
#imdb_title_akas_df.isna().sum()

In [18]:
#imdb_title_basics_df.isna().sum()

In [19]:
#imdb_title_crew_df.isna().sum()

In [20]:
#imdb_title_principals_df.isna().sum()

In [21]:
#imdb_title_ratings_df.isna().sum()

## Data Preparation

Describe and justify the process for preparing the data for analysis.

***
Questions to consider:
* Were there variables you dropped or created?
* How did you address missing values or outliers?
* Why are these choices appropriate given the data and the business problem?
***

In [22]:
# Here you run your code to clean the data

## Box Office Mojo Data

In [23]:
bom_moviegross_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3387 entries, 0 to 3386
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   title           3387 non-null   object 
 1   studio          3382 non-null   object 
 2   domestic_gross  3359 non-null   float64
 3   foreign_gross   2037 non-null   object 
 4   year            3387 non-null   int64  
dtypes: float64(1), int64(1), object(3)
memory usage: 132.4+ KB


In [24]:
bom_moviegross_df['foreign_gross'].describe()

count        2037
unique       1204
top       1200000
freq           23
Name: foreign_gross, dtype: object

In [25]:
bom_moviegross_df.isna().sum()

title                0
studio               5
domestic_gross      28
foreign_gross     1350
year                 0
dtype: int64

In [26]:
bom_moviegross_df['foreign_gross'].shape

(3387,)

Let's find the percentage of data that is null!

In [27]:
num_missing_bom_foreigngross = bom_moviegross_df.isna().sum()['foreign_gross']
total_moviegross_entries = len(bom_moviegross_df['foreign_gross'])
percentage_missing_foreign = num_missing_bom_foreigngross / total_moviegross_entries
print(percentage_missing_foreign)

0.3985828166519043


Hmmmmm, so 40% of the data for the 'foreign_gross' column is missing and classified as a NaN. 

Let's dig a little deeper.

In [28]:
len(list(bom_moviegross_df['foreign_gross'].unique()))

1205

There are 1205 unique values in this series. What type of data are they?

In [29]:
non_standard = []
for x in list(bom_moviegross_df['foreign_gross'].unique()):
    if type(x) != int and type(x) != float:
        non_standard.append(x)
    else:
        pass

In [30]:
len(non_standard)

1204

In [31]:
[type(x) for x in non_standard if type(x) != str] #All the unique values are strings

[]

Let's summarize what we have so far:
    
    3387 total entries
    1350 null values
    1204 str type values

This leaves:
    
    833 int/float type values

Let's convert every item in this series to float in order to remove the 'str' types

In [33]:
foreign_gross_numeric = pd.to_numeric(bom_moviegross_df['foreign_gross'], errors = 'coerce', downcast = 'float') #Converted from str, int, float to float

In [34]:
foreign_gross_numeric.describe()

count         2032.0
mean      75057048.0
std      137529360.0
min            600.0
25%        3775000.0
50%       18900000.0
75%       75050000.0
max      960499968.0
Name: foreign_gross, dtype: float64

In [35]:
bom_moviegross_df['foreign_gross'] = foreign_gross_numeric #Assigned new series in place of original df series

In [36]:
bom_moviegross_df.isna().sum()

title                0
studio               5
domestic_gross      28
foreign_gross     1355
year                 0
dtype: int64

In [37]:
bom_moviegross_df['foreign_gross'].describe()

count         2032.0
mean      75057048.0
std      137529360.0
min            600.0
25%        3775000.0
50%       18900000.0
75%       75050000.0
max      960499968.0
Name: foreign_gross, dtype: float64

In [38]:
mean_adjusted = bom_moviegross_df['foreign_gross'].fillna(75057048)

In [39]:
mean_adjusted.describe()

count         3387.0
mean      75057040.0
std      106514048.0
min            600.0
25%       11750000.0
50%       75057048.0
75%       75057048.0
max      960499968.0
Name: foreign_gross, dtype: float64

In [40]:
mean_adjusted_removed = bom_moviegross_df[(bom_moviegross_df['foreign_gross'] > 1000000) & (bom_moviegross_df['domestic_gross'] > 1000000)]

In [41]:
mean_adjusted_removed.shape

(1358, 5)

In [42]:
mean_adjusted_removed.isna().sum()

title             0
studio            0
domestic_gross    0
foreign_gross     0
year              0
dtype: int64

Let's just remove everything that doesn't have a domestic gross value

In [43]:
mean_adjusted_removed.dropna(axis=0, how='any', inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mean_adjusted_removed.dropna(axis=0, how='any', inplace=True)


In [44]:
mean_adjusted_removed

Unnamed: 0,title,studio,domestic_gross,foreign_gross,year
0,Toy Story 3,BV,415000000.0,652000000.0,2010
1,Alice in Wonderland (2010),BV,334200000.0,691299968.0,2010
2,Harry Potter and the Deathly Hallows Part 1,WB,296000000.0,664300032.0,2010
3,Inception,WB,292600000.0,535700000.0,2010
4,Shrek Forever After,P/DW,238700000.0,513900000.0,2010
...,...,...,...,...,...
3252,Papillon (2018),BST,2300000.0,2200000.0,2018
3257,Don't Worry He Won't Get Far on Foot,Amazon,1400000.0,2500000.0,2018
3258,A Private War,Aviron,1600000.0,2200000.0,2018
3263,The Front Runner,Sony,2000000.0,1200000.0,2018


In [45]:
mean_adjusted_removed.isna().sum()

title             0
studio            0
domestic_gross    0
foreign_gross     0
year              0
dtype: int64

Let's drop the columns that I will not be using

In [46]:
mean_adjusted_removed.drop('year', axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


In [47]:
mean_adjusted_removed.drop('studio', axis=1, inplace=True)

In [48]:
mean_adjusted_removed

Unnamed: 0,title,domestic_gross,foreign_gross
0,Toy Story 3,415000000.0,652000000.0
1,Alice in Wonderland (2010),334200000.0,691299968.0
2,Harry Potter and the Deathly Hallows Part 1,296000000.0,664300032.0
3,Inception,292600000.0,535700000.0
4,Shrek Forever After,238700000.0,513900000.0
...,...,...,...
3252,Papillon (2018),2300000.0,2200000.0
3257,Don't Worry He Won't Get Far on Foot,1400000.0,2500000.0
3258,A Private War,1600000.0,2200000.0
3263,The Front Runner,2000000.0,1200000.0


In [49]:
final_moviegross_df = mean_adjusted_removed.set_index('title')

In [50]:
final_moviegross_df['total_gross'] = final_moviegross_df['domestic_gross'] + final_moviegross_df['foreign_gross']

In [51]:
final_moviegross_df

Unnamed: 0_level_0,domestic_gross,foreign_gross,total_gross
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Toy Story 3,415000000.0,652000000.0,1.067000e+09
Alice in Wonderland (2010),334200000.0,691299968.0,1.025500e+09
Harry Potter and the Deathly Hallows Part 1,296000000.0,664300032.0,9.603000e+08
Inception,292600000.0,535700000.0,8.283000e+08
Shrek Forever After,238700000.0,513900000.0,7.526000e+08
...,...,...,...
Papillon (2018),2300000.0,2200000.0,4.500000e+06
Don't Worry He Won't Get Far on Foot,1400000.0,2500000.0,3.900000e+06
A Private War,1600000.0,2200000.0,3.800000e+06
The Front Runner,2000000.0,1200000.0,3.200000e+06


## IMDB Title_basics data

Let's clean the imdb_title_basics_df

In [52]:
imdb_title_basics_df

Unnamed: 0,tconst,primary_title,original_title,start_year,runtime_minutes,genres
0,tt0063540,Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama"
1,tt0066787,One Day Before the Rainy Season,Ashad Ka Ek Din,2019,114.0,"Biography,Drama"
2,tt0069049,The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama
3,tt0069204,Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama"
4,tt0100275,The Wandering Soap Opera,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy"
...,...,...,...,...,...,...
146139,tt9916538,Kuambil Lagi Hatiku,Kuambil Lagi Hatiku,2019,123.0,Drama
146140,tt9916622,Rodolpho Teóphilo - O Legado de um Pioneiro,Rodolpho Teóphilo - O Legado de um Pioneiro,2015,,Documentary
146141,tt9916706,Dankyavar Danka,Dankyavar Danka,2013,,Comedy
146142,tt9916730,6 Gunn,6 Gunn,2017,116.0,


In [53]:
imdb_title_index = imdb_title_basics_df.set_index('primary_title')

In [54]:
imdb_title_index

Unnamed: 0_level_0,tconst,original_title,start_year,runtime_minutes,genres
primary_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Sunghursh,tt0063540,Sunghursh,2013,175.0,"Action,Crime,Drama"
One Day Before the Rainy Season,tt0066787,Ashad Ka Ek Din,2019,114.0,"Biography,Drama"
The Other Side of the Wind,tt0069049,The Other Side of the Wind,2018,122.0,Drama
Sabse Bada Sukh,tt0069204,Sabse Bada Sukh,2018,,"Comedy,Drama"
The Wandering Soap Opera,tt0100275,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy"
...,...,...,...,...,...
Kuambil Lagi Hatiku,tt9916538,Kuambil Lagi Hatiku,2019,123.0,Drama
Rodolpho Teóphilo - O Legado de um Pioneiro,tt9916622,Rodolpho Teóphilo - O Legado de um Pioneiro,2015,,Documentary
Dankyavar Danka,tt9916706,Dankyavar Danka,2013,,Comedy
6 Gunn,tt9916730,6 Gunn,2017,116.0,


In [55]:
imdb_title_basics_df.shape

(146144, 5)

In [56]:
imdb_title_index_dropped = imdb_title_index.drop('tconst', axis=1)

In [57]:
imdb_title_index_dropped

Unnamed: 0_level_0,original_title,start_year,runtime_minutes,genres
primary_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama"
One Day Before the Rainy Season,Ashad Ka Ek Din,2019,114.0,"Biography,Drama"
The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama
Sabse Bada Sukh,Sabse Bada Sukh,2018,,"Comedy,Drama"
The Wandering Soap Opera,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy"
...,...,...,...,...
Kuambil Lagi Hatiku,Kuambil Lagi Hatiku,2019,123.0,Drama
Rodolpho Teóphilo - O Legado de um Pioneiro,Rodolpho Teóphilo - O Legado de um Pioneiro,2015,,Documentary
Dankyavar Danka,Dankyavar Danka,2013,,Comedy
6 Gunn,6 Gunn,2017,116.0,


In [59]:
imdb_title_index_removedna = imdb_title_index_dropped.dropna(axis=0, how = 'any')

In [62]:
imdb_title_index_removedna

Unnamed: 0_level_0,original_title,start_year,runtime_minutes,genres
primary_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Sunghursh,Sunghursh,2013,175.0,"Action,Crime,Drama"
One Day Before the Rainy Season,Ashad Ka Ek Din,2019,114.0,"Biography,Drama"
The Other Side of the Wind,The Other Side of the Wind,2018,122.0,Drama
The Wandering Soap Opera,La Telenovela Errante,2017,80.0,"Comedy,Drama,Fantasy"
A Thin Life,A Thin Life,2018,75.0,Comedy
...,...,...,...,...
Drømmeland,Drømmeland,2019,72.0,Documentary
The Rehearsal,O Ensaio,2019,51.0,Drama
Illenau - die Geschichte einer ehemaligen Heil- und Pflegeanstalt,Illenau - die Geschichte einer ehemaligen Heil...,2017,84.0,Documentary
Safeguard,Safeguard,2019,90.0,"Drama,Thriller"


In [63]:
dropped_imdb_title = imdb_title_index_removedna.drop(['original_title', 'start_year'], axis=1)

In [64]:
dropped_imdb_title

Unnamed: 0_level_0,runtime_minutes,genres
primary_title,Unnamed: 1_level_1,Unnamed: 2_level_1
Sunghursh,175.0,"Action,Crime,Drama"
One Day Before the Rainy Season,114.0,"Biography,Drama"
The Other Side of the Wind,122.0,Drama
The Wandering Soap Opera,80.0,"Comedy,Drama,Fantasy"
A Thin Life,75.0,Comedy
...,...,...
Drømmeland,72.0,Documentary
The Rehearsal,51.0,Drama
Illenau - die Geschichte einer ehemaligen Heil- und Pflegeanstalt,84.0,Documentary
Safeguard,90.0,"Drama,Thriller"


In [65]:
final_imdb_title_df = dropped_imdb_title

In [66]:
final_imdb_title_df

Unnamed: 0_level_0,runtime_minutes,genres
primary_title,Unnamed: 1_level_1,Unnamed: 2_level_1
Sunghursh,175.0,"Action,Crime,Drama"
One Day Before the Rainy Season,114.0,"Biography,Drama"
The Other Side of the Wind,122.0,Drama
The Wandering Soap Opera,80.0,"Comedy,Drama,Fantasy"
A Thin Life,75.0,Comedy
...,...,...
Drømmeland,72.0,Documentary
The Rehearsal,51.0,Drama
Illenau - die Geschichte einer ehemaligen Heil- und Pflegeanstalt,84.0,Documentary
Safeguard,90.0,"Drama,Thriller"


Here are the two DataFrames we have so far:

In [67]:
final_imdb_title_df

Unnamed: 0_level_0,runtime_minutes,genres
primary_title,Unnamed: 1_level_1,Unnamed: 2_level_1
Sunghursh,175.0,"Action,Crime,Drama"
One Day Before the Rainy Season,114.0,"Biography,Drama"
The Other Side of the Wind,122.0,Drama
The Wandering Soap Opera,80.0,"Comedy,Drama,Fantasy"
A Thin Life,75.0,Comedy
...,...,...
Drømmeland,72.0,Documentary
The Rehearsal,51.0,Drama
Illenau - die Geschichte einer ehemaligen Heil- und Pflegeanstalt,84.0,Documentary
Safeguard,90.0,"Drama,Thriller"


In [68]:
final_moviegross_df

Unnamed: 0_level_0,domestic_gross,foreign_gross,total_gross
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Toy Story 3,415000000.0,652000000.0,1.067000e+09
Alice in Wonderland (2010),334200000.0,691299968.0,1.025500e+09
Harry Potter and the Deathly Hallows Part 1,296000000.0,664300032.0,9.603000e+08
Inception,292600000.0,535700000.0,8.283000e+08
Shrek Forever After,238700000.0,513900000.0,7.526000e+08
...,...,...,...
Papillon (2018),2300000.0,2200000.0,4.500000e+06
Don't Worry He Won't Get Far on Foot,1400000.0,2500000.0,3.900000e+06
A Private War,1600000.0,2200000.0,3.800000e+06
The Front Runner,2000000.0,1200000.0,3.200000e+06


## tn_movie_budgets_df

In [73]:
tn_movie_budgets_df

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross
0,1,"Dec 18, 2009",Avatar,"$425,000,000","$760,507,625","$2,776,345,279"
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,"$410,600,000","$241,063,875","$1,045,663,875"
2,3,"Jun 7, 2019",Dark Phoenix,"$350,000,000","$42,762,350","$149,762,350"
3,4,"May 1, 2015",Avengers: Age of Ultron,"$330,600,000","$459,005,868","$1,403,013,963"
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,"$317,000,000","$620,181,382","$1,316,721,747"
...,...,...,...,...,...,...
5777,78,"Dec 31, 2018",Red 11,"$7,000",$0,$0
5778,79,"Apr 2, 1999",Following,"$6,000","$48,482","$240,495"
5779,80,"Jul 13, 2005",Return to the Land of Wonders,"$5,000","$1,338","$1,338"
5780,81,"Sep 29, 2015",A Plague So Pleasant,"$1,400",$0,$0


In [84]:
tn_movie_budgets_df['production_budget'] = tn_movie_budgets_df['production_budget'].map(lambda x: x.lstrip('$'))

In [86]:
tn_movie_budgets_df['domestic_gross'] = tn_movie_budgets_df['domestic_gross'].map(lambda x: x.lstrip('$'))

In [87]:
tn_movie_budgets_df['worldwide_gross'] = tn_movie_budgets_df['worldwide_gross'].map(lambda x: x.lstrip('$'))

In [88]:
tn_movie_budgets_df

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross
0,1,"Dec 18, 2009",Avatar,425000000,760507625,2776345279
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,410600000,241063875,1045663875
2,3,"Jun 7, 2019",Dark Phoenix,350000000,42762350,149762350
3,4,"May 1, 2015",Avengers: Age of Ultron,330600000,459005868,1403013963
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,317000000,620181382,1316721747
...,...,...,...,...,...,...
5777,78,"Dec 31, 2018",Red 11,7000,0,0
5778,79,"Apr 2, 1999",Following,6000,48482,240495
5779,80,"Jul 13, 2005",Return to the Land of Wonders,5000,1338,1338
5780,81,"Sep 29, 2015",A Plague So Pleasant,1400,0,0


In [89]:
tn_movie_budgets_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5782 entries, 0 to 5781
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   id                 5782 non-null   int64 
 1   release_date       5782 non-null   object
 2   movie              5782 non-null   object
 3   production_budget  5782 non-null   object
 4   domestic_gross     5782 non-null   object
 5   worldwide_gross    5782 non-null   object
dtypes: int64(1), object(5)
memory usage: 271.2+ KB


In [90]:
tn_movie_budgets_df.isna().sum()

id                   0
release_date         0
movie                0
production_budget    0
domestic_gross       0
worldwide_gross      0
dtype: int64

In [70]:
tn_movie_budgets_df[(tn_movie_budgets_df['production_budget'] > 1000000) 
                    & (tn_movie_budgets_df[domestic_gross] > 1000000) 
                    & (tn_movie_budgets_df['worldwide_gross'] > 1000000)]

TypeError: '>' not supported between instances of 'str' and 'int'

## Data Modeling
Describe and justify the process for analyzing or modeling the data.

***
Questions to consider:
* How did you analyze or model the data?
* How did you iterate on your initial approach to make it better?
* Why are these choices appropriate given the data and the business problem?
***

In [None]:
# Here you run your code to model the data


## Evaluation
Evaluate how well your work solves the stated business problem.

***
Questions to consider:
* How do you interpret the results?
* How well does your model fit your data? How much better is this than your baseline model?
* How confident are you that your results would generalize beyond the data you have?
* How confident are you that this model would benefit the business if put into use?
***

## Conclusions
Provide your conclusions about the work you've done, including any limitations or next steps.

***
Questions to consider:
* What would you recommend the business do as a result of this work?
* What are some reasons why your analysis might not fully solve the business problem?
* What else could you do in the future to improve this project?
***