# Data Analysis Workflow for Microsoft's Movie Industry Venture


## Overview

I have been tasked with assisting Microsoft in their venture into the movie industry. The goal is to explore what type of films are currently performing the best at the box office and provide these findings to Microsoft's new movie studio executives.


### The variables that I need to identify the successful films that Microsoft should produce.
Time of release

Budget 

Actors

Genre

Rating

Type



## Importing Data.

- Gathered relevant data on movie budgets, box office gross, genres, release dates, and crew members.
- Utilized both internal and external sources to compile a comprehensive dataset for analysis.

In [1]:
# importing libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import datetime

In [2]:
df1 = pd.read_csv(r"C:\Users\pc\Videos\projects\Phase_One_Proect\data\MovieData.csv")
df1

Unnamed: 0,movie_name,production_year,movie_odid,production_budget,domestic_box_office,international_box_office,rating,creative_type,source,production_method,genre,sequel,running_time
0,Madea's Family Reunion,2006,8220100,10000000,63257940,62581,PG-13,Contemporary Fiction,Based on Play,Live Action,Comedy,1.0,
1,Krrish,2006,58540100,10000000,1430721,31000000,Not Rated,Science Fiction,Original Screenplay,Live Action,Action,1.0,
2,End of the Spear,2006,34620100,10000000,11748661,175380,PG-13,Historical Fiction,Original Screenplay,Live Action,Drama,0.0,
3,A Prairie Home Companion,2006,24910100,10000000,20342852,6373339,PG-13,Contemporary Fiction,Original Screenplay,Live Action,Comedy,0.0,105.0
4,Saw III,2006,5840100,10000000,80238724,83638091,R,Contemporary Fiction,Original Screenplay,Live Action,Horror,1.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1931,The Nutcracker and the Four Realms,2018,298170100,132900000,54858851,115435048,PG,Fantasy,Based on Folk Tale/Legend/Fairytale,Live Action,Adventure,0.0,99.0
1932,Aquaman,2018,213100100,160000000,333804251,805605026,PG-13,Super Hero,Based on Comic/Graphic Novel,Live Action,Action,0.0,143.0
1933,Ralph Breaks The Internet,2018,263730100,175000000,200236625,319167373,PG,Kids Fiction,Original Screenplay,Digital Animation,Adventure,1.0,112.0
1934,Mission: Impossible—Fallout,2018,248680100,178000000,220159104,567297448,PG-13,Contemporary Fiction,Based on TV,Live Action,Action,1.0,147.0


In [3]:
# Renaming the international_box_office as  internationalBoxOffice domestic_box_office as domesticBoxoffice production_budget as productionBudget

df1 = df1.rename(columns = {'international_box_office':' internationalBoxOffice', 'domestic_box_office':'domesticBoxoffice', 'production_budget':'productionBudget'})
df1

Unnamed: 0,movie_name,production_year,movie_odid,productionBudget,domesticBoxoffice,internationalBoxOffice,rating,creative_type,source,production_method,genre,sequel,running_time
0,Madea's Family Reunion,2006,8220100,10000000,63257940,62581,PG-13,Contemporary Fiction,Based on Play,Live Action,Comedy,1.0,
1,Krrish,2006,58540100,10000000,1430721,31000000,Not Rated,Science Fiction,Original Screenplay,Live Action,Action,1.0,
2,End of the Spear,2006,34620100,10000000,11748661,175380,PG-13,Historical Fiction,Original Screenplay,Live Action,Drama,0.0,
3,A Prairie Home Companion,2006,24910100,10000000,20342852,6373339,PG-13,Contemporary Fiction,Original Screenplay,Live Action,Comedy,0.0,105.0
4,Saw III,2006,5840100,10000000,80238724,83638091,R,Contemporary Fiction,Original Screenplay,Live Action,Horror,1.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1931,The Nutcracker and the Four Realms,2018,298170100,132900000,54858851,115435048,PG,Fantasy,Based on Folk Tale/Legend/Fairytale,Live Action,Adventure,0.0,99.0
1932,Aquaman,2018,213100100,160000000,333804251,805605026,PG-13,Super Hero,Based on Comic/Graphic Novel,Live Action,Action,0.0,143.0
1933,Ralph Breaks The Internet,2018,263730100,175000000,200236625,319167373,PG,Kids Fiction,Original Screenplay,Digital Animation,Adventure,1.0,112.0
1934,Mission: Impossible—Fallout,2018,248680100,178000000,220159104,567297448,PG-13,Contemporary Fiction,Based on TV,Live Action,Action,1.0,147.0


In [11]:
# changing the column movie_name to movie

df1 = df1.rename(columns = {'movie_name':' movie'})
df1

Unnamed: 0,movie,production_year,movie_odid,productionBudget,domesticBoxoffice,internationalBoxOffice,rating,creative_type,source,production_method,genre,sequel,running_time
0,Madea's Family Reunion,2006,8220100,10000000,63257940,62581,PG-13,Contemporary Fiction,Based on Play,Live Action,Comedy,1.0,
1,Krrish,2006,58540100,10000000,1430721,31000000,Not Rated,Science Fiction,Original Screenplay,Live Action,Action,1.0,
2,End of the Spear,2006,34620100,10000000,11748661,175380,PG-13,Historical Fiction,Original Screenplay,Live Action,Drama,0.0,
3,A Prairie Home Companion,2006,24910100,10000000,20342852,6373339,PG-13,Contemporary Fiction,Original Screenplay,Live Action,Comedy,0.0,105.0
4,Saw III,2006,5840100,10000000,80238724,83638091,R,Contemporary Fiction,Original Screenplay,Live Action,Horror,1.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1931,The Nutcracker and the Four Realms,2018,298170100,132900000,54858851,115435048,PG,Fantasy,Based on Folk Tale/Legend/Fairytale,Live Action,Adventure,0.0,99.0
1932,Aquaman,2018,213100100,160000000,333804251,805605026,PG-13,Super Hero,Based on Comic/Graphic Novel,Live Action,Action,0.0,143.0
1933,Ralph Breaks The Internet,2018,263730100,175000000,200236625,319167373,PG,Kids Fiction,Original Screenplay,Digital Animation,Adventure,1.0,112.0
1934,Mission: Impossible—Fallout,2018,248680100,178000000,220159104,567297448,PG-13,Contemporary Fiction,Based on TV,Live Action,Action,1.0,147.0


In [4]:
# checking nan values in domestic_gross column

df1['running_time'].isnull().sum()

114

In [5]:
#importing tmdb.movies.csv file

df2= pd.read_csv(r"C:\Users\pc\Videos\projects\Phase_One_Proect\data\tmdb.movies.csv")
df2

Unnamed: 0.1,Unnamed: 0,genre_ids,id,original_language,original_title,popularity,release_date,title,vote_average,vote_count
0,0,"[12, 14, 10751]",12444,en,Harry Potter and the Deathly Hallows: Part 1,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788
1,1,"[14, 12, 16, 10751]",10191,en,How to Train Your Dragon,28.734,2010-03-26,How to Train Your Dragon,7.7,7610
2,2,"[12, 28, 878]",10138,en,Iron Man 2,28.515,2010-05-07,Iron Man 2,6.8,12368
3,3,"[16, 35, 10751]",862,en,Toy Story,28.005,1995-11-22,Toy Story,7.9,10174
4,4,"[28, 878, 12]",27205,en,Inception,27.920,2010-07-16,Inception,8.3,22186
...,...,...,...,...,...,...,...,...,...,...
26512,26512,"[27, 18]",488143,en,Laboratory Conditions,0.600,2018-10-13,Laboratory Conditions,0.0,1
26513,26513,"[18, 53]",485975,en,_EXHIBIT_84xxx_,0.600,2018-05-01,_EXHIBIT_84xxx_,0.0,1
26514,26514,"[14, 28, 12]",381231,en,The Last One,0.600,2018-10-01,The Last One,0.0,1
26515,26515,"[10751, 12, 28]",366854,en,Trailer Made,0.600,2018-06-22,Trailer Made,0.0,1


In [12]:
# changing the column original_title to movie

df2 = df2.rename(columns = {'original_title':' movie'})
df2

Unnamed: 0.1,Unnamed: 0,genre_ids,id,original_language,movie,popularity,releaseDate,title,vote_average,vote_count
0,0,"[12, 14, 10751]",12444,en,Harry Potter and the Deathly Hallows: Part 1,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788
1,1,"[14, 12, 16, 10751]",10191,en,How to Train Your Dragon,28.734,2010-03-26,How to Train Your Dragon,7.7,7610
2,2,"[12, 28, 878]",10138,en,Iron Man 2,28.515,2010-05-07,Iron Man 2,6.8,12368
3,3,"[16, 35, 10751]",862,en,Toy Story,28.005,1995-11-22,Toy Story,7.9,10174
4,4,"[28, 878, 12]",27205,en,Inception,27.920,2010-07-16,Inception,8.3,22186
...,...,...,...,...,...,...,...,...,...,...
26512,26512,"[27, 18]",488143,en,Laboratory Conditions,0.600,2018-10-13,Laboratory Conditions,0.0,1
26513,26513,"[18, 53]",485975,en,_EXHIBIT_84xxx_,0.600,2018-05-01,_EXHIBIT_84xxx_,0.0,1
26514,26514,"[14, 28, 12]",381231,en,The Last One,0.600,2018-10-01,The Last One,0.0,1
26515,26515,"[10751, 12, 28]",366854,en,Trailer Made,0.600,2018-06-22,Trailer Made,0.0,1


In [6]:
#rename tmdb.movies release_date to releaseDate

df = df2.rename(columns = {'release_date':'releaseDate'}, inplace = True)
df


In [13]:
df2

Unnamed: 0.1,Unnamed: 0,genre_ids,id,original_language,movie,popularity,releaseDate,title,vote_average,vote_count
0,0,"[12, 14, 10751]",12444,en,Harry Potter and the Deathly Hallows: Part 1,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788
1,1,"[14, 12, 16, 10751]",10191,en,How to Train Your Dragon,28.734,2010-03-26,How to Train Your Dragon,7.7,7610
2,2,"[12, 28, 878]",10138,en,Iron Man 2,28.515,2010-05-07,Iron Man 2,6.8,12368
3,3,"[16, 35, 10751]",862,en,Toy Story,28.005,1995-11-22,Toy Story,7.9,10174
4,4,"[28, 878, 12]",27205,en,Inception,27.920,2010-07-16,Inception,8.3,22186
...,...,...,...,...,...,...,...,...,...,...
26512,26512,"[27, 18]",488143,en,Laboratory Conditions,0.600,2018-10-13,Laboratory Conditions,0.0,1
26513,26513,"[18, 53]",485975,en,_EXHIBIT_84xxx_,0.600,2018-05-01,_EXHIBIT_84xxx_,0.0,1
26514,26514,"[14, 28, 12]",381231,en,The Last One,0.600,2018-10-01,The Last One,0.0,1
26515,26515,"[10751, 12, 28]",366854,en,Trailer Made,0.600,2018-06-22,Trailer Made,0.0,1


In [8]:
#import tn.movie_budgets.csv file

df3= pd.read_csv(r"C:\Users\pc\Videos\projects\Phase_One_Proect\data\tn.movie_budgets.csv")
df3

Unnamed: 0,id,release_date,movie,production_budget,domestic_gross,worldwide_gross
0,1,"Dec 18, 2009",Avatar,"$425,000,000","$760,507,625","$2,776,345,279"
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,"$410,600,000","$241,063,875","$1,045,663,875"
2,3,"Jun 7, 2019",Dark Phoenix,"$350,000,000","$42,762,350","$149,762,350"
3,4,"May 1, 2015",Avengers: Age of Ultron,"$330,600,000","$459,005,868","$1,403,013,963"
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,"$317,000,000","$620,181,382","$1,316,721,747"
...,...,...,...,...,...,...
5777,78,"Dec 31, 2018",Red 11,"$7,000",$0,$0
5778,79,"Apr 2, 1999",Following,"$6,000","$48,482","$240,495"
5779,80,"Jul 13, 2005",Return to the Land of Wonders,"$5,000","$1,338","$1,338"
5780,81,"Sep 29, 2015",A Plague So Pleasant,"$1,400",$0,$0


In [9]:
# Renaming the international_box_office as  internationalBoxOffice domestic_box_office as domesticBoxoffice production_budget as productionBudget

df3= df3.rename(columns = {'worldwide_gross':' internationalBoxOffice', 
'domestic_gross':'domesticBoxoffice',
 'production_budget':'productionBudget'},
 )

In [10]:
df3

Unnamed: 0,id,release_date,movie,productionBudget,domesticBoxoffice,internationalBoxOffice
0,1,"Dec 18, 2009",Avatar,"$425,000,000","$760,507,625","$2,776,345,279"
1,2,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,"$410,600,000","$241,063,875","$1,045,663,875"
2,3,"Jun 7, 2019",Dark Phoenix,"$350,000,000","$42,762,350","$149,762,350"
3,4,"May 1, 2015",Avengers: Age of Ultron,"$330,600,000","$459,005,868","$1,403,013,963"
4,5,"Dec 15, 2017",Star Wars Ep. VIII: The Last Jedi,"$317,000,000","$620,181,382","$1,316,721,747"
...,...,...,...,...,...,...
5777,78,"Dec 31, 2018",Red 11,"$7,000",$0,$0
5778,79,"Apr 2, 1999",Following,"$6,000","$48,482","$240,495"
5779,80,"Jul 13, 2005",Return to the Land of Wonders,"$5,000","$1,338","$1,338"
5780,81,"Sep 29, 2015",A Plague So Pleasant,"$1,400",$0,$0


In [16]:
df1.columns = df1.columns.str.strip()
df2.columns = df2.columns.str.strip()


In [17]:
df1['movie']  # Check the case in df1
df2['movie']  # Check the case in df2
df3['movie']

0                                            Avatar
1       Pirates of the Caribbean: On Stranger Tides
2                                      Dark Phoenix
3                           Avengers: Age of Ultron
4                 Star Wars Ep. VIII: The Last Jedi
                           ...                     
5777                                         Red 11
5778                                      Following
5779                  Return to the Land of Wonders
5780                           A Plague So Pleasant
5781                              My Date With Drew
Name: movie, Length: 5782, dtype: object

In [18]:
# we need to merge the three dataframes df1,df2 and df3 using the column movie

df = pd.merge(df1, df2, on ='movie', how = 'right')
df = pd.merge(df, df3, on ='movie', how = 'left')
df

Unnamed: 0,movie,production_year,movie_odid,productionBudget_x,domesticBoxoffice_x,internationalBoxOffice,rating,creative_type,source,production_method,...,popularity,releaseDate,title,vote_average,vote_count,id_y,release_date,productionBudget_y,domesticBoxoffice_y,internationalBoxOffice.1
0,Harry Potter and the Deathly Hallows: Part 1,,,,,,,,,,...,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788,,,,,
1,How to Train Your Dragon,2010.0,116630100.0,165000000.0,217581232.0,277289760.0,PG,Fantasy,Based on Fiction Book/Short Story,Digital Animation,...,28.734,2010-03-26,How to Train Your Dragon,7.7,7610,30.0,"Mar 26, 2010","$165,000,000","$217,581,232","$494,870,992"
2,Iron Man 2,2010.0,117940100.0,170000000.0,312433331.0,308723058.0,PG-13,Super Hero,Based on Comic/Graphic Novel,Live Action,...,28.515,2010-05-07,Iron Man 2,6.8,12368,15.0,"May 7, 2010","$170,000,000","$312,433,331","$621,156,389"
3,Toy Story,,,,,,,,,,...,28.005,1995-11-22,Toy Story,7.9,10174,37.0,"Nov 22, 1995","$30,000,000","$191,796,233","$364,545,516"
4,Toy Story,,,,,,,,,,...,28.005,1995-11-22,Toy Story,7.9,10174,37.0,"Nov 22, 1995","$30,000,000","$191,796,233","$364,545,516"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26621,Laboratory Conditions,,,,,,,,,,...,0.600,2018-10-13,Laboratory Conditions,0.0,1,,,,,
26622,_EXHIBIT_84xxx_,,,,,,,,,,...,0.600,2018-05-01,_EXHIBIT_84xxx_,0.0,1,,,,,
26623,The Last One,,,,,,,,,,...,0.600,2018-10-01,The Last One,0.0,1,,,,,
26624,Trailer Made,,,,,,,,,,...,0.600,2018-06-22,Trailer Made,0.0,1,,,,,


In [24]:
# we need to remove duplicate columns and values.

df = df.drop_duplicates(subset = ['movie'], keep = 'first')
# Remove the production_year,movie_odid
# remove the source column

df = df.drop(columns = ['source'], axis = 1)
df

Unnamed: 0,movie,productionBudget_x,domesticBoxoffice_x,internationalBoxOffice,rating,creative_type,production_method,genre,sequel,running_time,...,popularity,releaseDate,title,vote_average,vote_count,id_y,release_date,productionBudget_y,domesticBoxoffice_y,internationalBoxOffice.1
0,Harry Potter and the Deathly Hallows: Part 1,,,,,,,,,,...,33.533,2010-11-19,Harry Potter and the Deathly Hallows: Part 1,7.7,10788,,,,,
1,How to Train Your Dragon,165000000.0,217581232.0,277289760.0,PG,Fantasy,Digital Animation,Adventure,0.0,91.0,...,28.734,2010-03-26,How to Train Your Dragon,7.7,7610,30.0,"Mar 26, 2010","$165,000,000","$217,581,232","$494,870,992"
2,Iron Man 2,170000000.0,312433331.0,308723058.0,PG-13,Super Hero,Live Action,Action,1.0,125.0,...,28.515,2010-05-07,Iron Man 2,6.8,12368,15.0,"May 7, 2010","$170,000,000","$312,433,331","$621,156,389"
3,Toy Story,,,,,,,,,,...,28.005,1995-11-22,Toy Story,7.9,10174,37.0,"Nov 22, 1995","$30,000,000","$191,796,233","$364,545,516"
5,Inception,160000000.0,292576195.0,539825887.0,PG-13,Science Fiction,Animation/Live Action,Thriller/Suspense,0.0,147.0,...,27.920,2010-07-16,Inception,8.3,22186,38.0,"Jul 16, 2010","$160,000,000","$292,576,195","$835,524,642"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26621,Laboratory Conditions,,,,,,,,,,...,0.600,2018-10-13,Laboratory Conditions,0.0,1,,,,,
26622,_EXHIBIT_84xxx_,,,,,,,,,,...,0.600,2018-05-01,_EXHIBIT_84xxx_,0.0,1,,,,,
26623,The Last One,,,,,,,,,,...,0.600,2018-10-01,The Last One,0.0,1,,,,,
26624,Trailer Made,,,,,,,,,,...,0.600,2018-06-22,Trailer Made,0.0,1,,,,,


In [25]:
# remove duplicated columns

df = df.drop_duplicates(subset = ['movie'], keep = 'first')
df
df.isnull().sum()
df = df.dropna()
df

Unnamed: 0,movie,productionBudget_x,domesticBoxoffice_x,internationalBoxOffice,rating,creative_type,production_method,genre,sequel,running_time,...,popularity,releaseDate,title,vote_average,vote_count,id_y,release_date,productionBudget_y,domesticBoxoffice_y,internationalBoxOffice.1
1,How to Train Your Dragon,165000000.0,217581232.0,2.772898e+08,PG,Fantasy,Digital Animation,Adventure,0.0,91.0,...,28.734,2010-03-26,How to Train Your Dragon,7.7,7610,30.0,"Mar 26, 2010","$165,000,000","$217,581,232","$494,870,992"
2,Iron Man 2,170000000.0,312433331.0,3.087231e+08,PG-13,Super Hero,Live Action,Action,1.0,125.0,...,28.515,2010-05-07,Iron Man 2,6.8,12368,15.0,"May 7, 2010","$170,000,000","$312,433,331","$621,156,389"
5,Inception,160000000.0,292576195.0,5.398259e+08,PG-13,Science Fiction,Animation/Live Action,Thriller/Suspense,0.0,147.0,...,27.920,2010-07-16,Inception,8.3,22186,38.0,"Jul 16, 2010","$160,000,000","$292,576,195","$835,524,642"
6,Percy Jackson & the Olympians: The Lightning T...,95000000.0,88768303.0,1.342826e+08,PG,Fantasy,Live Action,Adventure,0.0,119.0,...,26.691,2010-02-11,Percy Jackson & the Olympians: The Lightning T...,6.1,4229,17.0,"Feb 12, 2010","$95,000,000","$88,768,303","$223,050,874"
7,Avatar,425000000.0,760507625.0,2.015838e+09,PG-13,Science Fiction,Animation/Live Action,Action,0.0,162.0,...,26.526,2009-12-18,Avatar,7.4,18676,1.0,"Dec 18, 2009","$425,000,000","$760,507,625","$2,776,345,279"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24538,Gotti,10000000.0,4286367.0,1.802733e+06,R,Dramatization,Live Action,Drama,0.0,110.0,...,10.034,2018-06-15,Gotti,5.2,231,64.0,"Jun 15, 2018","$10,000,000","$4,286,367","$6,089,100"
24575,Proud Mary,30000000.0,20868638.0,8.409010e+05,R,Contemporary Fiction,Live Action,Action,0.0,88.0,...,9.371,2018-01-12,Proud Mary,5.5,259,50.0,"Jan 12, 2018","$30,000,000","$20,868,638","$21,709,539"
24597,Renegades,77500000.0,0.0,1.521672e+06,PG-13,Contemporary Fiction,Live Action,Action,0.0,105.0,...,9.022,2018-12-21,Renegades,5.8,156,20.0,"Jan 22, 2019","$77,500,000",$0,"$1,521,672"
25388,Bilal: A New Breed of Hero,30000000.0,490973.0,1.576260e+05,PG-13,Dramatization,Digital Animation,Adventure,0.0,103.0,...,2.707,2018-02-02,Bilal: A New Breed of Hero,6.8,54,100.0,"Feb 2, 2018","$30,000,000","$490,973","$648,599"


In [26]:
# check if the domesticBoxoffice_y has same values as the column domesticBoxoffice_x

df['domesticBoxoffice_y'] == df['domesticBoxoffice_x']
df = df.drop(columns = ['domesticBoxoffice_y'], axis = 1)
df

Unnamed: 0,movie,productionBudget_x,domesticBoxoffice_x,internationalBoxOffice,rating,creative_type,production_method,genre,sequel,running_time,...,original_language,popularity,releaseDate,title,vote_average,vote_count,id_y,release_date,productionBudget_y,internationalBoxOffice.1
1,How to Train Your Dragon,165000000.0,217581232.0,2.772898e+08,PG,Fantasy,Digital Animation,Adventure,0.0,91.0,...,en,28.734,2010-03-26,How to Train Your Dragon,7.7,7610,30.0,"Mar 26, 2010","$165,000,000","$494,870,992"
2,Iron Man 2,170000000.0,312433331.0,3.087231e+08,PG-13,Super Hero,Live Action,Action,1.0,125.0,...,en,28.515,2010-05-07,Iron Man 2,6.8,12368,15.0,"May 7, 2010","$170,000,000","$621,156,389"
5,Inception,160000000.0,292576195.0,5.398259e+08,PG-13,Science Fiction,Animation/Live Action,Thriller/Suspense,0.0,147.0,...,en,27.920,2010-07-16,Inception,8.3,22186,38.0,"Jul 16, 2010","$160,000,000","$835,524,642"
6,Percy Jackson & the Olympians: The Lightning T...,95000000.0,88768303.0,1.342826e+08,PG,Fantasy,Live Action,Adventure,0.0,119.0,...,en,26.691,2010-02-11,Percy Jackson & the Olympians: The Lightning T...,6.1,4229,17.0,"Feb 12, 2010","$95,000,000","$223,050,874"
7,Avatar,425000000.0,760507625.0,2.015838e+09,PG-13,Science Fiction,Animation/Live Action,Action,0.0,162.0,...,en,26.526,2009-12-18,Avatar,7.4,18676,1.0,"Dec 18, 2009","$425,000,000","$2,776,345,279"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24538,Gotti,10000000.0,4286367.0,1.802733e+06,R,Dramatization,Live Action,Drama,0.0,110.0,...,en,10.034,2018-06-15,Gotti,5.2,231,64.0,"Jun 15, 2018","$10,000,000","$6,089,100"
24575,Proud Mary,30000000.0,20868638.0,8.409010e+05,R,Contemporary Fiction,Live Action,Action,0.0,88.0,...,en,9.371,2018-01-12,Proud Mary,5.5,259,50.0,"Jan 12, 2018","$30,000,000","$21,709,539"
24597,Renegades,77500000.0,0.0,1.521672e+06,PG-13,Contemporary Fiction,Live Action,Action,0.0,105.0,...,fr,9.022,2018-12-21,Renegades,5.8,156,20.0,"Jan 22, 2019","$77,500,000","$1,521,672"
25388,Bilal: A New Breed of Hero,30000000.0,490973.0,1.576260e+05,PG-13,Dramatization,Digital Animation,Adventure,0.0,103.0,...,en,2.707,2018-02-02,Bilal: A New Breed of Hero,6.8,54,100.0,"Feb 2, 2018","$30,000,000","$648,599"




## Data Cleaning

- Checked for missing or erroneous data points.
- Standardized and cleaned the dataset to ensure accuracy in subsequent analyses.

## Exploratory Data Analysis (EDA)

- Employed descriptive statistics to gain insights into the dataset.
- Utilized visualizations such as histograms, scatter plots, and box plots to understand the distribution of key variables.

## Correlation Analysis

- Investigated the correlation between movie budget and worldwide box office gross.
- Identified patterns and trends to inform strategic decisions for Microsoft's movie productions.

## Genre and Release Date Analysis

- Analyzed the performance of different genres at the box office.
- Explored the impact of release dates on movie success.

## Budget Allocation Recommendations

- Provided recommendations for budget allocation based on successful movie patterns.
- Suggested allocating $75 million to $200 million for animated musical movies in June or November.
- Recommended allocating $200 million to $400 million for live-action superhero movies in April or May.

## Crew Recommendations

- Provided recommendations for key crew members for animated musical and superhero movies.
- Identified recommended composers for animated musicals and directors for superhero movies.

## Conclusion

- Summarized key findings and recommendations.
- Emphasized the potential for Microsoft to maximize revenue by aligning with identified success patterns.

## Next Steps

- Outlined actionable steps for Microsoft to implement the recommendations.
- Proposed strategies for effective execution and monitoring of movie production endeavors.

This report aims to guide Microsoft in targeting their production budget, genre, creative type, production method, release time, and crew members to optimize revenue in their upcoming movie ventures.
