## Final Project Submission

Please fill out:
* Student name: Amos Kibet
* Student pace: part time 
* Scheduled project review date/time: 
* Instructor name: 
* Blog post URL:


# Scraping Movie Data: Dates, Rating, Runtime, Genre, Release Date, Actors, Directors, Budget, and Gross Income.
The websites that I will be scraping in this notebook are:
1. https://www.imdb.com/search/title/?title_type=feature&num_votes=5000,&languages=en&sort=boxoffice_gross_us,desc&start=1&explore=genres&ref_=adv_nxt
2. https://www.the-numbers.com/movie/budgets/all
Then create four clean tables; a base movie table with and without the finances included, an actors table, and a directors table.

In [272]:
#Import necessary packages
import pandas as pd
import numpy as np
import requests
import re
import bleach
from time import sleep
from random import randint
from bs4 import BeautifulSoup
%matplotlib inline

First, we are going to scrap the IMDb website for all movies with over 5000 reviews. In our first table we are going to scrape: Movie Title, Release Date, IMDb Rating, Movie Rating, Runtime, and finally Genre. We'll  combine all of these data points into a table called IMDB_movie_df.

In [274]:
# Create a list for each of the data points being scraped.
names = []
years = []
imdb_ratings = []
ratings = []
runtimes = []
genres = []
#Develop a for-loop that will iterate through each of the pages.
for i in range(1, 5000, 50):
    url = 'https://www.imdb.com/search/title/?title_type=feature&num_votes=5000,&languages=en&sort=boxoffice_gross_us,desc&start={}&explore=genres&ref_=adv_nxt'.format(i)
    response = requests.get(url)
    sleep(randint(8,15))
    soup = BeautifulSoup(response.text, 'html.parser')
    warning = soup.find_all('div', class_='lister-item mode-advanced')
    # Extract data from individual movie container
    for container in warning:
        # If the movie has Rating, then extract:
        if container.find('span', class_ = 'certificate') is not None:
            # The Movie Title
            name = container.h3.a.text
            names.append(name)
            # The Release Date
            year = container.h3.find('span', class_ = 'lister-item-year').text
            years.append(year)
            # The IMDB rating
            imdb = float(container.strong.text)
            imdb_ratings.append(imdb)
            # The Rating
            rating = container.find('span', class_ = 'certificate').text
            ratings.append(rating)
            # The Movie Runtime
            runtime = container.find('span', class_ = 'runtime').text
            runtimes.append(runtime)
            # The Movie Genres
            genre = container.find('span', class_ = 'genre').text
            genres.append(genre)
            
    

In [275]:
# Create a DataFrame from the newly acquired data.
IMDb_movie_df = pd.DataFrame({'Movie': names,
'Year': years,
'IMDb': imdb_ratings,
'Rating': ratings,
'Runtime': runtimes,
'Genre': genres
})

In [276]:
# Clean the Year, Runtime, and Genre data.
IMDb_movie_df['Year'] =IMDb_movie_df['Year'].str[-5:-1].astype(int)
IMDb_movie_df['Runtime'] = IMDb_movie_df['Runtime'].str[:-4].astype(int)
IMDb_movie_df['Genre'] = IMDb_movie_df['Genre'].map(lambda x: x.strip())

In [277]:
print(IMDb_movie_df.info()) #Checking the data type
IMDb_movie_df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3802 entries, 0 to 3801
Data columns (total 6 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Movie    3802 non-null   object 
 1   Year     3802 non-null   int32  
 2   IMDb     3802 non-null   float64
 3   Rating   3802 non-null   object 
 4   Runtime  3802 non-null   int32  
 5   Genre    3802 non-null   object 
dtypes: float64(1), int32(2), object(3)
memory usage: 148.6+ KB
None


Unnamed: 0,Movie,Year,IMDb,Rating,Runtime,Genre
0,Star Wars: Episode VII - The Force Awakens,2015,7.8,PG-13,138,"Action, Adventure, Sci-Fi"
1,Avengers: Endgame,2019,8.4,PG-13,181,"Action, Adventure, Drama"
2,Spider-Man: No Way Home,2021,8.3,PG-13,148,"Action, Adventure, Fantasy"
3,Avatar,2009,7.8,PG-13,162,"Action, Adventure, Fantasy"
4,Black Panther,2018,7.3,PG-13,134,"Action, Adventure, Sci-Fi"
...,...,...,...,...,...,...
3797,Krippendorf's Tribe,1998,5.0,PG-13,94,Comedy
3798,Barney's Version,2010,7.3,R,134,"Comedy, Drama"
3799,I Know Who Killed Me,2007,3.6,R,106,"Horror, Mystery, Thriller"
3800,The Curse of the Jade Scorpion,2001,6.7,PG-13,103,"Comedy, Crime, Mystery"


## Movie Titles, Release Dates and Actors

In [278]:
# Create a list for each of the data points being scraped.
a_names = []
a_release = []
actors = []
directors = []
#Develop a for-loop that will iterate through each of the pages.
for i in range(1, 1000, 50):
    url = 'https://www.imdb.com/search/title/?title_type=feature&num_votes=5000,&languages=en&sort=boxoffice_gross_us,desc&start={}&explore=genres&ref_=adv_nxt'.format(i)
    response = requests.get(url)
    sleep(randint(8,15))
    html_soup = BeautifulSoup(response.text, 'lxml')
    movie_containers = html_soup.find_all('div', class_='lister-item mode-advanced')
# Extract data from individual movie container    
    for container in movie_containers:
# Movie Title      
        a_name = container.h3.a.text
        a_names.append(a_name)
#Release Date        
        year = container.h3.find('span', class_ = 'lister-item-year').text
        a_release.append(year)
#Actors and Directors    
        imdb_names_cont = container.find('p', class_ = '')
        b = imdb_names_cont.find_all('a')
        actors.append(b[-4:])
        directors.append(b[-5::-1])

In [279]:
# Create a DataFrame containing: Movie Title, Release Date, and Actors
actors_df = pd.DataFrame({
'Movie': a_names,
'Year': a_release,
'Actors': actors})

In [280]:
# Separate each Actor from an element in a list to its own row.
actors_df1 = actors_df['Actors'].apply(pd.Series)
actors_df2 = pd.merge(actors_df, actors_df1, right_index = True, left_index = True)
actors_df2 = actors_df2.drop(['Actors'], axis = 1)
final_actors_df = actors_df2.melt(id_vars = ['Movie', 'Year'], var_name = ['Actors'])
final_actors_df = final_actors_df.drop('Actors', axis=1)
final_actors_df = final_actors_df.drop_duplicates()

In [281]:
# Clean the Year value.
final_actors_df['Year'] = final_actors_df['Year'].str[-5:-1].astype(int)
# Clean the new 'value' field to remove tags.
final_actors_df['value'] = final_actors_df['value'].apply(lambda x: re.sub('<[^<]+?>', '', str(x)))

In [282]:
print(final_actors_df.info())
final_actors_df

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4000 entries, 0 to 3999
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Movie   4000 non-null   object
 1   Year    4000 non-null   int32 
 2   value   4000 non-null   object
dtypes: int32(1), object(2)
memory usage: 109.4+ KB
None


Unnamed: 0,Movie,Year,value
0,Star Wars: Episode VII - The Force Awakens,2015,Daisy Ridley
1,Avengers: Endgame,2019,Robert Downey Jr.
2,Spider-Man: No Way Home,2021,Tom Holland
3,Avatar,2009,Sam Worthington
4,Black Panther,2018,Chadwick Boseman
...,...,...,...
3995,The Wedding Singer,1998,Allen Covert
3996,Saw III,2006,Bahar Soomekh
3997,Disturbia,2007,Sarah Roemer
3998,Nacho Libre,2006,Darius Rose


### scrapping data from numbers

In [306]:
# Using The Number webpage, scrape all Movie Titles, their Release Date, Production Budget, Domestic Gross, and Worldwide Gross.
final_budget_container = []
#Develop a for-loop that will iterate through each of the pages.
for x in range(1, 5002, 100):
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'}
    url2 = 'https://www.the-numbers.com/movie/budgets/all/{}'.format(x)
    response2 = requests.get(url2,headers=headers )
    sleep(randint(8,15))
    soup = BeautifulSoup(response2.text, 'html.parser')
    budget_containers = soup.find_all('td')
    
    for containers in budget_containers:
# Extract data from individual movie container
#         final_budget_container.append(containers)
        final_budget_container.append(containers.get_text())

In [307]:
final_budget_container[0:6]

['1',
 'Apr 23, 2019',
 'Avengers: Endgame',
 '\xa0$400,000,000',
 '\xa0$858,373,000',
 '\xa0$2,797,732,053']

In [308]:
def get_grouped_movie_gross():
    N = 6
    group_movie_gross1 = [final_budget_container[n:n+N] for n in range(0, len(final_budget_container), N)]
    return group_movie_gross1 

In [309]:
for item in get_grouped_movie_gross():
    print(item[1])

Apr 23, 2019
May 20, 2011
Apr 22, 2015
Dec 16, 2015
Apr 25, 2018
May 24, 2007
Nov 13, 2017
Oct 6, 2015
Jul 12, 2023
Dec 18, 2019
May 23, 2018
Mar 7, 2012
Mar 23, 2016
Dec 13, 2017
Jul 11, 2019
Nov 24, 2010
May 4, 2007
Apr 22, 2016
Jul 1, 2022
Jul 15, 2009
Dec 12, 2013
Dec 10, 2014
Apr 7, 2017
Sep 29, 2021
Dec 17, 2009
Jun 28, 2006
Jul 19, 2012
May 23, 2017
Nov 14, 2008
Apr 25, 2012
Jul 6, 2006
Jun 14, 2013
May 16, 2008
Jul 2, 2013
Jun 27, 2012
Apr 11, 2012
Jun 20, 2017
Jun 9, 2015
May 23, 2012
Jun 24, 2009
Jun 25, 2014
May 26, 2006
May 14, 2010
Dec 14, 2005
Dec 7, 2007
Aug 26, 2020
Dec 14, 2021
Feb 13, 2018
Dec 18, 1997
Jun 15, 2018
Dec 14, 2016
Jun 16, 2016
Jun 20, 2019
Jun 18, 2010
May 4, 2022
Apr 24, 2013
Apr 28, 2017
Jun 30, 2004
Mar 1, 2022
Mar 4, 2010
Oct 26, 2012
Dec 11, 2012
Jun 21, 2013
Mar 8, 2013
May 21, 2014
Apr 16, 2014
Jun 23, 2011
Jul 7, 2021
Jul 31, 2019
May 20, 2021
Dec 17, 2010
Nov 12, 2009
Nov 3, 2021
Nov 14, 2018
Oct 19, 2022
May 21, 2009
Jun 15, 2022
Jul 28, 2021
J

Mar 30, 2005
Dec 25, 2019
Apr 4, 2003
Aug 29, 2003
Sep 7, 2018
Sep 21, 2007
Feb 28, 2003
Mar 17, 2006
Dec 10, 2014
Aug 24, 2007
Apr 22, 2016
Oct 16, 1998
Feb 17, 2017
Nov 21, 2007
Aug 27, 2004
Dec 9, 2011
Nov 6, 1998
Sep 23, 1970
Jan 23, 1998
Apr 11, 2014
Sep 23, 1994
Feb 3, 2017
May 22, 2009
Apr 20, 2001
Jul 26, 1996
Mar 18, 2005
Aug 26, 2011
Apr 23, 2010
Apr 24, 2009
Aug 24, 2007
Apr 7, 1995
Apr 22, 2005
May 1, 1998
Feb 11, 2011
Feb 6, 2015
Sep 2, 2011
Mar 23, 2017
Oct 11, 2002
Feb 20, 2004
Dec 22, 2017
Apr 4, 2008
Sep 23, 2016
Oct 25, 2013
Nov 24, 2021
Dec 8, 2006
Dec 7, 2018
Apr 1, 1988
Nov 15, 2002
Apr 18, 1986
Jul 26, 1996
Sep 30, 2005
Nov 6, 2009
Mar 3, 2000
Oct 8, 2010
Apr 9, 2004
Aug 15, 2008
Sep 13, 1996
Sep 13, 2002
Aug 3, 2007
Sep 9, 2011
Dec 24, 1999
Sep 22, 2017
Dec 21, 2012
Aug 6, 2015
Jun 21, 1985
Apr 15, 2011
Jul 3, 2002
Jun 3, 2005
Jul 1, 1986
Oct 11, 1996
Mar 28, 2008
Oct 8, 1993
Oct 18, 2002
Sep 16, 2011
Aug 13, 1999
Aug 16, 2002
Apr 19, 1996
May 17, 1991
Jan 24, 20

Jul 24, 1987
May 22, 1981
Jan 1, 1980
Oct 25, 2019
Apr 2, 1993
Oct 5, 2017
Apr 9, 1999
Apr 7, 2006
Jan 1, 1971
Oct 22, 1999
Oct 20, 2017
Mar 7, 2003
Dec 2, 2011
May 13, 2005
May 31, 2013
Oct 21, 2005
Nov 13, 2009
Jul 23, 2004
Aug 17, 2001
Unknown
Oct 26, 1984
Jul 25, 2003
Nov 19, 2016
Oct 10, 2007
Oct 9, 2009
Feb 19, 2021
Apr 22, 2016
Apr 15, 2022
Feb 22, 2008
Apr 19, 2018
Sep 19, 1980
Aug 9, 2019
Jan 15, 1988
Feb 13, 2009
Feb 12, 2016
Oct 20, 1995
Unknown
Jun 16, 1978
Dec 19, 1986
Jul 28, 1982
Jun 23, 2004
Oct 24, 1969
Aug 26, 1964
May 9, 2018
Oct 18, 1961
Oct 17, 1956
Jul 25, 1980
Aug 21, 2019
Mar 23, 2001
Dec 17, 2008
Jun 30, 1989
Jul 10, 1981
Jul 16, 1999
Aug 16, 1995
Jun 21, 1974
May 24, 2019
Aug 11, 1989
Apr 5, 2002
Dec 31, 1946
Mar 29, 2019
Sep 27, 2000
Sep 27, 2006
Apr 16, 2003
Feb 8, 2019
Nov 28, 2018
Feb 12, 1988
Aug 8, 2007
Oct 21, 1988
Mar 20, 1998
Aug 21, 1956
Apr 24, 1998
May 24, 1995
Oct 13, 1989
Sep 7, 2012
Jan 21, 1983
Nov 13, 2015
Apr 27, 2012
Nov 12, 2020
Dec 15, 200

In [310]:
def Date():
    return [item[1] for item in get_grouped_movie_gross()]
def movie_name():
    return [item[2] for item in get_grouped_movie_gross()]
def production_cost():
    return [item[3] for item in get_grouped_movie_gross()]
def domestic_gross():
    return [item[4] for item in get_grouped_movie_gross()]
def worldwide_gross():
    return [item[5] for item in get_grouped_movie_gross()]

    

In [311]:
# Create a DataFrame containing: Movie Title, Release Date, Production Budget, Domestic Gross, and Worldwide Gross.
budget_df = pd.DataFrame({'Release Date': Date(),
'Movie': movie_name(),
'Production Budget': production_cost(),
'Domestic Gross': domestic_gross(),
'Worldwide Gross': worldwide_gross(),
})
print(budget_df.info())
budget_df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5100 entries, 0 to 5099
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Release Date       5100 non-null   object
 1   Movie              5100 non-null   object
 2   Production Budget  5100 non-null   object
 3   Domestic Gross     5100 non-null   object
 4   Worldwide Gross    5100 non-null   object
dtypes: object(5)
memory usage: 199.3+ KB
None


Unnamed: 0,Release Date,Movie,Production Budget,Domestic Gross,Worldwide Gross
0,"Apr 23, 2019",Avengers: Endgame,"$400,000,000","$858,373,000","$2,797,732,053"
1,"May 20, 2011",Pirates of the Caribbean: On Stranger Tides,"$379,000,000","$241,071,802","$1,045,713,802"
2,"Apr 22, 2015",Avengers: Age of Ultron,"$365,000,000","$459,005,868","$1,395,316,979"
3,"Dec 16, 2015",Star Wars Ep. VII: The Force Awakens,"$306,000,000","$936,662,225","$2,064,615,817"
4,"Apr 25, 2018",Avengers: Infinity War,"$300,000,000","$678,815,482","$2,048,359,754"
...,...,...,...,...,...
5095,"Jul 11, 2008",August,"$3,400,000","$12,636","$12,636"
5096,"Jan 2, 2015",Babysitting,"$3,400,000",$0,"$24,564,100"
5097,"Mar 3, 2015",To Write Love On Her Arms,"$3,400,000",$0,$0
5098,Unknown,Vilaine,"$3,400,000",$0,$0


In [312]:
budget_df['Production Budget'] = budget_df['Production Budget'].str.replace(',','')
budget_df['Production Budget'] = budget_df['Production Budget'].str.replace('$','')
budget_df['Production Budget'] = budget_df['Production Budget'].astype(int)
budget_df['Production Budget']

  budget_df['Production Budget'] = budget_df['Production Budget'].str.replace('$','')


0       400000000
1       379000000
2       365000000
3       306000000
4       300000000
          ...    
5095      3400000
5096      3400000
5097      3400000
5098      3400000
5099      3380000
Name: Production Budget, Length: 5100, dtype: int32

In [313]:
budget_df['Domestic Gross'] = budget_df['Domestic Gross'].str.replace(',','')
budget_df['Domestic Gross'] = budget_df['Domestic Gross'].str.replace('$','')
budget_df['Domestic Gross'] = budget_df['Domestic Gross'].astype(int)
budget_df['Domestic Gross']

  budget_df['Domestic Gross'] = budget_df['Domestic Gross'].str.replace('$','')


0       858373000
1       241071802
2       459005868
3       936662225
4       678815482
          ...    
5095        12636
5096            0
5097            0
5098            0
5099            0
Name: Domestic Gross, Length: 5100, dtype: int32

In [314]:
budget_df['Worldwide Gross'] = budget_df['Worldwide Gross'].str.replace(',','')
budget_df['Worldwide Gross'] = budget_df['Worldwide Gross'].str.replace('$','')
budget_df['Worldwide Gross'] = budget_df['Worldwide Gross'].astype(float)

  budget_df['Worldwide Gross'] = budget_df['Worldwide Gross'].str.replace('$','')


In [315]:
# Convert each $ field into an integer type.
# budget_df['Production Budget'] = budget_df['Production Budget'].str.replace(',','').str.replace('$','').astype(int)
# budget_df['Domestic Gross'] = budget_df['Domestic Gross'].str.replace(',','').str.replace('$','').astype(int)
# budget_df['Worldwide Gross'] = budget_df['Worldwide Gross'].str.replace(',','').str.replace('$','')

In [316]:
# Add Year column to budget_df table.
budget_df['Year'] = budget_df['Release Date'].str[-4:]
budget_df = budget_df[budget_df['Year'] != 'nown']
budget_df['Year'] = budget_df['Year'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  budget_df['Year'] = budget_df['Year'].astype(int)


In [317]:
# Merge IMDb_df and budget_df
IMDb_budget = pd.merge(IMDb_movie_df, budget_df)

In [318]:
IMDb_budget.nunique()

Movie                2680
Year                   53
IMDb                   69
Rating                  6
Runtime               117
Genre                 257
Release Date         1706
Production Budget     242
Domestic Gross       2700
Worldwide Gross      2700
dtype: int64

In [319]:
# Remove Duplicates
IMDb_budget = IMDb_budget.drop_duplicates()

In [320]:
print(IMDb_budget.info())
IMDb_budget

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2700 entries, 0 to 2699
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Movie              2700 non-null   object 
 1   Year               2700 non-null   int32  
 2   IMDb               2700 non-null   float64
 3   Rating             2700 non-null   object 
 4   Runtime            2700 non-null   int32  
 5   Genre              2700 non-null   object 
 6   Release Date       2700 non-null   object 
 7   Production Budget  2700 non-null   int32  
 8   Domestic Gross     2700 non-null   int32  
 9   Worldwide Gross    2700 non-null   float64
dtypes: float64(2), int32(4), object(4)
memory usage: 189.8+ KB
None


Unnamed: 0,Movie,Year,IMDb,Rating,Runtime,Genre,Release Date,Production Budget,Domestic Gross,Worldwide Gross
0,Avengers: Endgame,2019,8.4,PG-13,181,"Action, Adventure, Drama","Apr 23, 2019",400000000,858373000,2.797732e+09
1,Spider-Man: No Way Home,2021,8.3,PG-13,148,"Action, Adventure, Fantasy","Dec 14, 2021",200000000,814108407,1.910042e+09
2,Avatar,2009,7.8,PG-13,162,"Action, Adventure, Fantasy","Dec 17, 2009",237000000,785221649,2.910284e+09
3,Black Panther,2018,7.3,PG-13,134,"Action, Adventure, Sci-Fi","Feb 13, 2018",200000000,700059566,1.336494e+09
4,Avengers: Infinity War,2018,8.4,PG-13,149,"Action, Adventure, Sci-Fi","Apr 25, 2018",300000000,678815482,2.048360e+09
...,...,...,...,...,...,...,...,...,...,...
2695,Beautiful Boy,2018,7.3,R,120,"Biography, Drama","Oct 12, 2018",25000000,7634767,1.331456e+07
2696,Mortdecai,2015,5.5,R,107,"Action, Adventure, Comedy","Jan 21, 2015",60000000,7696134,3.039613e+07
2697,Over Her Dead Body,2008,5.2,PG-13,95,"Comedy, Fantasy, Romance","Feb 1, 2008",10000000,7570127,2.159607e+07
2698,Proof,2005,6.7,PG-13,100,"Drama, Mystery","Sep 16, 2005",20000000,7535331,8.284331e+06


In [321]:
# Merge final_actors_df and budget_df
actors_finance = pd.merge(final_actors_df, budget_df)

In [322]:
# Remove duplicates.
actors_finance = actors_finance.drop_duplicates()

In [323]:
print(actors_finance.info())
actors_finance

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3184 entries, 0 to 3183
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Movie              3184 non-null   object 
 1   Year               3184 non-null   int32  
 2   value              3184 non-null   object 
 3   Release Date       3184 non-null   object 
 4   Production Budget  3184 non-null   int32  
 5   Domestic Gross     3184 non-null   int32  
 6   Worldwide Gross    3184 non-null   float64
dtypes: float64(1), int32(3), object(3)
memory usage: 161.7+ KB
None


Unnamed: 0,Movie,Year,value,Release Date,Production Budget,Domestic Gross,Worldwide Gross
0,Avengers: Endgame,2019,Robert Downey Jr.,"Apr 23, 2019",400000000,858373000,2.797732e+09
1,Avengers: Endgame,2019,Chris Evans,"Apr 23, 2019",400000000,858373000,2.797732e+09
2,Avengers: Endgame,2019,Mark Ruffalo,"Apr 23, 2019",400000000,858373000,2.797732e+09
3,Avengers: Endgame,2019,Chris Hemsworth,"Apr 23, 2019",400000000,858373000,2.797732e+09
4,Spider-Man: No Way Home,2021,Tom Holland,"Dec 14, 2021",200000000,814108407,1.910042e+09
...,...,...,...,...,...,...,...
3179,Nacho Libre,2006,Darius Rose,"Jun 16, 2006",32000000,80197993,9.929646e+07
3180,Jumper,2008,Hayden Christensen,"Feb 14, 2008",82500000,80172128,2.226408e+08
3181,Jumper,2008,Samuel L. Jackson,"Feb 14, 2008",82500000,80172128,2.226408e+08
3182,Jumper,2008,Jamie Bell,"Feb 14, 2008",82500000,80172128,2.226408e+08


In [324]:
IMDb_budget

Unnamed: 0,Movie,Year,IMDb,Rating,Runtime,Genre,Release Date,Production Budget,Domestic Gross,Worldwide Gross
0,Avengers: Endgame,2019,8.4,PG-13,181,"Action, Adventure, Drama","Apr 23, 2019",400000000,858373000,2.797732e+09
1,Spider-Man: No Way Home,2021,8.3,PG-13,148,"Action, Adventure, Fantasy","Dec 14, 2021",200000000,814108407,1.910042e+09
2,Avatar,2009,7.8,PG-13,162,"Action, Adventure, Fantasy","Dec 17, 2009",237000000,785221649,2.910284e+09
3,Black Panther,2018,7.3,PG-13,134,"Action, Adventure, Sci-Fi","Feb 13, 2018",200000000,700059566,1.336494e+09
4,Avengers: Infinity War,2018,8.4,PG-13,149,"Action, Adventure, Sci-Fi","Apr 25, 2018",300000000,678815482,2.048360e+09
...,...,...,...,...,...,...,...,...,...,...
2695,Beautiful Boy,2018,7.3,R,120,"Biography, Drama","Oct 12, 2018",25000000,7634767,1.331456e+07
2696,Mortdecai,2015,5.5,R,107,"Action, Adventure, Comedy","Jan 21, 2015",60000000,7696134,3.039613e+07
2697,Over Her Dead Body,2008,5.2,PG-13,95,"Comedy, Fantasy, Romance","Feb 1, 2008",10000000,7570127,2.159607e+07
2698,Proof,2005,6.7,PG-13,100,"Drama, Mystery","Sep 16, 2005",20000000,7535331,8.284331e+06


In [325]:
IMDb_budget.to_csv('IMDb_budgets.csv',  index=False)

In [326]:
IMDb_movie_df.to_csv('IMDb__new_base.csv', index=False)

In [327]:
actors_finance.to_csv('Actors_Table.csv', index=False)