# Project: Scraping Top Grossing Movies Data

Task: Our task is to scrape the top 200 movies with their lifetime collection and release data and save it in a CSV file. The website we will be using is https://www.boxofficemojo.com/chart/top_lifetime_gross/?area=XWW they have a chart of the top 200 highest-grossing movies.

### Step 1: Importing Libraries and Modules

In [1]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
from selenium.webdriver.common.by import By

### Step 2: Accessing Driver & Website

In [2]:
driver = webdriver.Chrome()
driver.get('https://www.boxofficemojo.com/chart/top_lifetime_gross/?area=XWW')

### Step 3: Scraping Movie Names

In [3]:
# <a class="a-link-normal" href="/title/tt0499549/?ref_=bo_cso_table_1">Avatar</a>
movies_names = driver.find_elements(By.XPATH,'//td[@class="a-text-left mojo-field-type-title"]/a[@class="a-link-normal"]')
movie_name_list = []
for movie in range(len(movies_names)):
    movie_name_list.append(movies_names[movie].text)
print(movie_name_list)
''' It will print all the movie names inside a list '''

['Avatar', 'Avengers: Endgame', 'Avatar: The Way of Water', 'Titanic', 'Star Wars: Episode VII - The Force Awakens', 'Avengers: Infinity War', 'Spider-Man: No Way Home', 'Jurassic World', 'The Lion King', 'Avengers: Age of Ultron', 'Furious 7', 'The Avengers', 'Top Gun: Maverick', 'Frozen II', 'Barbie', 'Frozen', 'The Super Mario Bros. Movie', 'Harry Potter and the Deathly Hallows: Part 2', 'Black Panther', 'Star Wars: Episode VIII - The Last Jedi', 'Jurassic World: Fallen Kingdom', 'Beauty and the Beast', 'Incredibles 2', 'The Fate of the Furious', 'Iron Man 3', 'The Lord of the Rings: The Return of the King', 'Minions', 'Aquaman', 'Captain America: Civil War', 'Skyfall', 'Spider-Man: Far from Home', 'Captain Marvel', 'Transformers: Dark of the Moon', 'Jurassic Park', 'The Dark Knight Rises', 'The Lord of the Rings: The Two Towers', 'Transformers: Age of Extinction', 'Joker', 'Star Wars: Episode IX - The Rise of Skywalker', 'Toy Story 4', 'Toy Story 3', "Pirates of the Caribbean: Dead

' It will print all the movie names inside a list '

### Step 4: Scraping Movie Release Dates.

In [4]:
release_year = driver.find_elements(By.XPATH,'//td[@class="a-text-left mojo-field-type-year"]/a[@class="a-link-normal"]')
release_year_list = []
for year in range(len(release_year)):
    release_year_list.append(release_year[year].text)
print(release_year_list)
''' It will print all the movies release dates inside a list '''

['2009', '2019', '2022', '1997', '2015', '2018', '2021', '2015', '2019', '2015', '2015', '2012', '2022', '2019', '2023', '2013', '2023', '2011', '2018', '2017', '2018', '2017', '2018', '2017', '2013', '2003', '2015', '2018', '2016', '2012', '2019', '2019', '2011', '1993', '2012', '2002', '2014', '2019', '2019', '2019', '2010', '2006', '2016', '2019', '2001', '2011', '2017', '2016', '2008', '1999', '2016', '2010', '2012', '2022', '2005', '2017', '2010', '2013', '1994', '2003', '2016', '2007', '2014', '2013', '2022', '2023', '2001', '2022', '2007', '2009', '2004', '2002', '2018', '2021', '2005', '2007', '2016', '2009', '2015', '2017', '2012', '2016', '2017', '2013', '2017', '2000', '2022', '2015', '2018', '2017', '2002', '2012', '2023', '2010', '2009', '2017', '2021', '1996', '2016', '2017', '2007', '2004', '2019', '2017', '2009', '1982', '2018', '2008', '2004', '2013', '2021', '2018', '2016', '1999', '1977', '2014', '2013', '2022', '2019', '2000', '2014', '2022', '2006', '2014', '2014',

' It will print all the movies release dates inside a list '

### Step 5: Scraping Movie Crossings.

In [5]:
lifetime_gross = driver.find_elements(By.XPATH,'//td[@class="a-text-right mojo-field-type-money"]')
lifetime_gross_list = []
for i in range(len(lifetime_gross)):
    lifetime_gross_list.append(lifetime_gross[i].text)
print(lifetime_gross_list)
''' It will print all the movies Grossings inside a list '''

['$2,923,706,026', '$2,799,439,100', '$2,320,250,281', '$2,264,750,694', '$2,071,310,218', '$2,052,415,039', '$1,921,847,111', '$1,671,537,444', '$1,663,079,059', '$1,542,317,294', '$1,527,251,659', '$1,520,538,536', '$1,495,696,292', '$1,453,683,476', '$1,445,638,421', '$1,397,045,694', '$1,361,952,674', '$1,356,961,410', '$1,349,926,083', '$1,334,407,706', '$1,310,469,037', '$1,266,115,964', '$1,243,225,667', '$1,236,009,236', '$1,215,577,205', '$1,183,481,430', '$1,161,000,230', '$1,157,347,433', '$1,155,046,416', '$1,142,504,419', '$1,131,927,996', '$1,131,416,446', '$1,123,794,079', '$1,114,456,337', '$1,114,125,189', '$1,107,943,412', '$1,104,054,072', '$1,078,958,629', '$1,077,022,372', '$1,073,841,394', '$1,067,316,101', '$1,066,179,747', '$1,058,684,742', '$1,054,304,000', '$1,049,966,517', '$1,046,721,266', '$1,034,800,131', '$1,029,526,569', '$1,029,315,930', '$1,029,275,933', '$1,029,266,989', '$1,025,468,216', '$1,019,136,646', '$1,009,490,506', '$1,007,590,096', '$995,340

' It will print all the movies Grossings inside a list '

### Step 6: Storing Data In a CSV

In [6]:
data =list( zip(movie_name_list, release_year_list, lifetime_gross_list))
df = pd.DataFrame(data,columns=['Movie Name', 'Release Date','Lifetime Earnings'])
df.to_csv('top_200_movies_with_lifetime_gross.csv',index=False)