# Box Office Mojo Web Scraping Project

### Overview
**This web scraping project focuses on extracting data from Box Office Mojo, a comprehensive database of box office statistics for movies. The goal is to gather information about various movies, including their worldwide gross income, domestic gross income, international gross income, and gross percentages.**

### Target Website
#### Website: Box Office Mojo - Yearly Worldwide Box Office

**Data to Scrape
Movie Title: Name of the movie.
Worldwide Gross Income: Total box office earnings globally.
Domestic Gross Income: Box office earnings within the movie's home country.
International Gross Income: Box office earnings outside the movie's home country.
Domestic Gross Percentage: Percentage of total gross income from the movie's home country.
International Gross Percentage: Percentage of total gross income from international markets.**

#### import library
***These lines of code import the required libraries for web scraping using Python. The requests library is used to make HTTP requests to the website and retrieve the HTML content. The BeautifulSoup library is used for parsing the HTML content and extracting the desired data from it. These libraries will be essential for accessing and processing the webpage content during the web scraping process.***

In [7]:
import requests
from bs4 import BeautifulSoup

**This function below takes a dictionary (data) as input and writes its contents into a CSV file specified by "fname". It first writes the header row using the dictionary keys, and then iterates over the values to write each row of data into the CSV file. Finally, it returns a confirmation message indicating the successful creation of the CSV file.
The function automates the process of converting data stored in a dictionary format into a CSV file. This automation saves time and effort compared to manually writing code to perform the conversion each time it's needed and can be reused.**

In [8]:
def parse_csv(data, fname='default.txt'):
    """
     parse a dictionary as a csv file
    """
    ##csv_file = open(fname, 'w')
    with open(fname,'w') as csv_file:

        header = ','.join(data.keys())

        csv_file.write(header + '\n')

    # write data on rows
        num_rows = len(next(iter(data.values())))

        for i in range(num_rows):
            row = ','.join(str(data[key][i]) for key in data.keys())
            csv_file.write(row + '\n')


    return "Done"

In [9]:
#Make a request to the website to test if we have a succesful connection
try:
    
    res = requests.get('https://www.boxofficemojo.com/year/world/?sort=rank&sortDir=asc&ref_=bo_ydw__resort#table').content
except:
    print("Unsuccesful connection")

In [10]:
#movie dictionary, this is where our data scraped will be store based on our Key and Value
movie = {
    'rank' : [],
    'Movie Title':[],
    'Worldwide':[],
    'Domestic' : [],
    'dom %':[],
    'Foreign':[],
    'For %':[]
}

In [11]:
#create an object of BeautifulSoup
soup = BeautifulSoup(res,'html.parser')

In [12]:
#extraction of all table data (td) elements with the specified class from the parsed HTML content (soup)

ranks = soup.find_all('td', class_='a-text-right mojo-header-column mojo-truncate mojo-field-type-rank mojo-sort-column')

In [205]:
#iterating through the extracted rank elements and appends the text content of each element to the 'rank' key within the movie dictionary.
for rank in ranks:
    movie['rank'].append(rank.get_text())

In [207]:
#extraction of all table data (td) elements with the specified class from the parsed HTML content (soup)

soup.find_all('td', class_="a-text-left mojo-field-type-release_group")
Movie_title = soup.find_all('td', class_="a-text-left mojo-field-type-release_group")

In [208]:
#iterating through the extracted Release Group elements and appends the text content of each element to the 'Movie Title' key within the movie dictionary.

for title in Movie_title:
    movie['Movie Title'].append(title.get_text())

In [210]:
#extraction of all table data (td) elements with the specified class from the parsed HTML content (soup)

soup.find_all('td', class_="a-text-right mojo-field-type-money")
World_wide = soup.find_all('td', class_="a-text-right mojo-field-type-money")

In [211]:
#iterating through the extracted World wide elements and appends the text content of each element to the 'Worldwide' key within the movie dictionary.


for world in World_wide[0::3]:
    movie['Worldwide'].append(world.get_text().replace(',',''))

In [212]:
#extraction of all table data (td) elements with the specified class from the parsed HTML content (soup)

soup.find_all('td', class_="a-text-right mojo-field-type-money")
domestic = soup.find_all('td', class_="a-text-right mojo-field-type-money")

In [213]:
#iterating through the extracted domestic elements and appends the text content of each element to the 'Domestic' key within the movie dictionary.


for dom in domestic[1::3]:
    movie['Domestic'].append(dom.get_text().replace(',',''))

In [214]:
#extraction of all table data (td) elements with the specified class from the parsed HTML content (soup)

soup.find_all('td', class_="a-text-right mojo-field-type-money")
foreign = soup.find_all('td', class_="a-text-right mojo-field-type-money")

In [215]:
#iterating through the extracted foreign elements and appends the text content of each element to the 'Foreign' key within the movie dictionary.


for reign in foreign[2::3]:
    movie['Foreign'].append(reign.get_text().replace(',',''))

In [217]:
#extraction of all table data (td) elements with the specified class from the parsed HTML content (soup)

soup.find_all('td', class_="a-text-right mojo-field-type-percent")
dom_percent = soup.find_all('td', class_="a-text-right mojo-field-type-percent")

In [218]:
#iterating through the extracted domestic % elements and appends the text content of each element to the 'dom %' key within the movie dictionary.


for dp in dom_percent[0::2]:
    movie['dom %'].append(dp.get_text())

In [220]:
#extraction of all table data (td) elements with the specified class from the parsed HTML content (soup)

soup.find_all('td', class_="a-text-right mojo-field-type-percent")
for_percent = soup.find_all('td', class_="a-text-right mojo-field-type-percent")

In [221]:
#iterating through the extracted foreign % elements and appends the text content of each element to the 'For %' key within the movie dictionary.


for fp in for_percent[1::2]:
    movie['For %'].append(fp.get_text())

In [224]:
parse_csv(movie,'movies.csv')

'Done'