To extract specific data from a given URL(https://www.rottentomatoes.com/browse/movies_at_home/), specifically ‘movie names’, 'Tomatometer' ratings, and 'Audience Scores.' Once you have successfully scraped this information, your goal is to organize it into a table format for further analysis

In [1]:
!pip install bs4 requests



In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
from tabulate import tabulate

<h2>Step 1: Data Extraction</h2>

In [3]:
url = "https://www.rottentomatoes.com/browse/movies_at_home/"
response = requests.get(url)

if response.status_code == 200:
    soup = BeautifulSoup(response.content, "lxml")
    # Created the object.
else:
    print("Failed to retrieve the webpage.")

<h2>Step 2: Data Transformation</h2>

In [4]:
# Selecting HTML elements representing movie containers or links to movies in Rotten Romatoe
movie_elements = soup.select('div.js-tile-link, a.js-tile-link')

# Initializing empty lists to hold the movie data
movie_title_list = []
critic_rating_list = []
audience_score_list = []

# Looping through each movie container to extract data
for element in movie_elements:
    # Extracting movie name and strip any extra whitespaces
    title = element.find('span', class_='p--small').text.strip()
    movie_title_list.append(title)

    # Extracting tomatometer rating
    critic_rating = element.find('score-pairs').get('criticsscore')
    critic_rating_list.append(critic_rating)

    # Extracting audience score
    audience_score = element.find('score-pairs').get('audiencescore')
    audience_score_list.append(audience_score)

# Creating a dictionary to hold the extracted movie data
movie_data = {
    'Movie Name': movie_title_list,
    'Tomatometer Rating': critic_rating_list,
    'Audience Score': audience_score_list
}

# Convert the dictionary to a Pandas DataFrame
movie_dataframe = pd.DataFrame(movie_data)


<h2>Step 3: Data Presentation</h2>

In [5]:
# Additional method to style Table
def style_movie_dataframe(df):
    styled_df = df.style \
        .set_properties(**{'text-align': 'center'}) \
        .set_table_styles([
            {'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'center')]},
            {'selector': '', 'props': [('border', '1px solid black')]}
        ])

    return styled_df

styled_movie_dataframe = style_movie_dataframe(movie_dataframe)

# Display the styled DataFrame
styled_movie_dataframe

Unnamed: 0,Movie Name,Tomatometer Rating,Audience Score
0,Strays,54.0,70.0
1,Shane Gillis: Beautiful Dogs,,95.0
2,Insidious Inferno,,
3,Thirst,,
4,Sirona,,100.0
5,The Hanged Girl,,
6,Beyond the Grave,,
7,Portrait of the Queen,,
8,Scout's Honor: The Secret Files of the Boy Scouts of America,90.0,75.0
9,What If,,
