Brandon Connors  
10/1/23  
Web Scraping with Jupyter Notebooks

In [7]:
# These are the libraries I used for this project
#pip install urllib3
#pip install certifi
#pip install pandas
#pip install Jinja2

In [8]:
import urllib3
import certifi

# This is the sample code the professor posted on the discussion boards, I liked it because of the use of 'f' formatting for the link to insert the variables.
# My go to would have been to use concatenation, but I felt this was a bit more elegant and wanted to use it.
year = 2015
page = 1
link = f"https://www.metacritic.com/browse/movie/all/all/{year}/metascore/?page={page}"
http = urllib3.PoolManager(ca_certs=certifi.where())
response = http.request('GET', link, headers={'User-Agent': 'Mozilla/5.0'})
datastring = str(response.data, "utf-8")
print(f"Fetched {len(datastring)} characters from {link}: {response.status}")

Fetched 322075 characters from https://www.metacritic.com/browse/movie/all/all/2015/metascore/?page=1: 200


In [9]:
import re

# Here I am searching for the regex strings I created in the first assignment and assigning those to lists
movieTitles = re.findall('<div data-title=\"(.*?)\"', datastring)
releaseDates = re.findall('<span class=\"u-text-uppercase\">\n\s+(.*?)\n\s+<', datastring)
movieDescriptions = re.findall('description\"><span>(.*?)<\/span', datastring)
metacriticScores = re.findall('<span data-v-4cdca868>(.*?)<', datastring)
rawThumbnailURLs = re.findall('<img src=\"(.*?)\" height', datastring)

# Fixing the URLs so they're actually clickable
for i in range(len(rawThumbnailURLs)):
    thumbnailURLs = [sub.replace('amp;', '') for sub in rawThumbnailURLs]

# Sanity check here to make sure I caught everything
print('Titles:', len(movieTitles), 'Dates: ', len(releaseDates), 'Descriptions: ', len(movieDescriptions), 'Scores: ', len(metacriticScores), 'URLs: ', len(thumbnailURLs))

# print(movieTitles)
# print(releaseDates)
# print(movieDescriptions)
# print(metacriticScores)
# print(thumbnailURLs)

Titles: 24 Dates:  24 Descriptions:  24 Scores:  24 URLs:  24


In [10]:
# Initializing the main database list
database = []
# Iterating through each item the lists above to grab each corresponding item and putting them into a list together, then putting that movie's list into the database list
for i in range(len(movieTitles)):
    newAddition = []
    newAddition.append(movieTitles[i])
    newAddition.append(releaseDates[i])
    newAddition.append(metacriticScores[i])
    newAddition.append(thumbnailURLs[i])
    newAddition.append(movieDescriptions[i])
    database.append(newAddition)

In [15]:
import pandas as pd

# This sets the dataframe to not truncate the data, useful if you want to actually see the whole thing
pd.set_option('display.max_colwidth', None)

# This creates the dataframe and formats the resulting table so that everything is left aligned, making it easier to read.
# Alignment code I got from: https://www.geeksforgeeks.org/align-columns-to-left-in-pandas-python/
df = pd.DataFrame(database, columns=['Movie Title', 'Release Date', 'Metacritic Score', 'Thumbnail URL', 'Movie Description'])
left_aligned_df = df.style.set_properties(**{'text-align': 'left'})
left_aligned_df = left_aligned_df.set_table_styles([dict(selector = 'th', props=[('text-align', 'left')])])

display(left_aligned_df)

Unnamed: 0,Movie Title,Release Date,Metacritic Score,Thumbnail URL,Movie Description
0,45 Years,"Dec 23, 2015",94,https://www.metacritic.com/a/img/resize/6a5bd5bbb7974005d66c3fe569c21e3786ea3145/catalog/provider/2/2/2-7e75f9d82b9daa3efdb74ef6d7d5505f.jpg?auto=webp&fit=cover&height=132&width=88,"There is just one week until Kate Mercer's (Charlotte Rampling) 45th wedding anniversary and the planning for the party is going well. But then a letter arrives for her husband (Tom Courtenay). The body of his first love has been discovered, frozen and preserved in the icy glaciers of the Swiss Alps. By the time the party is upon them, five days later, there may not be a marriage left to celebrate. [IFC Films]"
1,Carol,"Nov 20, 2015",94,https://www.metacritic.com/a/img/resize/7b3783def087f9b94e7a0391870fc81d08cc8449/catalog/provider/2/2/2-f7316f7e14637d049239318771fdd1a0.jpg?auto=webp&fit=cover&height=132&width=88,"Set in 1950s New York, two women from very different backgrounds find themselves in the throes of love. A young woman in her 20s, Therese Belivet (Rooney Mara), is a clerk working in a Manhattan department store and dreaming of a more fulfilling life when she meets Carol (Cate Blanchett), an alluring woman trapped in a loveless, convenient marriage. As an immediate connection sparks between them, the innocence of their first encounter dims and their connection deepens. While Carol breaks free from the confines of marriage, her husband (Kyle Chandler) threatens her competence as a mother when Carol’s involvement with Therese and close relationship with her best friend Abby (Sarah Paulson) comes to light. As Carol leaves the comfort of home to travel with Therese, an internal journey of self-discovery coincides with her new sense of space. [TWC]"
2,Inside Out,"Jun 19, 2015",94,https://www.metacritic.com/a/img/resize/54c04aa6b209fdf9bf3aface7457b9bfdfd036fc/catalog/provider/2/2/2-ec41dc3dbeb08defaa20a5ff93a1fc64.jpg?auto=webp&fit=cover&height=132&width=88,"Growing up can be a bumpy road, and it’s no exception for Riley, who is uprooted from her Midwest life when her father starts a new job in San Francisco. Like all of us, Riley is guided by her emotions: Joy (Amy Poehler), Fear (Bill Hader), Anger (Lewis Black), Disgust (Mindy Kaling) and Sadness (Phyllis Smith). The emotions live in Headquarters, the control center inside Riley’s mind, where they help advise her through everyday life. As Riley and her emotions struggle to adjust to a new life in San Francisco, turmoil ensues in Headquarters. Although Joy, Riley’s main and most important emotion, tries to keep things positive, the emotions conflict on how best to navigate a new city, house, and school. [Pixar]"
3,Spotlight,"Nov 6, 2015",93,https://www.metacritic.com/a/img/resize/35758d6a35b5992b36081c441e9dd3869b461a25/catalog/provider/2/2/2-337ed9394803aa3e504a12ae7abf5eaf.jpg?auto=webp&fit=cover&height=132&width=88,"Spotlight tells the riveting true story of the Pulitzer Prize-winning Boston Globe investigation that would rock the city and cause a crisis in one of the world’s oldest and most trusted institutions. When the newspaper’s tenacious “Spotlight” team of reporters delve into allegations of abuse in the Catholic Church, their year-long investigation uncovers a decades-long cover-up at the highest levels of Boston's religious, legal, and government establishment, touching off a wave of revelations around the world. [Open Road Films]"
4,Sherpa,"Oct 2, 2015",93,https://www.metacritic.com/a/img/resize/ffb78b3e2560281c977d60448ad5b62afc566d14/catalog/provider/2/2/2-83789aa521d7a7a10473bb311623ec41.jpg?auto=webp&fit=cover&height=132&width=88,"A fight on Everest? It seemed incredible. But in 2013 news channels around the world reported an ugly brawl at 21,000ft as European climbers fled a mob of angry Sherpas. In 1953, New Zealander Edmund Hillary and Sherpa Tenzing Norgay had reached the summit in a spirit of co-operation and brave optimism. Now climbers and Sherpas were trading insults - even blows. What had happened to the happy, smiling Sherpas and their dedication in getting foreigners to the top of the mountain they hold so sacred? Determined to explore what was going on, the filmmakers set out to make a film of the 2014 Everest climbing season, from the Sherpas' point of view. Instead, they captured a tragedy that would change Everest forever. At 6.45am on 18th April, 2014, a 14 million ton block of ice crashed down onto the climbing route through the Khumbu Icefall, killing 16 Sherpas. It was the worst tragedy in the history of Everest. The disaster provoked a drastic reappraisal about the role of the Sherpas in the Everest industry."
5,Jafar Panahi's Taxi,"Oct 2, 2015",91,https://www.metacritic.com/a/img/resize/15326c4d0f712ca6832ec247faddb0e1258ce29e/catalog/provider/2/2/2-f5310d7eeaa81f567744c9f9fd177786.jpg?auto=webp&fit=cover&height=132&width=88,"Director Jafar Panahi drives a yellow cab through the vibrant streets of Tehran, picking up a diverse group of passengers in a single day. Each man, woman, and child candidly expresses his or her own view of the world, while being interviewed by the curious and gracious driver/director. His camera, placed on the dashboard of his mobile film studio, captures a spirited slice of Iranian society while also brilliantly redefining the borders of comedy, drama and cinema. [Kino Lorber]"
6,Son of Saul,"Dec 18, 2015",91,https://www.metacritic.com/a/img/resize/d9d1fecde0221a12e51a1411fb4674a6bc08e2b9/catalog/provider/2/2/2-183a271796b8b76982b89bb23729b828.jpg?auto=webp&fit=cover&height=132&width=88,"October 1944, Auschwitz-Birkenau. Saul Ausländer is a Hungarian member of the Sonderkommando, the group of Jewish prisoners isolated from the camp and forced to assist the Nazis in the machinery of large-scale extermination. While working in one of the crematoriums, Saul discovers the corpse of a boy he takes for his son. As the Sonderkommando plans a rebellion, Saul decides to carry out an impossible task: save the child’s body from the flames, find a rabbi to recite the mourner’s Kaddish and offer the boy a proper burial."
7,The Fits,"Jun 3, 2016",90,https://www.metacritic.com/a/img/resize/a477607fbcbc44288a2efb27dbcd2eba34469caf/catalog/provider/2/2/2-6467658e5615807af6b270e0a7c4dfea.jpg?auto=webp&fit=cover&height=132&width=88,"Toni trains as a boxer with her brother at a community center in Cincinnati’s West End, but becomes fascinated by the dance team that also practices there. Enamored by their strength and confidence, Toni eventually joins the group, eagerly absorbing routines, mastering drills, and even piercing her own ears to fit in. As she discovers the joys of dance and of female camaraderie, she grapples with her individual identity amid her newly defined social sphere. Shortly after Toni joins the team, the captain faints during practice. By the end of the week, most of the girls on the team suffer from episodes of fainting, swooning, moaning, and shaking in a seemingly uncontrollable catharsis. Soon, however, the girls on the team embrace these mysterious spasms, transforming them into a rite of passage. Toni fears “the fits” but is equally afraid of losing her place just as she’s found her footing. Caught between her need for control and her desire for acceptance, Toni must decide how far she will go to embody her new ideals. [Oscilloscope]"
8,Mad Max: Fury Road,"May 15, 2015",90,https://www.metacritic.com/a/img/resize/c85a3e6a9f5ba94e5d5e534e33e0aa753a69c994/catalog/provider/2/2/2-94c61ec444cb2ab7e05970eb05bed74f.jpg?auto=webp&fit=cover&height=132&width=88,"In the furthest reaches of our planet, in a stark desert landscape where humanity is broken, and everyone is fighting for the necessities of life, there are two rebels who just might be able to restore order—Max (Tom Hardy), a man of action and few words, who seeks peace of mind following the loss of his wife and child in the aftermath of the chaos, and Furiosa (Charlize Theron), a woman of action who believes her path to survival may be achieved if she can make it across the desert back to her childhood homeland."
9,Paths of the Soul,"May 13, 2016",90,https://www.metacritic.com/a/img/resize/ce2103b2a600881db9fa7085dd1a5329c2cb9530/catalog/provider/2/2/2-6e14e1ffeeb8713b51c8d45b2f14fded.jpg?auto=webp&fit=cover&height=132&width=88,"Filmmaker Zhang Yang (Shower, Getting Home) blurs the border between documentary and fiction to follow a group of Tibetan villagers who leave their families and homes in the small village of Nyima to make a Buddhist ""bowing pilgrimage""-laying their bodies flat on the ground after every few steps-along the 1,200 mile road to Lhasa, the holy capital of Tibet. Though united in their remarkable devotion, each of the travelers embarks on this near impossible journey for very personal reasons. One traveler needs to expunge bad family karma, a butcher wants to cleanse animal bloodstains from his soul, another nearing his life's end, hopes that the prostrations will break the chain of cause and effect determined by his life's actions. [Icarus Films]"
