##Project of the Week: Dive Deeper into Web Scraping and Databases


---
###**Case Study:** Building a Database for your favorite Movies
**Objective:** The goal of this challenge is to let you go over a project that allows you to dive deeper into webscraping and learn some new related concepts (e.g. API scraping). The challenge is divided into 2 parts:

* **Part A:** You will scrape the website https://www.digitaltrends.com/movies/best-movies-on-netflix/ using Beautifulsoup, and get the required information about the presented movies. Once you clean them, you would save them in a SQL database, and run some queries over them.

* **Part B:** You will use the OMDB API to get information about specific movies or series and you would save them in a MongoDB instance. Then you would run some queries over the database.

##Part A: Scraping the best movies on Netflix from the digitaltrends website

In this part you will get the necessary information about the movies on the website an save them in a SQL database. Before doing that, you need to familiarize yourself with the sqlite3 library

####1. Familiarize yourself with sqlite3.

The goal of this section is just for you to practice the sqlite3 python's library.
Start by importing the library and creating a new database.

In [1]:
#Test your Zaka
import sqlite3

Create a table in this database. This table should contain a few columns. You can add whatever columns you want. The goal is just to practice.

In [2]:
#Test your Zaka
conn = sqlite3.connect('Demo.db')
c = conn.cursor()
c.execute('''CREATE TABLE movies
             (name TEXT, rating REAL, release_date TEXT, director TEXT, genre TEXT)''')

<sqlite3.Cursor at 0x7f9757ea2940>

Add data to the table you created

In [3]:
#Test your Zaka
c.execute("INSERT INTO movies VALUES ('The Shawshank Redemption', 9.3, '1994-09-23', 'Frank Darabont', 'Drama')")
c.execute("INSERT INTO movies VALUES ('The Godfather', 9.2, '1972-03-24', 'Francis Ford Coppola', 'Crime, Drama')")
c.execute("INSERT INTO movies VALUES ('The Dark Knight', 9.0, '2008-07-18', 'Christopher Nolan', 'Action, Crime, Drama')")
c.execute("INSERT INTO movies VALUES ('Forrest Gump', 8.8, '1994-07-06', 'Robert Zemeckis', 'Drama, Romance')")

<sqlite3.Cursor at 0x7f9757ea2940>

Perform a few basic queries (2-3) to familiarize yourself with the usage (The queries can be anything like selecting all rows in the table or selecting a specific column according to a condition, etc.)

In [4]:
#Test your Zaka
# Query 1: view all rows in the table
c.execute("SELECT * FROM movies;")
result = c.fetchall()
for row in result:
  print("Movie name:",row[0],"with rating",row[1],"released on",row[2],"is of genre",row[3])

Movie name: The Shawshank Redemption with rating 9.3 released on 1994-09-23 is of genre Frank Darabont
Movie name: The Godfather with rating 9.2 released on 1972-03-24 is of genre Francis Ford Coppola
Movie name: The Dark Knight with rating 9.0 released on 2008-07-18 is of genre Christopher Nolan
Movie name: Forrest Gump with rating 8.8 released on 1994-07-06 is of genre Robert Zemeckis


In [5]:
#Test your Zaka
# Query 2: The oldest movie
c.execute("SELECT name, release_date FROM movies ORDER BY release_date ASC LIMIT 1;")
result = c.fetchone()
print("The oldest movie is:", result[0], "released on", result[1])

The oldest movie is: The Godfather released on 1972-03-24


In [6]:
#Test your Zaka
# Query 3: The highest rating movie
c.execute("SELECT name, rating FROM movies ORDER BY rating DESC LIMIT 1;")
result = c.fetchone()
print("The highest rating movie is:", result[0], "with a rating of", result[1])

The highest rating movie is: The Shawshank Redemption with a rating of 9.3


In [7]:
conn.commit()
conn.close()

###2. Scraping the website via BeautifulSoup.

The website to scrape is the following https://www.digitaltrends.com/movies/best-movies-on-netflix/. Start by sending a request to the website and make sure you obtain the right response.

In [8]:
#Test your Zaka
import pandas as pd
import requests
from bs4 import BeautifulSoup

link = 'https://www.digitaltrends.com/movies/best-movies-on-netflix/'
response = requests.get(link)
soup = BeautifulSoup(response.content, 'html.parser')
response.status_code

200

In [9]:
#Creating variable containing info about all the movies
movie_divs = soup.find_all('div', class_=lambda x: x and x.startswith('b-media h-'))
movie_divs[0]

<div class="b-media h-648caab86cfc4">
<div class="b-media__wrapper">
<div class="b-media__title">
<a id="dt-media-extraction-2"></a>
			Extraction 2 (2023)							<span class="b-media__new">
					new				</span>
</div>
<div class="b-media__inner">
<div class="b-media__poster">

</div>
<div class="b-media__info">
<div class="b-media__header">
<div class="b-media__rating">
<div class="b-media__rating-item">
<div class="b-media__rating-img">
<img alt="" class="dt-lazy-load dt-lazy-pending" data-dt-lazy-src="https://www.digitaltrends.com/wp-content/themes/dt-stardust/assets/images/svg/icon-metacritic.svg" onerror="dti_load_error(this)" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAA

For each movie, extract the following:
* Title
* Year
* Poster
* Duration
* IMDB Rating
* Genre
* Stars (Actors)
* Director TEXT

**Tip:** You can start by extracting each element alone to make sure everything is function as expected and then wrap the whole thing within a function when you need to save them into a database

In [10]:
#title and year
Title = []
Year = []

for movie_div in movie_divs:
    title_element = movie_div.find('div', class_='b-media__title').text.strip()
    title = title_element.split('(')[0].strip()
    Title.append(title)
    year = int(title_element.split('(')[1].split(')')[0])
    Year.append(year)

print(len(Title))
print(len(Year))

50
50


In [11]:
#poster
URL = []

all_img = soup.find_all('img')
all_img = all_img[2:-7]

for img in all_img:
    img_url = img.get('data-dt-lazy-src')
    if img_url and img_url.endswith('.jpg'):
      URL.append(img_url)
print(len(URL))

#we notice that there is 8 duplicated picture of letter S

new_list = list(set(URL))

#we transformed the list into a set to remove duplicate then returned it to list

print(len(new_list))

#51 so we need to save letter S in a variable, find it's index and remove it

S = 'https://www.digitaltrends.com/wp-content/uploads/2021/12/shudder-icon.jpg?p=1#038;p=1.jpg'
index = new_list.index(S)
del new_list[index]

print(len(new_list))

58
51
50


In [12]:
#duration
Duration = []

for movie_div in movie_divs:
    duration_div = movie_div.find('div', class_='b-media__duration')
    duration_spans = duration_div.find_all('span')

    if len(duration_spans) > 1: #case where we have "r" or other mark next to duration
        duration = duration_spans[1].text.strip()
    elif len(duration_spans) == 1: #case where we only have duration
        duration = duration_spans[0].text.strip()
    else:
        duration = '' # case where no duration is found

    Duration.append(duration)

print(len(Duration))

50


In [13]:
#rating
Rating = []

for movie_div in movie_divs:
    rating_div = movie_div.find('div', class_='b-media__rating')
    if rating_div is None: #in case there was no rating for the movie
        rating = 'Nan'
    else:
        rating_scores = rating_div.find_all('div', class_='b-media__rating-score')
        if len(rating_scores) > 1: #case where we have metametric and imbd
            rating = rating_scores[1].text.strip()
        elif len(rating_scores) == 1: #case where we have imbd rating only
            rating = rating_scores[0].text.strip()
        else:
            rating = 'Nan'
    Rating.append(rating)
print(len(Rating))

50


In [14]:
#genre
Genre = []
for movie_div in movie_divs:
  genre = movie_div.find('div',class_='b-media__info').find("span",class_='dt-clamp dt-clamp-2').text.replace("\n","").replace("\t","")
  Genre.append(genre)
print(len(Genre))

50


In [15]:
#stars
Stars = []
for movie_div in movie_divs:
    stars = movie_div.find('div',class_='b-media__info').find_all("span",class_='dt-clamp dt-clamp-2')
    if len(stars) > 1:#because we have stars as the 2nd idex , in case there was a 2nd span then we extract it
        stars = stars[1].text.replace("\n","").replace("\t","")
    else: # no info about the stars
        stars = 'Nan'
    Stars.append(stars)
print(len(Stars))

50


In [16]:
#director
Director = []
for movie_div in movie_divs:
  director = movie_div.find('div',class_='b-media__info').find("span",class_='dt-clamp dt-clamp-1').text.replace("\n","").replace("\t","")
  Director.append(director)
print(len(Director))

50


In [17]:
#Test your Zaka
def movie(title, year, poster, durations, ratings, genres, stars, directors):
    movies = []
    for i in range(len(title)):#since we have all same length
        movie_info = {
            'Title': title[i],
            'Year': year[i],
            'Poster': poster[i],
            'Duration': durations[i],
            'Rating': ratings[i],
            'Genre': genres[i],
            'Stars': stars[i],
            'Director': directors[i]
        }
        movies.append(movie_info)
    return movies

movies = movie(Title, Year, URL, Duration, Rating, Genre, Stars, Director)

#To visualize our first row to see if function working well
print(movies[0])

{'Title': 'Extraction 2', 'Year': 2023, 'Poster': 'https://www.digitaltrends.com/wp-content/uploads/2023/06/7gKI9hpEMcZUQpNgKrkDzJpbnNS.jpg?p=1#038;p=1.jpg', 'Duration': '123m', 'Rating': '7.9/10', 'Genre': 'Action, Thriller', 'Stars': 'Chris Hemsworth, Rudhraksh Jaiswal, Golshifteh Farahani', 'Director': 'Sam Hargrave'}


In [18]:
#For practice only, transforming into dataframe
df = pd.DataFrame(movies)
df =  df.set_index("Title")
df.head(5)

Unnamed: 0_level_0,Year,Poster,Duration,Rating,Genre,Stars,Director
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Extraction 2,2023,https://www.digitaltrends.com/wp-content/uploa...,123m,7.9/10,"Action, Thriller","Chris Hemsworth, Rudhraksh Jaiswal, Golshifteh...",Sam Hargrave
Dunkirk,2017,https://www.digitaltrends.com/wp-content/uploa...,107m,7.8/10,"War, Action, Drama","Fionn Whitehead, Tom Hardy, Mark Rylance",Christopher Nolan
The Boss Baby,2017,https://www.digitaltrends.com/wp-content/uploa...,97m,6.3/10,"Animation, Comedy, Family","Alec Baldwin, Steve Buscemi, Miles Bakshi",Tom McGrath
We're the Millers,2013,https://www.digitaltrends.com/wp-content/uploa...,110m,7.0/10,"Comedy, Crime","Jennifer Aniston, Jason Sudeikis, Emma Roberts",Rawson Marshall Thurber
Blood & Gold,2023,https://www.digitaltrends.com/wp-content/uploa...,100m,6.5/10,"Action, Drama, War","Robert Maaser, Jördis Triebel, Marie Hacke",Peter Thorwarth


####*Save the result in a SQL database*

In [19]:
#Test your Zaka
conn = sqlite3.connect('Top50NetflixMovies.db')
c = conn.cursor()

c.execute('''CREATE TABLE Top50movies
             (Title TEXT, Year INTEGER, Poster TEXT, Duration TEXT, Rating TEXT, Genre TEXT, Stars TEXT, Director TEXT)''')

for movie in movies:
    c.execute("INSERT INTO Top50movies VALUES (?, ?, ?, ?, ?, ?, ?, ?)", (movie['Title'], movie['Year'], movie['Poster'], movie['Duration'], movie['Rating'], movie['Genre'], movie['Stars'], movie['Director']))


c.execute("SELECT * FROM Top50movies LIMIT 1;")
row = c.fetchone()

# Print the first row to make sure my data was saved correctly
if row:
    print('Title:', row[0])
    print('Year:', row[1])
    print('Poster:', row[2])
    print('Duration:', row[3])
    print('Rating:', row[4])
    print('Genre:', row[5])
    print('Stars:', row[6])
    print('Director:', row[7])

#conn.commit()

Title: Extraction 2
Year: 2023
Poster: https://www.digitaltrends.com/wp-content/uploads/2023/06/7gKI9hpEMcZUQpNgKrkDzJpbnNS.jpg?p=1#038;p=1.jpg
Duration: 123m
Rating: 7.9/10
Genre: Action, Thriller
Stars: Chris Hemsworth, Rudhraksh Jaiswal, Golshifteh Farahani
Director: Sam Hargrave


####3. Running some queries against the database we created

Write a query that returns all movies with all their specifications

In [20]:
#Test your Zaka
c.execute("SELECT * FROM Top50movies")

# Fetch all rows and print them out
rows = c.fetchall()
for row in rows:
    print(row)

('Extraction 2', 2023, 'https://www.digitaltrends.com/wp-content/uploads/2023/06/7gKI9hpEMcZUQpNgKrkDzJpbnNS.jpg?p=1#038;p=1.jpg', '123m', '7.9/10', 'Action, Thriller', 'Chris Hemsworth, Rudhraksh Jaiswal, Golshifteh Farahani', 'Sam Hargrave')
('Dunkirk', 2017, 'https://www.digitaltrends.com/wp-content/uploads/2021/10/ebsnoddg9lbsmiawg2uabjn7to5.jpg?p=1#038;p=1.jpg', '107m', '7.8/10', 'War, Action, Drama', 'Fionn Whitehead, Tom Hardy, Mark Rylance', 'Christopher Nolan')
('The Boss Baby', 2017, 'https://www.digitaltrends.com/wp-content/uploads/2023/05/unPB1iyEeTBcKiLg8W083rlViFH.jpg?p=1#038;p=1.jpg', '97m', '6.3/10', 'Animation, Comedy, Family', 'Alec Baldwin, Steve Buscemi, Miles Bakshi', 'Tom McGrath')
("We're the Millers", 2013, 'https://www.digitaltrends.com/wp-content/uploads/2023/06/qF2LJ0jwWrtXSuT4AFD5OS2IqaT.jpg?p=1#038;p=1.jpg', '110m', '7.0/10', 'Comedy, Crime', 'Jennifer Aniston, Jason Sudeikis, Emma Roberts', 'Rawson Marshall Thurber')
('Blood & Gold', 2023, 'https://www.dig

Write a query that selects movies having an IMDB rating higher than 8.5

In [21]:
#Test your Zaka
c.execute("SELECT DISTINCT(Title), Rating FROM Top50movies WHERE Rating > 8.5 AND Rating != 'Nan';")
row = c.fetchone()
#We have 1 movie only with imbd rating greater than 8.5/10 , among the ones without Nan rating
print("The movie",row[0],"with rating",row[1])
#we had to use distinct cause for some reason every query we run return the result 4 times

The movie Inception with rating 8.8/10


Write a query that selects only comedy movies

In [22]:
#Test your Zaka
#Return movies that has only comedy as genre
c.execute("SELECT DISTINCT(Title),Genre FROM Top50movies WHERE Genre IN('Comedy')")
rows = c.fetchall()
for row in rows :
  print("The movie",row[0],"is",row[1],"only")

The movie The Best Man Holiday is Comedy only
The movie Easy A is Comedy only


In [23]:
#In case you wanted comedy among other genres :

#c.execute("SELECT DISTINCT(Title),Genre FROM Top50movies WHERE Genre LIKE '%Comedy%'")
#rows = c.fetchall()
#for row in rows :
#  print("The movie",row[0],"include these genres",row[1])

Write a query that selects comedy or action movies shorter than 120 minutes.

In [24]:
#Test your Zaka
c.execute("SELECT DISTINCT(Title),Genre, Duration  FROM Top50movies WHERE (Genre LIKE'%Comedy%' OR Genre LIKE '%action%') AND CAST(Duration AS INTEGER) < 120;")
rows = c.fetchall()
for row in rows :
  print("The movie",row[0],"include these genres",row[1],"and it's duration is ",row[2])

The movie Dunkirk include these genres War, Action, Drama and it's duration is  107m
The movie The Boss Baby include these genres Animation, Comedy, Family and it's duration is  97m
The movie We're the Millers include these genres Comedy, Crime and it's duration is  110m
The movie Blood & Gold include these genres Action, Drama, War and it's duration is  100m
The movie Ted include these genres Comedy, Fantasy and it's duration is  107m
The movie The Mother include these genres Action, Thriller and it's duration is  115m
The movie Pitch Perfect include these genres Comedy, Music, Romance and it's duration is  112m
The movie Shrek Forever After include these genres Comedy, Adventure, Fantasy, Animation, Family and it's duration is  93m
The movie Minions: The Rise of Gru include these genres Animation, Comedy, Family and it's duration is  87m
The movie Pitch Black include these genres Thriller, Science Fiction, Action and it's duration is  108m
The movie World War Z include these genres A

#Part B: Scraping Movies and series using the OMDB API

An alternative to web scraping via Beautifulsoup or selenium is the use of APIs to scrape the web. APIs give you regulated access to the information you want, and if an API is available, it is always better to use it rather than scrpaing via libraries. In our example, we will use the OMDB API: http://www.omdbapi.com/

####1. Familiarize yourself with pymongo:
Before starting with the scraping part, familiarize yourself with the pymongo library that will help you create a MongoDB instance and add collections of documents to it.

Start by setting up the requirements to be able to use the pymongo library.

In [25]:
#Test your Zaka
!pip install pymongo
import pymongo

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pymongo
  Downloading pymongo-4.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (492 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m492.9/492.9 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.3.0-py3-none-any.whl (283 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m283.7/283.7 kB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.3.0 pymongo-4.3.3


Create a database named testdb

In [26]:
#Test your Zaka
from pymongo import MongoClient

client = pymongo.MongoClient("mongodb://localhost:27017/")

#i wasn't able to connect to local had a problem with installation
#but will work on further questions assuming it was succesfully connected

# create a new database called "testdb"
db = client["testdb"]

Create a collection in this database

In [27]:
#Test your Zaka
movies_collection = db["movies"]


Insert 2 records in this database. You can choose whatever fields you want to define. The goal is just to practice.

In [None]:
#Test your Zaka
movie1 = {
    "title": "The Shawshank Redemption",
    "year": 1994,
    "director": "Frank Darabont",
    "actors": ["Tim Robbins", "Morgan Freeman"],
    "genre": ["Drama", "Crime"],
    "rating": 9.3
}

movie2 = {
    "title": "The Godfather",
    "year": 1972,
    "director": "Francis Ford Coppola",
    "actors": ["Marlon Brando", "Al Pacino", "James Caan"],
    "genre": ["Drama", "Crime"],
    "rating": 9.2
}

result = movies_collection.insert_many([movie1, movie2])
print(result.inserted_ids)

Write a query that selects all documents in your collection.

In [None]:
#Test your Zaka
cursor = movies_collection.find()

# print the selected documents
for document in cursor:
    print(document)

####2. Use the API to get info about 5 movies/series.
Some suggestions:
Friends, How I Met Your Mother, Prison Break, La Casa De Papel, and Blindspot.

While scraping, make sure to scrape some of the movies via their title, and some of them via their IMDB ID.

In [None]:
import requests
import json

In [None]:
#Test your Zaka

#friends using title

api_key = "e1ee935a"

title = "Friends"

# the API query URL
url = f"http://www.omdbapi.com/?apikey=e1ee935a&t={title}&type=series"


response = requests.get(url)

data = json.loads(response.text)

# Print the series information
if 'Error' in data:
    print(f"API error: {data['Error']}")
else:
    # Print the series information
    print(f"Title: {data['Title']}")
    print(f"Year of release: {data['Year']}")
    print(f"Genre: {data['Genre']}")
    print(f"Plot: {data['Plot']}")
    print(f"Actors: {data['Actors']}")
    print(f"IMDB rating: {data['imdbRating']}")

In [None]:
#Test your Zaka

# himym using imbd id

api_key = "e1ee935a"

# IMDB ID for "How I Met Your Mother"
imdb_id = "tt0460649"


url = f"http://www.omdbapi.com/?apikey={api_key}&i={imdb_id}&type=series"


response = requests.get(url)


data = json.loads(response.text)

# Check for errors
if 'Error' in data:
    print(f"API error: {data['Error']}")
else:
    # Print the series information
    print(f"Title: {data['Title']}")
    print(f"Year of release: {data['Year']}")
    print(f"Genre: {data['Genre']}")
    print(f"Plot: {data['Plot']}")
    print(f"Actors: {data['Actors']}")
    print(f"IMDB rating: {data['imdbRating']}")

In [None]:
#Test your Zaka

#prison break using title

api_key = "e1ee935a"


title = "Prison break"


url = f"http://www.omdbapi.com/?apikey={api_key}&t={title}&type=series"


response = requests.get(url)


data = json.loads(response.text)


if 'Error' in data:
    print(f"API error: {data['Error']}")
else:
    # Print the series information
    print(f"Title: {data['Title']}")
    print(f"Year of release: {data['Year']}")
    print(f"Genre: {data['Genre']}")
    print(f"Plot: {data['Plot']}")
    print(f"Actors: {data['Actors']}")
    print(f"IMDB rating: {data['imdbRating']}")

In [None]:
#Test your Zaka

#La casa de papel(Money heist) using imbd id

api_key = "e1ee935a"

title = "Money heist"

url = f"http://www.omdbapi.com/?apikey={api_key}&t={title}&type=series"

response = requests.get(url)

data = json.loads(response.text)

if 'Error' in data:
    print(f"API error: {data['Error']}")
else:
    # Print the series information
    print(f"Title: {data['Title']}")
    print(f"Year of release: {data['Year']}")
    print(f"Genre: {data['Genre']}")
    print(f"Plot: {data['Plot']}")
    print(f"Actors: {data['Actors']}")
    print(f"IMDB rating: {data['imdbRating']}")


In [None]:
#Test your Zaka

#Blindspot using title

api_key = "e1ee935a"

title = "Blindspot"

url = f"http://www.omdbapi.com/?apikey={api_key}&t={title}&type=series"

response = requests.get(url)

data = json.loads(response.text)

if 'Error' in data:
    print(f"API error: {data['Error']}")
else:
    # Print the series information
    print(f"Title: {data['Title']}")
    print(f"Year of release: {data['Year']}")
    print(f"Genre: {data['Genre']}")
    print(f"Plot: {data['Plot']}")
    print(f"Actors: {data['Actors']}")
    print(f"IMDB rating: {data['imdbRating']}")


####3. Save the results you got into a MongoDB instance.
You will use the pymongo library you have imported.


In [None]:
#Test your Zaka
import requests
import json
from pymongo import MongoClient

# API key
api_key = "e1ee935a"

# Connect to MongoDB instance
client = MongoClient("mongodb://localhost:27017/")

# Get the 'movies' database
db = client["movies"]

# Get the 'series' collection
collection = db["series"]

# Search for TV series and insert the results into MongoDB
series = [
    {"title": "Friends", "type": "series"},
    {"imdb_id": "tt0460649", "type": "series"},
    {"title": "Prison Break", "type": "series"},
    {"title": "Money heist", "type": "series"},
    {"imdb_id": "tt4474344", "type": "series"}
]

for s in series:
    # Construct the API query URL
    if "title" in s:
        url = f"http://www.omdbapi.com/?apikey={api_key}&t={s['title']}&type={s['type']}"
    elif "imdb_id" in s:
        url = f"http://www.omdbapi.com/?apikey={api_key}&i={s['imdb_id']}&type={s['type']}"

    # Send the API query using a GET request
    response = requests.get(url)

    # Parse the JSON response
    data = json.loads(response.text)

    # Check for errors
    if 'Error' in data:
        print(f"API error: {data['Error']}")
    else:
        # Insert the series information into the MongoDB collection
        collection.insert_one(data)
        print(f"Data for {data['Title']} inserted successfully.")

# Close the MongoDB connection
client.close()


#### 4. Querying the database

Write a query that selects all movies/series you scraped

In [None]:
#Test your Zaka

all_series = collection.find()

# Print the documents
for s in all_series:
    print(s)

Write a query that selects a specific movie/series based on its title

In [None]:
#Test your Zaka
title = "Friends"

# Find the document with the specified title
result = collection.find_one({"Title": title})

# Print the document
print(result)

Write a query that returns all movies/series that were released after 2006.

In [None]:
#Test your Zaka

result = collection.find({"Year": {"$gt": "2006"}})

# Print the documents
for s in result:
    print(s)

Write a query that can find comedy series/movies only.

In [None]:
#Test your Zaka

result = collection.find({"Genre": {"$regex": "Comedy"}})

# Print the documents
for s in result:
    print(s)

Use subplots to show for each series/movie you have in your database, its poster and above it the title and the year (or years) it was aired.

In [None]:
#Test your Zaka
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

result = collection.find()

# Set up the subplots
fig, axs = plt.subplots(nrows=1, ncols=result.count(), figsize=(20, 6))

# Iterate over the documents and display each poster, title, and year
for i, s in enumerate(result):
    # Load the poster image from file
    img = mpimg.imread(f"{s['Title']}.jpg")

    # Display the poster, title, and year in the subplot
    axs[i].imshow(img)
    axs[i].set_title(f"{s['Title']} ({s['Year']})", fontsize=12)
    axs[i].set_axis_off()

# Show the subplots
plt.tight_layout()
plt.show()