# Workshop No. 3

This code is an expert system to help a teacher choose a movie through filters without implementing an API taking the information for the movies from a popular website called IMDb where there is different information about movies and opinions, In addition to the code there is a pdf with the graphic representation of the expert system.

In [None]:
# neccesary libraries
!pip install --upgrade pip
!pip install requests
!pip install beautifulsoup4
!pip install pandas

## 1. Get Movies Data



In [2]:
# import libraries
import requests
from bs4 import BeautifulSoup

def get_web_html(url: str) -> BeautifulSoup:
    """
    This method gets the HTML from a website using scrapping.

    Args:
        url: URL to scrape.

    Returns:
        A BeautifulSoup object with the HTML.
    """
    # headers to avoid 403 error, cos' IMDb blocks requests from bots
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
    }

    # get HTML and save it in a variable
    response = requests.get(url, headers=headers)
    return BeautifulSoup(response.text, "html.parser")




In [3]:
# import libraries
from bs4 import BeautifulSoup


def get_title_year(movie_data: BeautifulSoup) -> tuple:
    """
    This methog gets the title, year, and link of a movie data using scrapping.

    Args:
        movie_data: BeautifulSoup object with movie data.

    Returns:
        A tuple with title, year, and link.
    """
    # process main title of subdivission
    base_url = "https://www.imdb.com"
    ref_data = movie_data.find("a")
    title = ref_data.text.split("(")[0].strip()
    year = ref_data.text.split("(")[1].replace(")", "")
    link = base_url + ref_data["href"]
    return title, year, link


def get_genre_actors(movie_data: BeautifulSoup) -> list:
    """
    This method gets genre and actors from a movie data using scrapping.

    Args:
        movie_data: BeautifulSoup object with movie data.

    Returns:
        A list with genre and actors.
    """
    # process ul tags to get genre and actors
    data = []
    ul_tags = movie_data.find_all("ul")
    for ul_tag in ul_tags:
        temp = []
        for li_tag in ul_tag.find_all("li"):
            temp.append(li_tag.find("span").get_text())
        data.append(", ".join(temp))
    return data

In [4]:
# import libraries
from bs4 import BeautifulSoup
import pandas as pd


def generate_dataframe(url: str) -> pd.DataFrame:
    """
    This method generates a DataFrame with movie data from IMDb.

    Args:
        url: URL to scrape.

    Returns:
        A DataFrame with movie data.
    """
    # movies datastructure definition
    imdb = get_web_html(url)
    movies = []
    movies_metadata = ["Title", "Year", "Genre", "Actors"]

    # process HTML using scrapping going to each division with the class ipc-metadata-list-summary-item__tc
    movies_html = imdb.find_all("div", class_="ipc-metadata-list-summary-item__tc")
    for movie in movies_html:
        # get each movie data into a clean html structure
        movie_data = BeautifulSoup(str(movie), "html.parser")

        # get movie data
        title, year, link = get_title_year(movie_data)
        data = get_genre_actors(movie_data)

        # create a dictionary to have a nice data movie structure
        movie_clean_data = {
            "Title": title,
            "Year": year,
            "Genre": data[0],
            "Actors": data[1] if len(data) > 1 else "",
        }
        # create a list of dictionaries to create a DataFrame
        movies.append(movie_clean_data)

    # create movies dataframe
    return pd.DataFrame(movies, columns=movies_metadata)

# ================================ MAIN =================================== #
# url to scrape
url = "https://www.imdb.com/calendar/?ref_=rlm&region=US&type=MOVIE"
movies_df = generate_dataframe(url)
print(movies_df.head(3))


                     Title  Year                         Genre  \
0  Furiosa: A Mad Max Saga  2024     Action, Adventure, Sci-Fi   
1       The Garfield Movie  2024  Animation, Adventure, Comedy   
2                    Sight  2023     Biography, Drama, History   

                                              Actors  
0  Anya Taylor-Joy, Chris Hemsworth, Tom Burke, A...  
1  Chris Pratt, Samuel L. Jackson, Hannah Wadding...  
2  Terry Chen, Greg Kinnear, Natasha Mumba, Fionn...  


## 2. Time to build a Decision Tree

Para crear un sistema experto, debe definir un árbol de decisión. Es decir, una secuencia de condicionales para obtener un resultado, es como construir un diagrama de flujo compuesto por muchos condicionales.

En esta parte, debes ir a cualquier sitio como _draw.io_ y crear un __diagrama de flujo__ sobre cómo crees que es el mejor proceso de decisión: _¿qué preguntas quieres hacer?_

Recuerde, hacer las preguntas correctas es un paso muy importante en cualquier tarea que desee realizar.

El siguiente es el algoritmo utilizado en el ___sistema experto___ (reemplace la imagen llamada _expert\_system.png_):




### 2.1 Your proposal

For my analysis i choose to ask to the user for the filter who he want, the idea its compare the string inserted for the user, look if its valid and show an answer

## 3. Put your expert system at work



In [14]:
import pandas as pd

def filtered_movies_df(movies_df, year=None, title=None, actors=None, genre=None):

  filtered_movies_df = movies_df.copy()

  print("Select a filter for your movie");

  if year:
    filtered_movies_df = movies_df[movies_df['Year'] == year]

  if title:
    filtered_movies_df = movies_df[movies_df['title'].str.contains(title, case=False, na=False)]

  if actors:
    filtered_movies_df = movies_df[movies_df['actors'].str.contains(actors, case=False, na=False)]

  if genre:
    filtered_movies_df = movies_df[movies_df['genre'].str.contains(genre, case=False, na=False)]

  return filtered_movies_df.to_string()

print(filtered_movies_df)

<function filtered_movies_df at 0x7e9162136c20>
