##### Develop a basic movie recommendation system that scrapes data from the IMDb website. The program will retrieve information from the Most Popular Movies page on IMDb. The data to be scraped is

1) Movie Titles <br>
2) Runtime <br>
3) Rating <br>
4) Age Restriction <br>
5) Genre <br>
6) Writer(s) <br>
7) Director(s) <br>
8) Fun Movie Trivia <br> 

The data will be stored in a Pandas dataframe.

Subsequently, the program will prompt the user to specify their preferences for:

1) Genre <br>
2) Minimum Rating<br>
3) Maximum Runtime<br>
4) Age Restriction<br>

It will then filter the dataframe accordingly and select a movie for recommendation. The recommendation will display the movie title, writer, director, and trivia. If there are no movies that match the user's preferences, the program will recommend any movie from the original dataframe.

In [54]:
import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.firefox.service import Service as FirefoxService
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import NoSuchElementExceptio

# Set up the WebDriver
driver = webdriver.Firefox(service=FirefoxService(GeckoDriverManager().install()))

# Navigate to the website
url = 'https://www.imdb.com/chart/moviemeter'
driver.get(url)

time.sleep(10)

In [88]:
movie_info = []

parent_div = driver.find_element(By.CLASS_NAME, 'ipc-metadata-list--base')

movie_cards = parent_div.find_elements(By.CLASS_NAME, 'cli-post-element')

for movie_card in range(len(movie_cards)):
    # Scroll the movie card into view
    driver.execute_script("arguments[0].scrollIntoView();", movie_cards[movie_card])

    # Wait for the movie card to be clickable
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, 'cli-post-element')))

    # Click on the movie card to view details
    movie_cards[movie_card].click()
    time.sleep(5)

    try:
        # Extract individual movie details from the current movie card
        movie_Title = driver.find_element(By.XPATH, '/html/body/div[4]/div[2]/div/div[2]/div/div/div[1]/div[2]/div[1]/a/h3').text
        movie_Runtime = driver.find_element(By.XPATH, '/html/body/div[4]/div[2]/div/div[2]/div/div/div[1]/div[2]/ul[1]/li[2]').text
        movie_Rating = driver.find_element(By.XPATH, '/html/body/div[4]/div[2]/div/div[2]/div/div/div[1]/div[2]/div[2]/span').text
        movie_Genre = driver.find_element(By.XPATH, '/html/body/div[4]/div[2]/div/div[2]/div/div/div[1]/div[2]/ul[2]').text
        movie_Directors = driver.find_element(By.XPATH, '/html/body/div[4]/div[2]/div/div[2]/div/div/div[3]/div[1]/ul/li').text
        Age_Restriction = driver.find_element(By.XPATH, '/html/body/div[4]/div[2]/div/div[2]/div/div/div[1]/div[2]/ul[1]/li[3]').text
    except NoSuchElementException:
        # Handle the case when age restriction is not provided
        movie_Title = "Not Available"
        movie_Runtime = "Not Available"
        movie_Rating = "Not Available"
        movie_Genre = "Not Available"
        movie_Directors = "Not Available"
        Age_Restriction = "Not Available"
        Age_Restriction = "Not Available"

    # Store movie details in a dictionary
    movie_info.append({
        'movie Title': movie_Title,
        'movie Runtime': movie_Runtime,
        'movie Rating': movie_Rating,
        'movie Genre': movie_Genre,
        'movie Directors': movie_Directors,
        'Age Restriction': Age_Restriction
    })

    time.sleep(2)
    
    # Close the movie details view
    close_button = driver.find_element(By.XPATH, '/html/body/div[4]/div[2]/div/div[1]/button')
    close_button.click()


In [89]:
# Create a DataFrame from the extracted information
df_movies = pd.DataFrame(movie_info)

# Print the DataFrame in table format
df_movies

Unnamed: 0,movie Title,movie Runtime,movie Rating,movie Genre,movie Directors,Age Restriction
0,The Killer,1h 58min,6.8\n/10,ActionAdventureCrime,David Fincher,R
1,The Marvels,1h 45min,6.0\n/10,ActionAdventureFantasy,Nia DaCosta,PG-13
2,The Hunger Games: The Ballad of Songbirds & Sn...,2h 37min,7.2\n/10,ActionAdventureDrama,Francis Lawrence,PG-13
3,Not Available,Not Available,Not Available,Not Available,Not Available,Not Available
4,Not Available,Not Available,Not Available,Not Available,Not Available,Not Available
...,...,...,...,...,...,...
95,Dungeons & Dragons: Honor Among Thieves,2h 14min,7.3\n/10,ActionAdventureComedy,John Francis Daley,PG-13
96,The Breakfast Club,1h 37min,7.8\n/10,ComedyDrama,John Hughes,R
97,The Exorcist: Believer,1h 51min,4.9\n/10,Horror,David Gordon Green,R
98,Gone Girl,2h 29min,8.1\n/10,DramaMysteryThriller,David Fincher,R


In [90]:
# Assuming df is your DataFrame
df_movies = df_movies.replace('\n', '', regex=True)


In [91]:
import re
# Add spaces between genres in 'movie Genre' column
df_movies['movie Genre'] = df_movies['movie Genre'].apply(lambda x: ' '.join(re.findall('[A-Z][^A-Z]*', x)))

# Display the modified DataFrame
df_movies


Unnamed: 0,movie Title,movie Runtime,movie Rating,movie Genre,movie Directors,Age Restriction
0,The Killer,1h 58min,6.8/10,Action Adventure Crime,David Fincher,R
1,The Marvels,1h 45min,6.0/10,Action Adventure Fantasy,Nia DaCosta,PG-13
2,The Hunger Games: The Ballad of Songbirds & Sn...,2h 37min,7.2/10,Action Adventure Drama,Francis Lawrence,PG-13
3,Not Available,Not Available,Not Available,Not Available,Not Available,Not Available
4,Not Available,Not Available,Not Available,Not Available,Not Available,Not Available
...,...,...,...,...,...,...
95,Dungeons & Dragons: Honor Among Thieves,2h 14min,7.3/10,Action Adventure Comedy,John Francis Daley,PG-13
96,The Breakfast Club,1h 37min,7.8/10,Comedy Drama,John Hughes,R
97,The Exorcist: Believer,1h 51min,4.9/10,Horror,David Gordon Green,R
98,Gone Girl,2h 29min,8.1/10,Drama Mystery Thriller,David Fincher,R


In [92]:
# Function to convert runtime to minutes
def convert_runtime(runtime):
    if not runtime or 'h' not in runtime or 'min' not in runtime:
        return None
    
    hours = int(re.search(r'\d+', runtime.split('h')[0]).group())
    minutes = int(re.search(r'\d+', runtime.split('min')[0]).group())
    
    return  hours * 60 + minutes


In [93]:
# Apply the function to 'movie Runtime'
df_movies['movie Runtime'] = df_movies['movie Runtime'].apply(convert_runtime)

df_movies

Unnamed: 0,movie Title,movie Runtime,movie Rating,movie Genre,movie Directors,Age Restriction
0,The Killer,61.0,6.8/10,Action Adventure Crime,David Fincher,R
1,The Marvels,61.0,6.0/10,Action Adventure Fantasy,Nia DaCosta,PG-13
2,The Hunger Games: The Ballad of Songbirds & Sn...,122.0,7.2/10,Action Adventure Drama,Francis Lawrence,PG-13
3,Not Available,,Not Available,Not Available,Not Available,Not Available
4,Not Available,,Not Available,Not Available,Not Available,Not Available
...,...,...,...,...,...,...
95,Dungeons & Dragons: Honor Among Thieves,122.0,7.3/10,Action Adventure Comedy,John Francis Daley,PG-13
96,The Breakfast Club,61.0,7.8/10,Comedy Drama,John Hughes,R
97,The Exorcist: Believer,61.0,4.9/10,Horror,David Gordon Green,R
98,Gone Girl,122.0,8.1/10,Drama Mystery Thriller,David Fincher,R


In [94]:
df_movies['movie Rating'] = df_movies['movie Rating'].str.extract('(\d+\.\d+)').astype(float)

In [95]:
df_movies

Unnamed: 0,movie Title,movie Runtime,movie Rating,movie Genre,movie Directors,Age Restriction
0,The Killer,61.0,6.8,Action Adventure Crime,David Fincher,R
1,The Marvels,61.0,6.0,Action Adventure Fantasy,Nia DaCosta,PG-13
2,The Hunger Games: The Ballad of Songbirds & Sn...,122.0,7.2,Action Adventure Drama,Francis Lawrence,PG-13
3,Not Available,,,Not Available,Not Available,Not Available
4,Not Available,,,Not Available,Not Available,Not Available
...,...,...,...,...,...,...
95,Dungeons & Dragons: Honor Among Thieves,122.0,7.3,Action Adventure Comedy,John Francis Daley,PG-13
96,The Breakfast Club,61.0,7.8,Comedy Drama,John Hughes,R
97,The Exorcist: Believer,61.0,4.9,Horror,David Gordon Green,R
98,Gone Girl,122.0,8.1,Drama Mystery Thriller,David Fincher,R


In [96]:
import pandas as pd

# Assuming your DataFrame is named df

# Replace 'YourDataFrame' with your actual DataFrame name
df_movies = df_movies.replace('Not Available', pd.NA)
df_movies = df_movies.dropna(how='any')

# Now, df contains rows without "Not Available" values or NaN values in any column
df_movies

Unnamed: 0,movie Title,movie Runtime,movie Rating,movie Genre,movie Directors,Age Restriction
0,The Killer,61.0,6.8,Action Adventure Crime,David Fincher,R
1,The Marvels,61.0,6.0,Action Adventure Fantasy,Nia DaCosta,PG-13
2,The Hunger Games: The Ballad of Songbirds & Sn...,122.0,7.2,Action Adventure Drama,Francis Lawrence,PG-13
5,Five Nights at Freddy's,61.0,5.5,Horror Mystery Thriller,Emma Tammi,PG-13
7,The Creator,122.0,7.0,Action Adventure Drama,Gareth Edwards,PG-13
...,...,...,...,...,...,...
95,Dungeons & Dragons: Honor Among Thieves,122.0,7.3,Action Adventure Comedy,John Francis Daley,PG-13
96,The Breakfast Club,61.0,7.8,Comedy Drama,John Hughes,R
97,The Exorcist: Believer,61.0,4.9,Horror,David Gordon Green,R
98,Gone Girl,122.0,8.1,Drama Mystery Thriller,David Fincher,R


In [97]:
# Get user preferences
genre_preference = input("Enter your preferred genre: ").lower()
min_rating = float(input("Enter the minimum rating you prefer: "))
max_runtime = int(input("Enter the maximum runtime you prefer (in minutes): "))
age_restriction_preference = input("Enter your preferred age restriction: ").lower()

# Filter DataFrame based on user preferences
filtered_df = df_movies[
    (df_movies['movie Genre'].str.contains(genre_preference, case=False)) &
    (df_movies['movie Rating'] >= min_rating) &
    (df_movies['movie Runtime'].astype(float) <= max_runtime) &
    (df_movies['Age Restriction'].str.contains(age_restriction_preference, case=False))
]

# If no movies match user preferences, recommend any movie from the original DataFrame
if filtered_df.empty:
    print("No movies match your preferences. Recommending any movie from the original DataFrame.")
    recommendation = df_movies.sample(1)
else:
    # Select a movie for recommendation from the filtered DataFrame
    recommendation = filtered_df.sample(1)

# Display the recommended movie
print("\nRecommended Movie:")
recommendation[['movie Title', 'movie Rating', 'movie Genre','movie Runtime', 'movie Directors', 'Age Restriction']]

Enter your preferred genre: crime
Enter the minimum rating you prefer: 7
Enter the maximum runtime you prefer (in minutes): 90
Enter your preferred age restriction: none
No movies match your preferences. Recommending any movie from the original DataFrame.

Recommended Movie:


Unnamed: 0,movie Title,movie Rating,movie Genre,movie Runtime,movie Directors,Age Restriction
24,Priscilla,7.1,Biography Drama Music,61.0,Sofia Coppola,R
