# Movie Recommendations Project

Created By: Logan Laszewski

## Description

### Load Data sets

**-The data comes from MovieLens, which anonymously tracks user ratings (over 32 million records!).**

### Merge Datasets

-Combine ratings and movies data so that each record includes the user ID, movie title, and rating.
-The dataset contains 84,239 unique movies across 1,783 unique genres.

### Enter Target User Information

**-The user inputs 8–20 movies they’ve watched and rated on a 1–5 star scale.**
-This helps understand preferences and generate accurate recommendations.

### Identify Similar Users

-Select all users who have rated any of the movies in the target user’s list.

-Group these users’ ratings together into neighbor_groups, aligning their reviews side by side.

**-Only consider users who have rated at least 65% of the movies the target user entered — ensuring enough overlap for meaningful comparison.**

### Compute Similarity Scores

-Use the target user’s ratings and compare them to each candidate’s ratings.

-Calculate Pearson correlation coefficients to measure similarity:

+1.0 → perfect positive correlation (they rate movies the same way relative to each other).

0.0 → no correlation (random relationship).

-1.0 → perfect negative correlation (they always disagree).

NaN → not enough variation (e.g., both rated the same movie with the same score → no variance to compare).

-Keep only the top 10 most similar users (“neighbors”) to the target user.

### Recommend new movies

-Look at movies that neighbors rated highly (e.g., 4–5 stars).

-Exclude any that the target user has already rated.

**-Only include movies rated by at least 3 neighbors to ensure reliability.
Compute a weighted average rating for each movie, where weights are the neighbors’ similarity scores.**

**-Finally, recommend up to 10 movies with the highest weighted scores.**



In [None]:
!pip install rapidfuzz

In [4]:
# @title
import pandas as pd
import numpy as np
from scipy.stats import pearsonr
from rapidfuzz import process, fuzz

# --- Load data ---
moviesID = pd.read_csv("movies.csv")
movieratings = pd.read_csv("ratings.csv")
ratingsandmovies = pd.merge(movieratings, moviesID, on="movieId", how="inner")

# --- Prepare normalized titles for fuzzy matching ---
def normalize(title):
    return title.lower().strip()

normalized_titles = [normalize(t) for t in moviesID['title']]

# --- Get user input ---
target_ratings = {}
min_movies = 8
max_movies = 20

print(f"Please enter between {min_movies} and {max_movies} movies you have seen and rate them 1–5.")

movie_number = 1
while movie_number <= max_movies:
    user_input = input(f"\nMovie #{movie_number} (or press Enter to finish): ").strip()

    if not user_input:
        if movie_number - 1 < min_movies:
            print(f"You must enter at least {min_movies} movies. Please continue.")
            continue
        else:
            print("Finished entering movies.")
            break

    # --- Fuzzy match for movie ---
    matches = process.extract(normalize(user_input), normalized_titles, scorer=fuzz.WRatio, limit=2)
    print("\nTop matches:")
    for idx, (title, score, _) in enumerate(matches, 1):
        print(f"{idx}. {title} (Score: {score:.1f})")
    print("0. None / try again / skip")

    # --- Confirm choice ---
    try:
        choice = int(input("Select the correct match (1, 2, or 0): "))
    except ValueError:
        print("Invalid input. Try again.")
        continue

    if choice == 0:
        retry = input("Retry this movie? (y/n): ").lower().strip()
        if retry == "y":
            continue  # stay on same movie slot
        else:
            movie_number += 1
            continue  # move to next movie slot

    if choice not in [1, 2]:
        print("Invalid choice.")
        continue

    selected_index = matches[choice - 1][2]
    movie_id = moviesID.iloc[selected_index]['movieId']
    selected_title = moviesID.iloc[selected_index]['title']

    # --- Get rating ---
    while True:
        try:
            rating = float(input(f"Your rating for '{selected_title}' (1–5): "))
            if 1 <= rating <= 5:
                break
        except ValueError:
            pass
        print("Invalid rating. Enter a number from 1 to 5.")

    target_ratings[movie_id] = rating
    movie_number += 1


# --- Summary ---
if target_ratings:
    print("\033[1m" + "\nYour input has been recorded:" + "\033[0m")
    for mid, r in target_ratings.items():
        title = moviesID.loc[moviesID['movieId'] == mid, 'title'].values[0]
        print("\033[1m" + f"{title}: {r}" + "\033[0m")
    print("\033[1m" + "\nCalculating recommendations… please wait ⏳\n" + "\033[0m")
else:
    print("No movies entered.")
    raise SystemExit()

# --- Find similar users ---
watched_movies = list(target_ratings.keys())
candidates = ratingsandmovies[ratingsandmovies['movieId'].isin(watched_movies)]
neighbor_groups = candidates.groupby('userId')


# --- Dynamic overlap threshold ---
num_rated = len(target_ratings)
min_overlap = max(3, int(np.ceil(num_rated * 0.65)))  # Require 65% overlap, min 3
print("\033[1m" + f"\nUsing dynamic overlap threshold: {min_overlap} of {num_rated} rated movies must overlap." + "\033[0m")

similarities = []
for neighbor_id, group in neighbor_groups:
    overlap = [m for m in group['movieId'] if m in target_ratings]
    if len(overlap) >=min_overlap:  # lower to 2 to get more neighbors
        neighbor_scores = group.set_index('movieId').loc[overlap]['rating'].values
        target_scores = np.array([target_ratings[m] for m in overlap])

        # Safe Pearson computation
        if np.std(neighbor_scores) == 0 or np.std(target_scores) == 0:
            sim = 0
        else:
            sim, _ = pearsonr(target_scores, neighbor_scores)
            if np.isnan(sim):
                sim = 0

        similarities.append((neighbor_id, sim))

print(f"\033[1m\nFound {len(similarities)} potential neighbors before filtering.\033[0m")

# --- Filter + sort ---
valid_sims = [(uid, sim) for uid, sim in similarities if sim > 0]
top_neighbors = sorted(valid_sims, key=lambda x: x[1], reverse=True)[:10]

print("\nTop neighbors:")
for uid, sim in top_neighbors:
    print(f"User {uid} → similarity {sim:.3f}")

# --- Recommendation generation ---
neighbor_ids = [uid for uid, _ in top_neighbors]
neighbor_ratings = ratingsandmovies[ratingsandmovies['userId'].isin(neighbor_ids)]
candidate_ratings = neighbor_ratings[~neighbor_ratings['movieId'].isin(watched_movies)]

sim_dict = dict(top_neighbors)
candidate_ratings = candidate_ratings.assign(
    weight=candidate_ratings['userId'].map(sim_dict)
)
candidate_ratings = candidate_ratings.assign(
    weighted_score=candidate_ratings['rating'] * candidate_ratings['weight']
)

movie_scores = (
    candidate_ratings.groupby(['movieId', 'title'])
    .agg(
        total_weighted_score=('weighted_score', 'sum'),
        total_weight=('weight', 'sum'),
        num_neighbors=('userId', 'nunique')
    )
    .reset_index()
)

movie_scores['avg_weighted_rating'] = movie_scores['total_weighted_score'] / movie_scores['total_weight']

# Require at least 3 neighbors rating it
movie_scores = movie_scores[movie_scores['num_neighbors'] >= 3]
recommendations = movie_scores.sort_values('avg_weighted_rating', ascending=False).head(10)

# Rename for clarity
recommendations = recommendations.rename(columns={
    'avg_weighted_rating': 'Weighted Avg Rating',
    'num_neighbors': '# of Similar Users that Watched'
})

print("\033[1m" + f"\nTop {len(recommendations)} Recommended Movies:" + "\033[0m")
print(recommendations[['title', 'Weighted Avg Rating', '# of Similar Users that Watched']].to_string(index=False))

Please enter between 8 and 20 movies you have seen and rate them 1–5.


KeyboardInterrupt: Interrupted by user