# Portfolio Project 2: Movie Recommender System

Welcome to your second portfolio project! We will build a simple **Movie Recommender System**. Recommender systems are one of the most popular applications of data science and are used by companies like Netflix, Amazon, and YouTube to personalize user experiences. 🎬

**Business Problem:** A movie streaming service wants to increase user engagement. They believe that if they can recommend movies that a user is likely to enjoy, the user will spend more time on the platform.

**Our Goal:** Build a recommender system that suggests movies to a user based on the ratings of other, similar users.

**Methodology: Collaborative Filtering**
We will use a technique called **item-based collaborative filtering**. The core idea is simple:
> "Users who liked this item also liked..."

The process involves:
1.  Finding the correlation between the ratings of every movie.
2.  For a given movie, finding the movies with the highest correlation.
3.  Returning those highly correlated movies as recommendations.

### Dataset Setup

We'll use a small version of the famous **MovieLens dataset**. It's split into two files.

➡️ **Action 1:** Inside the `06_Portfolio_Projects/Project_02_Movie_Recommender/data/` folder, create a new file named `movies.csv` and paste the following content into it:

```csv
movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
6,Heat (1995),Action|Crime|Thriller
7,Sabrina (1995),Comedy|Romance
8,Tom and Huck (1995),Adventure|Children
9,Sudden Death (1995),Action
10,GoldenEye (1995),Action|Adventure|Thriller
```

➡️ **Action 2:** In the same folder, create another file named `ratings.csv` and paste this content:

```csv
userId,movieId,rating,timestamp
1,1,4.0,964982703
1,3,4.0,964981247
1,6,4.0,964982224
2,1,5.0,847117132
2,10,3.0,847117132
3,1,2.0,847117132
3,6,3.0,847117132
4,1,5.0,964982703
4,2,4.0,964981247
4,6,5.0,964982224
5,1,4.0,847117132
5,3,3.0,847117132
5,7,5.0,847117132
6,2,3.0,847117132
6,5,5.0,847117132
```

In [None]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

## 1. Data Loading and Merging

In [None]:
movies = pd.read_csv('data/movies.csv')
ratings = pd.read_csv('data/ratings.csv')

# Merge the two dataframes on 'movieId'
df = pd.merge(ratings, movies, on='movieId')
df.head()

## 2. Exploratory Data Analysis (EDA)

Let's create a new dataframe that shows the average rating and the number of ratings for each movie.

In [None]:
ratings_summary = df.groupby('title')['rating'].agg(['mean', 'count'])
ratings_summary.rename(columns={'mean': 'avg_rating', 'count': 'num_ratings'}, inplace=True)
ratings_summary.sort_values(by='num_ratings', ascending=False).head()

## 3. Creating the User-Item Matrix

**Theory:** To find correlations, we need our data in a specific format: a matrix where the rows are users, the columns are movies, and the values are the ratings. Most values in this matrix will be `NaN`, because a single user has only rated a few of the total movies.

In [None]:
movie_matrix = df.pivot_table(index='userId', columns='title', values='rating')
movie_matrix.head()

## 4. Building the Recommender

Now we can implement the logic.
1.  Choose a movie to get recommendations for (e.g., 'Toy Story (1995)').
2.  Grab the ratings for that movie from our matrix.
3.  Use the `.corrwith()` method to compute the pairwise correlation between that movie's ratings and all other movies' ratings.
4.  Clean up the result and join it with the number of ratings to filter out movies with too few reviews.

In [None]:
# Step 1 & 2: Get ratings for a specific movie
movie_user_ratings = movie_matrix['Toy Story (1995)']

# Step 3: Compute correlations
similar_to_movie = movie_matrix.corrwith(movie_user_ratings)

# Step 4: Clean up and join
corr_movie = pd.DataFrame(similar_to_movie, columns=['Correlation'])
corr_movie.dropna(inplace=True)
corr_movie = corr_movie.join(ratings_summary['num_ratings'])

# Filter out movies with less than 2 ratings and sort by correlation
recommendations = corr_movie[corr_movie['num_ratings'] > 1].sort_values('Correlation', ascending=False)

recommendations.head()

## 5. Conclusion

Success! We've built a simple but effective movie recommender. The output shows that users who liked 'Toy Story' also tended to give high ratings to 'Grumpier Old Men' and 'Heat'.

This project demonstrates how to manipulate data into a user-item matrix and apply correlation to find similar items, which is the foundation of many powerful recommender systems.

**Congratulations on completing the projects module! You have successfully built two end-to-end data science applications.**