### 1.Business Understanding

Business Problem: "Provide personalized movie recommendations to users based on their ratings of previously watched films."

### 2. Objectives

1. Recommendation Accuracy

Achieve a Mean Absolute Error (MAE) of less than 0.5 within the first three months of implementation.

This objective directly measures how accurately the system predicts user ratings, which is critical for user satisfaction and trust in the recommendations provided.

2. User Engagement

Achieve a minimum of 80% user engagement rate within the first month of deployment, measured by the percentage of users who input their ratings after visiting the platform.

High user engagement indicates that users find value in the system. Engaged users are more likely to return and utilize the recommendations, enhancing overall platform success.

3. Cold Start Solution

Implement a cold start solution that allows users with fewer than 5 ratings to receive recommendations based on popular movies or content-based filtering methods, achieving a satisfaction rating of 70% from users in this category.

Addressing the cold start problem is essential for retaining new users who may not have extensive rating histories. This ensures that all users, regardless of experience level, can receive relevant recommendations.

Focusing on these three objectives will help us build a robust and user-friendly recommendation system while ensuring high accuracy and user satisfaction from the outset.

### 2.Data Understanding

In [12]:
# import relevant libraries
import pandas as pd
from surprise import Dataset, Reader
from surprise.model_selection import train_test_split


In [2]:
# load data
#movies data
movies_df=pd.read_csv(r"C:\Users\David\Documents\PHASE 4 PROJECT\movies.csv")
movies_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [3]:
#ratings data
ratings_df=pd.read_csv(r"C:\Users\David\Documents\PHASE 4 PROJECT\ratings.csv")
ratings_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [4]:
#links data
links_df=pd.read_csv(r"C:\Users\David\Documents\PHASE 4 PROJECT\links.csv")
links_df.head()

Unnamed: 0,movieId,imdbId,tmdbId
0,1,114709,862.0
1,2,113497,8844.0
2,3,113228,15602.0
3,4,114885,31357.0
4,5,113041,11862.0


### 2. Data preprocessing 

In [5]:
#Merging DataFrames to create a comprehensive dataset
merged_df = pd.merge(ratings_df, movies_df, on='movieId')  # Merge ratings with movies
merged_df = pd.merge(merged_df, links_df, on='movieId')    # Merge the resulting DataFrame with links
print(merged_df.head())
print(merged_df.columns)


   userId  movieId  rating   timestamp             title  \
0       1        1     4.0   964982703  Toy Story (1995)   
1       5        1     4.0   847434962  Toy Story (1995)   
2       7        1     4.5  1106635946  Toy Story (1995)   
3      15        1     2.5  1510577970  Toy Story (1995)   
4      17        1     4.5  1305696483  Toy Story (1995)   

                                        genres  imdbId  tmdbId  
0  Adventure|Animation|Children|Comedy|Fantasy  114709   862.0  
1  Adventure|Animation|Children|Comedy|Fantasy  114709   862.0  
2  Adventure|Animation|Children|Comedy|Fantasy  114709   862.0  
3  Adventure|Animation|Children|Comedy|Fantasy  114709   862.0  
4  Adventure|Animation|Children|Comedy|Fantasy  114709   862.0  
Index(['userId', 'movieId', 'rating', 'timestamp', 'title', 'genres', 'imdbId',
       'tmdbId'],
      dtype='object')


In [13]:
# select relevant columns
df = merged_df[['userId', 'movieId', 'rating']]


In [14]:
# Create Surprise Dataset:
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['userId', 'movieId', 'rating']], reader)


In [15]:
# Split the data
trainset, testset = train_test_split(data, test_size=0.2)



### 3. Model Selection

I will use SVD (Singular Value Decomposition) due to its effectiveness in collaborative filtering.

In [16]:
from surprise import SVD
model = SVD()


### 4.Model Training

In [17]:
model.fit(trainset)


<surprise.prediction_algorithms.matrix_factorization.SVD at 0x20242bb9fa0>

### 5. Make Predictions

I will use the trained model to make predictions on the test set

In [18]:
predictions = model.test(testset)


### 6. Evaluation

I will use RMSE to evaluate the performance

In [19]:
from surprise import accuracy
rmse = accuracy.rmse(predictions)
print(f'RMSE: {rmse}')


RMSE: 0.8763
RMSE: 0.8762544158199204


An RMSE of approximately 0.88 indicates that our recommendation system is performing reasonably well, with predictions close to the actual ratings given by users

I will explore hyperparameter tuning to enhance prediction accuracy further.

### 7. Hyperparameter Optimization