# Content Based Movie Recommendation System
---
## Group 7

**Student Names:**  
- Alex Mwera  
- Noel Seda  
- Zena Lisa Karari  

**Student Pace:** Part-time  
**Instructor Name:** Mildred Jepkosgei


## Table of Contents
1. Business Understanding  
2. Problem Statement  
3. Data Understanding  
4. Data Preprocessing  
5. Exploratory Data Analysis (EDA)  
6. Content Based Modelling    
7. Model Evaluation  
8. Top-5 Recommendation Generation  
9. Conclusion and Final Recommendations  



## 1. Business Understanding

### Objective
To improve user satisfaction and engagement on a movie streaming platform by providing **Top 5 personalised movie recommendations** based on users' past ratings. The goal is to simulate how platforms like Netflix and Prime Video tailor suggestions to each user.

### Scope
This project uses the MovieLens `ml-latest-small` dataset. It focuses on:
- Explicit rating data (0.5 to 5.0 scale)
- Recommending unseen movies to users

### Success Criteria

| Metric                 | Goal                            |
|------------------------|----------------------------------|
| RMSE / MAE             | Below 1.0                        |
| Recommendation Relevance | Top 5 suggestions match user interests |
| Coverage               | Recommendations generated for most users |
| Scalability            | Can extend to larger datasets or real-world scenarios |



## 2. Problem Statement

In the age of digital content, users are often overwhelmed by thousands of movie choices. Without a guiding system, users may miss out on films they would have enjoyed, leading to reduced satisfaction and platform engagement.

> **How can we accurately recommend the Top 5 movies to a user based on their historical movie ratings using content-based filtering techniques?**

This project aims to build such a recommender system using content based filtering, trained on historical ratings data from MovieLens.



## 3. Data Understanding

The dataset used is the `ml-latest-small` version from MovieLens, containing 100,836 ratings by 610 users across 9,742 movies.

### Key Files:
- `movies.csv` – Contains `movieId`, `title`, `genres`
- `ratings.csv` – Contains `userId`, `movieId`, `rating`, `timestamp`
- `tags.csv`  – Contains user-generated tags
- `links.csv`  – External references (IMDb, TMDb)

### Ratings Data:
- Ratings range from 0.5 to 5.0 in 0.5 increments
- Explicit feedback format



## 4. Data Preprocessing

- Removed duplicates and unnecessary fields
- Converted data into a **user-item matrix**
- Split data into **training and test sets**
- Normalized data for similarity calculations



## 5. Exploratory Data Analysis (EDA)

- Visualized distribution of ratings
- Identified most rated movies and most active users
- Checked for matrix sparsity
- Generated basic statistics to guide model design



## 6. Content-based Modelling 

### User-Based Content Filtering
- Recommends movies highly rated by users
- Finds movies similar to those the user has liked
- More scalable and stable for sparse datasets



## 7. Model Evaluation

- **Root Mean Squared Error (RMSE)**
- **Mean Absolute Error (MAE)**
- Precision@K, Recall@K for Top-N evaluation
- Cross-validation for reliability



## 8. Top-10 Recommendation Generation

- For each user, predicted unseen ratings
- Sorted predicted scores to get Top 10 movies
- Filtered out already-watched titles
- Presented recommendations in readable format



## 9. Cold Start and Hybrid Approach 

- Addressed new users/movies using:
  - Genre-based content filtering
  - Initial rating surveys
- Hybrid approach combines content and collaborative filtering



## 10. Conclusion and Final Recommendations

### Conclusion:
- Developed a content filtering-based movie recommender
- Achieved strong performance on metrics like RMSE and MAE
- Demonstrated potential for deployment and scaling

### Future Work:
- Build hybrid models with content metadata
- Explore deep learning-based recommenders