## 1. Understand the Basics of Recommender Systems

Before jumping into code, understand the two main types of recommendation systems:

✅ Collaborative Filtering – Based on user-item interactions.

✅ Content-Based Filtering – Based on item features.

✅ Hybrid Systems – A combination of both.


## 2. Learn the Core Mathematical Concepts

Understanding these will help in implementing models:

🔹 Linear Algebra – Matrices and vector operations.

🔹 Probability & Statistics – Bayes theorem, normal distribution.

🔹 Machine Learning Basics – Supervised vs Unsupervised Learning.


In [2]:
# Install the necessary libraries using:
!pip install pandas numpy scikit-learn surprise scipy


Collecting surprise
  Downloading surprise-0.1-py2.py3-none-any.whl (1.8 kB)
Installing collected packages: surprise
Successfully installed surprise-0.1


## 3. Get Hands-on with Python Libraries

In [3]:
import pandas as pd

# Load the dataset
movies = pd.read_csv("https://files.grouplens.org/datasets/movielens/ml-100k/u.item", sep="|", encoding="latin-1", header=None)

# Display first 5 rows
movies.head()


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,14,15,16,17,18,19,20,21,22,23
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


## 4 Build a Simple Popularity-Based Recommendation System

In [16]:
ratings = pd.read_csv("https://files.grouplens.org/datasets/movielens/ml-100k/u.data", sep="\t", names=["user_id", "item_id", "rating", "timestamp"])
print(ratings.head())
# Get the average rating for each movie
movie_avg = ratings.groupby("item_id")["rating"].mean().reset_index()

# Recommend top 5 movies
top_movies = movie_avg.sort_values(by="rating", ascending=False).head(5)
print(top_movies)


   user_id  item_id  rating  timestamp
0      196      242       3  881250949
1      186      302       3  891717742
2       22      377       1  878887116
3      244       51       2  880606923
4      166      346       1  886397596
      item_id  rating
813       814     5.0
1598     1599     5.0
1200     1201     5.0
1121     1122     5.0
1652     1653     5.0


##  5. Implement User-Based Collaborative Filtering

In [19]:
import os
import urllib.request
from surprise import Dataset, Reader
from surprise import KNNBasic
from surprise.model_selection import train_test_split
from surprise import accuracy

# Step 1: Download the dataset and save it locally
url = "https://files.grouplens.org/datasets/movielens/ml-100k/u.data"
local_filename = "u.data"

# Check if the file already exists locally; if not, download it
if not os.path.exists(local_filename):
    print(f"Downloading {url} to {local_filename}...")
    urllib.request.urlretrieve(url, local_filename)
else:
    print(f"{local_filename} already exists. Skipping download.")

# Step 2: Load the data using the local file path
reader = Reader(line_format='user item rating timestamp', sep='\t')
data = Dataset.load_from_file(local_filename, reader=reader)

# Step 3: Split the data into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2)

# Step 4: Build the user-based collaborative filtering model
sim_options = {'name': 'cosine', 'user_based': True}
model = KNNBasic(sim_options=sim_options)
model.fit(trainset)

# Step 5: Make predictions and calculate RMSE
predictions = model.test(testset)
print("RMSE:", accuracy.rmse(predictions))

Downloading https://files.grouplens.org/datasets/movielens/ml-100k/u.data to u.data...
Computing the cosine similarity matrix...
Done computing similarity matrix.
RMSE: 1.0168
RMSE: 1.0168189225636801


## 6. Implement Item-Based Collaborative Filtering

In [20]:
sim_options = {'name': 'cosine', 'user_based': False}
model = KNNBasic(sim_options=sim_options)
model.fit(trainset)

predictions = model.test(testset)
print("RMSE:", accuracy.rmse(predictions))


Computing the cosine similarity matrix...
Done computing similarity matrix.
RMSE: 1.0278
RMSE: 1.027811826515474


## 7. Content-Based Filtering

In [21]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Sample Movie Data
data = {"Movie": ["Iron Man", "Avengers", "Thor", "Spider-Man"],
        "Description": ["Superhero in iron suit", "Superheroes fight villains", "God of thunder", "Teen superhero in New York"]}

df = pd.DataFrame(data)

# Convert text into vectors
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(df["Description"])

# Compute cosine similarity
similarity = cosine_similarity(tfidf_matrix)
print(similarity)


[[1.         0.         0.         0.33512282]
 [0.         1.         0.         0.        ]
 [0.         0.         1.         0.        ]
 [0.33512282 0.         0.         1.        ]]


## 8. Matrix Factorization (SVD – Singular Value Decomposition)

In [22]:
from surprise import SVD

model = SVD()
model.fit(trainset)
predictions = model.test(testset)
print("RMSE:", accuracy.rmse(predictions))


RMSE: 0.9342
RMSE: 0.9341725093409238


## 9. Deploy Your Model 

In [44]:
!pip install streamlit scikit-surprise pandas
!pip install streamlit
!pip install --upgrade plotly

Requirement already up-to-date: plotly in c:\users\user\anaconda3\envs\learn-env\lib\site-packages (6.0.0)


In [50]:
from scipy.sparse import csc_matrix
from scipy.sparse.linalg import svds
#import streamlit as st

# st.title("Movie Recommender")
# user_id = st.number_input("Enter User ID", min_value=1, max_value=943)
# st.write(f"Recommendations for User {user_id} will be displayed here!")


OR 

In [43]:
import streamlit as st
import pandas as pd
from surprise import Dataset, Reader
from surprise import KNNBasic
from surprise.model_selection import train_test_split
from collections import defaultdict

# Step 1: Load the MovieLens dataset
def load_data():
    # Download and load the MovieLens 100k dataset
    from surprise import Dataset
    data = Dataset.load_builtin('ml-100k')
    return data

# Step 2: Train the recommendation model
def train_model(data):
    # Use KNNBasic for collaborative filtering
    trainset = data.build_full_trainset()
    sim_options = {'name': 'cosine', 'user_based': True}
    model = KNNBasic(sim_options=sim_options)
    model.fit(trainset)
    return model, trainset

# Step 3: Get top-N recommendations for a user
def get_top_n_recommendations(model, trainset, user_id, n=10):
    # Get all items the user has not rated
    user_inner_id = trainset.to_inner_uid(str(user_id))
    items_rated_by_user = set([j for (j, _) in trainset.ur[user_inner_id]])
    all_items = set(trainset.all_items())
    items_not_rated = all_items - items_rated_by_user

    # Predict ratings for unrated items
    predictions = [
        model.predict(user_id, trainset.to_raw_iid(item)) for item in items_not_rated
    ]

    # Sort predictions by estimated rating
    top_n = sorted(predictions, key=lambda x: x.est, reverse=True)[:n]

    # Extract movie IDs and estimated ratings
    top_n_movies = [(pred.iid, pred.est) for pred in top_n]
    return top_n_movies

# Step 4: Load movie names for better display
def load_movie_names():
    movies = pd.read_csv(
        "https://raw.githubusercontent.com/niklasdonges/movielens/master/ml-100k/u.item",
        sep="|",
        header=None,
        encoding="latin-1"
    )
    movie_names = movies[[0, 1]]  # Columns: movie_id, movie_title
    movie_names.columns = ["movie_id", "title"]
    return movie_names

# Main function for the Streamlit app
def main():
    st.title("Movie Recommender")

    # Load data and train the model
    data = load_data()
    model, trainset = train_model(data)
    movie_names = load_movie_names()

    # User input
    user_id = st.number_input("Enter User ID", min_value=1, max_value=943, value=1)

    if st.button("Get Recommendations"):
        # Get recommendations for the user
        top_n = get_top_n_recommendations(model, trainset, user_id, n=10)

        # Map movie IDs to movie titles
        recommended_movies = []
        for movie_id, rating in top_n:
            title = movie_names[movie_names["movie_id"] == int(movie_id)]["title"].values[0]
            recommended_movies.append(f"{title} (Predicted Rating: {rating:.2f})")

        # Display recommendations
        st.subheader(f"Top 10 Movie Recommendations for User {user_id}:")
        for i, movie in enumerate(recommended_movies, start=1):
            st.write(f"{i}. {movie}")

if __name__ == "__main__":
    main()

AttributeError: module 'plotly.graph_objs.layout.template.data' has no attribute 'Icicle'