<a href="https://colab.research.google.com/github/InsightfulSantosh/Complex-algorithm-methods/blob/main/Recommender_Systems_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **📝Session Flow📝**:

* **Learning Objective**
 - Introduction
 - Theme
 - Primary Goals

* **Learning Material**
  * Introduction
  * Understanding recommendation systems: Collaborative filtering, content-based filtering, and hybrid models
  * Understanding Recommendation Systems: Implementation of Collaborative Filtering
  * Content Based Filtering
  * Activity 1: Fill In The Blanks
  * Implementation of Content Based Filtering
  * Evaluation metrics for recommendation systems: Precision, recall, F1-score, and others
  * Real-world applications of recommendation systems: E-commerce, social media, and others
  * Activity 2: True or False

* **Summary**
 - What did you learn?
 - Best Practices and Tips for Recommender Systems
 - Shortcomings to Keep in Mind for Recommender Systems

* **Enhance your Knowledge**
 - Additional Reference Paper
 - Mnemonic

* **Try it Yourself**
  - Take Home Assignment
  - Social Engagement

# 👨🏻‍🎓 **Learning Objective** 👨🏻‍🎓

### **Introduction:**

📢 Attention Students! 📢

Welcome to this class on **Recommender Systems**. Recommender systems play a crucial role in various domains, including e-commerce, social media, and more. In this course, we will explore the fascinating field of recommendation systems and learn about the techniques used to provide personalized recommendations to users.

Throughout this course, you will delve into the following topics:

🔍 **Understanding recommendation systems:** You will learn about the different types of recommendation systems, including collaborative filtering, content-based filtering, and hybrid models. Understanding these approaches is essential for building effective recommendation systems.

📏 **Evaluation metrics for recommendation systems:** You will learn about evaluation metrics used to assess the performance of recommendation systems. Metrics such as precision, recall, F1-score, and others will help you measure the accuracy and effectiveness of your recommendations.

🌍 **Real-world applications of recommendation systems:** You will discover how recommendation systems are applied in various domains, such as e-commerce and social media. Understanding these applications will give you insights into the practical impact of recommendation systems.

### **Theme:**

Recommender systems play a crucial role in various industries by providing personalized recommendations to users, enhancing user experience, and driving business growth. Data professionals can leverage advanced algorithms and machine learning techniques to build effective recommender systems that cater to the unique preferences and needs of individual users. These systems utilize data type casting and mathematical functions to process user data, product information, and historical interactions, generating accurate and relevant recommendations. E-commerce platforms can employ recommender systems to suggest products based on user behavior, increasing sales and customer satisfaction.

Streaming services can use these systems to recommend movies, shows, or music based on users' viewing or listening history, improving user retention and engagement. Social media platforms can utilize recommender systems to suggest relevant content and connections, fostering user engagement and community building. Additionally, news websites can implement these systems to personalize article recommendations, enhancing user engagement and promoting content discovery.

By mastering type casting and leveraging mathematics functions in building recommender systems, data professionals can unlock the full potential of their datasets, delivering personalized experiences and driving success in their respective industries. 🎯💡🛍️🎬📰

### **Primary Goals:**

🎯 In this lesson, our primary goals are to:

🔍 Understand the different types of recommendation systems, including collaborative filtering, content-based filtering, and hybrid models

📏 Familiarize yourself with evaluation metrics, such as precision, recall, F1-score, and others, to assess the performance of recommendation systems

🌍 Explore real-world applications of recommendation systems in domains like e-commerce and social media

💡 By the end of this lesson, you will have a solid understanding of the fundamentals of recommendation systems and the techniques used to create effective personalized recommendations. You will be equipped with the knowledge to build recommendation systems for real-world applications and improve their performance. So, get ready to embark on an exciting journey into the world of recommender systems! 🚀

# **📖 Learning Material 📖**


## <u><b> Introduction to Recommender Systems </b></u>


The objective of a Recommender System is to recommend relevant items for users, based on their preference. Preference and relevance are subjective, and they are generally inferred by items users have consumed previously

<b> Popular Recommender Systems widely used in Industries are : </b>

### <u><b> Collaborative Filtering </b></u>

This method makes automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on a set of items, A is more likely to have B's opinion for a given item than that of a randomly chosen person.   


### <u><b> Content-Based Filtering </b></u>

This method uses only information about the description and attributes of the items users has previously consumed to model user's preferences. In other words, these algorithms try to recommend items that are similar to those that a user liked in the past (or is examining in the present). In particular, various candidate items are compared with items previously rated by the user and the best-matching items are recommended.


### <u><b> Hybrid Approach </b></u>
Recent research has demonstrated that a hybrid approach, combining collaborative filtering and content-based filtering could be more effective than pure approaches in some cases. These methods can also be used to overcome some of the common problems in recommender systems such as cold start and the sparsity problem.

## <b>Collaborative Filtering Model: </b>

### Collaborative Filtering (CF) has two main implementation strategies:  
### <u>Memory-based</u>

* This approach uses the memory of previous users interactions to compute users similarities based on items they've interacted (user-based approach) or compute items similarities based on the users that have interacted with them (item-based approach).

* A typical example of this approach is User Neighbourhood-based CF, in which the top-N similar users (usually computed using Pearson correlation) for a user are selected and used to recommend items those similar users liked, but the current user have not interacted yet. This approach is very simple to implement, but usually do not scale well for many users.

### <u> Model-based </u>
* In this approach, models are developed using different machine learning algorithms to recommend items to users. There are many model-based CF algorithms, like Neural Networks, Bayesian Networks, Clustering Techniques, and Latent Factor Models such as Singular Value Decomposition (SVD) and Probabilistic Latent Semantic Analysis.

### **Problem Statement:**

**Background:**

Recommendation systems are crucial for enhancing user experience and engagement on online platforms, particularly in the context of movie streaming services. User-based collaborative filtering is a widely used technique for providing personalized movie recommendations by leveraging the similarity between users.

**Objective:**

The objective of this problem is to develop a user-based collaborative filtering recommender system that provides movie recommendations to users based on their similarity to other users. The system aims to enhance user engagement and satisfaction by suggesting movies that are likely to be of interest to a target user, drawing on the preferences of similar users.

**Data Description:**

The problem utilizes a synthetic user-movie interaction dataset. This dataset contains user IDs, movie IDs, and user ratings for a set of movies. It is used to build a user-item rating matrix, where rows represent users, columns represent movies, and the matrix cells contain user ratings.

**Implementing a User Based Collaborative Filtering Model**

In [None]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

# Load the synthetic ratings DataFrame
ratings_df = pd.DataFrame({
    'user_id': [1, 1, 1, 2, 2, 2, 3, 3],
    'movie_id': [1, 2, 3, 1, 4, 5, 2, 5],
    'rating': [5, 3, 4, 5, 4, 2, 5, 5]
})

ratings_df

Unnamed: 0,user_id,movie_id,rating
0,1,1,5
1,1,2,3
2,1,3,4
3,2,1,5
4,2,4,4
5,2,5,2
6,3,2,5
7,3,5,5


In [None]:
# Create a user-item rating matrix
user_item_matrix = pd.pivot_table(ratings_df, values='rating', index='user_id', columns='movie_id', fill_value=0)

In [None]:
user_item_matrix

movie_id,1,2,3,4,5
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,5,3,4,0,0
2,5,0,0,4,2
3,0,5,0,0,5


In [None]:
# Calculate the cosine similarity between users
user_similarity = cosine_similarity(user_item_matrix)

In [None]:
user_similarity

array([[1.        , 0.52704628, 0.3       ],
       [0.52704628, 1.        , 0.21081851],
       [0.3       , 0.21081851, 1.        ]])

**Let's try building a model that kind of mimics the Collaborative Filtering**

In [None]:
# Define the user ID and number of top recommendations
user_id = 2
top_n = 5

# Adjust for 0-based indexing to get similarity scores for the target user
target_user_similarity = user_similarity[user_id - 1]
print("Target User Similarity Scores:", target_user_similarity)

# Get the user's ratings
user_ratings = user_item_matrix.loc[user_id]
print("User Ratings:", user_ratings)

# Initialize an empty list to store recommendations
recommendations = []

# Iterate over each movie
for movie_id in user_item_matrix.columns:
    # Skip movies the user has already rated
    if user_ratings[movie_id] > 0:
        continue

    # Calculate the weighted sum of ratings for this movie based on user similarity
    # This is done by iterating over each user and multiplying their similarity score with the target user by their rating for the movie.
    weighted_sum = sum(target_user_similarity[other_user_id - 1] * user_item_matrix.loc[other_user_id, movie_id]
                       for other_user_id in user_item_matrix.index)/len(target_user_similarity)

    # Calculate the average similarity score for the current movie.
    #This is done by summing the similarity scores of all users who have rated the movie and dividing by the count of such users.
    rated_users_similarity_sum = sum(target_user_similarity[other_user_id - 1]
                                     for other_user_id in user_item_matrix.index
                                     if user_item_matrix.loc[other_user_id, movie_id] > 0)
    average_similarity = rated_users_similarity_sum / len([other_user_id for other_user_id in user_item_matrix.index
                                                            if user_item_matrix.loc[other_user_id, movie_id] > 0])

    # Calculate the predicted rating for this movie
    predicted_rating = weighted_sum / average_similarity if average_similarity > 0 else 0
    print("Movie ID:", movie_id)
    print("Weighted Sum:", weighted_sum)
    print("Rated Similairty Sum:", rated_users_similarity_sum)
    print("Average Similarity:", average_similarity)
    print("Predicted Rating:", predicted_rating)

    # Add the movie ID and predicted rating to the recommendations list
    recommendations.append((movie_id, predicted_rating))


Target User Similarity Scores: [0.52704628 1.         0.21081851]
User Ratings: movie_id
1    5
2    0
3    0
4    4
5    2
Name: 2, dtype: int64
Movie ID: 2
Weighted Sum: 0.8784104611578831
Rated Similairty Sum: 0.7378647873726217
Average Similarity: 0.36893239368631087
Predicted Rating: 2.3809523809523814
Movie ID: 3
Weighted Sum: 0.7027283689263064
Rated Similairty Sum: 0.5270462766947298
Average Similarity: 0.5270462766947298
Predicted Rating: 1.3333333333333333


**Let's make our code modularized**

In [None]:
# Define a function to get movie recommendations for a user
def get_recommendations(user_id, top_n=5):
    # Get the similarity scores for the target user
    target_user_similarity = user_similarity[user_id - 1]  # Adjust for 0-based indexing

    # Get the user's ratings
    user_ratings = user_item_matrix.loc[user_id]

    # Initialize an empty list to store recommendations
    recommendations = []

    # Iterate over each movie
    for movie_id in user_item_matrix.columns:
        # Skip movies the user has already rated
        if user_ratings[movie_id] > 0:
            continue

        # Calculate the weighted sum of ratings for this movie based on user similarity
        weighted_sum = sum(target_user_similarity[other_user_id - 1] * user_item_matrix.loc[other_user_id, movie_id]
                           for other_user_id in user_item_matrix.index)/len(target_user_similarity)

        # Calculate the average similarity score for this movie
        rated_users_similarity_sum = sum(target_user_similarity[other_user_id - 1]
                                         for other_user_id in user_item_matrix.index
                                         if user_item_matrix.loc[other_user_id, movie_id] > 0)
        average_similarity = rated_users_similarity_sum / len([other_user_id for other_user_id in user_item_matrix.index
                                                                if user_item_matrix.loc[other_user_id, movie_id] > 0])

        # Calculate the predicted rating for this movie
        predicted_rating = weighted_sum / average_similarity if average_similarity > 0 else 0

        # Add the movie ID and predicted rating to the recommendations list
        recommendations.append((movie_id, predicted_rating))

    # Sort recommendations by predicted rating in descending order
    recommendations.sort(key=lambda x: x[1], reverse=True)

    # Get the top N recommended movie IDs
    top_recommendations = [movie_id for movie_id, _ in recommendations[:top_n]]
    return top_recommendations


In [None]:
# Example: Get movie recommendations
get_recommendations(user_id=3, top_n=1)

[1]

## <u><b> Content-Based Filtering </b></u>

This method uses only information about the description and attributes of the items users has previously consumed to model user's preferences. In other words, these algorithms try to recommend items that are similar to those that a user liked in the past (or is examining in the present). In particular, various candidate items are compared with items previously rated by the user and the best-matching items are recommended.

Content-based filtering approaches leverage description or attributes from items the user has interacted to recommend similar items. It depends only on the user previous choices, making this method robust to avoid the *cold-start* problem. For textual items, like articles, news and books, it is simple to use the raw text to build item profiles and user profiles.  

### **Activity 1: Fill in the Blanks**



1. In collaborative filtering, recommendations are made based on the preferences and behaviors of ____________ users.


2. Content-based filtering recommends items based on the ____________ of the items themselves.


3. Hybrid models combine ____________ and content-based filtering approaches to provide more accurate and diverse recommendations.




#### **Acitivity 1 Answers:**



1. similar

2. characteristics

3. collaborative filtering



### **Problem Statement:**

**Background:**

The entertainment industry, particularly the film industry, witnesses a vast array of movies being released every year. With such a plethora of choices available to audiences, it can often be overwhelming for individuals to decide which movies to watch. In this era of digital streaming platforms and online movie databases, personalized movie recommendation systems have become increasingly popular. These systems help users discover new movies tailored to their preferences, enhancing their overall viewing experience.

**Objective:**

The objective of this project is to develop a movie recommendation system based on content similarity. Instead of relying on user-item interactions, this system suggests movies similar to a given movie based on their textual features such as title, genre, and description. By leveraging natural language processing (NLP) techniques and machine learning algorithms, the recommendation system aims to provide personalized movie recommendations to users, thereby improving user engagement and satisfaction.

**Data Description:**

The dataset used for this project consists of information about several movies, including their unique identifiers (movie_id), titles (title), genres (genre), and descriptions (description). Each movie is represented by a combination of these features, which are concatenated into a single column named features. The descriptions provide textual summaries of the movies, capturing key themes, plot points, and characterizations.


**How doe we think of solving it**

The dataset serves as the foundation for building the recommendation system. It is preprocessed to create a TF-IDF (Term Frequency-Inverse Document Frequency) matrix, which represents the textual features of each movie. This matrix is then used to compute a similarity matrix using the cosine similarity metric. Finally, a function is defined to generate recommendations based on a given movie title, utilizing the computed similarity scores.

Overall, the dataset and the subsequent processing pipeline enable the creation of a content-based movie recommendation system that suggests similar movies to users based on textual similarities in their descriptions and other features.

**Implementing a Content Based Filtering Model:**

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

# Sample dataset
data = {
    'movie_id': [1, 2, 3, 4, 5],
    'title': ['The Shawshank Redemption', 'The Godfather', 'The Dark Knight', 'Pulp Fiction', 'Forrest Gump'],
    'genre': ['Drama', 'Crime', 'Action', 'Crime', 'Drama'],
    'description': ['Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.',
                    "An organized crime dynasty's aging patriarch transfers control of his clandestine empire to his reluctant son.",
                    'When the menace known as The Joker, emerges from his mysterious past, he wreaks havoc and chaos on the people of Gotham.',
                    'The lives of two mob hitmen, a boxer, a gangster and his wife, and a pair of diner bandits intertwine in four tales of violence and redemption.',
                    'The presidencies of Kennedy and Johnson, the Vietnam War, the Watergate scandal and other historical events unfold from the perspective of an Alabama man with an IQ of 75, whose only desire is to be reunited with his childhood sweetheart.']
}

movies_df = pd.DataFrame(data)

# Concatenate text features
movies_df['features'] = movies_df['title'] + ' ' + movies_df['genre'] + ' ' + movies_df['description']

# TF-IDF Vectorization
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf_vectorizer.fit_transform(movies_df['features'])

# Compute similarity matrix
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

# Function to get recommendations
def get_recommendations(title, cosine_sim=cosine_sim):
    idx = movies_df.loc[movies_df['title'] == title].index[0]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:6]  # Top 5 similar movies
    movie_indices = [i[0] for i in sim_scores]
    return movies_df['title'].iloc[movie_indices]

# Example: Get recommendations for a movie
movie_title = 'The Dark Knight'
recommended_movies = get_recommendations(movie_title)
print(f"Recommended movies for '{movie_title}':")
print(recommended_movies)


Recommended movies for 'The Dark Knight':
0    The Shawshank Redemption
1               The Godfather
3                Pulp Fiction
4                Forrest Gump
Name: title, dtype: object


In [None]:
import pandas as pd
import numpy as np

# Create the ratings DataFrame
ratings_df = pd.DataFrame({
    'user_id': [1, 1, 2, 2, 3],
    'movie_id': [1, 2, 3, 4, 5],
    'rating': [5, 3, 4, 2, 5]
})

# Create a movies DataFrame with movie_id and genre
movies_df = pd.DataFrame({
    'movie_id': [1, 2, 3, 4, 5],
    'genre': ['Action', 'Comedy', 'Drama', 'Fantasy', 'Horror']
})

# Merge ratings and movies data to get genre information for each movie
merged_df = ratings_df.merge(movies_df, on='movie_id')

# Calculate the average rating for each genre
genre_avg_ratings = merged_df.groupby('genre')['rating'].mean().reset_index()

# Define a function to recommend top-rated movies of a given genre
def recommend_movies_by_genre(user_id, genre, top_n=5):
    # Filter movies of the specified genre
    genre_movies = merged_df[merged_df['genre'] == genre]

    # Exclude movies the user has already rated
    user_rated_movies = ratings_df[ratings_df['user_id'] == user_id]['movie_id']
    genre_movies = genre_movies[~genre_movies['movie_id'].isin(user_rated_movies)]

    # Sort movies by average rating in descending order
    genre_movies = genre_movies.sort_values(by='rating', ascending=False)

    # Get the top N recommended movie IDs
    top_recommendations = genre_movies['movie_id'].head(top_n).tolist()
    return top_recommendations

# Example: Get top 5 movie recommendations in the 'Action' genre for User 1
user_id = 2
genre = 'Action'
recommendations = recommend_movies_by_genre(user_id, genre)
print(f"Top 5 '{genre}' Movie Recommendations for User {user_id}: {recommendations}")

Top 5 'Action' Movie Recommendations for User 2: [1]


### **Evaluation metrics for recommendation systems: Precision, recall, F1-score, and others**

🔍 Evaluation metrics play a crucial role in assessing the performance of recommendation systems. These metrics help measure the effectiveness of the system in providing relevant and personalized recommendations to users. Some commonly used evaluation metrics for recommendation systems include precision, recall, F1-score, and others.

🎯 **Precision** is a metric that measures the proportion of correctly recommended items out of the total recommended items. It focuses on the relevancy of the recommendations and is calculated as the ratio of true positives (correctly recommended items) to the sum of true positives and false positives (incorrectly recommended items).

🔎 **Recall** is a metric that measures the proportion of correctly recommended items out of the total relevant items. It focuses on the coverage of the recommendations and is calculated as the ratio of true positives to the sum of true positives and false negatives (relevant items not recommended).

⚖️ **F1-score** is a metric that combines precision and recall into a single measure. It provides a balanced assessment of the recommendation system's performance by considering both relevancy and coverage. The F1-score is the harmonic mean of precision and recall, giving equal weight to both metrics.


📈 When evaluating recommendation systems, it is essential to consider the specific goals and requirements of the system. Different metrics may be more suitable depending on whether the focus is on precision, recall, or a balance between the two. It is also crucial to consider the nature of the recommendation task, such as top-N recommendations or personalized rankings, as different metrics may be appropriate for different scenarios.

📊 Various programming libraries and frameworks provide implementations of these evaluation metrics for recommendation systems. Python libraries like scikit-learn, TensorFlow, and PyTorch offer functions to calculate precision, recall, F1-score, and other relevant metrics. These libraries enable practitioners to assess the performance of their recommendation systems and make data-driven improvements based on the evaluation results.

📈 Here's an example of calculating precision, recall, and F1-score for a recommendation system using scikit-learn in Python:

```python
from sklearn.metrics import precision_score, recall_score, f1_score

# Assuming true labels and predicted labels are available
true_labels = [1, 0, 1, 1, 0, 0, 1]
predicted_labels = [1, 0, 0, 1, 1, 0, 1]

precision = precision_score(true_labels, predicted_labels)
recall = recall_score(true_labels, predicted_labels)
f1 = f1_score(true_labels, predicted_labels)

print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)
```

In this example, we have a list of true labels and predicted labels for a set of recommendations. We use the precision_score, recall_score, and f1_score functions from scikit-learn to calculate the respective evaluation metrics. The results are then printed to assess the recommendation system's performance.

### **Real-world applications of recommendation systems: E-commerce, social media, and others**

🛒 Recommendation systems play a crucial role in various real-world applications, including e-commerce platforms, social media platforms, and other domains. These systems aim to provide personalized recommendations to users, enhancing their experience and helping them discover relevant products or content.

📦 In the context of e-commerce, recommendation systems can be used to suggest products to customers based on their browsing history, purchase behavior, and preferences. By analyzing user data, these systems can generate personalized recommendations, showcasing items that are likely to be of interest to the user. This can increase customer engagement, improve conversion rates, and enhance customer satisfaction.

👥 Social media platforms also leverage recommendation systems to curate personalized content for their users. These systems analyze user interactions, such as likes, shares, and comments, to understand their interests and preferences. They then recommend relevant posts, articles, videos, or accounts to follow, creating a more tailored user experience. By keeping users engaged with relevant content, social media platforms can increase user retention and ad revenue.

🔍 Recommendation systems are not limited to e-commerce and social media alone. They find applications in various other domains as well. For example, in the entertainment industry, streaming platforms utilize recommendation systems to suggest movies, TV shows, or songs based on users' viewing or listening history. This helps users discover new content and improves user engagement on the platform.

📰 News recommendation systems are employed by news websites and applications to deliver personalized news articles to their readers. These systems consider factors such as the user's reading history, preferred topics, and trending news to recommend articles that align with the user's interests. By providing tailored news recommendations, these systems keep users informed and engaged with relevant content.

📱 Mobile applications also utilize recommendation systems to enhance user experiences. For instance, music streaming apps recommend playlists or songs based on a user's music preferences, creating a personalized soundtrack. Similarly, travel apps suggest destinations, hotels, or activities based on a user's past travel history and preferences, assisting users in planning their trips more effectively.

💻 Implementing recommendation systems involves various techniques such as collaborative filtering, content-based filtering, and hybrid approaches that combine multiple methods. These systems rely on machine learning algorithms to analyze user data, build user profiles, and generate accurate recommendations.

📊 Evaluation of recommendation systems is crucial to assess their performance. Techniques like offline evaluation, online A/B testing, and user feedback analysis are used to measure the effectiveness of recommendations, ensure their quality, and make improvements.

📈 In conclusion, recommendation systems have become integral in e-commerce, social media, and other domains, providing personalized recommendations to users. They enhance user experiences, increase engagement, and contribute to the success of various real-world applications.

### **Activity 2: True or False**



1. Precision measures the proportion of correctly recommended items out of the total recommended items.

2. Recall measures the proportion of correctly recommended items out of the total relevant items.

3. F1-score is the arithmetic mean of precision and recall.

4. Precision, recall, and F1-score are the only evaluation metrics used for recommendation systems.

5. Evaluation of recommendation systems is not essential since the success of the system can be easily observed.



#### **Activity 2 Answers**



1. **True**.

2. **True**.

3. **False**.

4.  **False**.

5. **False**.


# **✅ Summary ✅**



### 📚 **What Did You Learn?** 🤔

In this lesson, we covered the basics of recommender systems, which play a crucial role in helping users discover relevant items in various domains, such as e-commerce and social media.

We discussed different types of recommendation systems, including collaborative filtering, content-based filtering, and hybrid models. Collaborative filtering relies on user-item interactions to make recommendations, while content-based filtering considers the characteristics of items and users' preferences. Hybrid models combine both approaches to leverage their strengths.

We delved into evaluation metrics for recommendation systems, including precision, recall, F1-score, and others. These metrics help assess the effectiveness of recommendation algorithms in terms of accurately predicting user preferences and capturing relevant items.

Real-world applications of recommendation systems were discussed, highlighting their importance in e-commerce and social media platforms. In e-commerce, recommendation systems assist users in finding products they are likely to purchase, enhancing the shopping experience. In social media, recommendation systems suggest relevant content, connections, or groups to users, increasing engagement.

💡 By the end of this lesson, you should be able to understand the different types of recommendation systems, evaluate their performance using appropriate metrics, and be aware of techniques to enhance and personalize recommendations for users in real-world applications.


## 👍 **Best Practices and Tips for Recommender Systems** 👍

✅ Understand recommendation systems: Recommender systems are designed to suggest relevant items to users based on their preferences and behavior. There are three main types of recommendation systems: collaborative filtering, content-based filtering, and hybrid models. Understanding these approaches will help you design effective recommendation systems. 🎯📚

✅ Collaborative filtering: Collaborative filtering analyzes user behavior and preferences to make recommendations. It identifies similarities between users or items and predicts user preferences based on the behavior of similar users or items. Consider using collaborative filtering when you have sufficient user-item interaction data. 🤝👥

✅ Content-based filtering: Content-based filtering recommends items to users based on the characteristics or features of the items and the user's preferences. It analyzes item attributes such as genre, keywords, or descriptions and matches them with the user's profile. Content-based filtering is useful when you have detailed information about items and user preferences. 📄🔎

✅ Hybrid models: Hybrid models combine collaborative filtering and content-based filtering to provide more accurate and diverse recommendations. These models leverage the strengths of both approaches and can be particularly effective when individual methods have limitations. Consider exploring hybrid models to improve the performance of your recommendation system. 🔄🔀

✅ Matrix factorization: Matrix factorization techniques decompose a user-item matrix into latent factors. These factors capture hidden patterns and relationships between users and items, allowing for more accurate predictions. Popular matrix factorization methods include Singular Value Decomposition (SVD) and Alternating Least Squares (ALS). Consider applying matrix factorization to enhance the performance of your recommendation system. 📉🔍

✅ Evaluation metrics for recommendation systems: To assess the performance of your recommendation system, you need appropriate evaluation metrics. Commonly used metrics include precision, recall, F1-score, and Mean Average Precision (MAP). These metrics measure the accuracy, coverage, and effectiveness of your recommendations. Choose the evaluation metrics that align with your system's goals and objectives. 📊🎯

✅ Real-world applications of recommendation systems: Recommendation systems have widespread applications in various domains. E-commerce platforms use them to suggest products to customers, social media platforms recommend posts or connections to users, and streaming services recommend movies or music based on user preferences. Explore these real-world applications to gain insights into how recommendation systems are utilized. 🌍💡

✅ Techniques for improving recommendation systems: Several techniques can enhance the performance of recommendation systems. Context-aware recommendations consider contextual factors such as time, location, or user demographics to provide more personalized suggestions. Active learning techniques actively engage users to provide feedback and improve the recommendations over time. Stay updated with these techniques to continually enhance your recommendation system. 🌟🔧

✅ Stay curious and keep learning: Recommender systems are a dynamic field with ongoing research and advancements. Stay curious, explore new algorithms and techniques, and keep learning to stay at the forefront of recommendation system design. This will enable you to adapt to evolving user preferences and deliver more accurate recommendations. 🌈📚

Remember, practice, experimentation, and continuous learning are the keys to mastering recommendation systems. Best of luck in your journey to create effective recommendation systems! 💪🚀



### 🤔 **Shortcomings to Keep in Mind for Recommender Systems** 🤔

Recommender systems are powerful tools for suggesting items or content to users based on their preferences and behavior. However, there are some shortcomings to keep in mind when working with recommender systems:

📊 Understanding recommendation systems can be complex due to the various techniques involved. There are three main types of recommendation systems: collaborative filtering, content-based filtering, and hybrid models. Each approach has its own strengths and limitations, and it's important to carefully consider which method is most suitable for a particular application.

🔍 Building recommender systems often relies on data availability and quality. Obtaining sufficient and reliable user-item data can be a challenge, especially for new or niche domains. Additionally, dealing with sparse data or the cold-start problem (when there is insufficient data for new users or items) requires careful handling and can affect the performance of the system.

📈 Evaluating the effectiveness of recommendation systems can be challenging. Evaluation metrics such as precision, recall, F1-score, and others are commonly used to measure the performance of recommender systems. However, selecting the appropriate metrics and interpreting the results can be subjective and context-dependent. It's crucial to consider the specific goals and requirements of the application when evaluating the system.

📉 Translating the results of a recommender system into real-world impact can be difficult. While a recommendation algorithm may perform well in terms of accuracy, the actual user experience and satisfaction may depend on various factors beyond the algorithm itself, such as the presentation of recommendations or the overall user interface. It's important to consider user feedback and conduct user studies to assess the practical effectiveness of the system.

💡 Recommender systems can be further improved by incorporating additional techniques. Context-aware recommendations take into account contextual information such as time, location, or user preferences to provide more personalized recommendations. Active learning techniques can be used to actively acquire user feedback and adapt the system over time. Regular monitoring and maintenance are essential to keep the system up to date and ensure its continued performance.

🎓 By keeping these shortcomings in mind, you can approach the development and evaluation of recommender systems more effectively. Understanding the strengths and limitations of different techniques, considering data quality and availability, selecting appropriate evaluation metrics, and incorporating additional techniques can lead to more accurate and useful recommendations for users in various real-world applications.


# **🧠 Enhance Your Knowledge 🧠**

## **➕ Additional Reading ➕**

#### **If you are interested in learning more about Recommender Systems, here are some additional activities and readings you can explore:**

👨‍💻 Online Tutorials: There are many online tutorials and courses that can teach you more about Recommender Systems, including Understanding recommendation systems, Collaborative filtering, content-based filtering, hybrid models, Matrix factorization techniques, Evaluation metrics for recommendation systems, Real-world applications of recommendation systems, and Techniques for improving recommendation systems. You can search for these tutorials on websites like Coursera, edX, or DataCamp.

📖 Books: There are several books that cover Recommender Systems in depth, including "Recommender Systems: An Introduction" by Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich, "Building Recommender Systems with Machine Learning and AI" by Frank Kane, and "Recommender Systems Handbook" edited by Francesco Ricci, Lior Rokach, and Bracha Shapira.

🎓 Practice Problems: To enhance your skills in Recommender Systems, you can find practice problems and datasets online. Websites like Kaggle or DataCamp often provide such resources for hands-on experience.

💡 Additional Tips: To gain a better understanding of Recommender Systems, try experimenting with different types of recommendation algorithms, and practice implementing them on your own datasets. Additionally, stay updated with the latest research papers and industry trends in Recommender Systems, as this field is constantly evolving.

🎓 By exploring these additional activities and readings, you can delve deeper into the world of Recommender Systems and become more proficient in designing and building effective recommendation systems.

### **Matrix factorization: Techniques for decomposing a user-item matrix into latent factors**

🔍 Matrix factorization is a popular technique used in recommender systems and collaborative filtering to decompose a user-item matrix into latent factors. It aims to uncover hidden patterns and relationships between users and items in the matrix, which can be used to make personalized recommendations.

📊 Decomposing a user-item matrix involves breaking it down into two lower-rank matrices: one representing users and their latent preferences, and the other representing items and their latent attributes. By multiplying these two matrices together, the original matrix can be approximated, filling in missing values and providing recommendations based on the learned latent factors.

🔍 There are several techniques available for matrix factorization, including:

**1. Singular Value Decomposition (SVD):** SVD is a widely used technique for matrix factorization. It decomposes the user-item matrix into three matrices: U, Σ, and V^T. U represents the user matrix, Σ is a diagonal matrix containing the singular values, and V^T represents the item matrix. By selecting a lower rank for the Σ matrix, the original matrix can be approximated.

**2. Non-negative Matrix Factorization (NMF):** NMF is a variation of matrix factorization that enforces non-negativity constraints on the resulting matrices. It is particularly useful when dealing with non-negative data, such as ratings or counts. NMF can provide interpretable latent factors and has been successful in various recommendation applications.

**3. Alternating Least Squares (ALS):** ALS is an iterative optimization algorithm commonly used for matrix factorization in collaborative filtering. It alternates between optimizing the user matrix and the item matrix, gradually improving the approximation of the original matrix. ALS is known for its scalability and ability to handle large datasets.

📊 The choice of matrix factorization technique depends on the specific requirements of the application, the characteristics of the user-item matrix, and computational considerations.

🔍 Python libraries such as NumPy, SciPy, and scikit-learn provide functions and implementations for matrix factorization techniques. Additionally, specialized libraries like Surprise and TensorRec offer more advanced recommender system capabilities, including matrix factorization-based algorithms.

📊 Here's an example of using SVD for matrix factorization in Python:

```python
import numpy as np
from scipy.linalg import svd

# Create a sample user-item matrix
user_item_matrix = np.array([[1, 2, 3],
                             [4, 5, 6],
                             [7, 8, 9]])

# Perform singular value decomposition (SVD)
U, Sigma, VT = svd(user_item_matrix)

# Select the desired rank for approximation
rank = 2
U = U[:, :rank]
Sigma = np.diag(Sigma[:rank])
VT = VT[:rank, :]

# Approximate the original matrix
approx_matrix = U @ Sigma @ VT
```

In this example, we use the `svd` function from SciPy to perform SVD on a sample user-item matrix. We then select a rank of 2 to approximate the original matrix using the calculated matrices U, Sigma, and VT. The resulting `approx_matrix` can be used for making recommendations or filling in missing values in the user-item matrix.

### **Implementation: Matrix Factorization for Recommender Systems**

1. Import the necessary libraries: Import the `numpy` library as `np` and import the `svd` function from the `scipy.linalg` module.

2. Create a user-item matrix: Define a sample user-item matrix `user_item_matrix` with three users and three items. Feel free to modify the values if desired.

3. Perform singular value decomposition (SVD): Use the `svd` function to decompose the `user_item_matrix` into three matrices: `U`, `Sigma`, and `VT`.

4. Select the desired rank for approximation: Choose a rank value, which determines the number of latent factors to retain in the approximation. Assign the chosen rank value to the `rank` variable.

5. Modify the decomposition matrices: Keep only the first `rank` columns of `U`, the first `rank` singular values in `Sigma`, and the first `rank` rows of `VT`.

6. Approximate the original matrix: Compute the approximation of the original matrix by multiplying the modified `U`, `Sigma`, and `VT` matrices together. Assign the result to the `approx_matrix` variable.

7. Display the approximation matrix: Print the `approx_matrix` to observe the reconstructed matrix based on the selected rank.

(Optional) 8. Experiment with different rank values: Modify the rank value in step 4 and observe how the approximation improves or deteriorates as the rank changes.

(Optional) 9. Apply matrix factorization to real-world data: Use a larger user-item matrix, such as a movie rating dataset, and apply matrix factorization techniques to discover latent factors that can be used for recommendations.

Note: This activity provides an introduction to matrix factorization for recommender systems. Further exploration and experimentation can be done to delve deeper into the topic and explore more advanced techniques and algorithms.

In [None]:
import numpy as np
from scipy.linalg import svd

# Create a sample user-item matrix
user_item_matrix = np.array([[1, 2, 3],
                             [4, 5, 6],
                             [7, 8, 9]])

# Perform singular value decomposition (SVD)
U, Sigma, VT = svd(user_item_matrix)

# Select the desired rank for approximation
rank = 2
U = U[:, :rank]
Sigma = np.diag(Sigma[:rank])
VT = VT[:rank, :]

# Approximate the original matrix
approx_matrix = U @ Sigma @ VT
approx_matrix

array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

##🤖🌲**Mnemonic**🕵️‍♂️🦉

📖 Once upon a time, there was a data scientist named Alex who specialized in recommender systems. Alex worked for a technology company that provided personalized recommendations to its users. Their goal was to help users discover relevant items and enhance their overall experience.

🔍 Alex began by understanding different types of recommendation systems. They learned about collaborative filtering, which analyzes user behavior and similarities to recommend items based on the preferences of similar users. Alex also explored content-based filtering, which recommends items based on the characteristics and attributes of the items themselves. Additionally, they discovered hybrid models that combine collaborative and content-based approaches to provide more accurate recommendations.

📈 As Alex progressed, they realized the importance of evaluating recommendation systems. They learned about evaluation metrics such as precision, recall, and F1-score, which measure the accuracy and effectiveness of recommendations. Alex discovered that precision focuses on the relevance of recommended items, recall emphasizes the coverage of recommended items, and the F1-score balances both metrics.

📉 Alex explored real-world applications of recommendation systems across various domains. They learned about e-commerce platforms using recommender systems to suggest products based on user preferences and browsing history. Social media platforms utilized recommendation systems to display relevant content to users, such as news articles, friends' posts, or suggested connections.

🌟 Alex's expertise in recommender systems proved invaluable for the technology company. Their accurate recommendations led to increased user engagement and satisfaction. By using evaluation metrics, Alex ensured the recommendations met the company's standards. They also discovered new techniques and applications that improved the overall recommendation system, providing a personalized experience for each user.

👍 Thanks to Alex's understanding of recommendation systems, the technology company flourished and gained a competitive edge in the market. Alex's expertise enabled the company to deliver relevant content and products, enhancing the user experience and driving business growth.

##**📖Additional Reference Paper📖**

1. https://neptune.ai/blog/recommender-systems-metrics
2. https://www.themarketingtechnologist.co/building-a-recommendation-engine-for-geek-setting-up-the-prerequisites-13/
3. https://www.algolia.com/blog/ai/the-anatomy-of-high-performance-recommender-systems-part-1/
4. https://towardsdatascience.com/introduction-to-recommender-systems-6c66cf15ada

# **Try it Yourself**

## **Task 1: Working on assignment**

### **Exploring Movie Ratings Dataset**

Dataset Description:
- **Ratings Dataset**: This dataset contains information about movie ratings. It includes the following columns:
  - `userId`: Unique identifier of the user who rated the movie.
  - `movieId`: Unique identifier of the movie.
  - `rating`: Rating given by the user to the movie.

- **Movies Dataset**: This dataset contains information about movies. It includes the following columns:
  - `movieId`: Unique identifier of the movie.
  - `title`: Title of the movie.

In this activity, you will perform various tasks using these datasets, including loading the data, calculating statistics, finding similar movies, and exploring movie ratings.

1. Import the required libraries: numpy, pandas, sklearn, matplotlib.pyplot, and seaborn.
2. Load the ratings and movies datasets from the given URLs.
3. Print the following information about the dataset:
   - Number of ratings
   - Number of unique movieIds
   - Number of unique users
   - Average ratings per user
   - Average ratings per movie
4. Create a DataFrame called `user_freq` that contains the count of ratings for each user.
5. Find the lowest and highest rated movies:
   - Find the movie with the lowest average rating and display its details.
   - Find the movie with the highest average rating and display its details.
   - Display the number of people who rated the movie with the highest rating.
   - Display the number of people who rated the movie with the lowest rating.
6. Calculate movie statistics by grouping the ratings DataFrame:
   - Group by 'movieId' and calculate the count and mean of ratings for each movie.
   - Store the results in a DataFrame called `movie_stats`.
7. Create a user-item matrix using sparse CSR matrix representation:
   - Implement the `create_matrix` function to convert the ratings DataFrame into a sparse CSR matrix.
   - The function should return the matrix `X`, user and movie mappers, and their inverse mappers.
8. Find similar movies using k-nearest neighbors (KNN):
   - Implement the `find_similar_movies` function to find similar movies to a given movie using KNN.
   - The function should take inputs: movie_id, matrix X, number of neighbors (k), metric, and show_distance.
   - It should return a list of similar movie_ids.
9. Define a dictionary called `movie_titles` that maps movieIds to movie titles.
10. Choose a movie_id (e.g., 3) and find similar movies using the `find_similar_movies` function.
11. Print the following statement: "Since you watched [movie_title]", replacing [movie_title] with the chosen movie's title.
12. Display the titles of the similar movies found.

(Note: This activity assumes that the necessary libraries are already installed in the Google Colab environment.)

In [None]:
# Importing the necessary libraries and dataset
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

ratings = pd.read_csv("https://s3-us-west-2.amazonaws.com/recommender-tutorial/ratings.csv")
ratings.head()

movies = pd.read_csv("https://s3-us-west-2.amazonaws.com/recommender-tutorial/movies.csv")
movies.head()

# Write your code here


### **Solution**

In [None]:
# code
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

ratings = pd.read_csv("https://s3-us-west-2.amazonaws.com/recommender-tutorial/ratings.csv")
ratings.head()

movies = pd.read_csv("https://s3-us-west-2.amazonaws.com/recommender-tutorial/movies.csv")
movies.head()

n_ratings = len(ratings)
n_movies = len(ratings['movieId'].unique())
n_users = len(ratings['userId'].unique())

print(f"Number of ratings: {n_ratings}")
print(f"Number of unique movieId's: {n_movies}")
print(f"Number of unique users: {n_users}")
print(f"Average ratings per user: {round(n_ratings/n_users, 2)}")
print(f"Average ratings per movie: {round(n_ratings/n_movies, 2)}")

user_freq = ratings[['userId', 'movieId']].groupby('userId').count().reset_index()
user_freq.columns = ['userId', 'n_ratings']
user_freq.head()


# Find Lowest and Highest rated movies:
mean_rating = ratings.groupby('movieId')[['rating']].mean()
# Lowest rated movies
lowest_rated = mean_rating['rating'].idxmin()
movies.loc[movies['movieId'] == lowest_rated]
# Highest rated movies
highest_rated = mean_rating['rating'].idxmax()
movies.loc[movies['movieId'] == highest_rated]
# show number of people who rated movies rated movie highest
ratings[ratings['movieId']==highest_rated]
# show number of people who rated movies rated movie lowest
ratings[ratings['movieId']==lowest_rated]

## the above movies has very low dataset. We will use bayesian average
movie_stats = ratings.groupby('movieId')[['rating']].agg(['count', 'mean'])
movie_stats.columns = movie_stats.columns.droplevel()

# Now, we create user-item matrix using scipy csr matrix
from scipy.sparse import csr_matrix

def create_matrix(df):

    N = len(df['userId'].unique())
    M = len(df['movieId'].unique())

    # Map Ids to indices
    user_mapper = dict(zip(np.unique(df["userId"]), list(range(N))))
    movie_mapper = dict(zip(np.unique(df["movieId"]), list(range(M))))

    # Map indices to IDs
    user_inv_mapper = dict(zip(list(range(N)), np.unique(df["userId"])))
    movie_inv_mapper = dict(zip(list(range(M)), np.unique(df["movieId"])))

    user_index = [user_mapper[i] for i in df['userId']]
    movie_index = [movie_mapper[i] for i in df['movieId']]

    X = csr_matrix((df["rating"], (movie_index, user_index)), shape=(M, N))

    return X, user_mapper, movie_mapper, user_inv_mapper, movie_inv_mapper

X, user_mapper, movie_mapper, user_inv_mapper, movie_inv_mapper = create_matrix(ratings)

from sklearn.neighbors import NearestNeighbors
"""
Find similar movies using KNN
"""
def find_similar_movies(movie_id, X, k, metric='cosine', show_distance=False):

    neighbour_ids = []

    movie_ind = movie_mapper[movie_id]
    movie_vec = X[movie_ind]
    k+=1
    kNN = NearestNeighbors(n_neighbors=k, algorithm="brute", metric=metric)
    kNN.fit(X)
    movie_vec = movie_vec.reshape(1,-1)
    neighbour = kNN.kneighbors(movie_vec, return_distance=show_distance)
    for i in range(0,k):
        n = neighbour.item(i)
        neighbour_ids.append(movie_inv_mapper[n])
    neighbour_ids.pop(0)
    return neighbour_ids


movie_titles = dict(zip(movies['movieId'], movies['title']))

movie_id = 3

similar_ids = find_similar_movies(movie_id, X, k=10)
movie_title = movie_titles[movie_id]

print(f"Since you watched {movie_title}")
for i in similar_ids:
    print(movie_titles[i])

Number of ratings: 100836
Number of unique movieId's: 9724
Number of unique users: 610
Average ratings per user: 165.3
Average ratings per movie: 10.37
Since you watched Grumpier Old Men (1995)
Grumpy Old Men (1993)
Striptease (1996)
Nutty Professor, The (1996)
Twister (1996)
Father of the Bride Part II (1995)
Broken Arrow (1996)
Bio-Dome (1996)
Truth About Cats & Dogs, The (1996)
Sabrina (1995)
Birdcage, The (1996)


##**Task 2: Community Engagement**

 **🔴Propose a Mnemonic to easily distinguish between Collaborative filtering and Content-based filtering. Share your insights in your Cohort group at AlmaBetter Community Platform.🔴**