#### Collaborative Filtering Algorithm
Collaborative filtering is a technique used in recommendation systems to predict a user's preferences based on the preferences of similar users. It operates on the principle that if two users agree on one issue, they are likely to agree on others as well. There are two main types of collaborative filtering:

- User-based Collaborative Filtering: Recommends items to a user based on the preferences of other users who are similar to them.
- Item-based Collaborative Filtering: Recommends items that are similar to items the user has liked in the past.

#### Generate Random Data
We can simulate a user-item rating matrix that represents users’ preferences for various items. Here’s how to generate random data suitable for collaborative filtering:

In [1]:
import numpy as np
import pandas as pd

# Set random seed for reproducibility
np.random.seed(42)

# Generate random user-item rating data
num_users = 10
num_items = 5

# Random ratings between 1 and 5 (0 indicates no rating)
ratings = np.random.randint(1, 6, size=(num_users, num_items))

# Set some values to 0 to simulate users not rating every item
mask = np.random.choice([1, 0], size=ratings.shape, p=[0.7, 0.3])
ratings *= mask

# Create a DataFrame for better visualization
ratings_df = pd.DataFrame(ratings, columns=[f'Item {i+1}' for i in range(num_items)],
                          index=[f'User {i+1}' for i in range(num_users)])

print(ratings_df)


         Item 1  Item 2  Item 3  Item 4  Item 5
User 1        4       5       3       0       5
User 2        2       3       0       3       5
User 3        4       3       0       2       4
User 4        0       0       5       1       4
User 5        0       0       0       1       1
User 6        0       3       2       4       4
User 7        3       4       4       1       3
User 8        5       0       0       1       0
User 9        4       0       0       2       2
User 10       1       2       5       2       4


The data presented in the user-item rating matrix indicates user preferences for various items, typically used in recommendation systems. Each row corresponds to a user, while each column represents an item. The values in the matrix indicate the rating or score that a user has given to an item. Here's a breakdown of what this data means:

#### Matrix Explanation
- Matrix Structure:
    - Rows: Represent individual users (e.g., User 1, User 2, etc.).
    - Columns: Represent different items (e.g., Item 1, Item 2, etc.).
    - Values: Numeric ratings assigned by users to items. In this case:
        - A value of 0 indicates that the user did not rate the item (i.e., no interaction).
        - Positive values (1-5) indicate the user's rating of the item, where a higher number typically signifies a greater preference or satisfaction with the item.

Interpretation of the Data
- User Ratings:
    - User 1 rated Item 2 with a 4, Item 4 with a 5, and Item 5 with a 3. They did not rate Items 1 and 3 (indicated by 0).
    - User 2 gave a rating of 2 to Item 3, 1 to Item 4, and 4 to Item 5, with no ratings for Items 1 and 2.
    - User 3 rated Item 1 as 3, Item 3 as 5, while not interacting with Items 2, 4, and 5.

#### Implement Collaborative Filtering from Scratch
Here's a simple implementation of user-based collaborative filtering using the cosine similarity to find similar users and recommend items:

In [2]:
class CollaborativeFiltering:
    def __init__(self, ratings):
        self.ratings = ratings
    
    def _cosine_similarity(self, user_a, user_b):
        # Calculate the cosine similarity between two users
        dot_product = np.dot(user_a, user_b)
        norm_a = np.linalg.norm(user_a)
        norm_b = np.linalg.norm(user_b)
        return dot_product / (norm_a * norm_b) if norm_a and norm_b else 0

    def get_similar_users(self, target_user_index, top_n=3):
        # Get the ratings of the target user
        target_user_ratings = self.ratings[target_user_index]

        similarities = []
        for i in range(self.ratings.shape[0]):
            if i != target_user_index:
                similarity = self._cosine_similarity(target_user_ratings, self.ratings[i])
                similarities.append((i, similarity))

        # Sort by similarity and get the top_n similar users
        similar_users = sorted(similarities, key=lambda x: x[1], reverse=True)[:top_n]
        return similar_users
    
    def recommend(self, target_user_index):
        similar_users = self.get_similar_users(target_user_index)
        
        # Collect item recommendations based on similar users
        recommendations = {}
        for user_index, similarity in similar_users:
            for item_index in range(self.ratings.shape[1]):
                if self.ratings[user_index][item_index] > 0:  # Only consider rated items
                    if item_index not in recommendations:
                        recommendations[item_index] = 0
                    recommendations[item_index] += self.ratings[user_index][item_index] * similarity

        # Sort recommendations by score
        recommended_items = sorted(recommendations.items(), key=lambda x: x[1], reverse=True)
        return recommended_items

# Example Usage
ratings_matrix = np.array(ratings_df)  # Use the ratings matrix from earlier
cf = CollaborativeFiltering(ratings_matrix)

# Get recommendations for User 1 (index 0)
recommendations = cf.recommend(0)
print("Recommendations for User 1:")
for item_index, score in recommendations:
    print(f"Item {item_index + 1} with score {score:.2f}")


Recommendations for User 1:
Item 5 with score 10.42
Item 2 with score 8.87
Item 1 with score 7.99
Item 4 with score 5.14
Item 3 with score 3.82


#### When to Use Collaborative Filtering
When to Use:
- Sufficient User Data: When there is a large amount of user interaction data available (ratings, clicks, etc.).
- User Similarity: When users have similar preferences and can provide meaningful recommendations based on peer choices.
- New Items: Collaborative filtering can recommend new items based on user behavior without requiring detailed item descriptions.

When Not to Use:
- Cold Start Problem: When there are new users or items with insufficient data, leading to unreliable recommendations.
- Sparse Data: When the user-item matrix is highly sparse, making it challenging to find meaningful similarities.
- Scalability: For large datasets, collaborative filtering can become computationally expensive and may require optimization techniques or algorithms.

In summary, collaborative filtering is a powerful technique for generating recommendations based on user interactions but may face challenges when data is sparse or when new users/items are introduced.

#### Key Concepts

- Cold Start Problem:

    - New Users: If a new user does not have any ratings or interactions, it becomes challenging to make personalized recommendations. Strategies to mitigate this include using demographic information or asking users to rate a few items at the beginning.
    - New Items: New items without user ratings cannot be recommended until sufficient interaction data is available.

- Sparsity:

    - In many real-world scenarios, the user-item interaction matrix is sparse, meaning most users do not interact with most items. This sparsity can hinder the effectiveness of collaborative filtering algorithms.

- Scalability:

    - As the number of users and items increases, the computational complexity of calculating similarities and generating recommendations also increases. Efficient algorithms and data structures (like sparse matrices) are necessary for scaling.

- Diversity and Serendipity:

    - It's essential to ensure that recommendations are not only accurate but also diverse and serendipitous to enhance user experience. This means including items that the user may not have considered but are still relevant.

- Evaluation Metrics:

    - Common evaluation metrics for recommendation systems include Precision, Recall, F1 Score, and Mean Absolute Error (MAE). Evaluating the system's performance is crucial to ensuring it meets user needs.