<a href="https://colab.research.google.com/github/Youssef-Adel91/css-pokemon-art/blob/main/Requirements_%26_Deliverables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Requirements & Deliverables

- _Use-case._ Explain the interpretation of the _MovieLens_ dataset, and its value in prediction to the business. Alternatively use whatever dataset you want.
- _Solution Design._ Design a recommendation engine using linear algebraic methods. Document using markdown and LaTeX.
- _Algorithm Engineering._ Implement the engine using Python, numpy, and pandas, Or using C++ with whatever linear algebra framework you prefer.
- _Scientific Evaluation._ Evaluate your engine following scientific principles.
- _Engineering Testing._
  - Add comments to your code.
  - Use readable function and variable names.
  - Respect the single-responsibility principle, i.e a single function solves a single minor problem.
  - Test your code with varying inputs.
  - Document the test cases.

# Template

## Use-case

Use-Case:
Interpretation of MovieLens Dataset
The MovieLens dataset is a well-known dataset in the field of recommendation systems. It contains user ratings of movies, along with additional metadata such as movie titles, genres, and user IDs. Each row in the dataset typically represents a user's rating for a specific movie. The dataset is valuable for prediction in businesses such as movie streaming platforms, where personalized recommendations can improve user satisfaction, engagement, and retention.

## Solution Design

Solution Design:
Recommendation Engine using Linear Algebraic Methods
Data Representation:
Represent user-item interactions as a matrix, where rows represent users and columns represent items (movies).
Similarity Calculation:
 Use linear algebraic methods such as cosine similarity to calculate similarity between users or items.
Recommendation Generation:
 For a given user, identify similar users or items based on similarity scores, and recommend items that they have rated highly but the given user hasn't seen yet.

## Implementation

In [None]:
import numpy as np
import pandas as pd

def load_dataset(file_path):
    """
    Load the MovieLens dataset from a CSV file.
    """
    dataset = pd.read_csv(file_path)
    return dataset

def create_user_item_matrix(dataset):
    """
    Create the user-item interaction matrix from the dataset.
    """
    user_item_matrix = dataset.pivot_table(index='userId', columns='movieId', values='rating', fill_value=0)
    return user_item_matrix

def cosine_similarity(A, B):
    """
    Calculate cosine similarity between two vectors.
    """
    return np.dot(A, B) / (np.linalg.norm(A) * np.linalg.norm(B))

def recommend_movies(user_index, user_item_matrix, num_recommendations=3):
    """
    Recommend movies for a given user based on cosine similarity.
    """
    user_ratings = user_item_matrix.iloc[user_index]
    similarities = []
    for i, other_user_ratings in user_item_matrix.iterrows():
        if i != user_index:
            similarity = cosine_similarity(user_ratings, other_user_ratings)
            similarities.append((i, similarity))

    similarities.sort(key=lambda x: x[1], reverse=True)

    recommendations = []
    for similar_user_index, similarity in similarities[:num_recommendations]:
        for movie_id, rating in user_item_matrix.iloc[similar_user_index].items():
            if user_ratings[movie_id] == 0 and rating != 0:
                recommendations.append((movie_id, rating))

    recommendations.sort(key=lambda x: x[1], reverse=True)
    return recommendations[:num_recommendations]

# Example usage
file_path = "movielens_dataset.csv"
dataset = load_dataset(file_path)
user_item_matrix = create_user_item_matrix(dataset)
user_index = 0
recommended_movies = recommend_movies(user_index, user_item_matrix)
print("Recommended movies for user", user_index, ":")
for movie_id, rating in recommended_movies:
    print("Movie ID:", movie_id, "with rating:", rating)


## Engineering Testing

Engineering Testing
Unit Tests: Test each function individually to ensure they produce the expected output for different inputs.
Integration Tests: Test the entire recommendation engine pipeline to ensure that it produces meaningful recommendations.
Edge Cases: Test the engine with edge cases such as users with few ratings or rare movies to ensure robustness.
Performance Testing: Measure the performance of the engine, especially for large datasets, to ensure it scales efficiently.

In [None]:
# Test case for loading dataset
def test_load_dataset():
    file_path = "movielens_dataset.csv" # put here your file path to make the program work
    dataset = load_dataset(file_path)
    assert isinstance(dataset, pd.DataFrame)

# Test case for creating user-item matrix
def test_create_user_item_matrix():
    dataset = pd.DataFrame({'userId': [1, 1, 2, 2],
                            'movieId': [1, 2, 1, 3],
                            'rating': [4, 5, 3, 4]})
    user_item_matrix = create_user_item_matrix(dataset)
    assert user_item_matrix.shape == (2, 3)

# Test case for recommending movies
def test_recommend_movies():
    user_item_matrix = pd.DataFrame({'movie1': [4, 0],
                                     'movie2': [0, 3],
                                     'movie3': [0, 4]})
    user_index = 0
    recommended_movies = recommend_movies(user_index, user_item_matrix, num_recommendations=1)
    assert len(recommended_movies) == 1


## Scientific Evaluation

Scientific Evaluation:
Evaluation of the recommendation engine can be done using various metrics such as precision, recall, and mean average precision. Additionally, A/B testing can be conducted to measure the impact of the recommendations on user engagement and retention.