# AIML Engineer Take-Home Project


## Exercise Info and Requirements

### Objective
You are tasked with building a recommendation system using a provided public dataset,
focusing on both classical recommendation models and modern LLM-based approaches.


### Project Requirements
Package your application as a service using a suitable framework, e.g. FastAPI or Flask. We
are intentionally not being prescriptive here, but do your best to demonstrate your
understanding of best practices when building your solution.
1. Dataset
  - Use the [Anime Recommendations Database from Kaggle](https://www.kaggle.com/datasets/CooperUnion/anime-recommendations-database). This dataset contains
information on various anime, including user ratings, genres, and other relevant
attributes.
2. Data Endpoint
  - Your service should have an API endpoint to query the dataset
3. Classical Recommendation System
  - Build a recommendation system that utilizes the dataset to suggest top k anime
for a user based on their viewing history and preferences.
  - Ensure that the recommendation logic excludes recently viewed items (e.g.,
anime watched within the last 7 days).
4. Contextual LLM-Based Personalization
  - Implement a feature where users can get personalized anime recommendations
based on a natural language description of their current mood or preferences
(e.g., "I want something uplifting and adventurous").




### Submission
Submit a single .zip file that includes:
- all source code
- A System Design doc

  - Provide a document (e.g. `SYSTEM_DESIGN.md`) that explains your choices and
the architecture.
  - Discuss how you would extend the current system to make it more accurate in
response to more “vague” user input
  - Include recommendations of how to transition this to an LLM deployed in-house
- A Presentation Slide Deck to present your project for 30 mins in during a panel interview
  - Can include information from the System Design doc

## Exercise

In [None]:
# !pip install pandas
# !pip install numpy
# # !pip install seaborn
# !pip install scikit-learn


### First ideas

My first idea is to use `NMF`, but let's first find out the current state of the art (SOTA) for recommender systems.

#### Current state of the art

There is not a single best algorithm for recommender systems. Solutions for the Netflix Prize include[^1][^2]:
- Decomposition models (SVD, NMF, SVD++, etc.)
- RBM
- Decision Tree-based methods (Gradient Boosted Decision Trees, etc.)
- Neural Networks
- SVM

Also, it is common to use a blend of models, like BellKor Solution to the Netflix Grand Prize[^3], which makes sense as many models tend to improve with ensemble methods.

#### Way to go

Considering this and the escope of this project, I will test simple models.


[^1]: Stephen Gower. [Netflix Prize and SVD](http://buzzard.ups.edu/courses/2014spring/420projects/math420-UPS-spring-2014-gower-netflix-SVD.pdf). April 18th 2014 

[^2]: [Netflix Recommendations: Beyond the 5 stars (Part 2)](https://netflixtechblog.com/netflix-recommendations-beyond-the-5-stars-part-2-d9b96aa399f5)

[^3]: [The BellKor Solution to the Netflix Grand Prize](https://www2.seas.gwu.edu/~simhaweb/champalg/cf/papers/KorenBellKor2009.pdf)

Considering this and the escope of this project, I will use the NMF algorithm.

### Feature engineering

Some ideas of features to be used in the model:
- User's movie ratings
- User's movie watching history
- User's movie type history
- User's movie episode size
- Movie's average rating
- Movie's popularity

In [1]:
# import numpy as np
# import pandas as pd
# import matplotlib.pyplot as plt
# from sklearn.metrics import mean_squared_error
# from sklearn.model_selection import train_test_split
# from sklearn.pipeline import Pipeline
# from sklearn.linear_model import LinearRegression

from utils import *

In [3]:
ratings, anime = preprocess_data(RATINGS_PATH, ANIME_PATH)

In [5]:
rated = ratings[ratings.rating != -1]
w1, h1, user_item_matrix = nmf(rated.iloc[0:1000], redo=True)

NMF components (w1 and h1) have been saved to 'data/nmf_components.pkl'


In [10]:
from utils import recommend
