
# Anime Recommendation System  
### Content-Based Recommendation using Genres & Numeric Features

This notebook implements a **content-based anime recommendation system**.
It computes cosine similarity **on demand** instead of storing a full similarity matrix,
making it fast and memory efficient.



## Project Pipeline

1. Load raw anime dataset  
2. Preprocess and clean data  
3. Feature engineering (genres + numeric features)  
4. Generate recommendations  
5. Evaluate recommendation quality  



## Dataset Description

The dataset contains:
- name – Anime title  
- genre – Comma-separated genres  
- rating – Average rating  
- members – Number of users  
- episodes – Episode count  


In [1]:

# Step 1: Load dataset
from src.load_data import load_raw_dataset

df_raw = load_raw_dataset()
df_raw.head() if df_raw is not None else None


[INFO] Loading dataset from: data/raw/anime.csv
[INFO] Dataset loaded successfully.
[INFO] Dataset shape: (12294, 7)
[INFO] Columns: ['anime_id', 'name', 'genre', 'type', 'episodes', 'rating', 'members']


Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266



## Data Preprocessing

Cleaning steps:
- Remove duplicates
- Handle missing values
- Convert numeric columns


In [2]:

# Step 2: Preprocess data
from src.preprocess import preprocess_dataset, save_cleaned_dataset

if df_raw is not None:
    df_clean = preprocess_dataset(df_raw)
    save_cleaned_dataset(df_clean)
    df_clean.head()


[2025-12-30 18:53:44] [INFO] Starting preprocessing...
[2025-12-30 18:53:44] [INFO] Removed 0 duplicate rows.
[2025-12-30 18:53:44] [INFO] Dropped rows without title/genre.
[2025-12-30 18:53:44] [INFO] Converted rating to numeric (invalid values → NaN).
[2025-12-30 18:53:44] [INFO] Filled missing values in 'rating' using median.
[2025-12-30 18:53:44] [INFO] Converted members to numeric (invalid values → NaN).
[2025-12-30 18:53:44] [INFO] Filled missing values in 'members' using median.
[2025-12-30 18:53:44] [INFO] Converted episodes to numeric (invalid values → NaN).
[2025-12-30 18:53:44] [INFO] Filled missing values in 'episodes' using median.
[2025-12-30 18:53:44] [INFO] Preprocessing completed successfully.
[2025-12-30 18:53:44] [INFO] Created folder: data/processed
[2025-12-30 18:53:44] [INFO] Cleaned dataset saved to: data/processed/cleaned_anime.csv



## Feature Engineering

Features include:
- Custom TF-IDF-like genre encoding
- Scaled numeric features
- Normalized feature vectors for cosine similarity


In [3]:

# Step 3: Feature engineering
from src.feature_engineering import extract_features, save_feature_artifacts

config, feature_matrix = extract_features(df_clean)
save_feature_artifacts(config, feature_matrix)

feature_matrix.shape


[2025-12-30 18:53:44] [INFO] Starting Feature Engineering...
[2025-12-30 18:53:44] [INFO] Building genre vocabulary...
[2025-12-30 18:53:44] [INFO] Genre vocabulary size: 43
[2025-12-30 18:53:44] [INFO] Encoding genre vectors...
[2025-12-30 18:53:44] [INFO] Scaling numeric features...
[2025-12-30 18:53:44] [INFO] Final feature matrix shape: (12232, 46)
[TIMER] Feature Engineering completed in 0.06 seconds.
[INFO] Created folder: models
[2025-12-30 18:53:44] [INFO] Feature artifacts saved.


(12232, 46)


## Recommendation Example

Recommendations are generated by computing cosine similarity on the fly.


In [4]:

# Step 4: Get recommendations
from src.recommend import recommend_anime

recommend_anime("Naruto")


[2025-12-30 18:53:44] [INFO] Computing recommendations for 'Naruto'...
[2025-12-30 18:53:44] [INFO] Loaded cleaned dataset.
[TIMER] Recommendation Completed completed in 0.04 seconds.


Unnamed: 0,Recommended Anime,Genre,Rating,Similarity Score
615,Naruto: Shippuuden,"Action, Comedy, Martial Arts, Shounen, Super P...",7.94,0.94231
582,Bleach,"Action, Comedy, Shounen, Super Power, Supernat...",7.95,0.898968
206,Dragon Ball Z,"Action, Adventure, Comedy, Fantasy, Martial Ar...",8.32,0.890968
6,Hunter x Hunter (2011),"Action, Adventure, Shounen, Super Power",9.13,0.877171
86,Shingeki no Kyojin,"Action, Drama, Fantasy, Shounen, Super Power",8.54,0.868898
346,Dragon Ball,"Adventure, Comedy, Fantasy, Martial Arts, Shou...",8.16,0.856454
175,Katekyo Hitman Reborn!,"Action, Comedy, Shounen, Super Power",8.37,0.837497
281,Kill la Kill,"Action, Comedy, School, Super Power",8.23,0.836572
288,Fairy Tail,"Action, Adventure, Comedy, Fantasy, Magic, Sho...",8.22,0.822927
304,D.Gray-man,"Action, Adventure, Comedy, Shounen",8.2,0.811815



## Evaluation

Evaluation metrics:
- Precision@K
- Recall@K
- F1-score@K


In [5]:

# Step 5: Evaluate system
from src.evaluation import evaluate_system
from src.recommend import load_cleaned_data

df_eval = load_cleaned_data()
evaluate_system(df_eval)


[2025-12-30 18:53:44] [INFO] Loaded cleaned dataset.
[2025-12-30 18:53:44] [INFO] Starting system evaluation...
[2025-12-30 18:53:44] [INFO] Evaluating anime: X OVA
[2025-12-30 18:53:44] [INFO] Computing recommendations for 'X OVA'...
[2025-12-30 18:53:44] [INFO] Loaded cleaned dataset.
[TIMER] Recommendation Completed completed in 0.02 seconds.
[2025-12-30 18:53:44] [INFO] Evaluating anime: Kyuukyoku no Sex Adventure Kamasutra
[2025-12-30 18:53:44] [INFO] Computing recommendations for 'Kyuukyoku no Sex Adventure Kamasutra'...
[2025-12-30 18:53:44] [INFO] Loaded cleaned dataset.
[TIMER] Recommendation Completed completed in 0.02 seconds.
[2025-12-30 18:53:44] [INFO] Evaluating anime: Son Gokuu no Koutsuu Rule Shugyou Chuu
[2025-12-30 18:53:44] [INFO] Computing recommendations for 'Son Gokuu no Koutsuu Rule Shugyou Chuu'...
[2025-12-30 18:53:44] [INFO] Loaded cleaned dataset.
[TIMER] Recommendation Completed completed in 0.02 seconds.
[2025-12-30 18:53:44] [INFO] Evaluating anime: Islan

{'precision': 1.0, 'recall': 1.0, 'f1_score': 1.0}


## Conclusion

This notebook demonstrates a practical content-based recommendation system
that is efficient, modular, and suitable for real-world datasets.
