Students: 

fc66661 Iaroslav Sagan

fc66662 Anna Maksymchuk

fc66663 Mariia Samosudova

Advanced Databases Course Project: Recommendation System

Project Description

Infrastructure and Data Source 

In [1]:
import sys
import os

project_root = os.getcwd()
if project_root not in sys.path:
    sys.path.append(project_root)
    print(f"Added project root to path: {project_root}")

Added project root to path: c:\MariaSamosudova\Projects\UNIVER\ADB\Project\MARS_1.0\project_db


In [7]:
# Index & Utility Managers
from dotenv import load_dotenv
from schema_manager import SchemaManager
from mongoDB_tools import MongoTools
from mysql_index_manager import MySQLIndexManager
from mongo_index_manager import MongoIndexManager
from database_connection_manager import DatabaseConnectionManager
from pathlib import Path

# Loaders and Data Processors
from load_mySQL import LoadMySQL
from load_MongoDB import LoadMongoDB

# Test and Recommendation Logic
from anime_recommendation import AnimeRecommendation
from concurrency_rating_test import ConcurrencyRatingTest
from performance_test import PerformanceTest

# Load environment variables
load_dotenv()

# Define configuration (using os.getenv)
MYSQL_CONFIG = {
    "host": os.getenv("MYSQL_HOST"),
    "user": os.getenv("MYSQL_USER"),
    "password": os.getenv("MYSQL_PASSWORD"),
    "database": os.getenv("MYSQL_DB")
}

MONGO_CONFIG = {
    "uri": os.getenv("MONGO_URI"),
    "db_name": os.getenv("MONGO_DB_NAME"),
    "collection": os.getenv("MONGO_COLLECTION", "animes")
}

DATA_DIR = Path(r"C:\MariaSamosudova\Projects\UNIVER\ADB\Project\MARS_1.0\data")
CURRENT_DATASET = DATA_DIR.joinpath(r"DataSet\anime_hernan4444")

CSV_PATHS = {
    "anime": CURRENT_DATASET.joinpath(r"anime.csv"),
    "animelist": CURRENT_DATASET.joinpath(r"animelist.csv"),
    "profiles": CURRENT_DATASET.joinpath(r"profiles.csv"),
    "ratings": CURRENT_DATASET.joinpath(r"rating_complete.csv")
}

# Initialize Connection Manager
db_manager = DatabaseConnectionManager(MYSQL_CONFIG, MONGO_CONFIG)
mongo_tools = MongoTools(db_manager)

# Initialize specific managers
mysql_idx_mgr = MySQLIndexManager(db_manager)
mongo_idx_mgr = MongoIndexManager(mongo_tools)

print("\nReady to run methods.")

Connected to MySQL
Connected to MongoDB

Ready to run methods.


Chapter 1: Data Selection, Modeling, and Database Creation

TODO:
● Select and analyze the dataset to identify relationships and data hierarchies.
● Design a relational schema for MySQL and a document schema for
MongoDB.
● Decide which data should reside in each system:
○ MySQL: structured, transactional, or reference data.
○ MongoDB: nested, user-driven, or flexible data.
● Clean (if needed) and import data.
● Ensure integration points between the databases (e.g., shared user or item IDs).
● ER diagram and schema definitions (SQL DDL + MongoDB schema
documentation)
● Data loading scripts

The dataset we are working with: https://www.kaggle.com/datasets/hernan4444/anime-recommendation-database-2020

In [8]:
# Data load scripts example for mySQL and MongoDB
schema_manager = SchemaManager(db_manager)
schema_manager.create_schema

load_my_sql = LoadMySQL(db_manager)
load_my_sql.load_paths(CSV_PATHS)
load_my_sql.load_genres()

load_mongo = LoadMongoDB(mongo_tools)
load_mongo.load_synopsis(
    csv_path=CSV_PATHS["anime"],
    collection_name="animes",
    overwrite_existing=True,
    upsert_missing=True
)


Loading genres and anime_genre mapping
Inserted 44 unique genres.
Inserted 50261 anime_genre mappings.
Done loading synopsis. read=0, updated=0, upserts=0, skipped=17,562
Docs with synopsis: 16206 / 16216


Chapter 2: Query Design and Implementation

TODO:
● Design queries and operations that power a basic recommendation system
(must include simple and complex queries).
● MySQL should handle structured analytics (e.g., top-rated items, user history).
● MongoDB should handle contextual or preference-based queries (e.g., user
interests, item attributes).
● Implement at least one combined or federated operation where data from both
systems contribute to a recommendation (e.g., fetching user data from MySQL
and preference data from MongoDB).
● SQL and MongoDB queries with documentation
● Examples of query outputs
● Explanation of how each query supports recommendations

In [9]:
# --- SCENARIO 1: [Federation] Mongo Text Search: 'bounty hunter space'
# [1] Top 'bounty hunter space' Anime in united_states
# This scenario implements a federated recommendation query that combines unstructured text search with structured demographic filtering. It works in three steps:
#     Content Discovery (MongoDB): It performs a Full-Text Search in MongoDB to find anime IDs where the synopsis matches the user's keywords (e.g., "demon sword"). This allows users to find content based on narrative themes rather than just metadata.
#     Demographic Filtering (MySQL): It passes those IDs to MySQL and calculates the average rating for each anime, but only counts votes from users in a specific country (e.g., 'Portugal').
#     Enrichment: It merges the statistical results back with the full synopsis text from MongoDB before displaying them.
# How it supports the user: This supports the user by answering the specific question: "What are the highest-rated anime about [Topic X] according to people in [My Country]?" It creates a localized, culturally relevant recommendation list based on specific plot interests.

AnimeRecommendation.print_recommend_top_by_synopsis_and_country(
    connection=db_manager.get_mysql_connection(),
    mongo_tools=mongo_tools,
    keywords="bounty hunter space",
    country="united_states",
    limit=5
)

# --- SCENARIO 2: Pure SQL (Index Benchmark) ---
# Fetch raw reviews for 'Fullmetal Alchemist: Brotherhood' (ID 5114)
# This scenario implements a high-volume data retrieval query designed primarily for benchmarking database performance rather than direct user recommendation. It works in two steps:
#     Direct Lookup: It queries the MySQL anime_user_rating table for a specific Anime ID (mal_id).
#     Bulk Retrieval: It fetches a massive dataset (up to 10,000 rows) of raw userID and user_rating pairs, filtering out invalid or empty ratings.
# How it supports the user: While not a direct recommendation generator itself, this query supports the recommendation system's backend integrity and performance.
AnimeRecommendation.print_get_raw_anime_reviews(
    connection=db_manager.get_mysql_connection(),
    mongo_tools=mongo_tools,
    mal_id=5114,
    limit=5
)

# --- SCENARIO 3: Pure SQL (Global Sorting) ---
# Simple top ranked anime
# This scenario implements a global leaderboard query that ranks anime based on their overall critical reception and popularity. It works in three steps:
#     Data Integration (MySQL): It joins the static anime metadata table with the dynamic anime_statistics table to link titles with their performance metrics.
#     Global Sorting: It sorts the entire dataset by score (primary factor) and members count (secondary factor for tie-breaking), efficiently identifying the absolute top-tier content.
#     Enrichment: It fetches and attaches the synopsis from MongoDB for the final top 30 list to provide context.
# How it supports the user: This supports the user by answering the foundational question: "What are the greatest anime of all time?" It provides a definitive "Hall of Fame" or "Top Rated" view, which is essential for new users looking for universally acclaimed content without needing to specify genres or keywords.
AnimeRecommendation.print_recommend_global_top30(
    connection=db_manager.get_mysql_connection(),
    mongo_tools=mongo_tools,
    limit=5
)

# --- SCENARIO 4: Federated (Text -> Geo Spatial) ---
# [4] Nearest Neighbors for User 603 interested in 'bounty hunter space'
# This scenario implements a context-aware geospatial recommendation query that blends content relevance with physical proximity trends. It works in three steps:
#     Content Filtering (MongoDB): It identifies anime that match the user's specific textual interests (e.g., "demon travel") using MongoDB's Full-Text Search.
#     Geospatial Cohort Analysis (MySQL): It locates "neighbor" users who are physically close to the target user (within a defined latitude/longitude delta) to form a local peer group.
#     Social Validation: It filters the content-relevant anime to show only those that have been positively rated by this local peer group, ranking them by their average "neighbor score."
# How it supports the user: This supports the user by answering the hyper-specific question: "What are people near me watching that matches my interest in [Topic X]?" 
# It assumes that cultural or regional trends (e.g., what's popular at a local convention or city) might influence a user's taste, 
# providing a recommendation that feels both personally relevant and socially connected.
AnimeRecommendation.print_find_nearest_neighbors_contextual(
    connection=db_manager.get_mysql_connection(),
    mongo_tools=mongo_tools,
    user_id=603,
    keywords="bounty hunter space",
    limit=5
)

# --- SCENARIO 5: Personalized Hybrid (Demographics) ---
# Find 'love robot' anime popular among peers of User #1 (same country/gender)
# This scenario implements a personalized hybrid recommendation query that combines current user interests with demographic collaborative filtering. It works in three steps:
#     Content Relevance (MongoDB): It uses MongoDB to find anime that match the user's immediate search intent (e.g., "love robot").
#     Profile Resolution (MySQL): It dynamically looks up the target user's specific demographic profile (Country and Gender).
#     Demographic Peer Filtering (MySQL): It filters the content-relevant anime to rank them based only on the ratings of "peer users"—people who share the same country and gender as the target user.
# How it supports the user: This supports the user by answering the identity-driven question: "What do people like me (same gender and background) think about [Topic X]?" 
# It provides a highly personalized experience that validates choices through the lens of a specific demographic peer group, which can be more trustworthy than a generic global average.
AnimeRecommendation.print_recommend_personal_demographic_hybrid(
    connection=db_manager.get_mysql_connection(),
    mongo_tools=mongo_tools,
    user_id=603,
    keywords="robot",
    limit=5
)



--- [1] Top 'bounty hunter space' Anime in united_states ---
[Federation] Mongo Text Search: 'bounty hunter space'...
Found 20 results. Showing top 5:
   MAL_ID                              name  local_rating  votes                                           synopsis
0   11061            Hunter x Hunter (2011)        9.1094  34701  Hunter x Hunter is set in a world where Hunter...
1   31758  Kizumonogatari III: Reiketsu-hen        8.7994  12078  fter helping revive the legendary vampire Kiss...
2       1                      Cowboy Bebop        8.7076  35824  In the year 2071, humanity has colonized sever...
3   29727                          Paradise        8.6667     23  "A highly energetic story told from outer spac...
4   31757   Kizumonogatari II: Nekketsu-hen        8.5623  12492  No longer truly human, Koyomi Araragi decides ...
------------------------------------------------------------

--- [2] Raw Reviews for: 'Fullmetal Alchemist: Brotherhood' (ID 5114) ---
Found 10000 resu

Chapter 3: Concurrency Testing

TODO:
● Develop scripts or programs that simulate multiple concurrent users
accessing and updating both databases.
● Measure:
○ Response times
○ Conflicts or deadlocks
○ Consistency of cross-database operations
● Document your testing setup (tools, concurrency level, sample code)
● Concurrency test scripts
● Performance metrics
● Summary of concurrency behavior and findings

In [10]:
concurrency_test = ConcurrencyRatingTest(db_manager)
concurrency_test.run_benchmark(num_users=5, mode="unsafe")
concurrency_test.run_benchmark(num_users=5, mode="safe")

Test start: UNSAFE simulating 5 users rates
Searching for an existing Anime with MEMBERS=0...
Test setup: found not rated anime: 'Cowboy Bebop' (ID: 1)
Test setup: Preparing 5 Temp Users for Anime ID 1...
Initial anime state:
  - Score:       0.00 (Reset for test)
  - Members:     0
  - Real Rows:   0
--------------------------------------------------------------------------------
[Unsafe] User 8916787 READ: M=0, S=0.00. Submitting RATING 4. Writes 1
[Unsafe] User 8916789 READ: M=0, S=0.00. Submitting RATING 8. Writes 1
[Unsafe] User 8916788 READ: M=0, S=0.00. Submitting RATING 7. Writes 1
[Unsafe] User 8916790 READ: M=0, S=0.00. Submitting RATING 4. Writes 1
[Unsafe] User 8916791 READ: M=0, S=0.00. Submitting RATING 5. Writes 1
--------------------------------------------------------------------------------
Final anime state:
  - Score:       5.00
  - Members:     1 (Final Count)
  - Real Rows:   5 (Actual Inserts)

Summary:
Avg Response Time: 1.0879 sec
Data loss: 4 votes count lost.

[{'duration': 0.9727467000047909, 'error': None},
 {'duration': 3.814005799998995, 'error': None},
 {'duration': 4.01076990000729, 'error': None},
 {'duration': 6.76862409999012, 'error': None},
 {'duration': 9.700516400000197, 'error': None}]

Chapter 4: Performance Optimization and Integration Tuning

TODO:
● Apply indexing, query optimization, and schema tuning in both MySQL and
MongoDB.
● Improve data access patterns between the two systems (e.g., batching
reads/writes, minimizing redundant joins).
● Re-test query performance after optimization.
● Reflect on the benefits and trade-offs of the hybrid approach
● Table with Before-and-after time performance comparison
● Explanation of optimization techniques used
● Final integrated system diagram

In [None]:
# This will execute all 5 scenarios (With Index vs. Without Index) and print the report
performance_test = PerformanceTest(db_manager, mysql_idx_mgr, mongo_idx_mgr, mongo_tools)
performance_test.run_targeted_benchmark()

Performance Benchmark (5 Optimized Scenarios)

Scenario: 1. Federated: Text Search + Country
MySQL creating index idx_users_country_age on users...
Index idx_users_country_age already exists.
With Index (idx_users_country_age)...
[Federation] Mongo Text Search: 'bounty hunter space'...
[Federation] Mongo Text Search: 'bounty hunter space'...
Time: 2.6489s
MySQL dropping index idx_users_country_age on users...
Dropped.
No Index...
[Federation] Mongo Text Search: 'bounty hunter space'...
Time: 2.7431s
MySQL creating index idx_users_country_age on users...
Done in 2.19s
----------------------------------------
Scenario: 2. Pure SQL: High Volume Read
MySQL creating index idx_mal_rating on anime_user_rating...
Index idx_mal_rating already exists.
With Index (idx_mal_rating)...
Time: 0.2563s
MySQL dropping index idx_mal_rating on anime_user_rating...
Dropped.
No Index...
Time: 13.7159s
MySQL creating index idx_mal_rating on anime_user_rating...
