Students: 

fc66661 Iaroslav Sagan

fc66662 Anna Maksymchuk

fc66663 Mariia Samosudova

Advanced Databases Course Project: Recommendation System

Project Description

Infrastructure and Data Source 

In [1]:
import sys
import os

project_root = os.getcwd()
if project_root not in sys.path:
    sys.path.append(project_root)
    print(f"Added project root to path: {project_root}")

Added project root to path: c:\MariaSamosudova\Projects\UNIVER\ADB\Project\MARS_1.0\project_db


In [3]:
# Index & Utility Managers
from dotenv import load_dotenv
from schema_manager import SchemaManager
from mongoDB_tools import MongoTools
from mysql_index_manager import MySQLIndexManager
from mongo_index_manager import MongoIndexManager
from database_connection_manager import DatabaseConnectionManager

# Loaders and Data Processors
from load_mySQL import LoadMySQL
from load_MongoDB import LoadMongoDB

# Test and Recommendation Logic
from anime_recommendation import AnimeRecommendation
from concurrency_rating_test import ConcurrencyRatingTest
from performance_test import PerformanceTest

# Load environment variables
load_dotenv()

# Define configuration (using os.getenv)
MYSQL_CONFIG = {
    "host": os.getenv("MYSQL_HOST"),
    "user": os.getenv("MYSQL_USER"),
    "password": os.getenv("MYSQL_PASSWORD"),
    "database": os.getenv("MYSQL_DB")
}

MONGO_CONFIG = {
    "uri": os.getenv("MONGO_URI"),
    "db_name": os.getenv("MONGO_DB_NAME"),
    "collection": os.getenv("MONGO_COLLECTION", "animes")
}

CSV_PATHS = {
    "anime": r"C:\MariaSamosudova\Projects\UNIVER\ADB\Project\MARS_1.0\data\DataSet\anime_hernan4444\anime.csv",
    "animelist": r"C:\MariaSamosudova\Projects\UNIVER\ADB\Project\MARS_1.0\data\DataSet\anime_hernan4444\animelist.csv",
    "profiles": r"C:\MariaSamosudova\Projects\UNIVER\ADB\Project\MARS_1.0\data\DataSet\anime_hernan4444\profiles.csv",
    "ratings": r"C:\MariaSamosudova\Projects\UNIVER\ADB\Project\MARS_1.0\data\DataSet\anime_hernan4444\rating_complete.csv"
}

# Initialize Connection Manager
db_manager = DatabaseConnectionManager(MYSQL_CONFIG, MONGO_CONFIG)
mongo_tools = MongoTools(db_manager)

# Initialize specific managers
mysql_idx_mgr = MySQLIndexManager(db_manager)
mongo_idx_mgr = MongoIndexManager(mongo_tools)

print("\nReady to run methods.")

Connected to MySQL
Connected to MongoDB

Ready to run methods.


Chapter 1: Data Selection, Modeling, and Database Creation

TODO:
● Select and analyze the dataset to identify relationships and data hierarchies.
● Design a relational schema for MySQL and a document schema for
MongoDB.
● Decide which data should reside in each system:
○ MySQL: structured, transactional, or reference data.
○ MongoDB: nested, user-driven, or flexible data.
● Clean (if needed) and import data.
● Ensure integration points between the databases (e.g., shared user or item IDs).
● ER diagram and schema definitions (SQL DDL + MongoDB schema
documentation)
● Data loading scripts

The dataset we are working with: https://www.kaggle.com/datasets/hernan4444/anime-recommendation-database-2020

In [22]:
# Data load scripts example for mySQL and MongoDB
schema_manager = SchemaManager(db_manager)
schema_manager.create_schema

load_my_sql = LoadMySQL(db_manager)
load_my_sql.load_paths(CSV_PATHS)
load_my_sql.load_genres()

load_mongo = LoadMongoDB(mongo_tools)
load_mongo.load_synopsis(
    csv_path=CSV_PATHS["anime"],
    collection_name="animes",
    overwrite_existing=True,
    upsert_missing=True
)


Loading genres and anime_genre mapping
Inserted 44 unique genres.
Inserted 50261 anime_genre mappings.
Done loading synopsis. read=0, updated=0, upserts=0, skipped=17,562
Docs with synopsis: 16206 / 16216


Chapter 2: Query Design and Implementation

TODO:
● Design queries and operations that power a basic recommendation system
(must include simple and complex queries).
● MySQL should handle structured analytics (e.g., top-rated items, user history).
● MongoDB should handle contextual or preference-based queries (e.g., user
interests, item attributes).
● Implement at least one combined or federated operation where data from both
systems contribute to a recommendation (e.g., fetching user data from MySQL
and preference data from MongoDB).
● SQL and MongoDB queries with documentation
● Examples of query outputs
● Explanation of how each query supports recommendations

In [9]:
# --- SCENARIO 1: [Federation] Mongo Text Search: 'bounty hunter space'
# [1] Top 'bounty hunter space' Anime in united_states

AnimeRecommendation.print_recommend_top_by_synopsis_and_country(
    connection=db_manager.get_mysql_connection(),
    mongo_tools=mongo_tools,
    keywords="bounty hunter space",
    country="united_states",
    limit=5
)

# --- SCENARIO 2: Pure SQL (Index Benchmark) ---
# Fetch raw reviews for 'Fullmetal Alchemist: Brotherhood' (ID 5114)
AnimeRecommendation.print_get_raw_anime_reviews(
    connection=db_manager.get_mysql_connection(),
    mongo_tools=mongo_tools,
    mal_id=5114,
    limit=5
)

# --- SCENARIO 3: Pure SQL (Global Sorting) ---
# Simple top ranked anime
AnimeRecommendation.print_recommend_global_top30(
    connection=db_manager.get_mysql_connection(),
    mongo_tools=mongo_tools,
    limit=5
)

# --- SCENARIO 4: Federated (Text -> Geo Spatial) ---
# [4] Nearest Neighbors for User 603 interested in 'bounty hunter space'
AnimeRecommendation.print_find_nearest_neighbors_contextual(
    connection=db_manager.get_mysql_connection(),
    mongo_tools=mongo_tools,
    user_id=603,
    keywords="bounty hunter space",
    limit=5
)

# --- SCENARIO 5: Personalized Hybrid (Demographics) ---
# Find 'love robot' anime popular among peers of User #1 (same country/gender)
AnimeRecommendation.print_recommend_personal_demographic_hybrid(
    connection=db_manager.get_mysql_connection(),
    mongo_tools=mongo_tools,
    user_id=603,
    keywords="robot",
    limit=5
)



[1] Top 'bounty hunter space' Anime in united_states
[Federation] Mongo Text Search: 'bounty hunter space'...
Found 20 results. Showing top 5:
   MAL_ID                              name  local_rating  votes                                           synopsis
0   11061            Hunter x Hunter (2011)        9.1094  34701  Hunter x Hunter is set in a world where Hunter...
1   31758  Kizumonogatari III: Reiketsu-hen        8.7994  12078  fter helping revive the legendary vampire Kiss...
2       1                      Cowboy Bebop        8.7076  35824  In the year 2071, humanity has colonized sever...
3   29727                          Paradise        8.6667     23  "A highly energetic story told from outer spac...
4   31757   Kizumonogatari II: Nekketsu-hen        8.5623  12492  No longer truly human, Koyomi Araragi decides ...
------------------------------------------------------------

[2] Raw Reviews for Anime ID 5114
Found 10000 results. Showing top 5:
   userID  user_rating
0    

Chapter 3: Concurrency Testing

TODO:
● Develop scripts or programs that simulate multiple concurrent users
accessing and updating both databases.
● Measure:
○ Response times
○ Conflicts or deadlocks
○ Consistency of cross-database operations
● Document your testing setup (tools, concurrency level, sample code)
● Concurrency test scripts
● Performance metrics
● Summary of concurrency behavior and findings

In [4]:
concurrency_test = ConcurrencyRatingTest(db_manager)
concurrency_test.run_benchmark(num_users=5, mode="unsafe")
concurrency_test.run_benchmark(num_users=5, mode="safe")

Test start: UNSAFE simulating 5 users rates
Searching for an existing Anime with MEMBERS=0...
Test setup: found not rated anime: 'Cowboy Bebop' (ID: 1)
Test setup: Preparing 5 Temp Users for Anime ID 1...
Initial anime state:
  - Score:       0.00 (Reset for test)
  - Members:     0
  - Real Rows:   0
--------------------------------------------------------------------------------
[Unsafe] User 8908468 READ: M=0, S=0.00. Submitting RATING 7. Writes 1
[Unsafe] User 8908467 READ: M=0, S=0.00. Submitting RATING 3. Writes 1
[Unsafe] User 8908471 READ: M=0, S=0.00. Submitting RATING 9. Writes 1
[Unsafe] User 8908470 READ: M=0, S=0.00. Submitting RATING 2. Writes 1
[Unsafe] User 8908469 READ: M=0, S=0.00. Submitting RATING 1. Writes 1
--------------------------------------------------------------------------------
Final anime state:
  - Score:       2.00
  - Members:     1 (Final Count)
  - Real Rows:   5 (Actual Inserts)

Summary:
Avg Response Time: 1.1122 sec
Data loss: 4 votes count lost.

[{'duration': 0.9539822000078857, 'error': None},
 {'duration': 4.0214005000307225, 'error': None},
 {'duration': 6.997524600010365, 'error': None},
 {'duration': 9.96811209997395, 'error': None},
 {'duration': 13.080298700020649, 'error': None}]

Chapter 4: Performance Optimization and Integration Tuning

TODO:
● Apply indexing, query optimization, and schema tuning in both MySQL and
MongoDB.
● Improve data access patterns between the two systems (e.g., batching
reads/writes, minimizing redundant joins).
● Re-test query performance after optimization.
● Reflect on the benefits and trade-offs of the hybrid approach
● Table with Before-and-after time performance comparison
● Explanation of optimization techniques used
● Final integrated system diagram

In [5]:
# This will execute all 5 scenarios (With Index vs. Without Index) and print the report
performance_test = PerformanceTest(db_manager, mysql_idx_mgr, mongo_idx_mgr, mongo_tools)
performance_test.run_targeted_benchmark()

Performance benchmark (5 Optimized Scenarios)

--- Scenario: 1. [Federated] Top 'bounty hunter space' in united_states ---
MySQL creating index idx_users_country_age on users...
Index idx_users_country_age already exists.
-- Running WITH index idx_users_country_age...
[Federation] Mongo Text Search: 'bounty hunter space'...
[Federation] Mongo Text Search: 'bounty hunter space'...
-- Time: 2.7262s
MySQL dropping index idx_users_country_age on users...
Dropped.
-- Running WITHOUT index...
[Federation] Mongo Text Search: 'bounty hunter space'...
-- Time: 2.7985s
MySQL creating index idx_users_country_age on users...
Done in 2.20s
------------------------------------------------------------
--- Scenario: 2. [SQL Pure] 10k Reviews (ID 1 - Cowboy Bebop) ---
MySQL creating index idx_mal_rating on anime_user_rating...
Index idx_mal_rating already exists.
-- Running WITH index idx_mal_rating...
-- Time: 0.2457s
MySQL dropping index idx_mal_rating on anime_user_rating...
Dropped.
-- Running WITH

Unnamed: 0,Test Case,Index Toggled,Time (No Index),Time (With Index),Speedup
0,1. [Federated] Top 'bounty hunter space' in un...,idx_users_country_age,2.7985s,2.7262s,1.0x
1,2. [SQL Pure] 10k Reviews (ID 1 - Cowboy Bebop),idx_mal_rating,14.7952s,0.2457s,60.2x
2,3. [SQL Pure] Global Top 30,idx_score_members,0.3244s,0.2022s,1.6x
3,4. [Federated] Neighbors for User 603 ('bounty...,idx_users_lat_lon,0.5836s,0.5372s,1.1x
4,5. [Federated] Personalized for User 603 (Cont...,idx_users_country_gender,1.3303s,1.2256s,1.1x
