# Database Operations

This notebook demonstrates common operations that are regularly required for a better debugging process when working with the db for the project. Each section includes detailed explanations to help you understand what's happening.

## Setting Up the Environment

First, let's check our current working directory and navigate to the project root. This ensures we can import modules correctly.


In [None]:
print("Current working directory:")
%pwd

# Navigate to the parent directory (project root)
%cd ..

print("\nNew working directory (project root):")
%pwd

### Db Libraries

- `pymongo`: The official MongoDB driver for Python
- `ConnectionFailure`: For handling connection errors
- `load_dotenv`: To load environment variables from a `.env` file

In [9]:
from pymongo import MongoClient
from pymongo.errors import ConnectionFailure
from pymongo.collection import Collection
import os
from dotenv import load_dotenv
import json
from pathlib import Path

Try to set a `.env` file on the root of the project, for sensitive information, like your Mongo and My Anime List credentials

In [7]:
# Load environment variables from the .env file
print("Loading environment variables...")
load_dotenv()

Loading environment variables...


True

### Establishing a Database Connection

In [4]:
mongo_uri = os.getenv("MONGO_URI", "mongodb://localhost:27017/") # Default to localhost if not set
print(f"Using MongoDB URI: {mongo_uri}")

Using MongoDB URI: mongodb://tcmpa:mongo@localhost:27018/admin


Now we check if the database connection is valid and we can move on

In [10]:
try:
    client = MongoClient(mongo_uri, serverSelectionTimeoutMS=5000)  # 5 second timeout
    backup_db = Path('database/anime.seasonals.json')

    # Check if connection is successful by running a simple command
    client.admin.command("ping")
    print("✅ Connection successful! MongoDB server is running.")

    # Access database and collection
    db = client.anime
    collection = db.seasonals # default collection name

    # Check if the database exists by listing all collections
    print("\n--- Database Information ---")
    if "anime" in client.list_database_names():
        print(f"✅ Database 'anime' exists")

        # List collections in the database
        collections = db.list_collection_names()
        print(f"Collections in 'anime' database: {collections}")

        # Check if our target collection exists
        if "seasonals" in collections:
            print(f"✅ Collection 'seasonals' exists")

            # Count documents in the collection
            doc_count = collection.count_documents({})
            print(f"Collection contains {doc_count} documents")

            # Show a sample document if any exist
            if doc_count > 0:
                print("\nSample document:")
                print(collection.find_one())
        else:
            print("❌ Collection 'seasonals' does not exist")
            # Check if backup file exists
            if backup_db.exists():
                print(f"✅ Backup file '{backup_db}' exists")
                # Load the backup file
                with open(backup_db, 'r') as f:
                    backup_data = json.load(f)
                    print(f"Loaded {len(backup_data)} records from backup")
                if isinstance(backup_data, list) and len(backup_data) > 0:
                    result = collection.insert_many(backup_data)
                    print(f"✅ Successfully imported {len(result.inserted_ids)} documents into 'seasonals' collection")
                else:
                    print("⚠️ Backup file doesn't contain valid data (expected a non-empty array)")
            else:
                print(f"❌ Backup file not found: {backup_db}")
                print("📝 The collection will be created but remain empty")
    else:
        print("❌ Database 'anime' does not exist")
        # Database will be created automatically when we insert data
        # Check if backup file exists and create collection from it
        if os.path.exists(backup_db):
            print(
                f"\n📥 Creating 'anime' database and 'seasonals' collection from backup file: {backup_db}"
            )
            with open(backup_db, "r") as f:
                backup_data = json.load(f)

            if isinstance(backup_data, list) and len(backup_data) > 0:
                result = collection.insert_many(backup_data)
                print(f"✅ Successfully imported {len(result.inserted_ids)} documents")
                print(
                    "✅ Database 'anime' and collection 'seasonals' have been created"
                )
            else:
                print(
                    "⚠️ Backup file doesn't contain valid data (expected a non-empty array)"
                )
        else:
            print(f"❌ Backup file not found: {backup_db}")
            print("📝 The database and collection will be created but remain empty")

except ConnectionFailure as e:
    print(f"❌ Connection failed: {e}")
    client = None
    collection = None
except Exception as e:
    print(f"❌ An error occurred: {e}")


✅ Connection successful! MongoDB server is running.

--- Database Information ---
✅ Database 'anime' exists
Collections in 'anime' database: ['seasonal_entries', 'winter_2025', 'producers', 'new_entries', 'karma_ranks', 'posts_of_the_week', 'mal', 'hourly_data', 'seasonals', 'ranime_table', 'karma_watch']
✅ Collection 'seasonals' exists
Collection contains 179 documents

Sample document:
{'_id': ObjectId('67e861e0d4ef7ca5850ca0a2'), 'id': 21, 'broadcast': {'day_of_the_week': 'sunday', 'start_time': '23:15'}, 'genres': [{'id': 1, 'name': 'Action'}, {'id': 2, 'name': 'Adventure'}, {'id': 10, 'name': 'Fantasy'}, {'id': 27, 'name': 'Shounen'}], 'images': {'large': 'https://cdn.myanimelist.net/images/anime/1244/138851l.jpg', 'medium': 'https://cdn.myanimelist.net/images/anime/1244/138851.jpg'}, 'media_type': 'tv', 'members': 2505263, 'num_episodes': 0, 'reddit_karma': None, 'score': None, 'season': 'fall', 'source': 'manga', 'start_date': '1999-10-20', 'status': 'currently_airing', 'streams

## Get the Season Schedule helper class

In [11]:
from util.seasonal_schedule import SeasonScheduler

# Either 'episodes' for Friday to Thursday schedule for the ranks
# or 'post' for the schedule that considers the week up to when the rank is posted
schedule_type = 'episodes'

schedule = SeasonScheduler(schedule_type=schedule_type)

2025-03-31 11:09 | INFO | seasonal_schedule.py:_get_schedule_details:73 | Getting schedule details for episodes at 2025-03-31 14:09:52.128132+00:00 on src/season_references/2025/episodes.csv
2025-03-31 11:09 | DEBUG | seasonal_schedule.py:calculate_derived_fields:66 | Derived fields calculated for episodes: {
  "schedule_type": "episodes",
  "post_time": "2025-03-31T14:09:52.128132Z",
  "base_path": "src/season_references",
  "year": 2025,
  "month": 3,
  "schedule_csv": "src/season_references/2025/episodes.csv",
  "schedule_detals": {
    "week_id": 1,
    "start_date": "2025-03-28T00:00:00Z",
    "end_date": "2025-04-03T23:59:59.999999Z",
    "season": 2
  },
  "season_name": "spring",
  "season_number": 2,
  "week_id": 1,
  "airing_period": {
    "airing_period": "Airing Period: March, 28 - April, 03",
    "season": "spring",
    "week_id": 1
  }
}


### Print all the parameters

In [12]:
from pprint import pprint

pprint(schedule.model_dump(), indent=2)

{ 'airing_period': { 'airing_period': 'Airing Period: March, 28 - April, 03',
                     'season': 'spring',
                     'week_id': 1},
  'base_path': 'src/season_references',
  'month': 3,
  'post_time': datetime.datetime(2025, 3, 31, 14, 9, 52, 128132, tzinfo=datetime.timezone.utc),
  'schedule_csv': PosixPath('src/season_references/2025/episodes.csv'),
  'schedule_detals': { 'end_date': Timestamp('2025-04-03 23:59:59.999999+0000', tz='UTC'),
                       'season': 2,
                       'start_date': Timestamp('2025-03-28 00:00:00+0000', tz='UTC'),
                       'week_id': 1},
  'schedule_type': 'episodes',
  'season_name': 'spring',
  'season_number': 2,
  'week_id': 1,
  'year': 2025}


### Set the main parameters to variables

In [13]:
season = schedule.season_name
year = schedule.year
week_id = schedule.week_id

### Set the document structured to query the db

The formart is *reddit_karma.current_year.current_season*

In [18]:
reddit_karma = f"reddit_karma.{year}.{season}"
reddit_karma

'reddit_karma.2025.spring'

### Look for a show based on their MY Anime List ID

In [26]:
# Example show will be Blue Miburo
mal_id = 60154
show = collection.find_one({"id": mal_id}, {"_id": 0, "id": 1, "title": 1,"title_english": 1, reddit_karma: 1})


pprint(show, indent=2)

{ 'id': 60154,
  'reddit_karma': { '2025': { 'spring': [ { 'comments': 166,
                                            'episode': '3',
                                            'karma': 232,
                                            'reddit_id': '1jmkcy6',
                                            'upvote_ratio': 0.92,
                                            'url': 'https://www.reddit.com/r/anime/comments/1jmkcy6/ore_wa_seikan_kokka_no_akutoku_ryoushu_im_the/',
                                            'week_id': 1}]}},
  'title': 'Ore wa Seikan Kokka no Akutoku Ryoushu!',
  'title_english': "I'm the Evil Lord of an Intergalactic Empire!"}


In [23]:
# Push week entry

week_entry = {
    "week_id": 1,
    "episode": "24",
    "karma": 44,
    "comments": 17,
    "upvote_ratio": 0.82,
    "post_id": "1jmj1qj",
    "url": "https://www.reddit.com/r/anime/comments/1jmj1qj/ao_no_miburo_blue_miburo_episode_24_discussion/",
}

collection.update_one(
    {"id": mal_id},
    {
        "$push": {
            reddit_karma: week_entry,
            
        }
    },
)

UpdateResult({'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)