# Introduction

In this notebook, I will get current ratings for 50,000 Lichess players and save them to my local player-opening database.

## The Problem

Ope! I did a silly. I forgot to get the ratings of all 50,000 lichess players in my database.

## The Solution

Luckily, this is an easy problem to solve. Here are the steps:

### 1. Database

- Update the database schema in db_utils to take a `rating` integer on the player table
- Note that this database has already been made so we're not actually changing anything immediately by doing this.  But if the schema is ever used to duplicate or remake the DB, we'll want that rating in there.

### 2. Update existing DB
- In step 1, we updated our schema but that'll only add `rating` if the schema is run again to make a new DB
- So, in this notebook, we'll add a simple integer `rating` column to `player`

### 3. Get player ratings

1. Get all 50,000 player usernames from our DB
2. In batches of 300 usernames, call POST lichess.org/api/users with my lichess auth token (env LICHESS_TOKEN), with the usernames in an array
    - See notebook 15's lichess.org api call for structure
3. For each batch, do a bulk update of usernames in the DB.

NOTE ON API LIMITS:
- We can only get max 8,000 users every ten minutes, and max 120,000 users per day
- The 120k limit shouldn't be an issue since we only have 50k players
- But we'll need simple timeouts and backoffs. Two requests per minute (of 300 players each) seems sensible; that's 6,000 players per ten minutes, well within the limit.
- So this will take about 90-120 minutes to run, which is fine.

## Rating specs:

We use blitz, rapid and classical games for our data. So there's some question about which rating to use. We'll employ the following formula and specs:

- Blitz is the most common time control in our dataset, so we'll prefer blitz.
- If rating deviation (RD) is <110, use blitz rating
- If blitz RD is >110 and Rapid RD more than 20 points below Blitz RD, use Rapid rating
- Only use Classical rating if both Blitz and Rapid are more than 110, and Classical is less than 110
- If all else fails, use Blitz rating
- The player will always have at least one of these ratings because we're only using highly active players, but if somehow they don't, use NAN or something else that makes sense
- We'll make a list of every player who gets listed as NAN and print them at the end

## Variables

At the top we'll define helpful variables that might need tweaking, including:
- How many lichess API calls to run before stopping
- How many max players to do total
- backoff times (default to two calls every one minute)

## Timing:

We'll include logging that is fairly detailed about the number of batches done so far, number of players remaining, number of NAN ratings, and ETA for the rest of the job.



## API response example

A single user from the lichess API looks like this:

[
  {
    "id": "thibault",
    "username": "thibault",
    "perfs": {
      "ultraBullet": {
        "games": 3,
        "rating": 1688,
        "rd": 350,
        "prog": 0,
        "prov": true
      },
      "bullet": {
        "games": 7475,
        "rating": 1787,
        "rd": 78,
        "prog": -6
      },
      "blitz": {
        "games": 11536,
        "rating": 1778,
        "rd": 47,
        "prog": 3
      },
      "rapid": {
        "games": 873,
        "rating": 1746,
        "rd": 131,
        "prog": -71,
        "prov": true
      },
      "classical": {
        "games": 24,
        "rating": 1806,
        "rd": 246,
        "prog": -5,
        "prov": true
      },
      "correspondence": {
        "games": 377,
        "rating": 1942,
        "rd": 148,
        "prog": -12,
        "prov": true
      },
      "chess960": {
        "games": 348,
        "rating": 1551,
        "rd": 254,
        "prog": 61,
        "prov": true
      },
      "kingOfTheHill": {
        "games": 94,
        "rating": 1744,
        "rd": 275,
        "prog": 14,
        "prov": true
      },
      "threeCheck": {
        "games": 66,
        "rating": 1728,
        "rd": 246,
        "prog": 132,
        "prov": true
      },
      "antichess": {
        "games": 72,
        "rating": 1512,
        "rd": 272,
        "prog": -20,
        "prov": true
      },
      "atomic": {
        "games": 99,
        "rating": 1633,
        "rd": 288,
        "prog": 18,
        "prov": true
      },
      "horde": {
        "games": 46,
        "rating": 1592,
        "rd": 268,
        "prog": -20,
        "prov": true
      },
      "racingKings": {
        "games": 13,
        "rating": 1552,
        "rd": 320,
        "prog": -75,
        "prov": true
      },
      "crazyhouse": {
        "games": 50,
        "rating": 1567,
        "rd": 286,
        "prog": -34,
        "prov": true
      },
      "puzzle": {
        "games": 5696,
        "rating": 1954,
        "rd": 71,
        "prog": 0
      },
      "storm": {
        "runs": 44,
        "score": 33
      },
      "racer": {
        "runs": 82,
        "score": 51
      },
      "streak": {
        "runs": 49,
        "score": 33
      }
    },
    "flair": "nature.seedling",
    "patron": true,
    "patronColor": 9,
    "verified": true,
    "createdAt": 1290415680000,
    "profile": {
      "bio": "I turn coffee into bugs.",
      "realName": "Thibault Duplessis",
      "links": "github.com/ornicar\r\nmas.to/@thibault"
    },
    "seenAt": 1758635849909,
    "playTime": {
      "total": 6436845,
      "tv": 17974
    }
  },
  {
    "id": "maia1",
    "username": "maia1",
    "perfs": {
      "bullet": {
        "games": 193064,
        "rating": 1512,
        "rd": 45,
        "prog": 17
      },
      "blitz": {
        "games": 357366,
        "rating": 1412,
        "rd": 45,
        "prog": -12
      },
      "rapid": {
        "games": 408246,
        "rating": 1590,
        "rd": 45,
        "prog": -25
      },
      "classical": {
        "games": 118909,
        "rating": 1626,
        "rd": 45,
        "prog": 10
      },
      "correspondence": {
        "games": 265,
        "rating": 1527,
        "rd": 206,
        "prog": -18,
        "prov": true
      }
    },
    "title": "BOT",
    "verified": true,
    "createdAt": 1582579972726,
    "profile": {
      "bio": "Maia is a human-like neural network chess engine. This version was trained by learning from over 10 million Lichess games between 1100s. \r\n\r\nMaia Chess is an ongoing research project aiming to make a more human-friendly, useful, and fun chess AI. For more information go to maiachess.com. You can also play against @maia5 and @maia9. Developed by¬†@ashtonanderson, @sidsen¬†and¬†@reidmcy.",
      "realName": "Maia Chess 1100",
      "links": "https://maiachess.com\r\nhttps://github.com/CSSLab/maia-chess\r\nhttps://twitter.com/maiachess"
    },
    "seenAt": 1758635494337,
    "playTime": {
      "total": 1083092110,
      "tv": 3019408
    }
  },
  {
    "id": "maia5",
    "username": "maia5",
    "perfs": {
      "bullet": {
        "games": 59096,
        "rating": 1569,
        "rd": 45,
        "prog": 17
      },
      "blitz": {
        "games": 161720,
        "rating": 1485,
        "rd": 45,
        "prog": -7
      },
      "rapid": {
        "games": 176792,
        "rating": 1640,
        "rd": 45,
        "prog": -14
      },
      "classical": {
        "games": 43066,
        "rating": 1621,
        "rd": 45,
        "prog": 8
      },
      "correspondence": {
        "games": 18,
        "rating": 1792,
        "rd": 228,
        "prog": 30,
        "prov": true
      }
    },
    "title": "BOT",
    "verified": true,
    "createdAt": 1582580198358,
    "profile": {
      "bio": "Maia is a human-like neural network chess engine. This version was trained by learning from over 10 million Lichess games between 1500s. Maia Chess is an ongoing research project aiming to make a more human-friendly, useful, and fun chess AI. For more information go to maiachess.com. You can also play @maia1 and @maia9. Developed by @ashtonanderson,  @sidsen and @reidmcy.",
      "realName": "Maia Chess 1500",
      "links": "https://maiachess.com\r\ngithub.com/CSSLab/maia-chess\r\nhttps://twitter.com/maiachess"
    },
    "seenAt": 1758635443518,
    "playTime": {
      "total": 508638915,
      "tv": 3849828
    }
  }
]

## Step 2: Add Rating Column to Existing Database

Now we'll add the rating column to the existing player table in our database.

In [6]:
# Add rating column to existing player table
from pathlib import Path
from utils.database.db_utils import get_db_connection

# Path to the main player-opening database
DB_PATH = Path.cwd().parent / "data" / "processed" / "chess_games.db"

con = get_db_connection(str(DB_PATH))
try:
    con.execute("""
        ALTER TABLE player
        ADD COLUMN IF NOT EXISTS rating INTEGER;
    """)
    print("‚úì Added 'rating' column to player table (if not already present)")
    
    # Verify the column was added
    columns = con.execute("DESCRIBE player").df()
    print("\nPlayer table schema:")
    print(columns)
finally:
    con.close()

‚úì Added 'rating' column to player table (if not already present)

Player table schema:
  column_name column_type null   key                   default extra
0          id     INTEGER   NO   PRI  nextval('player_id_seq')  None
1        name     VARCHAR  YES   UNI                      None  None
2       title     VARCHAR  YES  None                      None  None
3      rating     INTEGER  YES  None                      None  None


## Step 3: Configuration and Helper Functions

Define constants, API configuration, and helper functions for rating selection and API calls.

In [None]:
# Configuration and setup
import os
from typing import Optional, Dict, Any, List, Tuple
from dataclasses import dataclass
from dotenv import load_dotenv
import requests
import time
from pathlib import Path
from utils.database.db_utils import get_db_connection

# Load environment variables
load_dotenv()
lichess_api_token = os.getenv("LICHESS_TOKEN")

# --- CONFIGURATION CONSTANTS ---
BATCH_SIZE = 300  # Lichess API limit per request
MAX_BATCHES: Optional[int] = 1  # Set to None for unlimited, or a number to limit batches
MAX_PLAYERS: Optional[int] = None  # Set to None for all players, or a number to limit
REQUESTS_PER_MINUTE = 2  # Rate limiting: 2 requests per minute = 600 players/min
DELAY_BETWEEN_REQUESTS = 60.0 / REQUESTS_PER_MINUTE  # Seconds between API calls
MAX_CONSECUTIVE_API_FAILURES = 3  # Stop after this many consecutive failures

# Rating deviation thresholds for rating selection
RD_THRESHOLD_GOOD = 110  # Rating deviation below this is considered reliable
RD_THRESHOLD_DIFFERENCE = 20  # Difference needed to prefer one rating over another

DB_PATH = Path.cwd().parent / "data" / "processed" / "chess_games.duckdb"

# API configuration
LICHESS_API_URL = "https://lichess.org/api/users"
headers = {"Authorization": f"Bearer {lichess_api_token}"}

print("‚úì Configuration loaded")
print(f"  Batch size: {BATCH_SIZE}")
print(f"  Max batches: {MAX_BATCHES if MAX_BATCHES else 'Unlimited'}")
print(f"  Max players: {MAX_PLAYERS if MAX_PLAYERS else 'All available'}")
print(f"  Requests per minute: {REQUESTS_PER_MINUTE}")
print(f"  Delay between requests: {DELAY_BETWEEN_REQUESTS:.1f}s")

‚úì Configuration loaded
  Batch size: 300
  Max batches: 1
  Max players: All available
  Requests per minute: 2
  Delay between requests: 30.0s


In [8]:
# Helper data structures and functions

@dataclass
class RatingInfo:
    """Represents a rating with its deviation."""
    rating: int
    rd: int
    
    def is_reliable(self) -> bool:
        """Check if this rating is reliable (RD < threshold)."""
        return self.rd < RD_THRESHOLD_GOOD


def extract_rating_info(perfs: Dict[str, Any], time_control: str) -> Optional[RatingInfo]:
    """
    Extract rating and RD for a specific time control from perfs object.
    
    Args:
        perfs: The 'perfs' dictionary from Lichess API response
        time_control: One of 'blitz', 'rapid', or 'classical'
    
    Returns:
        RatingInfo object if the time control exists, None otherwise
    """
    if time_control not in perfs:
        return None
    
    tc_data = perfs[time_control]
    rating = tc_data.get('rating')
    rd = tc_data.get('rd')
    
    if rating is None or rd is None:
        return None
    
    return RatingInfo(rating=int(rating), rd=int(rd))


def select_player_rating(perfs: Dict[str, Any]) -> Optional[int]:
    """
    Select the most appropriate time control rating for a player based on rating deviations.
    
    Rating selection logic:
    1. If blitz RD < 110, use blitz rating (preferred)
    2. If blitz RD >= 110 and rapid RD is 20+ points better, use rapid
    3. If both blitz and rapid RD >= 110, but classical RD < 110, use classical
    4. Otherwise, use blitz rating (fallback)
    5. If no ratings exist at all, return None
    
    Args:
        perfs: The 'perfs' dictionary from Lichess API response
    
    Returns:
        The selected rating as an integer, or None if no ratings available
    """
    # Extract rating info for each time control
    blitz = extract_rating_info(perfs, 'blitz')
    rapid = extract_rating_info(perfs, 'rapid')
    classical = extract_rating_info(perfs, 'classical')
    
    # If no ratings exist at all
    if not any([blitz, rapid, classical]):
        return None
    
    # Rule 1: Prefer blitz if it's reliable
    if blitz and blitz.is_reliable():
        return blitz.rating
    
    # Rule 2: Use rapid if it's significantly more reliable than blitz
    if blitz and rapid:
        if rapid.rd < blitz.rd - RD_THRESHOLD_DIFFERENCE:
            return rapid.rating
    
    # Rule 3: Use classical if both blitz and rapid are unreliable, but classical is reliable
    if classical and classical.is_reliable():
        if (not blitz or not blitz.is_reliable()) and (not rapid or not rapid.is_reliable()):
            return classical.rating
    
    # Rule 4: Fallback to blitz if it exists
    if blitz:
        return blitz.rating
    
    # Rule 5: Final fallback to rapid, then classical
    if rapid:
        return rapid.rating
    if classical:
        return classical.rating
    
    return None


print("‚úì Helper functions defined")
print("  - RatingInfo dataclass for type safety")
print("  - extract_rating_info() for parsing API responses")
print("  - select_player_rating() with rating selection logic")

‚úì Helper functions defined
  - RatingInfo dataclass for type safety
  - extract_rating_info() for parsing API responses
  - select_player_rating() with rating selection logic


## Step 4: Test Rating Selection Logic

Quick test to verify our rating selection logic works correctly.

In [9]:
# Test rating selection logic with sample data

test_cases = [
    {
        "name": "Reliable blitz (should pick blitz)",
        "perfs": {
            "blitz": {"rating": 1800, "rd": 50},
            "rapid": {"rating": 1750, "rd": 100},
        },
        "expected": 1800
    },
    {
        "name": "Unreliable blitz, good rapid (should pick rapid)",
        "perfs": {
            "blitz": {"rating": 1800, "rd": 150},
            "rapid": {"rating": 1750, "rd": 80},
        },
        "expected": 1750
    },
    {
        "name": "All unreliable, good classical (should pick classical)",
        "perfs": {
            "blitz": {"rating": 1800, "rd": 150},
            "rapid": {"rating": 1750, "rd": 140},
            "classical": {"rating": 1820, "rd": 90},
        },
        "expected": 1820
    },
    {
        "name": "All unreliable (should fallback to blitz)",
        "perfs": {
            "blitz": {"rating": 1800, "rd": 150},
            "rapid": {"rating": 1750, "rd": 140},
        },
        "expected": 1800
    },
    {
        "name": "No ratings (should return None)",
        "perfs": {},
        "expected": None
    }
]

print("Testing rating selection logic:\n")
all_passed = True
for test in test_cases:
    result = select_player_rating(test["perfs"])
    passed = result == test["expected"]
    all_passed = all_passed and passed
    status = "‚úì" if passed else "‚úó"
    print(f"{status} {test['name']}")
    print(f"  Expected: {test['expected']}, Got: {result}\n")

if all_passed:
    print("‚úì All tests passed!")
else:
    print("‚úó Some tests failed. Please review the logic.")

Testing rating selection logic:

‚úì Reliable blitz (should pick blitz)
  Expected: 1800, Got: 1800

‚úì Unreliable blitz, good rapid (should pick rapid)
  Expected: 1750, Got: 1750

‚úì All unreliable, good classical (should pick classical)
  Expected: 1820, Got: 1820

‚úì All unreliable (should fallback to blitz)
  Expected: 1800, Got: 1800

‚úì No ratings (should return None)
  Expected: None, Got: None

‚úì All tests passed!


## Step 5: Main Pipeline - Fetch and Update Player Ratings

This is the main pipeline that:
1. Fetches players without ratings from the database
2. Calls Lichess API in batches
3. Selects appropriate ratings using our logic
4. Updates the database in bulk
5. Provides detailed progress tracking and ETA

In [10]:
# Main pipeline for fetching and updating player ratings

import numpy as np
from typing import List, Tuple, Dict

# --- Statistics Tracking ---
@dataclass
class PipelineStats:
    """Track statistics throughout the pipeline execution."""
    total_players_processed: int = 0
    total_ratings_updated: int = 0
    total_nan_ratings: int = 0
    consecutive_api_failures: int = 0
    nan_player_names: List[str] = None
    
    def __post_init__(self):
        if self.nan_player_names is None:
            self.nan_player_names = []


def fetch_players_to_update(con, limit: Optional[int] = None) -> List[str]:
    """
    Fetch player names that need rating updates (where rating is NULL).
    
    Args:
        con: Database connection
        limit: Optional limit on number of players to fetch
    
    Returns:
        List of player names
    """
    query = "SELECT name FROM player WHERE rating IS NULL ORDER BY id"
    if limit:
        query += f" LIMIT {limit}"
    
    return con.execute(query).df()['name'].tolist()


def fetch_ratings_from_api(usernames: List[str]) -> Tuple[Dict[str, int], List[str]]:
    """
    Fetch ratings for a batch of usernames from Lichess API.
    
    Args:
        usernames: List of usernames to fetch
    
    Returns:
        Tuple of (ratings_dict, failed_usernames)
        - ratings_dict: Maps username -> rating (excludes users with no valid rating)
        - failed_usernames: Usernames that couldn't be processed
    """
    try:
        resp = requests.post(
            LICHESS_API_URL,
            data=",".join(usernames),
            headers=headers,
            timeout=30
        )
        
        if resp.status_code != 200:
            print(f"  ‚ö† API Error! Status: {resp.status_code}")
            return {}, usernames
        
        users_data = resp.json()
        ratings_dict: Dict[str, int] = {}
        failed_usernames: List[str] = []
        
        # Create a set of returned usernames for quick lookup
        returned_usernames = {user['username'].lower() for user in users_data if user}
        
        for user in users_data:
            if not user:
                continue
            
            username = user['username']
            perfs = user.get('perfs', {})
            rating = select_player_rating(perfs)
            
            if rating is not None:
                ratings_dict[username] = rating
            else:
                failed_usernames.append(username)
        
        # Add usernames that weren't returned by the API to failed list
        for username in usernames:
            if username.lower() not in returned_usernames:
                failed_usernames.append(username)
        
        return ratings_dict, failed_usernames
        
    except requests.RequestException as e:
        print(f"  ‚ö† Network Error: {e}")
        return {}, usernames


def bulk_update_ratings(con, ratings_dict: Dict[str, int]) -> int:
    """
    Bulk update player ratings in the database.
    
    Args:
        con: Database connection
        ratings_dict: Dictionary mapping username -> rating
    
    Returns:
        Number of rows updated
    """
    if not ratings_dict:
        return 0
    
    # Create a temporary table with the updates
    con.execute("CREATE TEMP TABLE IF NOT EXISTS temp_ratings (name VARCHAR, rating INTEGER)")
    con.execute("DELETE FROM temp_ratings")
    
    # Insert all ratings into temp table
    for username, rating in ratings_dict.items():
        con.execute(
            "INSERT INTO temp_ratings (name, rating) VALUES (?, ?)",
            (username, rating)
        )
    
    # Bulk update using a join
    con.execute("""
        UPDATE player
        SET rating = temp_ratings.rating
        FROM temp_ratings
        WHERE player.name = temp_ratings.name
    """)
    
    rows_updated = len(ratings_dict)
    con.execute("DROP TABLE temp_ratings")
    
    return rows_updated


def format_time(seconds: float) -> str:
    """Format seconds into a readable time string."""
    if seconds < 60:
        return f"{seconds:.0f}s"
    elif seconds < 3600:
        return f"{seconds/60:.1f}m"
    else:
        hours = int(seconds // 3600)
        minutes = int((seconds % 3600) // 60)
        return f"{hours}h {minutes}m"


print("‚úì Pipeline functions defined")
print("  - fetch_players_to_update() to get players needing ratings")
print("  - fetch_ratings_from_api() to call Lichess API")
print("  - bulk_update_ratings() for efficient database updates")
print("  - format_time() for readable time displays")

‚úì Pipeline functions defined
  - fetch_players_to_update() to get players needing ratings
  - fetch_ratings_from_api() to call Lichess API
  - bulk_update_ratings() for efficient database updates
  - format_time() for readable time displays


In [11]:
# Execute the main pipeline

print("=" * 80)
print("STARTING PLAYER RATING UPDATE PIPELINE")
print("=" * 80)

pipeline_start_time = time.time()
stats = PipelineStats()

# Connect to database
con = get_db_connection(str(DB_PATH))

try:
    # Fetch all players that need rating updates
    print("\nüìä Fetching players from database...")
    players_to_update = fetch_players_to_update(con, limit=MAX_PLAYERS)
    total_players = len(players_to_update)
    
    if total_players == 0:
        print("‚úì No players need rating updates. Database is up to date!")
    else:
        print(f"‚úì Found {total_players:,} players needing rating updates")
        
        # Calculate total batches
        total_batches = (total_players + BATCH_SIZE - 1) // BATCH_SIZE
        if MAX_BATCHES:
            total_batches = min(total_batches, MAX_BATCHES)
            print(f"  (Limited to {MAX_BATCHES} batches as configured)")
        
        print(f"  Total batches: {total_batches}")
        print(f"  Estimated time: {format_time(total_batches * DELAY_BETWEEN_REQUESTS)}")
        print("\n" + "-" * 80)
        
        # Process in batches
        batch_num = 0
        for i in range(0, total_players, BATCH_SIZE):
            # Check if we've hit the batch limit
            if MAX_BATCHES and batch_num >= MAX_BATCHES:
                print(f"\n‚ö† Reached maximum batch limit ({MAX_BATCHES}). Stopping.")
                break
            
            # Check for too many consecutive failures
            if stats.consecutive_api_failures >= MAX_CONSECUTIVE_API_FAILURES:
                print(f"\n‚ùå Stopping due to {stats.consecutive_api_failures} consecutive API failures.")
                break
            
            batch_num += 1
            batch_start_time = time.time()
            
            # Get batch of usernames
            batch_usernames = players_to_update[i:i + BATCH_SIZE]
            current_batch_size = len(batch_usernames)
            
            print(f"\nüîÑ Batch {batch_num}/{total_batches} ({current_batch_size} players)")
            
            # Fetch ratings from API
            print(f"  ‚Üí Calling Lichess API...")
            ratings_dict, failed_usernames = fetch_ratings_from_api(batch_usernames)
            
            if not ratings_dict and not failed_usernames:
                # Complete API failure
                stats.consecutive_api_failures += 1
                print(f"  ‚ùå API call failed. Consecutive failures: {stats.consecutive_api_failures}")
                continue
            else:
                # Reset failure counter on successful API call
                stats.consecutive_api_failures = 0
            
            # Update database
            print(f"  ‚Üí Updating database...")
            rows_updated = bulk_update_ratings(con, ratings_dict)
            
            # Update statistics
            batch_nan_count = len(failed_usernames)
            stats.total_players_processed += current_batch_size
            stats.total_ratings_updated += rows_updated
            stats.total_nan_ratings += batch_nan_count
            stats.nan_player_names.extend(failed_usernames)
            
            # Report batch results
            print(f"  ‚úì Updated {rows_updated} ratings")
            if batch_nan_count > 0:
                print(f"  ‚ö† {batch_nan_count} players with no valid rating (NaN)")
            
            # Calculate and display progress statistics
            batch_duration = time.time() - batch_start_time
            total_elapsed = time.time() - pipeline_start_time
            
            # Calculate rates and ETA
            batches_remaining = total_batches - batch_num
            avg_time_per_batch = total_elapsed / batch_num if batch_num > 0 else 0
            eta_seconds = batches_remaining * avg_time_per_batch
            
            players_remaining = total_players - stats.total_players_processed
            
            print(f"\n  üìà Progress: {stats.total_players_processed:,}/{total_players:,} " +
                  f"({stats.total_players_processed/total_players*100:.1f}%)")
            print(f"  ‚è±  Batch time: {batch_duration:.1f}s | " +
                  f"Total elapsed: {format_time(total_elapsed)}")
            print(f"  üéØ ETA: {format_time(eta_seconds)} " +
                  f"(~{players_remaining:,} players remaining)")
            print(f"  üìä Cumulative NaN ratings: {stats.total_nan_ratings} " +
                  f"({stats.total_nan_ratings/stats.total_players_processed*100:.1f}%)")
            
            # Rate limiting delay (skip on last batch)
            if batch_num < total_batches and players_remaining > 0:
                print(f"  ‚è∏  Waiting {DELAY_BETWEEN_REQUESTS:.0f}s (rate limiting)...")
                time.sleep(DELAY_BETWEEN_REQUESTS)
        
        # Final summary
        print("\n" + "=" * 80)
        print("PIPELINE COMPLETED")
        print("=" * 80)
        
        total_duration = time.time() - pipeline_start_time
        print(f"\n‚è±  Total Duration: {format_time(total_duration)}")
        print(f"üìä Players Processed: {stats.total_players_processed:,}")
        print(f"‚úì Ratings Updated: {stats.total_ratings_updated:,} " +
              f"({stats.total_ratings_updated/stats.total_players_processed*100:.1f}%)")
        print(f"‚ö† NaN Ratings: {stats.total_nan_ratings} " +
              f"({stats.total_nan_ratings/stats.total_players_processed*100:.1f}%)")
        
        if stats.nan_player_names:
            print(f"\n‚ö† Players with NaN ratings ({len(stats.nan_player_names)}):")
            # Show first 20, with option to see more
            display_limit = 20
            for idx, name in enumerate(stats.nan_player_names[:display_limit], 1):
                print(f"  {idx}. {name}")
            
            if len(stats.nan_player_names) > display_limit:
                remaining = len(stats.nan_player_names) - display_limit
                print(f"  ... and {remaining} more")
                print(f"\n  Full list stored in: stats.nan_player_names")
        
        # Performance metrics
        if total_duration > 0:
            players_per_second = stats.total_players_processed / total_duration
            print(f"\nüöÄ Performance: {players_per_second:.2f} players/second")
        
finally:
    con.close()
    print("\n‚úì Database connection closed")
    print("=" * 80)

STARTING PLAYER RATING UPDATE PIPELINE

üìä Fetching players from database...

‚úì Database connection closed


CatalogException: Catalog Error: Table with name player does not exist!
Did you mean "pg_type"?

LINE 1: SELECT name FROM player WHERE rating IS NULL ORDER BY id
                         ^

## Step 6: Verification

Verify that ratings were successfully added to the database.

In [None]:
# Verify ratings were added successfully

con = get_db_connection(str(DB_PATH))
try:
    # Get rating statistics
    rating_stats = con.execute("""
        SELECT 
            COUNT(*) as total_players,
            COUNT(rating) as players_with_rating,
            COUNT(*) - COUNT(rating) as players_without_rating,
            MIN(rating) as min_rating,
            MAX(rating) as max_rating,
            AVG(rating) as avg_rating,
            MEDIAN(rating) as median_rating
        FROM player
    """).df()
    
    print("=" * 60)
    print("RATING VERIFICATION REPORT")
    print("=" * 60)
    print(f"\nTotal Players: {rating_stats['total_players'][0]:,}")
    print(f"Players with Rating: {rating_stats['players_with_rating'][0]:,} " +
          f"({rating_stats['players_with_rating'][0]/rating_stats['total_players'][0]*100:.1f}%)")
    print(f"Players without Rating: {rating_stats['players_without_rating'][0]:,} " +
          f"({rating_stats['players_without_rating'][0]/rating_stats['total_players'][0]*100:.1f}%)")
    
    if rating_stats['players_with_rating'][0] > 0:
        print(f"\nüìä Rating Statistics:")
        print(f"  Min Rating: {rating_stats['min_rating'][0]}")
        print(f"  Max Rating: {rating_stats['max_rating'][0]}")
        print(f"  Average Rating: {rating_stats['avg_rating'][0]:.1f}")
        print(f"  Median Rating: {rating_stats['median_rating'][0]:.0f}")
        
        # Show sample players with ratings
        print(f"\nüë• Sample Players with Ratings:")
        sample_players = con.execute("""
            SELECT name, title, rating
            FROM player
            WHERE rating IS NOT NULL
            ORDER BY RANDOM()
            LIMIT 10
        """).df()
        
        for _, row in sample_players.iterrows():
            title = f" ({row['title']})" if row['title'] else ""
            print(f"  ‚Ä¢ {row['name']}{title}: {row['rating']}")
    
    print("\n" + "=" * 60)
    
finally:
    con.close()
    print("‚úì Database connection closed")