# Check for Null ECO Codes in Parquet File
This notebook will scan a single parquet file to identify and display any games that have a null `ECO` (Encyclopaedia of Chess Openings) code. This is useful for data cleaning and ensuring data quality.

This is a one-time notebook I' using for debugging. We can delete it later.

In [1]:
import duckdb
import pandas as pd
from pathlib import Path

# Define the path to the parquet file
# The notebook is in notebooks/, so we go up one level to the project root
project_root = Path.cwd().parent
file_path = project_root / "data" / "raw" / "train-00000-of-00066.parquet"

print(f"Searching for games with null ECO in: {file_path}")

# Establish a connection to DuckDB
con = duckdb.connect()

# Query to find games with null 'ECO'
# Exclude UTCTime and UTCDate to avoid issues with unsupported data types
query = f"""
SELECT * EXCLUDE (UTCTime, UTCDate)
FROM '{file_path}'
WHERE ECO IS NULL
"""

try:
    # Execute the query and fetch results into a pandas DataFrame
    null_eco_games_df = con.execute(query).df()

    null_eco_game_count = len(null_eco_games_df)

    if null_eco_game_count > 0:
        print(f"Found {null_eco_game_count} games with null ECO codes. Details below:")
        # Print details for each game with a null ECO
        for index, row in null_eco_games_df.iterrows():
            print("\n--- Game Details ---")
            print(row)
            print("--------------------")
    else:
        print("No games with null ECO codes were found in this file.")

    print(f"\nTotal number of games with null ECO found: {null_eco_game_count}")

except Exception as e:
    print(f"An error occurred: {e}")

finally:
    # Close the DuckDB connection
    con.close()

Searching for games with null ECO in: /Users/a/Documents/personalprojects/chess-opening-recommender/data/raw/train-00000-of-00066.parquet
Found 2608 games with null ECO codes. Details below:

--- Game Details ---
Event              Rated Blitz tournament https://lichess.org/tou...
Site                                    https://lichess.org/YEWKPVLZ
White                                                     nowheremen
Black                                                      Jeycat-08
Result                                                           0-1
WhiteTitle                                                      None
BlackTitle                                                      None
WhiteElo                                                        1500
BlackElo                                                        1500
WhiteRatingDiff                                                 -242
BlackRatingDiff                                                  242
ECO                         