# Chess Game Analysis with Spark

## Introduction
This notebook explores a dataset of chess games played on Lichess in September 2020. The dataset provides information on player ratings, game types, and in-depth annotations for moves, including blunders, mistakes, and inaccuracies.

The goals of this project are:
1. To analyze patterns in chess games, such as errors and win probabilities.
2. To evaluate the impact of player rating and game type on gameplay metrics.
3. To predict game outcomes using machine learning techniques.

The analysis will be conducted using PySpark, allowing us to handle large-scale data efficiently. Key questions addressed include:
- The rate of blunders, mistakes, and inaccuracies per move for different player categories.
- The influence of chess openings on win probabilities.
- The predictability of game outcomes based on gameplay features.

Let's begin by loading and understanding the dataset.


In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, when

# Initialize Spark session
spark = SparkSession.builder \
    .appName("ChessGameAnalysis") \
    .getOrCreate()

# Load the dataset
file_path = "Sept_20_analysis.csv"
df = spark.read.csv(file_path, header=True, inferSchema=True)

# Show schema and first few rows
df.printSchema()
df.show(5, truncate=False)

# Check for null values
null_counts = df.select([count(when(col(c).isNull(), c)).alias(c) for c in df.columns])
null_counts.show()
