# Pandas & NumPy Event Data Analysis

This notebook demonstrates how to analyze game event data using Pandas and NumPy. We'll load sample data, compute reaction time percentiles, plot score histograms, and visualize a heatmap using a 2D histogram.

## 1. Import Required Libraries
We will use pandas, numpy, matplotlib, and seaborn for data analysis and visualization.

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set plot style
sns.set(style="whitegrid")

ModuleNotFoundError: No module named 'matplotlib'

## 2. Load Sample Event Data (CSV/Parquet)
Let's load the sample event data from the assets folder. We'll demonstrate loading both CSV and Parquet formats.

In [None]:
# Load CSV
df_csv = pd.read_csv("./assets/sample_events.csv")
print("CSV Data:")
display(df_csv)

# Load Parquet
df_parquet = pd.read_parquet("./assets/sample_events.parquet")
print("Parquet Data:")
display(df_parquet)

## 3. Compute Reaction Time Percentiles
We will calculate the 25th, 50th (median), and 75th percentiles of the `reaction_time` column.

In [None]:
percentiles = [25, 50, 75]
rt_percentiles = np.percentile(df_csv['reaction_time'], percentiles)
for p, v in zip(percentiles, rt_percentiles):
    print(f"{p}th percentile: {v:.3f} seconds")

## 4. Plot Histogram of Scores
Let's visualize the distribution of scores using a histogram.

In [None]:
plt.figure(figsize=(7, 4))
sns.histplot(df_csv['score'], bins=8, kde=True, color='skyblue')
plt.title('Histogram of Scores')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

## 5. Create and Plot Heatmap from numpy.histogram2d
We'll use `numpy.histogram2d` to create a 2D histogram of reaction time vs. score, then plot it as a heatmap.

In [None]:
# 2D histogram: reaction_time vs. score
x = df_csv['reaction_time']
y = df_csv['score']
heatmap, xedges, yedges = np.histogram2d(x, y, bins=(5, 5))

plt.figure(figsize=(6, 5))
sns.heatmap(heatmap.T, cmap='YlGnBu', annot=True, fmt='.0f',
            xticklabels=np.round(xedges, 2), yticklabels=np.round(yedges, 2))
plt.title('Heatmap: Reaction Time vs. Score')
plt.xlabel('Reaction Time Bin')
plt.ylabel('Score Bin')
plt.show()

## 6. Export Notebook as PDF
To export this notebook as a PDF, use the menu: `File` → `Export Notebook As...` → `PDF`. Ensure you have the required dependencies (like TeX or nbconvert) installed.

## 7. Include Assets Folder with Example Data

The `assets` folder contains `sample_events.csv` and `sample_events.parquet` files used in this notebook. To generate similar data programmatically, use the following code:

In [None]:
import os
os.makedirs("assets", exist_ok=True)

data = {
    "session_id": ["s1", "s1", "s2", "s2", "s3", "s3"],
    "event_type": ["move", "move", "jump", "move", "move", "jump"],
    "timestamp": [
        "2025-11-16T12:00:00Z", "2025-11-16T12:00:01Z", "2025-11-16T12:00:02Z",
        "2025-11-16T12:00:03Z", "2025-11-16T12:00:04Z", "2025-11-16T12:00:05Z"
    ],
    "score": [100, 150, 200, 180, 120, 210],
    "x": [10, 12, 15, 16, 11, 13],
    "y": [20, 22, 25, 26, 21, 23],
    "reaction_time": [0.45, 0.38, 0.52, 0.41, 0.47, 0.36]
}
df = pd.DataFrame(data)
df.to_csv("assets/sample_events.csv", index=False)
df.to_parquet("assets/sample_events.parquet", index=False)
print("Sample data files saved to assets/.")