In [None]:
```json
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pixel Paladin RL: Data Exploration"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Description:** 'Train an AI agent using Reinforcement Learning to master a custom-built Pygame environment, learning complex strategies beyond simple scripted behavior. It's dangerous to go alone; train this!'.\n",
    "\n",
    "**Features:** Custom Pygame 2D environment (e.g., maze navigation, simple platformer, or top-down shooter)., Implementation of a core Reinforcement Learning algorithm (like Q-Learning or basic DQN)., Agent learns optimal policies through trial-and-error interaction with the game., Visualization of the agent's learning progress (e.g., rewards per episode)., The agent exhibits emergent behaviors not explicitly programmed..\n",
    "\n",
    "**Objective:** Explore simulated training data (logs) from RL training sessions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Setup and Data Loading"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import json\n",
    "import os\n",
    "from scipy import stats\n",
    "\n",
    "# Set plot style\n",
    "sns.set_theme(style=\"whitegrid\")\n",
    "plt.rcParams['figure.figsize'] = (12, 6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate Simulated Training Data\n",
    "\n",
    "Since we don't have actual logs yet, we'll simulate data representing a typical RL training run (e.g., DQN). This data will include episode number, reward, steps taken, loss, and exploration rate (epsilon)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_simulated_data(num_episodes=1000, random_seed=42):\n",
    "    np.random.seed(random_seed)\n",
    "    \n",
    "    episodes = np.arange(1, num_episodes + 1)\n",
    "    \n",
    "    # Simulate Reward: Starts low, increases, then plateaus with noise\n",
    "    base_reward = -500 + 700 * (1 - np.exp(-episodes / 200))\n",
    "    noise_reward = np.random.normal(0, 50, num_episodes)\n",
    "    # Add some occasional dips/spikes\n",
    "    for _ in range(num_episodes // 50):\n",
    "        idx = np.random.randint(0, num_episodes)\n",
    "        noise_reward[idx] *= np.random.uniform(1.5, 3.0) * np.random.choice([-1, 1])\n",
    "    reward = base_reward + noise_reward\n",
    "    \n",
    "    # Simulate Steps: Starts high, decreases as agent learns, might slightly increase later\n",
    "    base_steps = 1000 * np.exp(-episodes / 300) + 50\n",
    "    noise_steps = np.random.normal(0, 20, num_episodes)\n",
    "    steps = np.maximum(10, base_steps + noise_steps).astype(int) # Ensure steps are positive\n",
    "    \n",
    "    # Simulate Loss (e.g., DQN loss): Starts high, decreases with noise\n",
    "    base_loss = 10 * np.exp(-episodes / 150) + 0.1\n",
    "    noise_loss = np.random.lognormal(0, 0.5, num_episodes) * 0.1 # Log-normal noise often seen in loss\n",
    "    loss = np.maximum(0.01, base_loss + noise_loss)\n",
    "    \n",
    "    # Simulate Epsilon (Exponential Decay)\n",
    "    epsilon_start = 1.0\n",
    "    epsilon_end = 0.05\n",
    "    epsilon_decay = 0.995\n",
    "    epsilon = []\n",
    "    current_epsilon = epsilon_start\n",
    "    for _ in range(num_episodes):\n",
    "        epsilon.append(current_epsilon)\n",
    "        current_epsilon = max(epsilon_end, current_epsilon * epsilon_decay)\n",
    "    epsilon = np.array(epsilon)\n",
    "    \n",
    "    df = pd.DataFrame({\n",
    "        'episode': episodes,\n",
    "        'reward': reward,\n",
    "        'steps': steps,\n",
    "        'loss': loss,\n",
    "        'epsilon': epsilon\n",
    "    })\n",
    "    \n",
    "    # Simulate a second run with slightly different parameters (e.g., learning rate)\n",
    "    np.random.seed(random_seed + 1) # Change seed for variation\n",
    "    base_reward_2 = -600 + 750 * (1 - np.exp(-episodes / 250)) # Slower initial learning, higher plateau\n",
    "    noise_reward_2 = np.random.normal(0, 60, num_episodes)\n",
    "    reward_2 = base_reward_2 + noise_reward_2\n",
    "    \n",
    "    base_steps_2 = 1100 * np.exp(-episodes / 350) + 60\n",
    "    noise_steps_2 = np.random.normal(0, 25, num_episodes)\n",
    "    steps_2 = np.maximum(15, base_steps_2 + noise_steps_2).astype(int)\n",
    "    \n",
    "    base_loss_2 = 12 * np.exp(-episodes / 180) + 0.15\n",
    "    noise_loss_2 = np.random.lognormal(0, 0.55, num_episodes) * 0.12\n",
    "    loss_2 = np.maximum(0.015, base_loss_2 + noise_loss_2)\n",
    "    \n",
    "    df_run2 = pd.DataFrame({\n",
    "        'episode': episodes,\n",
    "        'reward': reward_2,\n",
    "        'steps': steps_2,\n",
    "        'loss': loss_2,\n",
    "        'epsilon': epsilon # Assuming same epsilon schedule for comparison\n",
    "    })\n",
    "    \n",
    "    # Save to CSV\n",
    "    if not os.path.exists('data'):\n",
    "        os.makedirs('data')\n",
    "    df.to_csv('data/training_log_run1.csv', index=False)\n",
    "    df_run2.to_csv('data/training_log_run2.csv', index=False)\n",
    "    \n",
    "    print(\"Simulated data generated and saved to 'data/training_log_run1.csv' and 'data/training_log_run2.csv'\")\n",
    "    return df, df_run2\n",
    "\n",
    "df_run1, df_run2 = generate_simulated_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    df_run1 = pd.read_csv('data/training_log_run1.csv')\n",
    "    df_run2 = pd.read_csv('data/training_log_run2.csv')\n",
    "    print(\"Training logs loaded successfully.\")\n",
    "except FileNotFoundError:\n",
    "    print(\"Error: Training log files not found. Please ensure they are in the 'data' directory.\")\n",
    "    # In a real scenario, you might stop execution or handle this differently\n",
    "    # For this example, we'll proceed with the data generated in the previous step if loading fails\n",
    "    if 'df_run1' not in locals():\n",
    "        print(\"Generating fresh simulated data as files were not found.\")\n",
    "        df_run1, df_run2 = generate_simulated_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Inspect Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- Run 1: First 5 Rows ---\")\n",
    "print(df_run1.head())\n",
    "\n",
    "print(\"\\n--- Run 1: Data Info ---\")\n",
    "df_run1.info()\n",
    "\n",
    "print(\"\\n--- Run 1: Descriptive Statistics ---\")\n",
    "print(df_run1.describe())\n",
    "\n",
    "print(\"\\n--- Run 1: Missing Values ---\")\n",
    "print(df_run1.isnull().sum())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- Run 2: First 5 Rows ---\")\n",
    "print(df_run2.head())\n",
    "\n",
    "print(\"\\n--- Run 2: Data Info ---\")\n",
    "df_run2.info()\n",
    "\n",
    "print(\"\\n--- Run 2: Descriptive Statistics ---\")\n",
    "print(df_run2.describe())\n",
    "\n",
    "print(\"\\n--- Run 2: Missing Values ---\")\n",
    "print(df_run2.isnull().sum())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Exploratory Data Analysis (EDA)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Learning Curves (Reward per Episode)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(14, 7))\n",
    "sns.lineplot(data=df_run1, x='episode', y='reward', label='Run 1 Reward', alpha=0.7)\n",
    "sns.lineplot(data=df_run2, x='episode', y='reward', label='Run 2 Reward', alpha=0.7)\n",
    "\n",
    "# Add rolling average for smoother trend\n",
    "rolling_window = 50\n",
    "df_run1['reward_rolling'] = df_run1['reward'].rolling(window=rolling_window).mean()\n",
    "df_run2['reward_rolling'] = df_run2['reward'].rolling(window=rolling_window).mean()\n",
    "\n",
    "sns.lineplot(data=df_run1, x='episode', y='reward_rolling', label=f'Run 1 Reward (Rolling Avg {rolling_window})', linewidth=2)\n",
    "sns.lineplot(data=df_run2, x='episode', y='reward_rolling', label=f'Run 2 Reward (Rolling Avg {rolling_window})', linewidth=2)\n",
    "\n",
    "plt.title('Reward per Episode Over Training')\n",
    "plt.xlabel('Episode')\n",
    "plt.ylabel('Total Reward')\n",
    "plt.legend()\n",
    "plt.grid(True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Steps per Episode"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(14, 7))\n",
    "sns.lineplot(data=df_run1, x='episode', y='steps', label='Run 1 Steps', alpha=0.7)\n",
    "sns.lineplot(data=df_run2, x='episode', y='steps', label='Run 2 Steps', alpha=0.7)\n",
    "\n",
    "# Add rolling average\n",
    "df_run1['steps_rolling'] = df_run1['steps'].rolling(window=rolling_window).mean()\n",
    "df_run2['steps_rolling'] = df_run2['steps'].rolling(window=rolling_window).mean()\n",
    "\n",
    "sns.lineplot(data=df_run1, x='episode', y='steps_rolling', label=f'Run 1 Steps (Rolling Avg {rolling_window})', linewidth=2)\n",
    "sns.lineplot(data=df_run2, x='episode', y='steps_rolling', label=f'Run 2 Steps (Rolling Avg {rolling_window})', linewidth=2)\n",
    "\n",
    "plt.title('Steps per Episode Over Training')\n",
    "plt.xlabel('Episode')\n",
    "plt.ylabel('Steps Taken')\n",
    "plt.legend()\n",
    "plt.grid(True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loss Curve (if applicable, e.g., for DQN)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(14, 7))\n",
    "sns.lineplot(data=df_run1, x='episode', y='loss', label='Run 1 Loss', alpha=0.7)\n",
    "sns.lineplot(data=df_run2, x='episode', y='loss', label='Run 2 Loss', alpha=0.7)\n",
    "\n",
    "# Add rolling average\n",
    "df_run1['loss_rolling'] = df_run1['loss'].rolling(window=rolling_window).mean()\n",
    "df_run2['loss_rolling'] = df_run2['loss'].rolling(window=rolling_window).mean()\n",
    "\n",
    "sns.lineplot(data=df_run1, x='episode', y='loss_rolling', label=f'Run 1 Loss (Rolling Avg {rolling_window})', linewidth=2)\n",
    "sns.lineplot(data=df_run2, x='episode', y='loss_rolling', label=f'Run 2 Loss (Rolling Avg {rolling_window})', linewidth=2)\n",
    "\n",
    "plt.title('Training Loss per Episode')\n",
    "plt.xlabel('Episode')\n",
    "plt.ylabel('Loss')\n",
    "plt.yscale('log') # Loss often benefits from a log scale\n",
    "plt.legend()\n",
    "plt.grid(True, which=\"both\", ls=\"--\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Epsilon Decay"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(14, 7))\n",
    "sns.lineplot(data=df_run1, x='episode', y='epsilon', label='Epsilon Decay')\n",
    "plt.title('Exploration Rate (Epsilon) Over Training')\n",
    "plt.xlabel('Episode')\n",
    "plt.ylabel('Epsilon Value')\n",
    "plt.legend()\n",
    "plt.grid(True)\n",
    "plt.show()\n",
    "# Note: Epsilon is the same for both runs in this simulation, so only one line is plotted."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Distribution of Rewards (Overall)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(14, 7))\n",
    "sns.histplot(df_run1['reward'], kde=True, label='Run 1 Reward Distribution', color='skyblue', bins=50)\n",
    "sns.histplot(df_run2['reward'], kde=True, label='Run 2 Reward Distribution', color='lightcoral', bins=50)\n",
    "plt.title('Overall Distribution of Rewards per Episode')\n",
    "plt.xlabel('Total Reward')\n",
    "plt.ylabel('Frequency')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Distribution of Rewards (Late Training Phase)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "late_phase_start_episode = int(len(df_run1) * 0.8) # Look at the last 20% of episodes\n",
    "df_run1_late = df_run1[df_run1['episode'] >= late_phase_start_episode]\n",
    "df_run2_late = df_run2[df_run2['episode'] >= late_phase_start_episode]\n",
    "\n",
    "plt.figure(figsize=(14, 7))\n",
    "sns.histplot(df_run1_late['reward'], kde=True, label='Run 1 Reward (Late Phase)', color='skyblue', bins=30)\n",
    "sns.histplot(df_run2_late['reward'], kde=True, label='Run 2 Reward (Late Phase)', color='lightcoral', bins=30)\n",
    "plt.title(f'Distribution of Rewards per Episode (Episodes >= {late_phase_start_episode})')\n",
    "plt.xlabel('Total Reward')\n",
    "plt.ylabel('Frequency')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Correlation Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "correlation_matrix_run1 = df_run1[['reward', 'steps', 'loss', 'epsilon']].corr()\n",
    "correlation_matrix_run2 = df_run2[['reward', 'steps', 'loss', 'epsilon']].corr()\n",
    "\n",
    "fig, axes = plt.subplots(1, 2, figsize=(16, 6))\n",
    "\n",
    "sns.heatmap(correlation_matrix_run1, annot=True, cmap='coolwarm', fmt='.2f', ax=axes[0])\n",
    "axes[0].set_title('Run 1: Correlation Matrix')\n",
    "\n",
    "sns.heatmap(correlation_matrix_run2, annot=True, cmap='coolwarm', fmt='.2f', ax=axes[1])\n",
    "axes[1].set_title('Run 2: Correlation Matrix')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Statistical Analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Summary Statistics for Key Metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"--- Run 1: Summary Statistics ---\")\n",
    "print(df_run1[['reward', 'steps', 'loss']].agg(['mean', 'median', 'std', 'min', 'max']))\n",
    "\n",
    "print(\"\\n--- Run 2: Summary Statistics ---\")\n",
    "print(df_run2[['reward', 'steps', 'loss']].agg(['mean', 'median', 'std', 'min', 'max']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performance Comparison (Late Training Phase)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"--- Comparison of Late Phase (Episodes >= {late_phase_start_episode}) ---\")\n",
    "\n",
    "print(\"\\n--- Run 1 (Late Phase) ---\")\n",
    "print(df_run1_late[['reward', 'steps', 'loss']].agg(['mean', 'median', 'std']))\n",
    "\n",
    "print(\"\\n--- Run 2 (Late Phase) ---\")\n",
    "print(df_run2_late[['reward', 'steps', 'loss']].agg(['mean', 'median', 'std']))\n",
    "\n",
    "# Perform t-test to compare mean rewards in the late phase\n",
    "t_stat, p_value = stats.ttest_ind(df_run1_late['reward'], df_run2_late['reward'], equal_var=False) # Welch's t-test\n",
    "\n",
    "print(f\"\\n--- T-test for Mean Reward in Late Phase ---\")\n",
    "print(f\"T-statistic: {t_stat:.4f}\")\n",
    "print(f\"P-value: {p_value:.4f}\")\n",
    "\n",
    "alpha = 0.05\n",
    "if p_value < alpha:\n",
    "    print(f\"The difference in mean rewards between Run 1 and Run 2 during the late phase is statistically significant (p < {alpha}).\")\n",
    "else:\n",
    "    print(f\"There is no statistically significant difference in mean rewards between Run 1 and Run 2 during the late phase (p >= {alpha}).\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Learning Stability (Rolling Standard Deviation)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rolling_std_window = 50\n",
    "df_run1['reward_rolling_std'] = df_run1['reward'].rolling(window=rolling_std_window).std()\n",
    "df_run2['reward_rolling_std'] = df_run2['reward'].rolling(window=rolling_std_window).std()\n",
    "\n",
    "plt.figure(figsize=(14, 7))\n",
    "sns.lineplot(data=df_run1, x='episode', y='reward_rolling_std', label=f'Run 1 Reward Rolling Std Dev ({rolling_std_window})')\n",
    "sns.lineplot(data=df_run2, x='episode', y='reward_rolling_std', label=f'Run 2 Reward Rolling Std Dev ({rolling_std_window})')\n",
    "\n",
    "plt.title('Reward Stability (Rolling Standard Deviation)')\n",
    "plt.xlabel('Episode')\n",
    "plt.ylabel(f'Rolling Std Dev (Window={rolling_std_window})')\n",
    "plt.legend()\n",
    "plt.grid(True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Feature Engineering Experiments (on Log Data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We already created rolling averages (`reward_rolling`, `steps_rolling`, `loss_rolling`) and rolling standard deviation (`reward_rolling_std`) during EDA. Let's add cumulative reward and a simple success rate metric."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Cumulative Reward"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_run1['cumulative_reward'] = df_run1['reward'].cumsum()\n",
    "df_run2['cumulative_reward'] = df_run2['reward'].cumsum()\n",
    "\n",
    "plt.figure(figsize=(14, 7))\n",
    "sns.lineplot(data=df_run1, x='episode', y='cumulative_reward', label='Run 1 Cumulative Reward')\n",
    "sns.lineplot(data=df_run2, x='episode', y='cumulative_reward', label='Run 2 Cumulative Reward')\n",
    "\n",
    "plt.