In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Alpha Metrics Analysis\n",
    "## Understanding Finishing Alpha and Playmaking Alpha\n",
    "\n",
    "This notebook explains the core concept of \"Alpha\" metrics used in this project."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from adjustText import adjust_text\n",
    "import sys\n",
    "sys.path.append('../src')\n",
    "from analysis import get_data\n",
    "\n",
    "plt.style.use('seaborn-v0_8-whitegrid')\n",
    "df = get_data()\n",
    "print(f\"Loaded {len(df)} players\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. What is Alpha?\n",
    "\n",
    "In finance, **Alpha** represents the excess return of an investment relative to the return of a benchmark.\n",
    "\n",
    "In football analytics, we use the same concept:\n",
    "\n",
    "- **Finishing Alpha** = Goals - xG (Expected Goals)\n",
    "- **Playmaking Alpha** = Assists - xAG (Expected Assists)\n",
    "\n",
    "A positive alpha means the player is **overperforming** expectations.\n",
    "A negative alpha means the player is **underperforming** expectations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Alpha calculation (already done in data pipeline)\n",
    "print(\"Finishing Alpha = Goals - xG\")\n",
    "print(\"Playmaking Alpha = Assists - xAG\")\n",
    "print(\"\\nExample:\")\n",
    "example = df[df['player'].str.contains('Haaland', case=False)].iloc[0]\n",
    "print(f\"Player: {example['player']}\")\n",
    "print(f\"Goals: {example['gls']:.0f}, xG: {example['xg']:.1f}\")\n",
    "print(f\"Finishing Alpha: {example['gls']:.0f} - {example['xg']:.1f} = {example['finishing_alpha']:.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Finishing Alpha Distribution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
    "\n",
    "# Histogram\n",
    "axes[0].hist(df['finishing_alpha'], bins=50, edgecolor='black', color='#3498db')\n",
    "axes[0].axvline(x=0, color='red', linestyle='--', linewidth=2, label='Zero (Expected)')\n",
    "axes[0].axvline(x=df['finishing_alpha'].mean(), color='green', linestyle='--', linewidth=2, label=f\"Mean: {df['finishing_alpha'].mean():.2f}\")\n",
    "axes[0].set_xlabel('Finishing Alpha')\n",
    "axes[0].set_ylabel('Count')\n",
    "axes[0].set_title('Finishing Alpha Distribution')\n",
    "axes[0].legend()\n",
    "\n",
    "# Box plot by position\n",
    "df.boxplot(column='finishing_alpha', by='main_pos', ax=axes[1])\n",
    "axes[1].axhline(y=0, color='red', linestyle='--', linewidth=2)\n",
    "axes[1].set_title('Finishing Alpha by Position')\n",
    "axes[1].set_xlabel('Position')\n",
    "axes[1].set_ylabel('Finishing Alpha')\n",
    "\n",
    "plt.suptitle('')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. xG vs Actual Goals Scatter Plot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(12, 10))\n",
    "scatter = plt.scatter(df['xg'], df['gls'], alpha=0.5, c=df['finishing_alpha'], cmap='RdYlGn', s=50)\n",
    "plt.colorbar(scatter, label='Finishing Alpha')\n",
    "\n",
    "# Perfect conversion line\n",
    "max_val = max(df['xg'].max(), df['gls'].max())\n",
    "plt.plot([0, max_val], [0, max_val], 'k--', linewidth=2, label='Perfect Conversion (Goals = xG)')\n",
    "\n",
    "# Label outliers\n",
    "top_outliers = df.nlargest(10, 'finishing_alpha')\n",
    "worst_outliers = df.nsmallest(5, 'finishing_alpha')\n",
    "outliers = pd.concat([top_outliers, worst_outliers])\n",
    "\n",
    "texts = []\n",
    "for _, row in outliers.iterrows():\n",
    "    texts.append(plt.text(row['xg'], row['gls'], row['player'], fontsize=9, fontweight='bold'))\n",
    "adjust_text(texts, arrowprops=dict(arrowstyle='-', color='gray', lw=0.5))\n",
    "\n",
    "plt.xlabel('Expected Goals (xG)', fontsize=12)\n",
    "plt.ylabel('Actual Goals', fontsize=12)\n",
    "plt.title('Expected Goals vs Actual Goals\\n(Above line = Overperforming, Below line = Underperforming)', fontsize=14)\n",
    "plt.legend(loc='upper left')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Top Clinical Finishers (Highest Finishing Alpha)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "top_finishers = df.nlargest(15, 'finishing_alpha')[['player', 'squad', 'comp', 'gls', 'xg', 'finishing_alpha']]\n",
    "top_finishers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(12, 8))\n",
    "top15 = df.nlargest(15, 'finishing_alpha')\n",
    "colors = ['#2ecc71' for _ in range(15)]\n",
    "plt.barh(top15['player'], top15['finishing_alpha'], color=colors)\n",
    "plt.xlabel('Finishing Alpha (Goals - xG)')\n",
    "plt.title('Top 15 Clinical Finishers - Overperforming xG')\n",
    "plt.gca().invert_yaxis()\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Worst Finishers (Lowest Finishing Alpha)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "worst_finishers = df.nsmallest(15, 'finishing_alpha')[['player', 'squad', 'comp', 'gls', 'xg', 'finishing_alpha']]\n",
    "worst_finishers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(12, 8))\n",
    "worst15 = df.nsmallest(15, 'finishing_alpha')\n",
    "colors = ['#e74c3c' for _ in range(15)]\n",
    "plt.barh(worst15['player'], worst15['finishing_alpha'], color=colors)\n",
    "plt.xlabel('Finishing Alpha (Goals - xG)')\n",
    "plt.title('Top 15 Underperforming Finishers - Below xG')\n",
    "plt.gca().invert_yaxis()\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Playmaking Alpha Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(12, 10))\n",
    "scatter = plt.scatter(df['xag'], df['ast'], alpha=0.5, c=df['playmaking_alpha'], cmap='RdYlGn', s=50)\n",
    "plt.colorbar(scatter, label='Playmaking Alpha')\n",
    "\n",
    "max_val = max(df['xag'].max(), df['ast'].max())\n",
    "plt.plot([0, max_val], [0, max_val], 'k--', linewidth=2, label='Perfect Conversion (Assists = xAG)')\n",
    "\n",
    "top_playmakers = df.nlargest(10, 'playmaking_alpha')\n",
    "texts = []\n",
    "for _, row in top_playmakers.iterrows():\n",
    "    texts.append(plt.text(row['xag'], row['ast'], row['player'], fontsize=9, fontweight='bold'))\n",
    "adjust_text(texts, arrowprops=dict(arrowstyle='-', color='gray', lw=0.5))\n",
    "\n",
    "plt.xlabel('Expected Assists (xAG)', fontsize=12)\n",
    "plt.ylabel('Actual Assists', fontsize=12)\n",
    "plt.title('Expected Assists vs Actual Assists', fontsize=14)\n",
    "plt.legend(loc='upper left')\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. League Comparison"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "league_alpha = df.groupby('comp').agg({\n",
    "    'finishing_alpha': 'mean',\n",
    "    'playmaking_alpha': 'mean'\n",
    "}).round(3)\n",
    "\n",
    "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n",
    "\n",
    "# Finishing Alpha by League\n",
    "league_order = league_alpha.sort_values('finishing_alpha').index\n",
    "colors = ['#2ecc71' if league_alpha.loc[l, 'finishing_alpha'] > 0 else '#e74c3c' for l in league_order]\n",
    "axes[0].barh(league_order, league_alpha.loc[league_order, 'finishing_alpha'], color=colors)\n",
    "axes[0].axvline(x=0, color='black', linewidth=0.5)\n",
    "axes[0].set_xlabel('Average Finishing Alpha')\n",
    "axes[0].set_title('League Finishing Efficiency')\n",
    "\n",
    "# Playmaking Alpha by League\n",
    "league_order = league_alpha.sort_values('playmaking_alpha').index\n",
    "colors = ['#2ecc71' if league_alpha.loc[l, 'playmaking_alpha'] > 0 else '#e74c3c' for l in league_order]\n",
    "axes[1].barh(league_order, league_alpha.loc[league_order, 'playmaking_alpha'], color=colors)\n",
    "axes[1].axvline(x=0, color='black', linewidth=0.5)\n",
    "axes[1].set_xlabel('Average Playmaking Alpha')\n",
    "axes[1].set_title('League Playmaking Efficiency')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Team Efficiency Rankings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "team_alpha = df.groupby('squad').agg({\n",
    "    'finishing_alpha': 'mean',\n",
    "    'player': 'count'\n",
    "}).rename(columns={'player': 'num_players'})\n",
    "\n",
    "# Filter teams with at least 5 players\n",
    "team_alpha = team_alpha[team_alpha['num_players'] >= 5]\n",
    "\n",
    "# Top 15 most clinical teams\n",
    "top_teams = team_alpha.nlargest(15, 'finishing_alpha')\n",
    "\n",
    "plt.figure(figsize=(12, 8))\n",
    "colors = ['#2ecc71' if x > 0 else '#e74c3c' for x in top_teams['finishing_alpha']]\n",
    "plt.barh(top_teams.index, top_teams['finishing_alpha'], color=colors)\n",
    "plt.xlabel('Average Finishing Alpha')\n",
    "plt.title('Top 15 Most Clinical Teams')\n",
    "plt.gca().invert_yaxis()\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Key Insights\n",
    "\n",
    "1. **Distribution:** Most players have alpha values close to zero (as expected by the model)\n",
    "2. **Outliers:** A few elite finishers consistently outperform their xG\n",
    "3. **Position Effect:** Forwards tend to have higher variance in finishing alpha\n",
    "4. **League Differences:** Different leagues show varying efficiency levels\n",
    "5. **Team Patterns:** Some teams consistently over/underperform expectations"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}