# Home Advantage & Goal Patterns — EDA
This notebook explores basic questions using Premier League CSVs.
Run `python src/prepare_data.py` first to create `data/processed/matches.parquet`.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
from pathlib import Path

path = Path('data/processed/matches.parquet')
assert path.exists(), 'Run prepare_data.py first'
df = pd.read_parquet(path)
df.head()

## 1. Overall home advantage
Home win rate and goals per game.

In [None]:
home_rate = df['home_win'].mean()
print(f'Home win rate: {home_rate:.2%}')

ax = df[['FTHG','FTAG']].mean().plot(kind='bar', title='Average Goals: Home vs Away')
ax.set_ylabel('Goals per match')
plt.show()

## 2. Trend over time
Home win rate by season.

In [None]:
season_stats = df.groupby('season').agg(home_rate=('home_win','mean'), goals=('total_goals','mean')).reset_index()
fig = px.line(season_stats, x='season', y=['home_rate','goals'], title='Home Win Rate and Goals per Match by Season')
fig.show()

## 3. Team-level view
Top and bottom teams by average goals (home).

In [None]:
team_home = df.groupby('HomeTeam')['FTHG'].mean().sort_values(ascending=False).head(10)
team_home

## 4. (Optional) Moving averages
Compute rolling home win rate by calendar date.

In [None]:
df_sorted = df.sort_values('Date')
roll = df_sorted['home_win'].rolling(100, min_periods=50).mean()
ax = roll.plot(title='Rolling Home Win Rate (window=100 matches)')
ax.set_ylabel('Rate')
plt.show()

## Next steps
- Add more seasons
- Add simple baseline predictor
- Export key charts as PNG for your README