# FPL Prediction Pipeline

This notebook runs the complete FPL prediction pipeline including data fetching, feature engineering, model training, predictions and team optimization.

### Quick Start

This notebook supports two workflows:
- **Train New Model**: Full pipeline including model training
- **Use Existing Model**: Load a pre-trained model and generate predictions

Both require:
1. Data fetching and feature processing (Common Setup)
2. Model (train new or load existing)
3. Prediction generation
4. Team optimization (optional, requires team ID for transfers)

### Alternative: Command Line

You can also run the pipeline from command line:

- Full pipeline:
```bash
python main.py --team-id YOUR_TEAM_ID
```
- Full pipeline with adjusted budget (for starters):
```bash
python main.py --team-id YOUR_TEAM_ID --budget YOUR_BUDGET
```
- Training and Prediction pipeline without team optimization
```bash
python main.py --skip-optimization
```
- Team Optimization only (if you already generated predictions)
```bash
python main.py --skip-pipeline
```

## ⚠️ Important Notes

### **Note on `initial_transfers` configuration**

The `initial_transfers` parameter can be set to `"auto"` in order to infer the number of available transfers at the start of the optimization horizon.

This option assumes that the manager has **not previously used Wildcard or Free Hit** during the current season.

If either of these chips has already been used, the automatic inference may be incorrect due to the way FPL transfer carryover rules are applied.  
In such cases, it is recommended to **explicitly set `initial_transfers` in `config.yaml`** (e.g. `initial_transfers: 1`), as done in the current configuration.


### **Seasons "21-22" and earlier do not have `expected_*` columns.**
If training on these older seasons, you must exclude all features containing `"expected_"`

If you include these older seasons in your `past_seasons` config, you must filter out all `expected_*` features before training.

Run this code **after the common setup section and before training**:

```python
import features

fe_config = pipeline.config["feature_engineering"]

# Get feature list based on scaling configuration
if fe_config["scale_features"]:
    feature_list = features.get_feature_set(fe_config["rolling_window"], fe_config["difficulty_window"],
                                            fe_config["team_window"], fe_config["form_window"], True)
else:
    feature_list = features.get_feature_set(fe_config["rolling_window"], fe_config["difficulty_window"],
                                            fe_config["team_window"], fe_config["form_window"], False)

# Filter out expected_ features
feature_list = [f for f in feature_list if "expected_" not in f]

rolling_columns = features.get_rolling_features(fe_config["rolling_window"])
rolling_columns = [f for f in rolling_columns if "expected_" not in f]

difficulty_columns = features.get_difficulty_features(fe_config["difficulty_window"])
difficulty_columns = [f for f in difficulty_columns if "expected_" not in f]

# Pass these filtered lists to train_model()
model = pipeline.train_model(
    feature_list=feature_list,
    rolling_cols=rolling_columns,
    difficulty_cols=difficulty_columns
)
```

**Alternative:** Remove seasons "21-22" and earlier from your config.yaml past_seasons list.

# Training / Prediction

## Workflow 1: Train New Model

Run this if you want to train a new model to generate predictions

In [1]:
from fpl.main import FPLPipeline

In [None]:
pipeline = FPLPipeline(config_path="config.yaml")

pipeline.fetch_past_data()
pipeline.fetch_current_data()
past_gws_df = pipeline.process_past_data()

if "code" in pipeline.config["model"]["categorical_features"]:
    pipeline.training_data = pipeline.training_data[
        pipeline.training_data.code.isin(
            pipeline.training_data[pipeline.training_data.season == pipeline.training_data.season.max()].code.unique()
        )
    ]

pipeline.fetch_future_data()
future_gws_df = pipeline.process_future_data()


feature_list = pipeline.config["feature_engineering"]["feature_set"]
nan_summary_past = past_gws_df[feature_list].isnull().sum()
nan_summary_future = future_gws_df[feature_list].isnull().sum()
print(f"Columns with NaN values (past dataset):\n{nan_summary_past[nan_summary_past > 0]}")
print(f"Columns with NaN values (future dataset):\n{nan_summary_future[nan_summary_future > 0]}")

In [None]:
model = pipeline.train_model()

In [None]:
pipeline.generate_predictions()

## Workflow 2: Use Existing Model (skip training)
Run this if you have a pre-trained model and just want predictions

In [None]:
from fpl.main import FPLPipeline
from fpl.utils.utils import handle_double_gameweeks
from fpl.models.rf_models import RandomForest
from fpl.models.gbm_models import GradientBoosting
from fpl.models.lgbm_models import LightGradientBoosting

In [None]:
pipeline = FPLPipeline(config_path="config.yaml")

pipeline.fetch_past_data()
pipeline.fetch_current_data()
past_gws_df = pipeline.process_past_data()

if "code" in pipeline.config["model"]["categorical_features"]:
    pipeline.training_data = pipeline.training_data[
        pipeline.training_data.code.isin(
            pipeline.training_data[pipeline.training_data.season == pipeline.training_data.season.max()].code.unique()
        )
    ]

pipeline.fetch_future_data()
future_gws_df = pipeline.process_future_data()

feature_list = pipeline.config["feature_engineering"]["feature_set"]
nan_summary = past_gws_df[feature_list].isnull().sum()
print(f"Columns with NaN values:\n{nan_summary[nan_summary > 0]}")

In [None]:
# Adjust the name of the saved model and it's type
model = GradientBoosting.from_saved_model("models/gb_model_gw18.pkl")

preds = model.predict(X=pipeline.future_data)

pipeline.predictions = handle_double_gameweeks(preds)

# Team Optimization

You do not need your team ID:
1. First gameweek of the season (Wildcard Optimization automatically)

2. If you want to use the wildcard chip (change in the config the use_wildcard: true or run the Wildcard Optimization subsection below)

3. If you want to use the freehit chip (change in the config the use_freehit: true or run the Free Hit Optimization subsection below)

In [None]:
team_id = pipeline.config.get("TEAM_ID")
result = pipeline.optimize_team(team_id=team_id)

## Wildcard Optimization

In [None]:
from fpl.optimization.team_selection import WildcardOptimization

wildcard = WildcardOptimization(
    predictions_df=pipeline.predictions,
    budget=pipeline.config["team_selection"]["wildcard"]["budget"],
    valid_status=pipeline.config["team_selection"]["valid_status"],
    num_starters=pipeline.config["team_selection"]["wildcard"]["num_starters"],
    future_weights=pipeline.config["team_selection"]["wildcard"]["future_weights"]
)

optimal_team_df, optimal_starters_df = wildcard.optimize_wildcard()

## Free Hit Optimization

In [None]:
from fpl.optimization.team_selection import FreeHitOptimization

freehit = FreeHitOptimization(
    predictions_df=pipeline.predictions,
    budget=pipeline.config["team_selection"]["freehit"]["budget"],
    valid_status=pipeline.config["team_selection"]["valid_status"]
)

optimal_team_df = freehit.optimize_freehit()