# Lineup Value Playground
Use this notebook to exercise `compute_lineup_value_features` on a small synthetic data set before wiring it into the full pipeline.

Each section mirrors a stage of the helper: building matches, lineups, player market values, and inspecting the resulting per-match aggregates.

## Imports and path bootstrap
Adds `src/` to `sys.path` so we can import the feature module without installing the package.

In [1]:
from __future__ import annotations

import sys
from pathlib import Path

import pandas as pd

PROJECT_ROOT = Path.cwd().resolve()
while not (PROJECT_ROOT / 'src').is_dir() and PROJECT_ROOT != PROJECT_ROOT.parent:
    PROJECT_ROOT = PROJECT_ROOT.parent
SRC_DIR = PROJECT_ROOT / 'src'
if not SRC_DIR.is_dir():
    raise RuntimeError(f'Source directory not found relative to {PROJECT_ROOT}')
if str(SRC_DIR) not in sys.path:
    sys.path.append(str(SRC_DIR))

from data.features.lineup_value import (
    LineupValueColumns,
    compute_lineup_value_features,
)

pd.options.display.precision = 2
pd.options.display.float_format = '{:.2f}'.format


## Synthetic fixtures, lineups, and valuations
This block fabricates two matches, the actual players who stepped on the pitch (bench minutes filtered), and market-value snapshots so we can validate the merge and aggregation logic.

In [2]:
matches_df = pd.DataFrame(
    {
        "id": [1001, 1002],
        "utcDate": pd.to_datetime(
            ["2024-03-01 20:00:00+00:00", "2024-03-05 15:00:00+00:00"], utc=True
        ),
        "homeTeam.id": [10, 12],
        "awayTeam.id": [11, 13],
    }
)

lineups_df = pd.DataFrame(
    {
        "match_id": [
            1001, 1001, 1001, 1001, 1001, 1001,  # home XI
            1001, 1001, 1001, 1001,  # away XI (one short value)
            1002, 1002, 1002, 1002, 1002, 1002,
            1002, 1002, 1002, 1002, 1002, 1002,
        ],
        "team_id": [
            10, 10, 10, 10, 10, 10,
            11, 11, 11, 11,
            12, 12, 12, 12, 12, 12,
            13, 13, 13, 13, 13, 13,
        ],
        "player_id": [
            501, 502, 503, 504, 505, 506,
            601, 602, 603, 604,
            701, 702, 703, 704, 705, 706,
            801, 802, 803, 804, 805, 806,
        ],
        "minutes_played": [
            90, 85, 75, 65, 45, 15,
            90, 88, 70, 5,
            90, 90, 90, 70, 20, 10,
            90, 90, 80, 60, 30, 0,  # last player sat on bench
        ],
    }
)

player_values_df = pd.DataFrame(
    {
        "player_id": [
            501, 501, 502, 503, 504, 505, 506,
            601, 602, 603, 604,
            701, 702, 703, 704, 705, 706,
            801, 802, 803, 804, 805, 806,
        ],
        "market_value_eur": [
            10_000_000, 11_000_000, 7_500_000, 6_000_000, 5_000_000, 3_000_000, 1_000_000,
            9_000_000, 4_000_000, 2_000_000, 500_000,
            8_000_000, 7_000_000, 6_500_000, 5_500_000, 2_500_000, 1_500_000,
            12_000_000, 9_500_000, 4_500_000, 3_500_000, 2_500_000, 750_000,
        ],
        "snapshot_date": pd.to_datetime(
            [
                "2024-02-01", "2024-02-20", "2024-02-18", "2024-02-15", "2024-02-10", "2024-02-05", "2024-02-01",
                "2024-02-21", "2024-02-10", "2024-01-15", "2024-02-25",
                "2024-02-22", "2024-02-18", "2024-02-05", "2024-02-01", "2024-02-01", "2024-02-15",
                "2024-02-15", "2024-02-12", "2024-02-07", "2024-02-05", "2024-02-03", "2024-02-01",
            ],
            utc=True,
        ),
    }
)

matches_df, lineups_df.head(), player_values_df.head()


(     id                   utcDate  homeTeam.id  awayTeam.id
 0  1001 2024-03-01 20:00:00+00:00           10           11
 1  1002 2024-03-05 15:00:00+00:00           12           13,
    match_id  team_id  player_id  minutes_played
 0      1001       10        501              90
 1      1001       10        502              85
 2      1001       10        503              75
 3      1001       10        504              65
 4      1001       10        505              45,
    player_id  market_value_eur             snapshot_date
 0        501          10000000 2024-02-01 00:00:00+00:00
 1        501          11000000 2024-02-20 00:00:00+00:00
 2        502           7500000 2024-02-18 00:00:00+00:00
 3        503           6000000 2024-02-15 00:00:00+00:00
 4        504           5000000 2024-02-10 00:00:00+00:00)

## Compute lineup-value features
The helper filters out sub-minute appearances, joins each player with the latest valuation at or before kickoff, and aggregates to home/away totals.

In [3]:
lineup_features = compute_lineup_value_features(
    matches=matches_df,
    lineups=lineups_df,
    player_values=player_values_df,
    min_minutes_played=1.0,
)
lineup_features


Unnamed: 0,id,lineup_value_home,lineup_value_away,lineup_value_players_with_values_home,lineup_value_players_with_values_away,lineup_value_players_total_home,lineup_value_players_total_away,lineup_value_players_coverage_home,lineup_value_players_coverage_away,lineup_value_diff,lineup_value_log_diff
0,1001,33500000.0,15500000.0,6,4,6,4,1.0,1.0,18000000.0,0.77
1,1002,31000000.0,32000000.0,6,5,6,5,1.0,1.0,-1000000.0,-0.03


## Inspect coverage and differentials
These diagnostic slices help verify that coverage ratios and log differences respond as expected when valuations are missing.

In [4]:
columns_of_interest = [
    "lineup_value_home",
    "lineup_value_away",
    "lineup_value_diff",
    "lineup_value_log_diff",
    "lineup_value_players_with_values_home",
    "lineup_value_players_total_home",
    "lineup_value_players_coverage_home",
    "lineup_value_players_with_values_away",
    "lineup_value_players_total_away",
    "lineup_value_players_coverage_away",
]
lineup_features[["id"] + columns_of_interest]


Unnamed: 0,id,lineup_value_home,lineup_value_away,lineup_value_diff,lineup_value_log_diff,lineup_value_players_with_values_home,lineup_value_players_total_home,lineup_value_players_coverage_home,lineup_value_players_with_values_away,lineup_value_players_total_away,lineup_value_players_coverage_away
0,1001,33500000.0,15500000.0,18000000.0,0.77,6,6,1.0,4,4,1.0
1,1002,31000000.0,32000000.0,-1000000.0,-0.03,6,6,1.0,5,5,1.0


## Next steps
Replace the synthetic frames with actual football-data matches, lineup feeds (e.g., FBref or WhoScored), and Transfermarkt/SoFIFA market-value snapshots once those extracts are available.