**Objective**: Analyze DraftKings DFS contest data to predict successful lineups for cash games (top 50%).


In [1]:
import pandas as pd
import numpy as np
from utils.draftkingsCashReview import load_split_dk_results, load_player_data, add_lineup_features, calculate_lineup_rating, normalize_features



**Input Data**:
1. `contest-standings.csv` load csv and split at column 6 into 2 df
2. `entries_df` (contest results):  
   Columns: `Rank`, `EntryId`, `EntryName`, `TimeRemaining`, `Points`, `Lineup`, `cash_win`, `cash_result`.
3. `player_results_df` (player projections):  
   Columns: `Player`, `RosterPosition`, `%Drafted`, `FPTS`, `teamabbrev`, `avgpointspergame`, `projpts`, `projown`, `salary`.

**Steps to Execute**:
1. **[Data Prep]**:  
   - Clean `Player` strings in both DataFrames.  
   - Parse `Lineup` strings into individual player lists.  
   - Compute new features per lineup (stacks, value exposure, etc.).


In [2]:
# Load the data
week = 'week02'
results_file_name = 'contest-standings.csv'
playerpool_df, entries_df = load_split_dk_results(week)
player_results_df = load_player_data(week, playerpool_df)

player_results_df.head(3)  

Unnamed: 0,Player,Roster Position,%Drafted,FPTS,position,name + id,name,id,roster position,salary,game info,teamabbrev,avgpointspergame,team,projpts,projown
0,Christian McCaffrey,RB,72.56%,22.7,RB,Christian McCaffrey (39971377),Christian McCaffrey,39971377.0,RB/FLEX,7500.0,SF@NO 09/14/2025 01:00PM ET,SF,23.2,49ers,19.5,34.7
1,Hollywood Brown,WR,52.76%,8.0,WR,Hollywood Brown (39971707),Hollywood Brown,39971707.0,WR/FLEX,5200.0,PHI@KC 09/14/2025 04:25PM ET,KC,19.9,Chiefs,14.0,15.7
2,Harold Fannin Jr.,TE,50.17%,9.8,TE,Harold Fannin Jr. (39972149),Harold Fannin Jr.,39972149.0,TE/FLEX,3100.0,CLE@BAL 09/14/2025 01:00PM ET,CLE,13.6,Browns,9.6,13.3


In [4]:
entries_df.head(3)

Unnamed: 0,Rank,EntryId,EntryName,TimeRemaining,Points,Lineup
0,1,4852565135,calebzsnyder,0,206.86002,DST Ravens FLEX George Pickens QB Jared Goff ...
1,2,4852356677,trh1010,0,204.36002,DST Colts FLEX Ja'Marr Chase QB Mac Jones RB ...
2,3,4853930626,mbhurls,0,200.76,DST Cowboys FLEX De'Von Achane QB Jared Goff ...



2. **[Feature Engineering]**:  
   - Expand lineup
   - For each lineup, create features like:  
     - `sum_projpts`, `sum_projown`  
     - `num_stacks`, `num_value_players` (projown < 20%)  
     - `avg_floor` (using avgpointspergame)  
     - `qb_wr_stacks`  
     - `num_chalk_players` (projown > 60%)  
   - Export to `training_data.csv`.


In [5]:
player_results_df = add_lineup_features(entries_df, player_results_df)

In [7]:
player_results_df.columns

Index(['Rank', 'EntryId', 'EntryName', 'TimeRemaining', 'Points', 'Lineup', 0], dtype='object')

In [None]:
player_results_df = normalize_features(player_results_df, ['projown', 'projpts,])
player_results_df.head(3)

Unnamed: 0,Player,Roster Position,%Drafted,FPTS,position,name + id,name,id,roster position,salary,game info,teamabbrev,avgpointspergame,team,projpts,projown
0,Christian McCaffrey,RB,72.56%,22.7,RB,Christian McCaffrey (39971377),Christian McCaffrey,39971377.0,RB/FLEX,7500.0,SF@NO 09/14/2025 01:00PM ET,SF,23.2,49ers,1.881923,6.232744
1,Hollywood Brown,WR,52.76%,8.0,WR,Hollywood Brown (39971707),Hollywood Brown,39971707.0,WR/FLEX,5200.0,PHI@KC 09/14/2025 04:25PM ET,KC,19.9,Chiefs,0.819337,2.403032
2,Harold Fannin Jr.,TE,50.17%,9.8,TE,Harold Fannin Jr. (39972149),Harold Fannin Jr.,39972149.0,TE/FLEX,3100.0,CLE@BAL 09/14/2025 01:00PM ET,CLE,13.6,Browns,-0.030732,1.919279



3. **[Model Training]**:  
   - **Regression**:  
     Train `LinearRegression` to predict `Points` from lineup features.  
     - Save coefficients as `reg_weights.csv`.  
   - **Classification**:  
     Train `LogisticRegression` to predict `cash_win` from lineup features.  
     - Save coefficients as `logit_weights.csv`.  



4. **[Evaluation]**:  
   - Compare correlation of `Points` prediction vs actual `Points`.  
   - Compare correlation of `cash_win` prediction vs actual `cash_win`.  
   - Evaluate improvement over:  
     - Raw `sum_projpts`.  
     - Baseline rating (`0.5*projown + 0.5*projpts`).  



5. **[Output]**:  
   - Save final DataFrame with best `lineup_rating` to `final_entries.csv`.  
   - Generate plots:  
     - Feature importance for `cash_win` prediction.  
     - Actual vs predicted `Points`.  

**Coding Language**: Python (Pandas, Scikit-learn, NumPy).  
**Goal**: Derive optimal lineup rating for cash-game success.