# Prepare Data for StatsBomb-SkillCorner Synchronization

This notebook helps you gather all the files needed to run the SkillCorner toolkit's synchronization script.

**Required files:**
1. SkillCorner match metadata (JSON)
2. SkillCorner tracking data (JSONL) 
3. StatsBomb events (JSON) - ✓ Already have: `data/sb_events.json`
4. StatsBomb lineup (JSON) - Need to obtain
5. StatsBomb home team ID - Need to extract

**Match ID:** StatsBomb match_id = 3925601

In [None]:
import json
import pandas as pd
from skillcorner.client import SkillcornerClient
from credentials import username, password
import importlib
import credentials
importlib.reload(credentials)
from credentials import username, password

# Initialize client
username = username.strip()
password = password.strip()
client = SkillcornerClient(username=username, password=password)

## Step 1: Find the SkillCorner Match

We need to find which SkillCorner match corresponds to StatsBomb match 3925601.
Check your `data/matches_mapping.csv` or search for the match.

In [None]:
# Check if you have a matches mapping file
try:
    mapping = pd.read_csv('data/matches_mapping.csv')
    print("Matches mapping found:")
    print(mapping.head())
except:
    print("No matches mapping file found. You'll need to identify the match manually.")
    print("\nLet's check your StatsBomb events to get more info:")
    
    with open('data/sb_events.json', 'r') as f:
        sb_events = json.load(f)
    
    # Get some event info to help identify the match
    first_event = sb_events[0] if isinstance(sb_events, list) else sb_events
    print(f"\nStatsBomb Match ID: {first_event.get('match_id')}")
    print(f"Teams in events: Check team names in the data")

## Step 2: Get SkillCorner Match Metadata

Once you know the SkillCorner match ID, get the metadata.
Replace `YOUR_SKC_MATCH_ID` with the actual match ID.

In [None]:
# Replace with your actual SkillCorner match ID
skc_match_id = "YOUR_SKC_MATCH_ID"  # TODO: Update this

# Get match metadata
try:
    match_metadata = client.get_match(skc_match_id)
    
    # Save to file
    with open('data/skc_match_metadata.json', 'w') as f:
        json.dump(match_metadata, f, indent=2)
    
    print(f"✓ Saved SkillCorner match metadata to: data/skc_match_metadata.json")
    print(f"\nMatch info:")
    print(f"  Home team: {match_metadata['home_team']['name']}")
    print(f"  Away team: {match_metadata['away_team']['name']}")
    print(f"  Date: {match_metadata['date_time']}")
except Exception as e:
    print(f"Error: {e}")
    print("\nPlease update skc_match_id with the correct match ID")

## Step 3: Get SkillCorner Tracking Data

Download the tracking data for this match.

In [None]:
# Get tracking data (extrapolated version)
try:
    tracking_filepath = f'data/{skc_match_id}_tracking.jsonl'
    
    client.save_match_tracking_data(
        skc_match_id,
        params={'data_version': 3},  # 3 = extrapolated
        filepath=tracking_filepath
    )
    
    print(f"✓ Saved tracking data to: {tracking_filepath}")
except Exception as e:
    print(f"Error: {e}")

## Step 4: Get StatsBomb Lineup Data

You need the StatsBomb lineup file (separate from events).
This typically comes from StatsBomb's API or data downloads.

**If you have StatsBomb access**, use their API to get lineup for match 3925601.
**If you have the file already**, place it in the `data/` folder.

In [None]:
# Check if lineup file exists
import os

lineup_file = 'data/3925601-lineup.json'  # Standard StatsBomb naming
if os.path.exists(lineup_file):
    print(f"✓ Lineup file found: {lineup_file}")
    
    with open(lineup_file, 'r') as f:
        lineup = json.load(f)
    
    print(f"\nLineup contains {len(lineup)} teams")
    for team in lineup:
        print(f"  Team: {team['team_name']} (ID: {team['team_id']}) - {len(team['lineup'])} players")
else:
    print(f"✗ Lineup file not found: {lineup_file}")
    print("\nYou need to obtain the StatsBomb lineup file for match 3925601")
    print("Place it in the data/ folder as '3925601-lineup.json'")

## Step 5: Identify StatsBomb Home Team ID

Extract the home team ID from the lineup or events.

In [None]:
# Try to determine home team from events
with open('data/sb_events.json', 'r') as f:
    sb_events = json.load(f)

# Get team info from first few events
teams = set()
for event in sb_events[:20]:
    if 'team' in event and 'id' in event['team']:
        team_id = event['team']['id']
        team_name = event['team']['name']
        teams.add((team_id, team_name))

print("Teams found in events:")
for team_id, team_name in teams:
    print(f"  Team ID: {team_id} - {team_name}")

print("\n⚠️ You need to identify which team is the HOME team")
print("This information should be in your StatsBomb match metadata or lineup file")

## Step 6: Run the Synchronization Script

Once you have all files ready, run this command in PowerShell from the skillcorner-toolkit folder:

```powershell
cd C:\Users\User\Desktop\skillcorner-toolkit

py tools\with_tracking\run_statsbomb.py `
    --match_data_path "C:\Users\User\Desktop\Capstone Project\data\skc_match_metadata.json" `
    --tracking_data_path "C:\Users\User\Desktop\Capstone Project\data\YOUR_MATCH_ID_tracking.jsonl" `
    --statsbomb_events_path "C:\Users\User\Desktop\Capstone Project\data\sb_events.json" `
    --statsbomb_match_data_path "C:\Users\User\Desktop\Capstone Project\data\3925601-lineup.json" `
    --statsbomb_home_team_id YOUR_HOME_TEAM_ID `
    --save_outputs_dir "C:\Users\User\Desktop\Capstone Project\data\sync_output"
```

Replace:
- `YOUR_MATCH_ID` with the actual SkillCorner match ID
- `YOUR_HOME_TEAM_ID` with the StatsBomb home team ID

## Summary Checklist

Check off each item as you complete it:

- [ ] SkillCorner match ID identified
- [ ] SkillCorner match metadata downloaded (`skc_match_metadata.json`)
- [ ] SkillCorner tracking data downloaded (`*_tracking.jsonl`)
- [ ] StatsBomb events file ready (`sb_events.json`) ✓
- [ ] StatsBomb lineup file obtained (`3925601-lineup.json`)
- [ ] StatsBomb home team ID identified
- [ ] Output directory created (`data/sync_output/`)
- [ ] Synchronization script executed