# Data Collection Demo

This notebook demonstrates how to use the data collection modules to fetch NBA data.

## What you'll learn:
- How to fetch game data from the NBA API
- How to fetch player statistics
- How to fetch team information
- How to save and load data

In [None]:
import sys
import os
import pandas as pd

# Add src to path
sys.path.append(os.path.abspath('..'))

from src.data_collection.game_data import GameDataCollector
from src.data_collection.player_data import PlayerDataCollector
from src.data_collection.team_data import TeamDataCollector

## 1. Fetch Game Data

In [None]:
# Initialize game data collector
game_collector = GameDataCollector()

# Fetch games from a specific date range
games = game_collector.fetch_games_by_date_range(
    start_date="2023-10-01",
    end_date="2023-10-31"
)

print(f"Fetched {len(games)} games")
print("\nFirst game:")
print(games[0])

### Enrich game data with additional features

In [None]:
# Add winner, score differential, etc.
enriched_games = game_collector.enrich_game_data(games)

# Convert to DataFrame for easy viewing
games_df = pd.DataFrame(enriched_games)
games_df.head()

### Save game data to file

In [None]:
# Save to JSON
game_collector.save_games_to_file(
    enriched_games,
    "data/raw/games/demo_games.json"
)

## 2. Fetch Player Data

In [None]:
# Initialize player data collector
player_collector = PlayerDataCollector()

# Search for specific players
lebron = player_collector.search_players("LeBron")
print(f"Found {len(lebron)} players matching 'LeBron'")
print(lebron[0])

### Fetch player statistics for a date range

In [None]:
# Get player stats
stats = player_collector.fetch_player_stats_by_date_range(
    start_date="2023-10-01",
    end_date="2023-10-31"
)

print(f"Fetched {len(stats)} player stat records")

# Convert to DataFrame
stats_df = pd.DataFrame(stats)
stats_df.head()

## 3. Fetch Team Data

In [None]:
# Initialize team data collector
team_collector = TeamDataCollector()

# Fetch all teams
teams = team_collector.fetch_all_teams()
print(f"Fetched {len(teams)} NBA teams")

# Convert to DataFrame
teams_df = pd.DataFrame(teams)
teams_df

### Calculate team season statistics

In [None]:
# Calculate team stats from games
team_stats = team_collector.calculate_all_team_season_stats(games, teams)

# Convert to DataFrame for viewing
team_stats_df = pd.DataFrame(team_stats).T
team_stats_df.head()

### Calculate standings

In [None]:
# Calculate overall standings
standings = team_collector.calculate_standings(team_stats)

standings_df = pd.DataFrame(standings)
standings_df

## 4. Data Collection Pipeline

Here's an example of collecting a complete dataset for a season:

In [None]:
# Note: This will make many API calls and may take several minutes
# Uncomment to run:

# with GameDataCollector() as gc:
#     # Collect games for 2023 season
#     games = gc.fetch_games_by_season(2023)
#     enriched = gc.enrich_game_data(games)
#     gc.save_games_to_file(enriched, "data/raw/games/2023_regular_season.json")

# with PlayerDataCollector() as pc:
#     # Collect all players
#     players = pc.collect_all_player_data()
#     
#     # Collect stats for 2023
#     pc.collect_season_stats(2023)

# with TeamDataCollector() as tc:
#     # Collect all teams
#     teams = tc.collect_all_team_data()
#     
#     # Generate season report
#     tc.generate_season_report(2023, games)

## Summary

You've learned how to:
- ✅ Fetch game data from the NBA API
- ✅ Fetch player statistics
- ✅ Fetch team information
- ✅ Enrich and save data
- ✅ Calculate team statistics and standings

Next steps:
- Check out the data processing notebook to learn about cleaning and feature engineering
- Run the collection scripts to build your dataset
- Explore the collected data

In [None]:
# Don't forget to close collectors!
game_collector.close()
player_collector.close()
team_collector.close()