# NCAA Soccer Player Analysis - Goals Per Game

This notebook explains how the `goals_per_game.py` script works. The script is designed to calculate average goals per game for NCAA soccer players using data from a CSV file.

## 1. Understanding the CSV Data Structure

First, let's examine the CSV file structure that stores our player data:

In [ ]:
import pandas as pd

# Load the CSV file
player_data = pd.read_csv('player_goals.csv')

# Display the structure
print("CSV File Structure:")
print(player_data.head(10))

# Basic information about the data
print("\nData Information:")
print(f"Number of rows: {len(player_data)}")
print(f"Columns: {', '.join(player_data.columns)}")
print(f"Unique players: {player_data['player_name'].nunique()}")
print(f"Total games: {player_data['game_id'].nunique()}")
print(f"Total goals: {player_data['goals'].sum()}")

## 2. Loading Data from CSV

The script uses a function to load data from the CSV file. Let's examine how this works:

In [ ]:
import os

def load_data_from_csv(csv_file):
    """
    Load player goals data from a CSV file.
    
    Args:
        csv_file (str): Path to the CSV file
        
    Returns:
        dict: Dictionary mapping player names to their goals data
    """
    if not os.path.exists(csv_file):
        print(f"Error: File {csv_file} not found")
        return {}
    
    player_data = {}
    
    df = pd.read_csv(csv_file)
    
    for _, player in df.groupby('player_name'):
        name = player['player_name'].iloc[0]
        goals_list = player['goals'].tolist()
        games_played = len(goals_list)
        
        player_data[name] = {
            'goals': goals_list,
            'games_played': games_played
        }
    
    return player_data

# Let's load the data and see what we get
csv_file = 'player_goals.csv'
player_data = load_data_from_csv(csv_file)

# Display the structured data
print("Processed Player Data:")
for player_name, data in player_data.items():
    print(f"\nPlayer: {player_name}")
    print(f"  Goals in each game: {data['goals']}")
    print(f"  Games played: {data['games_played']}")
    print(f"  Total goals: {sum(data['goals'])}")

## 3. Goals Per Game Calculation Function

Here's the core function that calculates the average goals per game:

In [ ]:
def calculate_goals_per_game(goals, games_played):
    """
    Calculate the average goals per game for a soccer player.
    
    Args:
        goals (list): A list of goals scored in each game
        games_played (int): Number of games played
        
    Returns:
        float: The average goals per game, or 0 if no games were played
        
    Example:
        >>> calculate_goals_per_game([1, 0, 2, 1], 4)
        1.0
    """
    if games_played == 0:
        return 0.0
    
    total_goals = sum(goals)
    return total_goals / games_played

# Calculate goals per game for each player
print("Goals Per Game Calculations:")
for player_name, data in player_data.items():
    goals = data['goals']
    games = data['games_played']
    avg = calculate_goals_per_game(goals, games)
    
    print(f"\nPlayer: {player_name}")
    print(f"  Goals: {goals}")
    print(f"  Games: {games}")
    print(f"  Average: {avg:.2f} goals per game")

## 4. Visualizing Player Statistics

Let's visualize the data to better understand player performance:

In [ ]:
import matplotlib.pyplot as plt
import numpy as np

# Calculate stats for each player
player_names = []
total_goals = []
avg_goals = []
games_played = []

for name, data in player_data.items():
    player_names.append(name)
    total_goals.append(sum(data['goals']))
    games_played.append(data['games_played'])
    avg_goals.append(calculate_goals_per_game(data['goals'], data['games_played']))

# Create a figure with multiple subplots
fig, axs = plt.subplots(2, 2, figsize=(14, 10))

# Plot 1: Total goals per player
axs[0, 0].bar(player_names, total_goals, color='blue')
axs[0, 0].set_title('Total Goals by Player')
axs[0, 0].set_ylabel('Total Goals')
axs[0, 0].set_xticklabels(player_names, rotation=45, ha='right')
axs[0, 0].grid(True, alpha=0.3)

# Plot 2: Games played per player
axs[0, 1].bar(player_names, games_played, color='green')
axs[0, 1].set_title('Games Played by Player')
axs[0, 1].set_ylabel('Games Played')
axs[0, 1].set_xticklabels(player_names, rotation=45, ha='right')
axs[0, 1].grid(True, alpha=0.3)

# Plot 3: Average goals per game
axs[1, 0].bar(player_names, avg_goals, color='red')
axs[1, 0].set_title('Average Goals per Game')
axs[1, 0].set_ylabel('Goals per Game')
axs[1, 0].set_xticklabels(player_names, rotation=45, ha='right')
axs[1, 0].grid(True, alpha=0.3)

# Plot 4: Goals per game by player (as line charts)
axs[1, 1].set_title('Goals in Each Game by Player')
axs[1, 1].set_xlabel('Game Number')
axs[1, 1].set_ylabel('Goals Scored')
axs[1, 1].grid(True, alpha=0.3)

colors = ['blue', 'orange', 'green', 'red', 'purple', 'brown']
for i, (name, data) in enumerate(player_data.items()):
    color = colors[i % len(colors)]
    games = range(1, len(data['goals']) + 1)
    axs[1, 1].plot(games, data['goals'], marker='o', label=name, color=color)

axs[1, 1].legend()

plt.tight_layout()
plt.show()

## 5. Pandas Analysis for Advanced Insights

We can use pandas to perform more advanced analysis on our player data:

In [ ]:
# Load the data
df = pd.read_csv('player_goals.csv')

# Create a player summary
player_summary = df.groupby('player_name').agg(
    total_goals=('goals', 'sum'),
    games_played=('game_id', 'count'),
    avg_goals=('goals', lambda x: x.mean())
).sort_values('avg_goals', ascending=False)

# Add consistency metrics (standard deviation of goals)
player_std = df.groupby('player_name')['goals'].std().fillna(0)
player_summary['consistency'] = player_std

# Display the summary
print("Player Performance Summary:")
print(player_summary)

# Identify top performers
print("\nTop Scorer (Total Goals):")
print(player_summary.sort_values('total_goals', ascending=False).head(1))

print("\nBest Average (Goals per Game):")
print(player_summary.sort_values('avg_goals', ascending=False).head(1))

print("\nMost Consistent Player (Lowest StdDev):")
print(player_summary[player_summary['games_played'] > 1].sort_values('consistency').head(1))

## 6. Full Python Script Breakdown

The complete `goals_per_game.py` script combines all of these elements:

1. **Imports**: Loads necessary libraries (pandas, rich)
2. **Data Loading**: Reads player data from a CSV file
3. **Data Processing**: Groups by player and calculates stats
4. **Calculation**: Computes goals per game average
5. **Visualization**: Displays results in a formatted table
6. **Fallback**: Includes example data if CSV isn't found

The script structure follows good practices:
- Modular functions with clear responsibilities
- Proper error handling for file operations
- Well-documented code with docstrings
- User-friendly output formatting

## 7. Summary and Next Steps

This notebook has explained how the `goals_per_game.py` script works with CSV data to calculate and display player statistics.

**Key concepts covered:**
- Reading data from CSV files with pandas
- Grouping and aggregating player statistics
- Calculating per-game averages
- Visualizing player performance
- Displaying formatted output with rich

**Potential next steps:**
- Add more advanced metrics (scoring rate, game impact)
- Implement filtering by game date or opponent
- Add team-level statistics
- Create visualizations for trends over time
- Add data validation for CSV input