# Soccer Analytics: Data Structures

This notebook explores Python's built-in data structures (lists, dictionaries, sets, and tuples) through the lens of soccer analytics. We'll implement and demonstrate functions that perform common data operations in sports analytics.

## 1. Unique Goal Scorers

This function extracts a list of unique goal scorers from a match report while preserving the order in which they first scored.

In [None]:
def unique_goal_scorers(match_report):
    """
    Extract a list of unique goal scorers from a match report while preserving the order.
    
    Args:
        match_report (list): A list of goal scorer names, which may contain duplicates
                           for players who scored multiple goals
        
    Returns:
        list: A list of unique goal scorers in the order they first scored
    """
    seen = set()  # Using a set for O(1) lookups
    unique_scorers = []
    for scorer in match_report:
        if scorer not in seen:
            seen.add(scorer)
            unique_scorers.append(scorer)
    return unique_scorers

In [None]:
# Example usage
match_report = ['Messi', 'Ronaldo', 'Messi', 'Neymar', 'Ronaldo']
print(f"Match report: {match_report}")
print(f"Unique scorers: {unique_goal_scorers(match_report)}")

## 2. Merge Player Statistics

This function merges player statistics from two different seasons, with the more recent season taking precedence.

In [None]:
def merge_player_stats(season1_stats, season2_stats):
    """
    Merge player statistics from two different seasons. 
    If a player appears in both seasons, use the most recent (season2) statistics.
    
    Args:
        season1_stats (dict): Player statistics from season 1
        season2_stats (dict): Player statistics from season 2
        
    Returns:
        dict: Merged player statistics with season2 taking precedence for players in both seasons
    """
    merged_stats = season1_stats.copy()  # Create a copy to avoid modifying the original
    merged_stats.update(season2_stats)   # Update with season2 stats, which takes precedence
    return merged_stats

In [None]:
# Example usage
season1 = {'Messi': {'goals': 30, 'assists': 10}}
season2 = {'Messi': {'goals': 25, 'assists': 15}, 'Ronaldo': {'goals': 28, 'assists': 5}}

print(f"Season 1 stats: {season1}")
print(f"Season 2 stats: {season2}")
print(f"Merged stats: {merge_player_stats(season1, season2)}")

## 3. Find Top Scorer

This function identifies the player who scored the most goals in a series of matches.

In [None]:
def find_top_scorer(match_goals):
    """
    Find the player who scored the most goals in a series of matches.
    If multiple players have the same highest number of goals, return any one of them.
    
    Args:
        match_goals (list): A list of goal scorers across multiple matches
        
    Returns:
        str: The name of the top goal scorer
    """
    from collections import Counter
    goal_counts = Counter(match_goals)  # Count occurrences of each player
    return max(goal_counts, key=goal_counts.get)  # Get player with maximum count

In [None]:
# Example usage
match_goals = ['Messi', 'Ronaldo', 'Messi', 'Neymar', 'Ronaldo', 'Messi']

from collections import Counter
goal_counts = Counter(match_goals)

print(f"All goals: {match_goals}")
print(f"Goal counts: {dict(goal_counts)}")
print(f"Top scorer: {find_top_scorer(match_goals)}")

## 4. Group Players by Position

This function groups a list of player dictionaries by their positions.

In [None]:
def group_players_by_position(players, position_key):
    """
    Group a list of player dictionaries by their position.
    
    Args:
        players (list): A list of player dictionaries
        position_key (str): The key in each dictionary that specifies the player's position
        
    Returns:
        dict: A dictionary where keys are positions and values are lists of players with that position
    """
    grouped = {}
    for player in players:
        position = player[position_key]
        if position not in grouped:
            grouped[position] = []
        grouped[position].append(player)
    return grouped

In [None]:
# Example usage
players = [
    {'name': 'Alisson', 'position': 'Goalkeeper'}, 
    {'name': 'Van Dijk', 'position': 'Defender'},
    {'name': 'Salah', 'position': 'Forward'},
    {'name': 'Robertson', 'position': 'Defender'}
]

grouped_players = group_players_by_position(players, 'position')

# Print each position group
for position, players_list in grouped_players.items():
    print(f"\n{position}:")
    for player in players_list:
        print(f"  - {player['name']}")

## 5. Total Tournament Goals

This function calculates the total number of goals across all divisions and matches in a tournament.

In [None]:
def total_tournament_goals(tournament_data):
    """
    Calculate the total number of goals across all divisions and matches in a tournament.
    The tournament data structure may contain nested lists and dictionaries.
    
    Args:
        tournament_data (list): A nested structure of tournament data
                              [division, [match, goals], ...]
        
    Returns:
        int: The total number of goals in the tournament
    """
    total_goals = 0
    for item in tournament_data:
        if isinstance(item, list):  # Check if item is a list (containing matches)
            for match in item:
                total_goals += match[1]  # Add the goals (second element in match list)
    return total_goals

In [None]:
# Example usage
tournament_data = [
    'Division 1', 
    [
        ['Match 1', 3],
        ['Match 2', 2]
    ],
    'Division 2',
    [
        ['Match 1', 1],
        ['Match 2', 4],
        ['Match 3', 2]
    ]
]

# Print tournament structure and goals by division
division_goals = {}
current_division = None

for item in tournament_data:
    if isinstance(item, str):  # This is a division name
        current_division = item
        division_goals[current_division] = 0
    elif isinstance(item, list) and current_division:  # These are matches
        for match in item:
            division_goals[current_division] += match[1]

print("Tournament Goals by Division:")
for division, goals in division_goals.items():
    print(f"{division}: {goals} goals")

print(f"\nTotal Tournament Goals: {total_tournament_goals(tournament_data)}")

## Summary

In this notebook, we've explored several Python data structures through the lens of soccer analytics:

1. **Sets and Lists**: Used to track unique goal scorers while preserving order
2. **Dictionaries**: Used to merge player statistics and group players by position
3. **Counter (from collections)**: Used to find the top scorer
4. **Nested Structures**: Used to process complex tournament data

These data structures are fundamental to sports analytics, allowing us to efficiently organize, process, and analyze player and match data.