Q3 - The GroupBy Gauntlet¶

Question: Welcome to the GroupBy Gauntlet! You are given a dataset of wacky wizard tournaments.
Each tournament has multiple rounds, and wizards earn points in each round.

Your task is to perform complex groupby operations to answer the following questions:

- Calculate the total points for each wizard across all tournaments.
- Identify the wizard with the highest average points per round.
- Determine the tournament with the highest total points.
- Find the wizard who won the most rounds (i.e., highest points in each round).
- Calculate the average points per round for each tournament.
- Determine the standard deviation of points for each wizard across all tournaments.
- Identify the top 3 wizards with the most consistent performance (lowest standard deviation in points).
- Calculate the cumulative points for each wizard across all tournaments over time.
- Find the round in each tournament with the highest average points scored.
- Determine the correlation between the number of rounds and total points scored for each wizard.

Datasets:

wizard_tournaments: Contains columns (tournament_id, round_id, wizard_name, points).

In [None]:
import pandas as pd
import numpy as np

# Seed for reproducibility
np.random.seed(42)

# Generate synthetic data
tournament_ids = np.arange(1, 6)
round_ids = np.arange(1, 11)
wizard_names = ['Merlin', 'Gandalf', 'Harry', 'Voldemort', 'Saruman', 'Dumbledore', 'Hermione', 'Ron']

data = []
for tournament in tournament_ids:
    for round_id in round_ids:
        for wizard in wizard_names:
            points = np.random.randint(0, 101)
            data.append([tournament, round_id, wizard, points])

# Create DataFrame
wizard_tournaments = pd.DataFrame(data, columns=['tournament_id', 'round_id', 'wizard_name', 'points'])

# Display the dataset
wizard_tournaments.head()

In [None]:
# Calculate the total points for each wizard across all tournaments.
total_points_wizard = wizard_tournaments.groupby('wizard_name')['points'].sum().reset_index()
total_points_wizard

In [None]:
# Identify the wizard with the highest average points per round
highest_avg_points = wizard_tournaments.groupby(['wizard_name'])['points'].mean().reset_index()
highest_avg_points.loc[highest_avg_points['points'].idxmax()]

In [None]:
# Determine the tournament with the highest total points
tournament_highest_points = wizard_tournaments.groupby(['tournament_id'])['points'].sum().reset_index()
tournament_highest_points.loc[tournament_highest_points['points'].idxmax()]

In [None]:
# Find the wizard who won the most rounds (i.e., highest points in each round)
round_winners = wizard_tournaments.loc[wizard_tournaments.groupby(['tournament_id', 'round_id'])['points'].idxmax()]
winning_wizard = round_winners['wizard_name'].value_counts().idxmax()
winning_wizard

In [None]:
# Calculate the average points per round for each tournament
avg_points_tournament = wizard_tournaments.groupby(['tournament_id'])['points'].mean().reset_index()
avg_points_tournament

In [None]:
# Determine the standard deviation of points for each wizard across all tournaments
std_dev_wizard = wizard_tournaments.groupby('wizard_name')['points'].std().reset_index()
std_dev_wizard['points'] = std_dev_wizard['points'].fillna(0)
std_dev_wizard

In [None]:
# Identify the top 3 wizards with the most consistent performance (lowest standard deviation in points)
most_consistent_wizards = std_dev_wizard.nsmallest(3, 'points')
most_consistent_wizards

In [None]:
# Calculate the cumulative points for each wizard across all tournaments over time
wizard_tournaments_sorted = wizard_tournaments.sort_values(['wizard_name', 'tournament_id', 'round_id'])
wizard_tournaments_sorted['cumulative_points'] = wizard_tournaments_sorted.groupby('wizard_name')['points'].cumsum()
wizard_tournaments_sorted[['wizard_name', 'tournament_id', 'round_id', 'cumulative_points']].head()

In [None]:
# Find the round in each tournament with the highest average points scored
avg_points_round = wizard_tournaments.groupby(['tournament_id', 'round_id'])['points'].mean().reset_index()
rounds_highest_avg = avg_points_round.loc[avg_points_round.groupby('tournament_id')['points'].idxmax()]
rounds_highest_avg