# **Table of Contents**

##### **IMPORTANT: Input Cleaned CSV, Shot CSV, Point CSV, and Point Json into "Cleaned" folder for the match in Google Drive**

### **Shot CSV:**
1. [Load in Data](#Load-Data)
2. [Manually Input Meta Data](#manually-input-meta-data)
- **You must manually type in values**
3. [Shot CSV Error Checks](#shot-csv-Error-Checks)
4. [Add Shot CSV Columns](#Add-Shot-CSV-Columns)
5. [Output Shot CSV](#Output-ShotCSV)

### **Point CSV:**
1. [Create Point CSV](#Create-PointCSV)
2. [Add Point CSV Columns](#add-point-csv-columns)
3. [Point CSV Error Checks](#point-csv-error-checks)
4. [Output Point CSV](#output-pointcsv)
- **Point CSV is for visuals** 
- **Point Json is for Upload to Website**

### **EDA:**
1. [Summary Stats](#Summary-stats)
2. [Serve and Return Stats](#serve-and-return-stats)
3. [Breakpoint Stats](#breakpoint-stats)
4. [Serve Win Percentage](#serve-win-percentage)
5. [Error Stats](#error-stats)

# **To Do:**
#### **Error Checks to Fix**
- [Check rows with mismatched serve in/serve zone](#Check-rows-with-mismatched-serve-in/serve-zone)


#### **Columns**
- [Change isApproach to be aggregated from Coordinate Data](#isapproach-column)
        - see if the next consecutive shotInRally coordinates are further up into the court
- [Depths Count (Short, Deep) Columns](#depths-count-short-deep-columns)
- [Add Columns from Leo (isLet, serverLocation, returnerLocation?)](#reorder-dataframe-for-output)

# **Load Data**

In [276]:
import pandas as pd
import numpy as np
import os 
import re

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Put your Uncleaned .csv file name here
your_file_name = 'https://docs.google.com/spreadsheets/d/1Te3dWUI4mpi52MPdcD4ti9zDKzll8wM3DalE2Pb1we0/export?format=csv'
# your_file_name = 'swingvision_SpencerJohnson_ArdaAzkara.csv' # if working from a local file
shot_data = pd.read_csv(your_file_name)
shot_data.shape

shot_data.columns

Index(['pointScore', 'gameScore', 'setScore', 'isPointStart', 'pointStartTime',
       'isPointEnd', 'pointEndTime', 'pointNumber', 'isBreakPoint',
       'shotInRally', 'side', 'serverName', 'serverFarNear', 'firstServeIn',
       'firstServeZone', 'firstServeXCoord', 'firstServeYCoord',
       'secondServeIn', 'secondServeZone', 'secondServeXCoord',
       'secondServeYCoord', 'isAce', 'shotContactX', 'shotContactY',
       'shotDirection', 'shotFhBh', 'isSlice', 'isVolley', 'isOverhead',
       'isApproach', 'isDropshot', 'isExcitingPoint', 'atNetPlayer1',
       'atNetPlayer2', 'isLob', 'shotLocationX', 'shotLocationY', 'isWinner',
       'isErrorWideR', 'isErrorWideL', 'isErrorNet', 'isErrorLong',
       'clientTeam', 'Date', 'Division', 'Event', 'lineupPosition',
       'matchDetails', 'matchVenue', 'opponentTeam', 'player1Name',
       'player2Name', 'player1Hand', 'player2Hand', 'Round', 'Surface',
       'Notes', 'isTopspin', 'isFlat', 'isKick'],
      dtype='object')

In [277]:
# your_file_name = '/Users/cjgimena/Desktop/Github/swingvision/points-2.csv'
# data = pd.read_csv(your_file_name)

# # Assuming df is your DataFrame
# desired_order = ['pointScore', 'gameScore', 'setScore', 'isPointStart', 'pointStartTime',
#                  'isPointEnd', 'pointEndTime', 'pointNumber', 'isBreakPoint',
#                  'shotInRally', 'side', 'serverName', 'serverFarNear', 'firstServeIn',
#                  'firstServeZone', 'firstServeXCoord', 'firstServeYCoord',
#                  'secondServeIn', 'secondServeZone', 'secondServeXCoord',
#                  'secondServeYCoord', 'isAce', 'shotContactX', 'shotContactY',
#                  'shotDirection', 'shotFhBh', 'isSlice', 'isVolley', 'isOverhead',
#                  'isApproach', 'isDropshot', 'isExcitingPoint', 'atNetPlayer1',
#                  'atNetPlayer2', 'isLob', 'shotLocationX', 'shotLocationY', 'isWinner',
#                  'isErrorWideR', 'isErrorWideL', 'isErrorNet', 'isErrorLong',
#                  'clientTeam', 'Date', 'Division', 'Event', 'lineupPosition',
#                  'matchDetails', 'matchVenue', 'opponentTeam', 'player1Name',
#                  'player2Name', 'player1Hand', 'player2Hand', 'Round', 'Surface',
#                  'Notes', 'isTopspin', 'isFlat', 'isKick']

# # Reorder the DataFrame columns
# data = data.reindex(columns=desired_order)
# data.head()

# # Output the DataFrame to a CSV file
# data.to_csv('points_kaylan.csv', index=False)


In [278]:
shot_data.head()

Unnamed: 0,pointScore,gameScore,setScore,isPointStart,pointStartTime,isPointEnd,pointEndTime,pointNumber,isBreakPoint,shotInRally,side,serverName,serverFarNear,firstServeIn,firstServeZone,firstServeXCoord,firstServeYCoord,secondServeIn,secondServeZone,secondServeXCoord,secondServeYCoord,isAce,shotContactX,shotContactY,shotDirection,shotFhBh,isSlice,isVolley,isOverhead,isApproach,isDropshot,isExcitingPoint,atNetPlayer1,atNetPlayer2,isLob,shotLocationX,shotLocationY,isWinner,isErrorWideR,isErrorWideL,isErrorNet,isErrorLong,clientTeam,Date,Division,Event,lineupPosition,matchDetails,matchVenue,opponentTeam,player1Name,player2Name,player1Hand,player2Hand,Round,Surface,Notes,isTopspin,isFlat,isKick
0,0-0,0-0,0-0,1.0,970,,,1,,1,Deuce,Player1,Near,1.0,T,-5.368734,241.474584,,,,,,17.144144,-441.747539,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,Spencer Johnson,Arda Azkara,,,,,,,,
1,0-0,0-0,0-0,,1730,1.0,1730.0,1,,2,Deuce,Player1,Near,,,,,,,,,,-29.728527,522.327575,Crosscourt,Forehand,,,,,,,,,,47.52693,329.740841,,,,1.0,,,,,,,,,,,,,,,,,,1.0,
2,15-0,0-0,0-0,1.0,18469,,,2,,1,Ad,Player1,Near,1.0,Wide,123.82161,190.280424,,,,,,-63.999896,-476.441902,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,15-0,0-0,0-0,,19370,1.0,19370.0,2,,2,Ad,Player1,Near,,,,,,,,,,196.340622,480.667968,Down the Line,Forehand,1.0,,,,,,,,,110.142558,262.003746,,,,1.0,,,,,,,,,,,,,,,,,,,
4,30-0,0-0,0-0,1.0,34330,,,3,,1,Deuce,Player1,Near,1.0,Wide,-106.626635,203.142771,,,,,,17.227204,-452.204134,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


# **Manually Input Meta Data**

In [279]:
# Fill in meta data
shot_data['clientTeam'] = 'UCLA'
shot_data['Date'] = '10/6/25'
shot_data['Division'] = ''
shot_data['Event'] = 'LWH Solano Challenger'
shot_data.loc[0, 'lineupPosition'] = ''                 # use.loc[0, .] to only assign to first row
shot_data.loc[0, 'matchDetails'] = ''
shot_data['matchVenue'] = ''
shot_data['opponentTeam'] = 'TCU'
shot_data['player1Name'] = 'Emon van Loben Sels'
shot_data['player2Name'] = 'Roger Pascual Ferra'
shot_data['player1Hand'] = 'Right'
shot_data['player2Hand'] = 'Left'
shot_data['Round'] = ''
shot_data['Surface'] = 'Hard'
shot_data.loc[0, 'Notes'] = ''

  shot_data.loc[0, 'lineupPosition'] = ''                 # use.loc[0, .] to only assign to first row
  shot_data.loc[0, 'matchDetails'] = ''
  shot_data.loc[0, 'Notes'] = ''


##### Set 'is_ad_match' to True if match has Ad scoring (non regular season match)

In [280]:
is_ad_match = True

## Assign Player Names

In [281]:
player1_name = shot_data['player1Name'].iloc[0]
player2_name = shot_data['player2Name'].iloc[0]
p = shot_data['serverName'].unique()

if (player1_name == None) or (player1_name == ''):
    raise ValueError('player1Name is blank')

if (player2_name == None) or (player2_name == ''):
    raise ValueError('player1Name is blank')

# Assign values in the serverName column
shot_data['serverName'] = shot_data['serverName'].replace({'Player1': player1_name, 'Player2': player2_name})

if not len(shot_data['serverName'].unique()) == 2:
    print(shot_data['serverName'].unique())
    raise ValueError('Unkown Name!')

print('Check Passed ✓')
shot_data['serverName'].unique()

Check Passed ✓


array(['Emon van Loben Sels', 'Roger Pascual Ferra'], dtype=object)

# **Shot CSV Error Checks**

#### Check for missing player1Hand and player2Hand

In [282]:
player1_hand =  shot_data.at[0, 'player1Hand']
player2_hand =  shot_data.at[0, 'player2Hand']

if (player1_hand == None) or (player1_hand == ''):
    raise ValueError('player1Hand has no value!')

if (player2_hand == None) or (player2_hand == '') == None:
    raise ValueError('player2Hand has no value!')

print('Check Passed ✓')

Check Passed ✓


#### Check for missing 'Deuce' or 'Ad' sides in side column

In [283]:
# Dataframe of all missing sides
missing_sides = shot_data[~shot_data['side'].isin(['Deuce', 'Ad'])][['pointScore', 'gameScore', 'setScore','side', 'shotInRally']]

if len(missing_sides) > 0:
    display(missing_sides)
    raise ValueError('Missing Deuce or Ad value in sides Column')
else: print('Check Passed ✓')

Check Passed ✓


####  Check pointNumber increases consecutively

If this check fails:
- swingvision data: check score columns (most likely incorrect) and check how the point ends columns 
(isErrorLong, isErrorWideR, isErrorWideL, isErrorNet likely marked wrong)
- add a helper column in the google sheets and use a formula to correct column (e.g. '=I51+1' and drag down)

#### If there are too many incorrect pointNumbers that would be too manually tedious to fix:

<span style="color: red;">Fix all points FIRST (IMPORTANT!!!) aka run through ALL error checks below</span>
- Most important error checks involve 'isPointStart' column, as that's how we'll fill pointNumber column
- Then delete all values from pointNumber column to let pointNumber fill (ONLY AFTER RUNNING THROUGH ALL ERROR CHECKS BELOW!!!)
 

In [None]:
# Fill in pointNumber if blank -- AFTER FIXING 'isPointStart' error checks
if 'pointNumber' not in shot_data.columns or shot_data['pointNumber'].isnull().any():
    point_starts = (shot_data['isPointStart'] == 1)
    shot_data['pointNumber'] = point_starts.cumsum()

# Check if pointNumber incresase consecutively
point_numbers = shot_data['pointNumber'].unique()
non_consecutive = [point_numbers[i] for i in range(1, len(point_numbers)) if point_numbers[i] != point_numbers[i-1] + 1]

if non_consecutive:
    non_consecutive_rows = shot_data[shot_data['pointNumber'].isin(non_consecutive)]
    display(non_consecutive_rows[['pointScore', 'gameScore', 'setScore', 'pointNumber', 'pointStartTime']])
    print(non_consecutive_rows['pointNumber'].unique())
    raise ValueError(f"Non-consecutive point numbers found: {list(non_consecutive_rows['pointNumber'].unique())}")
else: print('Check Passed ✓')

Check Passed ✓


### Check for ServerName not alternating
- WARNING: Needs to accomodate for 6-6 (tiebreakers)
- Ignore 6-6 games for now

In [285]:
shot_data['gameNumber'] = (shot_data['gameScore'] != shot_data['gameScore'].shift()).cumsum()



def find_bad_games(shot_data):
    bad_games = []  # Initialize an empty list
    games = shot_data['gameNumber'].unique()

    for i in games:
        df = shot_data[shot_data['gameNumber'] == i]

        # Check if there are multiple unique serverNames within the game
        if df['serverName'].nunique() > 1:
            bad_games.append(i)

    return bad_games  # Return bad games at the end

# Call the function
bad_games = find_bad_games(shot_data)
print(bad_games)  # Output the bad games list


display(shot_data[shot_data['gameNumber'].isin(bad_games)][['pointScore', 'gameScore', 'setScore', 'pointNumber', 'pointStartTime']])

[]


Unnamed: 0,pointScore,gameScore,setScore,pointNumber,pointStartTime


#### Check for NA values
- all should be 0, except if there's a tiebreak then missing pointScore should match amount of tiebreak shots

In [286]:
# Count empty strings in each column
empty_string_counts = ((shot_data.isna()).sum())
non_zero_counts = empty_string_counts[(empty_string_counts == shot_data.shape[0])]

na_counts = shot_data[[ 'pointScore', 'shotInRally', 'gameScore', 'setScore', 'side', 'serverName']].isna().sum()
has_na = shot_data[['pointScore', 'shotInRally', 'gameScore', 'setScore', 'side', 'serverName']].isna().any().any()

if has_na:
    display(na_counts)
    raise ValueError('Empty of cells in these columns')
else: 
    print('Check Passed ✓')
    display(non_zero_counts)

Check Passed ✓


isApproach         392
isDropshot         392
isExcitingPoint    392
atNetPlayer1       392
atNetPlayer2       392
isLob              392
dtype: int64

#### Check that score doesn't have incorrect date format
- accounts for all variations of dates eg. (0-00-0000, 0/0/0000)

In [287]:
# Make Jan-00 back into 1-0 for Game/Set Score
# Make Scores Strings not Date Time
columns_to_convert = ['gameScore', 'setScore']
shot_data[columns_to_convert] = shot_data[columns_to_convert].astype(object)

# Define a mapping for month abbreviations
month_mapping = {'Jan': '1', 'Feb': '2', 'Mar': '3', 'Apr': '4', 'May': '5', 'Jun': '6',
                 'Jul': '7', 'Aug': '8', 'Sep': '9', 'Oct': '10', 'Nov': '11', 'Dec': '12'}

# Function to convert string like 'Jan-00' to '1-0'
def convert_score_string(score_str):
    # Check if the string has a month abbreviation and a year ending with '00'
    if re.match(r'^\d{1,2}-[A-Za-z]{3}$', score_str):
        # Extract year and month abbreviation
        year, month = score_str.split('-')

        # Remove leading zeros from the year
        year = str(int(year))

        # Replace month abbreviation with corresponding number
        month_number = month_mapping.get(month, month)

        # Concatenate the parts to form the transformed string
        transformed_str = f'{year}-{month_number}'
        return transformed_str

    # Check if the string has a month abbreviation and a year with leading '0's
    elif re.match(r'^[A-Za-z]{3}-\d{1,2}$', score_str):
        # Extract month abbreviation and year
        month, year = score_str.split('-')

        # Replace month abbreviation with corresponding number
        month_number = month_mapping.get(month, month)

        # Remove leading zeros from the year
        year = str(int(year))

        # Concatenate the parts to form the transformed string
        transformed_str = f'{month_number}-{year}'
        return transformed_str

    # Check if the string has a date in the format 'month/day/year'
    elif re.match(r'^\d{1,2}/\d{1,2}/\d{4}$', score_str):
        # Extract month, day, and year
        month, day, year = score_str.split('/')

        # Remove leading zeros from month and day
        month = str(int(month))
        day = str(int(day))

        # Concatenate the parts to form the transformed string
        transformed_str = f'{month}-{day}'
        return transformed_str

    # Check if the string has a date in the format 'month-day-year'
    elif re.match(r'^\d{1,2}-\d{1,2}-\d{4}$', score_str):
        # Extract month, day, and year
        month, day, year = score_str.split('-')

        # Remove leading zeros from month and day
        month = str(int(month))
        day = str(int(day))

        # Concatenate the parts to form the transformed string
        transformed_str = f'{month}-{day}'
        return transformed_str

    return score_str

# Apply the conversion function to the relevant columns in shot_data
shot_data['gameScore'] = shot_data['gameScore'].apply(convert_score_string)
shot_data['setScore'] = shot_data['setScore'].apply(convert_score_string)

####  Check for missing shotInRally rows

In [288]:
# All rows of missing shotInRally
empty_shot_rows = shot_data[shot_data['shotInRally'].isnull()]

if not empty_shot_rows.empty:
    display(empty_shot_rows[['pointScore', 'gameScore', 'setScore', 'pointStartTime', 'shotInRally']])
else:
    print('Check Passed ✓')


Check Passed ✓


#### Check missing isPointStart and isPointEnd

In [None]:
missing_point_start = list()
missing_point_end = list()
point_number = list()

for i in point_numbers:
    current_df = shot_data[shot_data['pointNumber'] == i].reset_index(drop=True)
    if not current_df.loc[0,'isPointStart'] == 1:
        # Find the shot_data index of the missing isPointStart row
        index_start = shot_data.index[(shot_data['pointNumber'] == i) & (shot_data['shotInRally'] == 1)][0]
        missing_point_start.append(index_start)
        point_number.append(i)

    if not current_df.loc[len(current_df) - 1,'isPointEnd'] == 1:
        # Find last shotInRally of current_df
        last_rally = current_df['shotInRally'].unique()[-1]
        index_end = shot_data.index[(shot_data['pointNumber'] == i) & (shot_data['shotInRally'] == last_rally)][0]
        missing_point_end.append(index_end)
        point_number.append(i)

if (len(missing_point_start) > 0) or (len(missing_point_end) > 0):
    print('Number of rows with isPointStart = 1:', len(missing_point_start))
    print('Number of rows with isPointEnd = 1:', len(missing_point_end), '\n')
    print('Missing isPointStart rows:')
    display(shot_data.loc[missing_point_start])
    print('Missing isPointEnd rows:')
    display(shot_data.loc[missing_point_end])
    raise ValueError(f"Manually check points: {sorted(set(point_number))}")

print('Check Passed ✓')

Check Passed ✓


#### Check all rows where isPointStart does not start at beginning of rally

In [290]:
filtered_rows = shot_data[(shot_data['isPointStart'] == 1) & (shot_data['shotInRally'] != 1)]

if filtered_rows.empty:
    print('Check Passed ✓')
else:
    display(filtered_rows)
    raise ValueError('Rows where isPointStart = 1 and shotInRally != 1')

Check Passed ✓


#### Find the rows where isPointEnd = 1 and shotInRally = 1 but is neither an ace or a double fault

In [291]:
filtered_rows = shot_data[
    (shot_data['isPointEnd'] == 1) & 
    (shot_data['shotInRally'] == 1) & 
    (shot_data['firstServeIn'] != 0) &
    (shot_data['secondServeIn'] != 0) &
    (shot_data['isAce'] != 1)
]

if filtered_rows.empty:
    print('Check Passed ✓')
else:
    display(filtered_rows)
    raise ValueError('Rows where isPointEnd = 1, shotInRally = 1, firstServeIn and secondServeIn are both not 0, and isAce != 1')

Check Passed ✓



#### Check rows where there are duplicate isPointStart = 1 and isPointEnd points
- Ignores 40-40 and Ad Scoring if is_ad_match = True

In [292]:
if is_ad_match:
    point_filter = ~shot_data['pointScore'].astype(str).str.contains('Ad|40-40', case=False, na=False)
else:
    point_filter = True  

# Output rows where isPointStart is 1 and pointScore, gameScore, and setScore have the same value
filtered_rows = shot_data[(shot_data['isPointStart'] == 1) & point_filter]
output_rows = filtered_rows[filtered_rows.duplicated(subset=['pointScore', 'gameScore', 'setScore'], keep=False)]

if not output_rows.empty:
    display(output_rows)
    raise ValueError("Rows where isPointStart is 1 and pointScore, gameScore, and setScore have the same value (conditional Ad/40-40 filter applied):")

# Output rows where isPointEnd is 1 and pointScore, gameScore, and setScore have the same value
filtered_rows = shot_data[(shot_data['isPointEnd'] == 1) & point_filter]
output_rows = filtered_rows[filtered_rows.duplicated(subset=['pointScore', 'gameScore', 'setScore'], keep=False)]

if not output_rows.empty:
    display(output_rows[['pointScore', 'gameScore', 'setScore', 'isPointStart']])
    raise ValueError("Rows where isPointEnd is 1 and pointScore, gameScore, and setScore have the same value (conditional Ad/40-40 filter applied):")

print('Check Passed ✓')


Check Passed ✓


#### Check for same amount of isPointStart and isPointEnd

In [293]:
# Count of isPointStart and isPointEnd
num_point_start = shot_data['isPointStart'].sum()
num_point_end = shot_data['isPointEnd'].sum()

if num_point_start != num_point_end:
    print("Number of rows with isPointStart = 1:", num_point_start)
    print("Number of rows with isPointEnd = 1:", num_point_end)
    raise ValueError("Error: count of isPointStart = 1 and isPointEnd = 1 are not the same.")
else: print('Check Passed ✓')

Check Passed ✓


#### Check for incorrect game and set scores

In [294]:
# Assuming shot_data is your DataFrame
set_scores = shot_data['setScore'].unique()
game_scores = shot_data['gameScore'].unique()

print("Unique Set Scores:\n", set_scores)
print("Unique Game Scores:\n", game_scores)

Unique Set Scores:
 ['0-0' '1-0']
Unique Game Scores:
 ['0-0' '1-0' '1-1' '1-2' '2-2' '3-2' '4-2' '5-2' '5-3' '0-1' '2-1' '3-1'
 '4-1' '5-1']


#### Check if shotInRally is duplicated or consectively increasing

In [295]:
if is_ad_match:
    point_filter = ~shot_data['pointScore'].astype(str).str.contains('Ad|40-40', case=False, na=False)
else:
    point_filter = True 
filtered_data = shot_data[point_filter]

# Check if shotInRally is not duplicated
output_rows = filtered_data[filtered_data.duplicated(
    subset=['pointScore', 'gameScore', 'setScore', 'shotInRally'], keep=False
)]['pointNumber'].unique().tolist()

rows = filtered_data[filtered_data.duplicated(
    subset=['pointScore', 'gameScore', 'setScore', 'shotInRally'], keep=False
)]

if len(output_rows) > 0:
    display(rows)
    raise ValueError(f'Duplicated shotInRally rows! \nCheck pointNumber(s): {output_rows}')

# Check if shotInRally is consecutively increasing
shotInRally_error = list()

# Only consider pointNumbers in filtered data
point_numbers = filtered_data['pointNumber'].unique().tolist()

for i in point_numbers:
    current_df = filtered_data[filtered_data['pointNumber'] == i]
    # Check if shotInRally in current pointNumber is strictly increasing
    if not (current_df['shotInRally'].diff().dropna() > 0).all():
        shotInRally_error.append(i)

if len(shotInRally_error) > 0:
    raise ValueError(f'shotInRally not consecutively increasing! \nCheck pointNumber(s): {shotInRally_error}')

print('Check Passed ✓')


Check Passed ✓


#### Check all the rows where isPointEnd != 1 and there is  isWinner, isErrorWideL, isErrorWideR, isErrorNet, isErrorLong

In [296]:
point_error = shot_data[(shot_data['isPointEnd'] != 1) & 
                        (shot_data['isPointStart'] != 1) & # Added isPointStart: swingvision marks isErrorNet = 1 for serves
                        ((shot_data['isWinner'] == 1) | 
                         (shot_data['isErrorNet'] == 1) | 
                         (shot_data['isErrorLong'] == 1) |
                         (shot_data['isErrorWideL'] == 1) |
                         (shot_data['isErrorWideR'] == 1))]

point_error_numbers = point_error['pointNumber'].to_list()

if len(point_error) > 0:
    display(point_error)
    raise ValueError('Manually check points', point_error_numbers)

print('Check Passed ✓')


Check Passed ✓


#### Check all the rows where there is isPointEnd == 1 but there is no isWinner, isErrorWideL, isErrorWideR, isErrorNet, isErrorLong
- Cj recommendation: have this error check automatically fill in how the point ends based on coordinate data

In [297]:
point_error = shot_data[(shot_data['isPointEnd'] == 1) &
                          (shot_data['isWinner'] != 1) &
                          (shot_data['isErrorWideL'] != 1) &
                          (shot_data['isErrorWideR'] != 1) &
                          (shot_data['isErrorNet'] != 1) & 
                          (shot_data['isErrorLong'] != 1) &
                          (shot_data['firstServeIn'] != 0) & 
                          (shot_data['secondServeIn'] != 0)]

point_error_numbers = point_error['pointNumber'].to_list()

if point_error.empty:
    print('Check Passed ✓')
else:
    display(point_error)
    raise ValueError('Manually check points', point_error_numbers)

Check Passed ✓


#### Check rows with mismatched serve in/serve zone
- **NEED TO DO: FIX, logic is wrong with "x+2"**

In [298]:
# Find indices of all rows with firstServeIn data
first_serve_row = shot_data.index[shot_data['firstServeIn'].notnull()]
# Find indices of all rows with firstServeZone data
first_serve_placement_row = shot_data.index[shot_data['firstServeZone'].notnull()]
# Check which indices are not found in one column but are found in the other
mismatched_first_serve_rows = list(set(first_serve_row).difference(first_serve_placement_row))
# Add 2 to the indices to match numbering of Google Sheets
mismatched_first_serve_rows = [x+2 for x in mismatched_first_serve_rows]
if not mismatched_first_serve_rows:
    print("Check passed for first serves.")
else:
    print("Rows where first serve in and first serve zone are not found together: " + str(mismatched_first_serve_rows))

second_serve_row = shot_data.index[shot_data['secondServeIn'].notnull()]
second_serve_placement_row = shot_data.index[shot_data['secondServeZone'].notnull()]
mismatched_second_serve_rows = list(set(second_serve_row).difference(first_serve_placement_row))
mismatched_second_serve_rows = [x+2 for x in mismatched_second_serve_rows]
if not mismatched_second_serve_rows:
    print("Check passed for second serves.")
else:
    print("Rows where second serve in and second serve zone are not found together: " + str(mismatched_second_serve_rows))

Check passed for first serves.
Rows where second serve in and second serve zone are not found together: [354, 107, 44, 245]


#### Check all points where double fault occurs (firstServeIn == 0 & secondServeIn == 0) but len(shotInRally) > 1

In [299]:
double_fault_error = shot_data[
    (shot_data['firstServeIn'] == 0) &
    (shot_data['secondServeIn'] == 0) &
    (shot_data['shotInRally'] > 1)
]

double_fault_error_numbers = double_fault_error['pointNumber'].to_list()

if len(double_fault_error) > 0:
    display(double_fault_error)
    raise ValueError('Manually check points', double_fault_error_numbers)

print('Check Passed ✓')


Check Passed ✓


#### Check all the points where everytime the server changes, the first pointScore should be "0-0". If not output error
- **NEED TO DO: Check incorrect scoring**
- **swing_vision Govind Nanda vs Cooper Williams (Harvard) row 380**

In [300]:
shot_data['server_changed'] = shot_data['serverName'] != shot_data['serverName'].shift(1)

incorrect_scoring = shot_data[
    (shot_data['server_changed']) & 
    ((shot_data['pointScore'] != "0-0") |
    (shot_data['shotInRally'] != 1)) &
    (shot_data.index != 0)  # skip first point overall
]

if (len(incorrect_scoring) > 0):
    display(incorrect_scoring)
    raise ValueError('Manually check points', incorrect_scoring['pointNumber'].to_list())


print("Check Passed ✓")

Check Passed ✓


##### Check all cases where server should've changed at new game (changed gameScore) but did not

- These may indicate errors in gameScores. Please rewatch match video to confirm and make changes as necessary.

In [301]:
# Check where a new game starts but the server doesn't change
wrong_server_cases = shot_data[
    (shot_data['gameScore'] != shot_data['gameScore'].shift(1)) &
    (shot_data['serverName'] == shot_data['serverName'].shift(1)) &
    (shot_data.index != 0)  # ignore very first point
]

# Raise an error if any found
if not wrong_server_cases.empty:
    display(wrong_server_cases)
    raise ValueError(
        f"Incorrect server assignment at game start: {wrong_server_cases['pointNumber'].tolist()}"
    )
else:
    print("Check Passed ✓")


Check Passed ✓


# **Add Shot CSV Columns**

#### tiebreakScore Column

In [302]:
def reverse_point_score(score):
    if '-' in score:
        parts = score.split('-')
        return '-'.join(parts[::-1])
    return score

if 'tiebreakScore' not in shot_data.columns or shot_data['tiebreakScore'].isnull().any():
    shot_data.loc[shot_data['gameScore'] == '6-6', 'tiebreakScore'] = shot_data['pointScore']
    # Apply reverse_point_score function where serverName is 'Player2' and tiebreakScore is not NaN
    shot_data.loc[
    (shot_data['serverName'] == player2_name) & (shot_data['tiebreakScore'].notna()), 
    'tiebreakScore'] = shot_data.loc[
    (shot_data['serverName'] == player2_name) & (shot_data['tiebreakScore'].notna()), 
    'tiebreakScore'
    ].apply(reverse_point_score)

# Set the pointScore to NaN where tiebreakScore is not NaN
shot_data.loc[pd.notna(shot_data['tiebreakScore']), 'pointScore'] = np.nan


#### returnerName Column

In [303]:
def get_returner_name(server_name):
    return player2_name if server_name == player1_name else player1_name

shot_data['returnerName'] = shot_data['serverName'].apply(get_returner_name)
print(f"Player 1 = {player1_name}, Player 2 = {player2_name}")

Player 1 = Emon van Loben Sels, Player 2 = Roger Pascual Ferra


#### shotHitBy Column

In [304]:
shot_data['shotHitBy'] = shot_data.apply(lambda row: row['serverName'] if row['shotInRally'] % 2 == 1 else row['returnerName'], axis=1)

#### isInsideOut and isInsideIn Column

In [305]:
shot_data['InsideOut'] = None
shot_data['InsideIn'] = None

def inside_out(hit_by, side, fhbh, direction):
    if hit_by == player1_name:
        player_hand = player1_hand
    else:
        player_hand = player2_hand

    if player_hand == "Right":
        if side == "Deuce" and fhbh == "Backhand" and direction == "Crosscourt":
            return 1
        elif side == "Ad" and fhbh == "Forehand" and direction == "Crosscourt":
            return 1
    else:
        if side == "Ad" and fhbh == "Backhand" and direction == "Crosscourt":
            return 1
        elif side == "Deuce" and fhbh == "Forehand" and direction == "Crosscourt":
            return 1        
            
def inside_in(hit_by, side, fhbh, direction):
    if hit_by == player1_name:
        player_hand = player1_hand
    else:
        player_hand = player2_hand

    if player_hand == "Right":
        if side == "Deuce" and fhbh == "Backhand" and direction == "Down the Line":
            return 1
        elif side == "Ad" and fhbh == "Forehand" and direction == "Down the Line":
            return 1
    else:
        if side == "Ad" and fhbh == "Backhand" and direction == "Down the Line":
            return 1
        elif side == "Deuce" and fhbh == "Forehand" and direction == "Down the Line":
            return 1        

shot_data['InsideOut'] = shot_data.apply(lambda x: inside_out(x['shotHitBy'], x['side'], x['shotFhBh'], x['shotDirection']), axis = 1)
shot_data['InsideIn'] = shot_data.apply(lambda x: inside_in(x['shotHitBy'], x['side'], x['shotFhBh'], x['shotDirection']), axis = 1)

#### isAce Column

In [306]:
# Add the Ace column
shot_data['isAce'] = None

for index, row in shot_data.iterrows():
    if row['isPointEnd'] == 1:
        if row['shotInRally'] == 1: # last point is serve
            if (row['firstServeIn'] == 1 or row['secondServeIn'] == 1): # either first or second serve went in
                shot_data.at[index, 'isAce'] = 1

#### isDoubleFault Column

In [307]:
# Add the DoubleFault column
shot_data['isDoubleFault'] = None

for index, row in shot_data.iterrows():
    if row['isPointEnd'] == 1:
        if row['shotInRally'] == 1: # last point is serve
            if (row['firstServeIn'] != 1 and row['secondServeIn'] != 1): # either first or second serve went in
                shot_data.at[index, 'isDoubleFault'] = 1

#### pointWonBy and lastShotError Columns

In [308]:
# Add the 'pointWonBy' column
shot_data['pointWonBy'] = None

# Add the 'lastShotError' column
shot_data['lastShotError'] = None

for index, row in shot_data.iterrows():
    if row['isPointEnd'] == 1:
        if row['shotInRally'] == 1: # last point is serve
            if row['isAce'] == 1: 
                shot_data.at[index, 'pointWonBy'] = row['serverName']
            elif row['isDoubleFault'] == 1: 
                shot_data.at[index, 'pointWonBy'] = row['returnerName']

                
        elif row['shotInRally'] != 1:
            if row['isErrorWideR'] == 1 or row['isErrorWideL'] == 1 or row['isErrorNet'] == 1 or row['isErrorLong'] == 1: # if error
                shot_data.at[index, 'lastShotError'] = 1
                
                if row['shotInRally'] % 2 == 0:
                    shot_data.at[index, 'pointWonBy'] = row['serverName']
                else:
                    shot_data.at[index, 'pointWonBy'] = row['returnerName']
        
            elif row['isWinner'] == 1:
                if row['shotInRally'] % 2 == 0:
                    shot_data.at[index, 'pointWonBy'] = row['returnerName']
                else:
                    shot_data.at[index, 'pointWonBy'] = row['serverName']

# Backward fill pointWonBy
shot_data['pointWonBy'].bfill();

#### serveResult and serveInPlacement Columns

In [309]:
conditions = [
    (shot_data['isPointStart'] == 1) & (shot_data['firstServeIn'] == 1),
    (shot_data['isPointStart'] == 1) & (shot_data['firstServeIn'] != 1) & (shot_data['secondServeIn'] == 1),
    (shot_data['isPointStart'] == 1) & (shot_data['firstServeIn'] != 1) & (shot_data['secondServeIn'] != 1),]

# Define the values to be assigned for each condition
values_result = ['1st Serve In', '2nd Serve In', 'Double Fault']
values_placement = [shot_data['firstServeZone'], shot_data['secondServeZone'], shot_data['secondServeZone']]

# Use numpy.select to assign values based on conditions
shot_data['serveResult'] = np.select(conditions, values_result, default='')
shot_data['serveInPlacement'] = np.select(conditions, values_placement, default='')

In [310]:
shot_data.replace('', None, inplace=True)

#### depth Column

In [311]:
def depth_metric(shotInRally, x, y, side):
    
    if (x >= -157.5) & (x <= 157.5):
    
        if side == 'Near':
            if shotInRally % 2 == 0:
                if -455 < y < -350: return 'Deep'
                if -350 < y < 0: return 'Short'
                if y < -455: return 'Long'

            elif shotInRally % 2 == 1:
                if 455 > y > 350: return 'Deep'
                if 0 < y < 350: return 'Short'
                if y > 455: return 'Long'

        elif side == 'Far':
            if shotInRally % 2 == 1:
                if -455 < y < -350: return 'Deep'
                if -350 < y < 0: return 'Short'
                if y < -455: return 'Long'

            elif shotInRally % 2 == 0:
                if 455 > y > 350: return 'Deep'
                if 0 < y < 350: return 'Short'
                if y > 455: return 'Long'
        
shot_data['depth'] = shot_data.apply(lambda x: depth_metric(x['shotInRally'], x['shotLocationX'], x['shotLocationY'], x['serverFarNear']), axis=1)

#### atNetPlayer1 and atNetPlayer2 Columns

In [312]:
shot_data['atNetPlayer1'] = None
shot_data['atNetPlayer2'] = None


# Define the criteria for being at the player's net
def is_at_player_net(x, y):
    return 1 if -245 <= y <= 245 and -157.5 <= x <= 157.5 else ''

player1Name = shot_data['player1Name'].loc[0] 
player2Name = shot_data['player2Name'].loc[0]

# Apply the criteria based on the serverName
shot_data.loc[shot_data['shotHitBy'] == player1Name, 'atNetPlayer1'] = shot_data.apply(lambda row: is_at_player_net(row['shotContactX'], row['shotContactY']), axis=1)
shot_data.loc[shot_data['shotHitBy'] == player2Name, 'atNetPlayer2'] = shot_data.apply(lambda row: is_at_player_net(row['shotContactX'], row['shotContactY']), axis=1)

# **Output ShotCSV**

In [313]:
player1NameNoSpace = shot_data.iloc[0]['player1Name'].replace(" ", "")
player2NameNoSpace = shot_data.iloc[0]['player2Name'].replace(" ", "")

# Save csv
shot_data.to_csv(f'Shot_Visuals_{player1NameNoSpace}_{player2NameNoSpace}.csv', index=False)

# **Create PointCSV**

In [314]:
# Creating point_df (with only 1 row for each pointNumber)
point_df = shot_data.drop_duplicates(subset='pointNumber')[['pointNumber']]

# **Add Point CSV Columns**

In [315]:
#### player1Name and player2Name Columns

In [316]:
# Extract the first value of player1Name and player2Name from shot_data
player1_name = shot_data['player1Name'].iloc[0]
player2_name = shot_data['player2Name'].iloc[0]

# Fill in the first value into all rows of point_df['player1Name'] and point_df['player2Name']
point_df['player1Name'] = player1_name
point_df['player2Name'] = player2_name

#### Scores Columns

In [317]:
point_df['pointScore'] = shot_data.groupby('pointNumber')['pointScore'].first().values
point_df['gameScore'] = shot_data.groupby('pointNumber')['gameScore'].first().values
point_df['setScore'] = shot_data.groupby('pointNumber')['setScore'].first().values
point_df['tiebreakScore'] = shot_data.groupby('pointNumber')['tiebreakScore'].first().values

#### side Column

In [318]:
# Group shot_data by 'pointNumber' and get the first 'side' value for each group
side_values = shot_data.groupby('pointNumber')['side'].first().reset_index()
point_df['side'] = side_values['side'].values

#### serverName, returnerName, and team Columns

In [319]:
# Adds Server and Returner Names and pointScore
point_df['serverName'] = shot_data.groupby('pointNumber')['serverName'].first().values
point_df['returnerName'] = shot_data.groupby('pointNumber')['returnerName'].first().values

client_team_value = shot_data.loc[0, 'clientTeam']
opponent_team_value = shot_data.loc[0, 'opponentTeam']

point_df['clientTeam'] = client_team_value
point_df['opponentTeam'] = opponent_team_value

#### pointStartTime, pointEndPosition, and Duration Columns

In [320]:
# Add Start and End times per point
for index, row in shot_data.iterrows():
    point_number = row['pointNumber']
    
    if row['isPointStart'] == 1:
        point_df.loc[point_df['pointNumber'] == point_number, 'Position'] = row['pointStartTime']
    if row['isPointEnd'] == 1:
        point_df.loc[point_df['pointNumber'] == point_number, 'pointEndPosition'] = row['pointEndTime']

# Add Duration
point_df['Duration'] = point_df['pointEndPosition'] - point_df['Position']

#### rallyCount Column

In [321]:
# Find the highest shotInRally for each pointNumber in shot_data
max_rally_per_point = shot_data.groupby('pointNumber')['shotInRally'].max().reset_index()
point_df['rallyCount'] = list(max_rally_per_point['shotInRally'])

# Add 'rallyCountFreq' column 
point_df['rallyCountFreq'] = point_df['rallyCount'].apply(lambda x: '1-4' if 1 <= x <= 4 else 
                                                          ('5-8' if 5 <= x <= 8 else 
                                                           ('9-12' if 9 <= x <= 12 else 
                                                            ('13+' if x >= 13 else 'Error'))))

# Convert to Categorical with specific levels
point_df['rallyCountFreq'] = pd.Categorical(point_df['rallyCountFreq'], 
                                             categories=['1-4', '5-8', '9-12', '13+'], 
                                             ordered=True)

point_df['rallyCountFreq']


0       1-4
2       1-4
4       1-4
6       5-8
11      1-4
12      1-4
16      5-8
21      1-4
24      5-8
30     9-12
40      1-4
41      1-4
42      1-4
45      1-4
48      13+
61      5-8
68      1-4
72     9-12
82      5-8
87      5-8
95      1-4
99      1-4
102     1-4
104     5-8
109     1-4
111     5-8
116     5-8
123     5-8
131     5-8
137     1-4
140     5-8
145     1-4
149    9-12
159    9-12
170     1-4
174     1-4
178     1-4
179     1-4
180     1-4
183     1-4
186     1-4
190     1-4
191     1-4
194     1-4
196     1-4
198     1-4
201     1-4
205     1-4
206     1-4
209     1-4
211     5-8
216     1-4
220     1-4
223     1-4
225     1-4
226     1-4
229     1-4
231     1-4
233     1-4
235     1-4
238     5-8
243     1-4
245     1-4
248     5-8
253     5-8
260     5-8
266     1-4
270     5-8
277     1-4
279     1-4
282     13+
297     1-4
299     1-4
301     1-4
305     1-4
309     1-4
311     1-4
314     1-4
315     5-8
320     5-8
328     1-4
331     13+
346     5-8
352 

#### Serve Columns

In [322]:
point_df['firstServeIn'] = 0
point_df['secondServeIn'] = 0

for point_number in shot_data['pointNumber'].unique():
    # Assign firstServeIn 
    if any((shot_data['pointNumber'] == point_number) & (shot_data['firstServeIn'] == 1)):
        point_df.loc[point_df['pointNumber'] == point_number, 'firstServeIn'] = 1
    # Assign secondServeIn
    if any((shot_data['pointNumber'] == point_number) & (shot_data['secondServeIn'] == 1)):
        point_df.loc[point_df['pointNumber'] == point_number, 'secondServeIn'] = 1

# Add serveResult and serveInPlacement
start_points = shot_data[shot_data['isPointStart'] == 1]
point_df['serveResult'] = start_points['serveResult'].values
point_df['serveInPlacement'] = start_points['serveInPlacement'].values

# Add firstServeZone and secondServeZone
serve_zones = shot_data.loc[shot_data['shotInRally'] == 1, ['pointNumber', 'firstServeZone', 'secondServeZone', 'firstServeIn', 'secondServeIn']].drop_duplicates()
point_df['firstServeZone'] = shot_data.groupby('pointNumber')['firstServeZone'].first().values
point_df['secondServeZone'] = shot_data.groupby('pointNumber')['secondServeZone'].first().values

#### Ace Column

In [323]:
point_df['isAce'] = ((point_df['rallyCount'] == 1) & ((point_df['serveResult'] != "Double Fault")))

#### Server Coordinate Data Columns

In [324]:
# Add serverFarNear
point_df['serverFarNear'] = shot_data.groupby('pointNumber')['serverFarNear'].first().values

# Add firstServeXCoord and firstServeYCoord
point_df['firstServeXCoord'] = shot_data.groupby('pointNumber')['firstServeXCoord'].first().values
point_df['firstServeYCoord'] = shot_data.groupby('pointNumber')['firstServeYCoord'].first().values

# Add secondServeXCoord and secondServeYCoord
point_df['secondServeXCoord'] = shot_data.groupby('pointNumber')['secondServeXCoord'].first().values
point_df['secondServeYCoord'] = shot_data.groupby('pointNumber')['secondServeYCoord'].first().values

#### Return Columns

In [325]:
point_df['returnDirection'] = None
point_df['returnFhBh'] = None

for point_number in shot_data['pointNumber'].unique():
    # shotInRally == 2 for returns
    if 2 in shot_data.loc[shot_data['pointNumber'] == point_number, 'shotInRally'].values:
        row_with_return_info = shot_data[(shot_data['pointNumber'] == point_number) & (shot_data['shotInRally'] == 2)].iloc[0]

        # Add/assign returnDirection and returnFhBh
        point_df.loc[point_df['pointNumber'] == point_number, 'returnDirection'] = row_with_return_info['shotDirection']
        point_df.loc[point_df['pointNumber'] == point_number, 'returnFhBh'] = row_with_return_info['shotFhBh']

#### errorType Column

In [326]:
# Create an empty DataFrame to store the results
error_results = pd.DataFrame(columns=['errorType', 'pointNumber'])

# Iterate through entire shot_data
for index, row in shot_data.iterrows():
    pointNumber = row['pointNumber']
    point_error_value = None
    
    if row['isErrorWideR'] == 1:
        point_error_value = 'Wide Right'
    elif row['isErrorWideL'] == 1:
        point_error_value = 'Wide Left'
    elif 'isErrorNet' in row and row['isErrorNet'] == 1:
        point_error_value = 'Net'
    elif row['isErrorLong'] == 1:
        point_error_value = 'Long'
    

    # If an error is found, append the result to the error_results DataFrame
    if point_error_value is not None:
        error_results = pd.concat([error_results, pd.DataFrame({'pointNumber': [pointNumber], 'errorType': [point_error_value]})], ignore_index=True)


# Drop duplicates based on 'pointNumber'
error_results = error_results.drop_duplicates(subset=['pointNumber'])

In [327]:
# Create a dictionary mapping 'pointNumber' to 'errorType' in error_results
error_type_mapping = dict(zip(error_results['pointNumber'], error_results['errorType']))

# Create 'errorType' column in point_df based on the mapping
point_df['errorType'] = point_df['pointNumber'].map(error_type_mapping)

point_df = point_df.replace({np.nan: None})

#### returnError Column

In [328]:
def get_return_error(row):
    if row['rallyCount'] == 2:
        return row['errorType']
    else:
        return None

point_df.loc[point_df['pointNumber'] == point_number, 'serveInPlacement'] = shot_data['secondServeZone']  

# Apply the functions to create the new columns
point_df['returnError'] = point_df.apply(get_return_error, axis=1)


#### lastShot Columns

In [329]:
point_df['lastShotDirection'] = None
point_df['lastShotFhBh'] = None
point_df['lastShotHitBy'] = None  
point_df['lastShotResult'] = None  

# Iterate through unique pointNumbers in shot_data
for point_number in shot_data['pointNumber'].unique():
    # Check if isPointEnd == 1 exists for the given pointNumber
    if 1 in shot_data.loc[shot_data['pointNumber'] == point_number, 'isPointEnd'].values:
        # Get the information from the corresponding row
        row_with_lastshot_info = shot_data[(shot_data['pointNumber'] == point_number) & (shot_data['isPointEnd'] == 1)].iloc[0]

        # Assign values to 'lastShotDirection' and 'lastShotFhBh' columns
        point_df.loc[point_df['pointNumber'] == point_number, 'lastShotDirection'] = row_with_lastshot_info['shotDirection']
        point_df.loc[point_df['pointNumber'] == point_number, 'lastShotFhBh'] = row_with_lastshot_info['shotFhBh']
        point_df.loc[point_df['pointNumber'] == point_number, 'lastShotHitBy'] = row_with_lastshot_info['shotHitBy']
        
        # Determine lastShotResult based on conditions
        if row_with_lastshot_info['isWinner'] == 1 and not row_with_lastshot_info['isAce']:
            point_df.loc[point_df['pointNumber'] == point_number, 'lastShotResult'] = "Winner"
        elif row_with_lastshot_info['lastShotError'] == 1:
            point_df.loc[point_df['pointNumber'] == point_number, 'lastShotResult'] = "Error"

#### pointWonBy Column

In [330]:
# Initialize variables to keep track of the state
prev_point_number = None
point_won_by_list = []

# Iterate through the DataFrame
for index, row in shot_data.iterrows():
    if row['isPointEnd'] == 1:
        # Check if pointNumber is different and consecutively increasing
        if prev_point_number is None or row['pointNumber'] == prev_point_number + 1:
            # Append pointWonBy to the list
            point_won_by_list.append(row['pointWonBy'])
            prev_point_number = row['pointNumber']
        else:
            raise ValueError("Error: Point numbers are not different or consecutively increasing.")
            break

# Add point_won_by_list as a new column to point_df
point_df['pointWonBy'] = point_won_by_list

#### isExcitingPoint Column

In [331]:
point_df['isExcitingPoint'] = shot_data.groupby('pointNumber')['isExcitingPoint'].count()

#### isBreakPoint Column

In [332]:
break_point_values = ['0-40', '15-40', '30-40', '40-40', '40-AD']
point_df['isBreakPoint'] = point_df['pointScore'].isin(break_point_values)

#### atNetPlayer1 and atNetPlayer2 Columns

In [333]:
for i in point_numbers:
    # atNetPlayer1
    if any((shot_data['pointNumber'] == i) & (shot_data['atNetPlayer1'] == 1)):
        point_df.loc[point_df['pointNumber'] == i, 'atNetPlayer1'] = 1
    # atNetPlayer2
    if any((shot_data['pointNumber'] == i) & (shot_data['atNetPlayer2'] == 1)):
        point_df.loc[point_df['pointNumber'] == i, 'atNetPlayer2'] = 1

# Add atNetPlayer Columns
point_df['atNetPlayer1'] = point_df['atNetPlayer1'].replace({0: "", 1: player1_name})
point_df['atNetPlayer2'] = point_df['atNetPlayer2'].replace({0: "", 1: player2_name})

#### setNum Column

In [334]:
point_df['setNum'] = point_df['setScore'].apply(lambda x: sum(int(char) for char in x if char.isdigit()) + 1)

#### Depths Count (Short, Deep) Columns
- NEED TO DO: Group by pointNumber and sum the counts of Deep, Short, and Long --> put into columns deepCount, and shortCount

In [335]:
# Add counts for each player on how many short and deep balls they hit in the point (group by pointNumber)
# - don't have to do long since we already know what points end with isErrorLong (and also isErrorNet)


shot_data['deep'] = np.where(shot_data['depth'] == 'Deep', 1, 0)
shot_data['short'] = np.where(shot_data['depth'] == 'Short', 1, 0)


#point_df['deepCountPlayer1'] = shot_data.groupby(['pointNumber', 'player1Name'])['deep'].sum().values
#point_df['shortCountPlayer1'] = shot_data.groupby(['pointNumber', 'player1Name'])['short'].sum().values

#point_df['deepCountPlayer2'] = shot_data.groupby(['pointNumber', 'player2Name'])['deep'].sum().values
#point_df['shortCountPlayer2'] = shot_data.groupby(['pointNumber', 'player2Name'])['short'].sum().values



### attempt 2:

deep_group = shot_data.pivot_table(index='pointNumber', columns='shotHitBy', values='deep', aggfunc='sum')
deep_group.columns = ['deepCountPlayer1', 'deepCountPlayer2']
# change NaN's from double faults to 0?
deep_group.fillna(0, inplace=True)


short_group = shot_data.pivot_table(index='pointNumber', columns='shotHitBy', values='short', aggfunc='sum')
short_group.columns = ['shortCountPlayer1', 'shortCountPlayer2']
# change NaN's from double faults to 0?
short_group.fillna(0, inplace=True)


point_df = pd.merge(point_df, deep_group, how='left', on='pointNumber')
point_df = pd.merge(point_df, short_group, how='left', on='pointNumber')

### Add Column: Game Number, Set Number, Game/Set/Point for each player
- **NEED TO DO: Fix pointscore: player1PointScore and player2PointScore change based on side; eg. suppose score is "15-0" player2 serving, currently it just takes 15 and assigns to player1PointScore**

In [336]:
point_df[['player1SetScore', 'player2SetScore']] = point_df['setScore'].str.split('-', expand=True)
point_df[['player1GameScore', 'player2GameScore']] = point_df['gameScore'].str.split('-', expand=True)
point_df[['player1PointScore', 'player2PointScore']] = point_df['pointScore'].str.split('-', expand=True) # NEED TO FIX
if not point_df['tiebreakScore'].isnull().all() and not point_df['tiebreakScore'].eq("").all():
    # Perform the operation only when tiebreakScore is not empty
    point_df[['player1TiebreakScore', 'player2TiebreakScore']] = point_df['tiebreakScore'].str.split('-', expand=True)
else:
    # Set player1TiebreakScore and player2TiebreakScore to NaN
    point_df['player1TiebreakScore'] = np.nan
    point_df['player2TiebreakScore'] = np.nan
    
def calculate_game_number(score):
    return int(score.split('-')[0]) + int(score.split('-')[1]) + 1

# Apply the function to create the 'gameNumber' column
point_df['gameNumber'] = point_df['gameScore'].apply(calculate_game_number)

#### player1ServeResult Column

In [337]:
# Add the 'player1ServeResult' column
point_df['player1ServeResult'] = None

point_df.loc[point_df['serverName'] == point_df['player1Name'], 'player1ServeResult'] = point_df['serveResult']
point_df.loc[point_df['isAce'] == True, 'player1ServeResult'] = 'Ace'

#### player1ServePlacement Column

In [338]:
# Add the 'player1ServePlacement' column
point_df['player1ServePlacement'] = None
point_df.loc[point_df['serverName'] == point_df['player1Name'], 'player1ServePlacement'] = point_df['side'] + ': ' + point_df['serveInPlacement']

#### player1ReturnPlacement Column

In [339]:
# Add the 'player1ReturnPlacement' column
point_df['player1ReturnPlacement'] = None

# Set player1ServePlacement based on conditions
point_df.loc[point_df['returnerName'] == point_df['player1Name'], 'player1ReturnPlacement'] = point_df['returnDirection']

#### player1ReturnFhBh Column

In [340]:
# Add the 'player1ReturnFhBh' column
point_df['player1ReturnFhBh'] = None

# Set player1ServePlacement based on conditions
point_df.loc[point_df['returnerName'] == point_df['player1Name'], 'player1ReturnFhBh'] = point_df['returnFhBh']

#### player1LastShotPlacement Column

In [341]:
# Add the 'player1LastShotFhBh' column
point_df['player1LastShotPlacement'] = None

# Set player1ServePlacement based on conditions
point_df.loc[point_df['lastShotHitBy'] == point_df['player1Name'], 'player1LastShotPlacement'] = point_df['lastShotDirection']

#### player1LastShotFhBh Column

In [342]:
# Add the 'player1LastShotFhBh' column
point_df['player1LastShotFhBh'] = None

# Set player1ServePlacement based on conditions
point_df.loc[point_df['lastShotHitBy'] == point_df['player1Name'], 'player1LastShotFhBh'] = point_df['lastShotFhBh']

#### player1LastShotResult Column

In [343]:
# Add the 'player1LastShotResult' column
point_df['player1LastShotResult'] = None

# Set player1LastShotResult based on conditions, excluding 'Ace' and 'Double Fault'
point_df.loc[
    (point_df['lastShotHitBy'] == point_df['player1Name']) & 
    ~point_df['player1ServeResult'].isin(['Ace', 'Double Fault']), 
    'player1LastShotResult'
] = point_df['lastShotResult']


#### player2ServeResult Column

In [344]:
# Add the 'player2ServeResult' column
point_df['player2ServeResult'] = None

# Set player1ServeResult based on conditions
point_df.loc[point_df['serverName'] == point_df['player2Name'], 'player2ServeResult'] = point_df['serveResult']
point_df.loc[point_df['isAce'] == True, 'player1ServeResult'] = 'Ace'

#### player2ServePlacement Column

In [345]:
# Add the 'player1ServePlacement' column
point_df['player2ServePlacement'] = None

# Set player1ServePlacement based on conditions
point_df.loc[point_df['serverName'] == point_df['player2Name'], 'player2ServePlacement'] = point_df['side'] + ': ' + point_df['serveInPlacement']

#### player2ReturnPlacement Column

In [346]:
# Add the 'player2ReturnPlacement' column
point_df['player2ReturnPlacement'] = None

# Set player1ServePlacement based on conditions
point_df.loc[point_df['returnerName'] == point_df['player2Name'], 'player2ReturnPlacement'] = point_df['returnDirection']

#### player2ReturnFhBh Column

In [347]:
# Add the 'player1ReturnFhBh' column
point_df['player2ReturnFhBh'] = None

# Set player1ServePlacement based on conditions
point_df.loc[point_df['returnerName'] == point_df['player2Name'], 'player2ReturnFhBh'] = point_df['returnFhBh']

#### player2LastShotPlacement Column

In [348]:
# Add the 'player1LastShotFhBh' column
point_df['player2LastShotPlacement'] = None

# Set player1ServePlacement based on conditions
point_df.loc[point_df['lastShotHitBy'] == point_df['player2Name'], 'player2LastShotPlacement'] = point_df['lastShotDirection']

#### player2LastShotFhBh Column

In [349]:
# Add the 'player1LastShotFhBh' column
point_df['player2LastShotFhBh'] = None

# Set player1ServePlacement based on conditions
point_df.loc[point_df['lastShotHitBy'] == point_df['player2Name'], 'player2LastShotFhBh'] = point_df['lastShotFhBh']

#### player2LastShotResult Column

In [350]:
# Add the 'player2LastShotResult' column
point_df['player2LastShotResult'] = None

# Set player1LastShotResult based on conditions, excluding 'Ace' and 'Double Fault'
point_df.loc[
    (point_df['lastShotHitBy'] == point_df['player2Name']) &
    ~point_df['player2ServeResult'].isin(['Ace', 'Double Fault']),
    'player2LastShotResult'
] = point_df['lastShotResult']


#### Name Column

In [351]:
# Change pointScore to the specified format
point_df['Name'] = point_df.apply(lambda row: f"Set {row['setNum']}: {row['gameScore']}, {row['tiebreakScore']} {row['serverName']} Serving" if pd.notna(row['tiebreakScore']) else f"Set {row['setNum']}: {row['gameScore']}, {row['pointScore']} {row['serverName']} Serving", axis=1)

#### Reorder DataFrame for Output
- **NEED TO DO: Update this with new columns from Leo (firstServeLocation and isLet)**

In [352]:
desired_order = ['pointNumber', 'player1Name', 'player2Name', 'pointScore', 'gameScore',
       'setScore', 'tiebreakScore', 'side', 'serverName', 'returnerName',
       'clientTeam', 'opponentTeam', 'Position', 'pointEndPosition',
       'Duration', 'rallyCount', 'rallyCountFreq', 'firstServeIn',
       'secondServeIn', 'serveResult', 'serveInPlacement', 'firstServeZone',
       'secondServeZone', 'isAce', 'serverFarNear', 'serverStartLocation', 'returnerStartLocation', 
       'firstServeXCoord','firstServeYCoord', 'secondServeXCoord', 'secondServeYCoord',
       'returnDirection', 'returnFhBh', 'errorType', 'returnError',
       'lastShotDirection', 'lastShotFhBh', 'lastShotHitBy', 'lastShotResult',
       'pointWonBy', 'isExcitingPoint', 'isBreakPoint', 'atNetPlayer1',
       'atNetPlayer2', 'setNum', 'player1SetScore', 'player2SetScore',
       'player1GameScore', 'player2GameScore', 'player1PointScore',
       'player2PointScore', 'player1TiebreakScore', 'player2TiebreakScore',
       'gameNumber', 'player1ServeResult', 'player1ServePlacement',
       'player1ReturnPlacement', 'player1ReturnFhBh',
       'player1LastShotPlacement', 'player1LastShotFhBh',
       'player1LastShotResult', 'player2ServeResult', 'player2ServePlacement',
       'player2ReturnPlacement', 'player2ReturnFhBh',
       'player2LastShotPlacement', 'player2LastShotFhBh',
       'player2LastShotResult', 'deepCountPlayer1', 'deepCountPlayer2',
       'shortCountPlayer1', 'shortCountPlayer2', 'Name']

# Reorder the columns
point_df = point_df.reindex(columns=desired_order)

In [353]:
point_df_copy = point_df.copy()

# **Point CSV Error Checks**
#### Check Game Number is consecutive increasing. Ex: 1,2,3,4,5,6. End of Set 1. 1,2,3,4,5,6,7,8

In [354]:
game_numbers = point_df['gameNumber'].tolist()

# Initialize variables
seen = set()
prev = None

# Iterate through gameNumber column
for num in game_numbers:
    # If the number is not in seen or it's different from the previous one, print it
    if num not in seen or num != prev:
        print(num, end=', ')
    # If the number is the same as the previous one but not consecutive, print it
    elif num == prev and num not in seen:
        print(num, end=', ')
    # Update seen set and prev variable
    seen.add(num)
    prev = num

1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 

#### Check if the columns and their order are the same

In [355]:
print(point_df.shape)
print(point_df_copy.shape)

if (point_df.shape == point_df_copy.shape):
    print('Check passed.')
else:
    raise ValueError('Error: Not the same!')

(93, 73)
(93, 73)
Check passed.


# Get the set of column names for each DataFrame
point_df_columns = set(point_df.columns)
point_df_copy_columns = set(point_df_copy.columns)

# Find the column names unique to each DataFrame
unique_to_point_df = point_df_columns - point_df_copy_columns
unique_to_point_df_copy = point_df_copy_columns - point_df_columns

# Output the results
if unique_to_point_df:
    print("Columns unique to point_df:", unique_to_point_df)
else:
    print("All columns in point_df are also in point_df_copy")

if unique_to_point_df_copy:
    print("Columns unique to point_df_copy:", unique_to_point_df_copy)
else:
    print("All columns in point_df_copy are also in point_df")


In [356]:
#### Change all empty cells to ""
point_df.replace([pd.NA, None, pd.NaT, float('nan')], "", inplace=True)

# **Output PointCSV**

In [357]:
player1NameNoSpace = point_df.iloc[0]['player1Name'].replace(" ", "")
player2NameNoSpace = point_df.iloc[0]['player2Name'].replace(" ", "")

# Save DataFrame to CSV file with modified player names
point_df.to_csv(f'Point_Visuals_{player1NameNoSpace}_{player2NameNoSpace}.csv', index=False)

# Function to change csv to json
def csv_to_json(csv_file_path, json_file_name):
    df = pd.read_csv(csv_file_path)
    json_data = df.to_json(orient='records')
    json_file_path = f'{json_file_name}'
    with open(json_file_path, 'w') as json_file:
        json_file.write(json_data)
    
    return json_file_path

# Convert CSV to JSON and save in the same directory
csv_file_path = f"Point_Visuals_{player1NameNoSpace}_{player2NameNoSpace}.csv"
json_file_name = f"Point_Visuals_{player1NameNoSpace}_{player2NameNoSpace}.json"
csv_to_json(csv_file_path, json_file_name)

'Point_Visuals_EmonvanLobenSels_RogerPascualFerra.json'

# **EDA**


### Shot CSV EDA

In [358]:
first_player1Name = point_df['player1Name'].iloc[0]
first_player2Name = point_df['player2Name'].iloc[0]

### Summary Stats

In [359]:
shot_eda = shot_data.copy()
point_df_eda = point_df.copy()

# Can input CSV Directly here for statistics functions
## [Depth EDA]
# Filter shots for Player1:
print(f"\n\nShot Results for {first_player1Name} for match overall:")

player1_shots = shot_eda[(shot_eda['shotHitBy'] == player1Name) & (shot_eda['lastShotError'] != 1) & (shot_eda['shotInRally'] != 1)]
#player1_shots = player1_shots[player1_shots['shotInRally'] != 1]
#print(player1_shots.head(10))
num_player1_shots = len(player1_shots)
num_deep_player1_shots = player1_shots['deep'].sum()
num_short_player1_shots = player1_shots['short'].sum()

print(f"Number of Deep Shots (count): {num_deep_player1_shots}" )
print(f"Number of Short Shots (count): {num_short_player1_shots}" )

# average of ALL shots in match for Player1 (including errors/long shots but excluding serves)
print(f"Number of Deep Shots (%): {num_deep_player1_shots / num_player1_shots:.2f}%")
print(f"Number of Short Shots (%): {num_short_player1_shots / num_player1_shots:.2f}%")


# separate into forehand, backhand, slice, and volley (all sepearate from each other):
# [remove slice and volleys from fh and bh count] ??

fh_shots = player1_shots[(player1_shots['shotFhBh'] == 'Forehand')] #& (player1_shots['isSlice'] != 1) & (player1_shots['isVolley'] != 1)]
bh_shots = player1_shots[(player1_shots['shotFhBh'] == 'Backhand')] #& (player1_shots['isSlice'] != 1) & (player1_shots['isVolley'] != 1)]
slice_shots = player1_shots[player1_shots['isSlice'] == 1]
volley_shots = player1_shots[player1_shots['isVolley'] == 1]

fh_slice = player1_shots[(player1_shots['shotFhBh'] == 'Forehand') & (player1_shots['isSlice'] == 1)]
bh_slice = player1_shots[(player1_shots['shotFhBh'] == 'Backhand') & (player1_shots['isSlice'] == 1)]
fh_volley = player1_shots[(player1_shots['shotFhBh'] == 'Forehand') & (player1_shots['isVolley'] == 1)]
bh_volley = player1_shots[(player1_shots['shotFhBh'] == 'Backhand') & (player1_shots['isVolley'] == 1)]

    
print(f"\nTotal number of Forehands (count): {len(fh_shots)}" )
print(f"Forehands Deep (count): {fh_shots['deep'].sum()}" )
print(f"Forehands Short (count): {fh_shots['short'].sum()}" )
print(f"Forehands Deep (%): {( fh_shots['deep'].sum() / len(fh_shots) ):.2f}%")
print(f"Forehands Short (%): {( fh_shots['short'].sum() / len(fh_shots) ):.2f}%")


print(f"\nTotal number of Backhands (count): {len(bh_shots)}" )
print(f"Backhands Deep (count): {bh_shots['deep'].sum()}" )
print(f"Backhands Short (count): {bh_shots['short'].sum()}" )
print(f"Backhands Deep (%): {( bh_shots['deep'].sum() / len(bh_shots) ):.2f}%")
print(f"Backhands Short (%): {( bh_shots['short'].sum() / len(bh_shots) ):.2f}%")


print(f"\nTotal number of Slices (count): {len(slice_shots)}" )
print(f"Slices Deep (count): {slice_shots['deep'].sum()}" )
print(f"Slices Short (count): {slice_shots['short'].sum()}" )
print(f"Slices Deep (%): {( slice_shots['deep'].sum() / len(slice_shots) ):.2f}%")
print(f"Slices Short (%): {( slice_shots['short'].sum() / len(slice_shots) ):.2f}%")

print(f"Slices Deep Forehand(%): {(fh_slice[('deep')].sum() / len(slice_shots) ):.2f}%")
print(f"Slices Short Forehand (%): {(fh_slice[('short')].sum()/ len(slice_shots) ):.2f}%")
print(f"Slices Deep Backhand (%): {(bh_slice[('deep')].sum()/ len(slice_shots) ):.2f}")
print(f"Slices Short Backhand (%): {(bh_slice[('short')].sum()/ len(slice_shots) ):.2f}")


print(f"\nTotal number of Volleys (count): {len(volley_shots)}" )
print(f"Volleys Deep (count): {volley_shots['deep'].sum()}" )
print(f"Volleys Short (count): {volley_shots['short'].sum()}" )
print(f"Volleys Deep (%): {( volley_shots['deep'].sum() / len(volley_shots) ):.2f}%")
print(f"Volleys Short (%): {( volley_shots['short'].sum() / len(volley_shots) ):.2f}%")

print(f"Volleys Deep Forehand (%): {(fh_volley[('deep')].sum()/ len(volley_shots) ):.2f}")
print(f"Volleys Short Forehand (%): {(fh_volley[('short')].sum()/ len(volley_shots) ):.2f}")
print(f"Volleys Deep Backhand (%): {(bh_volley[('deep')].sum()/ len(volley_shots) ):.2f}")
print(f"Volleys Short Backhand (%): {(bh_volley[('short')].sum()/ len(volley_shots) ):.2f}")
# Filter shots for Player1:
print(f"\n\nShot Results for {first_player1Name} per point:")

print(f"Average Deep Shots (%): {(point_df_eda['deepCountPlayer1'].sum() / len(point_df_eda)):.2f}%")
print(f"Average Short Shots (%): {(point_df_eda['shortCountPlayer1'].sum() / len(point_df_eda)):.2f}%")


# Approach Shots
player1Name = shot_eda.iloc[0]['player1Name']

# Filter shot_data based on the conditions
approach_data_player1 = shot_eda[(shot_eda['isApproach'] == 1) & (shot_eda['shotHitBy'] == player1Name)]

# Count the distinct pointNumbers
distinct_point_numbers = approach_data_player1['pointNumber'].nunique()

# Print the result
print(f"Number of Approach Shots hit by {player1Name}: {distinct_point_numbers}" )

# print(approach_data_player1)



#print(point_df_eda.columns)
#point_df_eda
#player1_shots = shot_eda[shot_eda['shotHitBy'] == player1Name]

# fix above code: make avg in each point, then avg the avg?
# your_file_name = "filename.csv"
# shot_eda = pd.read_csv(your_file_name)



Shot Results for Emon van Loben Sels for match overall:
Number of Deep Shots (count): 45
Number of Short Shots (count): 83
Number of Deep Shots (%): 0.35%
Number of Short Shots (%): 0.65%

Total number of Forehands (count): 51
Forehands Deep (count): 19
Forehands Short (count): 32
Forehands Deep (%): 0.37%
Forehands Short (%): 0.63%

Total number of Backhands (count): 72
Backhands Deep (count): 26
Backhands Short (count): 46
Backhands Deep (%): 0.36%
Backhands Short (%): 0.64%

Total number of Slices (count): 10
Slices Deep (count): 2
Slices Short (count): 8
Slices Deep (%): 0.20%
Slices Short (%): 0.80%
Slices Deep Forehand(%): 0.00%
Slices Short Forehand (%): 0.40%
Slices Deep Backhand (%): 0.20
Slices Short Backhand (%): 0.40

Total number of Volleys (count): 5
Volleys Deep (count): 0
Volleys Short (count): 5
Volleys Deep (%): 0.00%
Volleys Short (%): 1.00%
Volleys Deep Forehand (%): 0.00
Volleys Short Forehand (%): 0.00
Volleys Deep Backhand (%): 0.00
Volleys Short Backhand (%): 

### Point CSV EDA

#### Serve and Return Stats

In [360]:
first_player1Name = point_df_eda['player1Name'].iloc[0]



# Display the results
print(f"\nServe Results for {first_player1Name}:")

# Assuming point_df_eda is your DataFrame
total_serves = len(point_df_eda[point_df_eda['serverName'] == first_player1Name])
first_serve_in_count = len(point_df_eda[(point_df_eda['serverName'] == first_player1Name) & (point_df_eda['firstServeIn'] == 1)])
first_serve_won_count = len(point_df_eda[(point_df_eda['serverName'] == first_player1Name) & (point_df_eda['firstServeIn'] == 1) & (point_df_eda['pointWonBy'] == first_player1Name)])
percentage_first_serve_in = (first_serve_in_count / total_serves) * 100 if total_serves > 0 else 0
percentage_first_serve_won = (first_serve_won_count / first_serve_in_count) * 100 if first_serve_in_count > 0 else 0

second_serve_total_count = len(point_df_eda[(point_df_eda['serverName'] == first_player1Name) & (point_df_eda['firstServeIn'] == 0)])
second_serve_in_count = len(point_df_eda[(point_df_eda['serverName'] == first_player1Name) & (point_df_eda['firstServeIn'] == 0)& (point_df_eda['secondServeIn'] == 1)])
second_serve_won_count = len(point_df_eda[(point_df_eda['serverName'] == first_player1Name) & (point_df_eda['firstServeIn'] == 0)& (point_df_eda['secondServeIn'] == 1) & (point_df_eda['pointWonBy'] == first_player1Name)])
percentage_second_serve_in = (second_serve_in_count / second_serve_total_count) * 100 if second_serve_total_count > 0 else 0
percentage_second_serve_won = (second_serve_won_count / second_serve_in_count) * 100 if second_serve_in_count > 0 else 0



# Display the results
print("\nTotal Serves:", total_serves)
print("First Serve In (Count):", first_serve_in_count)
print("First Serve Won (Count):", first_serve_won_count)
print(f"First Serve In (%): {percentage_first_serve_in:.2f}%")
print(f"First Serve Won (%): {percentage_first_serve_won:.2f}%")

print("Second Serve In (Count):", second_serve_in_count)
print("Second Serve Total (Count):", second_serve_total_count)
print("Second Serve Won (Count):", second_serve_won_count)
print(f"Second Serve In (%): {percentage_second_serve_in:.2f}%")
print(f"Second Serve Won (%): {percentage_second_serve_won:.2f}%")

# Assuming point_df is your DataFrame
count_is_ace = (point_df_eda[point_df_eda['serverName'] == first_player1Name]['isAce']).sum()
count_is_double_fault = ((point_df_eda['serverName'] == first_player1Name) & (point_df_eda['serveResult'] == "Double Fault")).sum()

# Display the results
print("Ace (Count):", count_is_ace)
print("Double Fault (Count):", count_is_double_fault)

# Count of rows where serverName is equal to the first row of player1Name and pointWonBy is equal to the first row of player1Name
total_service_points_won = len(point_df_eda[(point_df_eda['serverName'] == first_player1Name) & (point_df_eda['pointWonBy'] == first_player1Name)])
total_service_points_won_percentage = total_service_points_won / total_serves *100

# Display the results
print(f"Points Won on Serve (Count) {total_service_points_won}")

print(f"Points Won on Serve (%): {total_service_points_won_percentage:.2f}%")

# Assuming point_df is your DataFrame
return_points = point_df_eda[(point_df_eda['returnerName'] == first_player1Name)] # CHANGED THIS JERRY, REMOVED RALLY COUNT >= 2

total_return = len(return_points)
returnMade = len(return_points[(return_points['rallyCount'] > 2) | ((return_points['rallyCount'] == 2) & (return_points['lastShotResult'] != 'Error'))])
returnError = len(return_points[(return_points['lastShotResult'] == 'Error') & (return_points['rallyCount'] == 2)])
returnWinner = len(return_points[(return_points['lastShotResult'] == 'Winner') & (return_points['rallyCount'] == 2)])
returnMadePercentage = returnMade/total_return

returnWonByPlayer1 = len(return_points[return_points['pointWonBy'] == first_player1Name])
returnWonByPlayer1Percentage = returnWonByPlayer1 / returnMade * 100 if returnMade > 0 else 0

deuceReturnCount = len(return_points[return_points['side'] == 'Deuce'])
adReturnCount = len(return_points[return_points['side'] == 'Ad'])


deuceReturnMade = len(return_points[(return_points['side'] == 'Deuce') & ((return_points['rallyCount'] > 2) | ((return_points['rallyCount'] == 2) & (return_points['lastShotResult'] != 'Error')))])
adReturnMade = len(return_points[(return_points['side'] == 'Ad') & ((return_points['rallyCount'] > 2) | ((return_points['rallyCount'] == 2) & (return_points['lastShotResult'] != 'Error')))])

deuceReturnMadePercentage = deuceReturnMade/deuceReturnCount
adReturnMadePercentage = adReturnMade/adReturnCount

deuceReturnWonByPlayer1 = len(return_points[(return_points['side'] == 'Deuce') & (return_points['pointWonBy'] == first_player1Name) | ((return_points['rallyCount'] == 2) & (return_points['lastShotResult'] != 'Error'))])
adReturnWonByPlayer1 = len(return_points[(return_points['side'] == 'Ad') & (return_points['pointWonBy'] == first_player1Name) | ((return_points['rallyCount'] == 2) & (return_points['lastShotResult'] != 'Error'))])

deuceReturnWonByPlayer1Percentage = deuceReturnWonByPlayer1 / deuceReturnMade * 100 if deuceReturnMade > 0 else 0
adReturnWonByPlayer1Percentage = adReturnWonByPlayer1 / adReturnMade * 100 if adReturnMade > 0 else 0




print(f"\nReturn Results for {first_player1Name}:\n")

print("Total Return (Count):", total_return)
print("Return Won (Count):", returnWonByPlayer1)
print("Return Won (%):", returnWonByPlayer1Percentage)

print("\nReturn Made (Count):", returnMade)
print("Return Made (%):", returnMadePercentage)
print("Return Error (Count):", returnError)
print("Return Winner (Count):", returnWinner)

print("\nDeuce Return (Count):", deuceReturnCount)
print("Deuce Return Made (Count):", deuceReturnMade)
print("Deuce Return Made (%):", deuceReturnMadePercentage)
print("Deuce Return Won by Player1 (%):", deuceReturnWonByPlayer1Percentage)
print("Deuce Return Won by Player1 (Count):", deuceReturnWonByPlayer1)


print("\nAd Return (Count):", adReturnCount)
print("Ad Return Made (Count):", adReturnMade)
print("Ad Return Made (%):", adReturnMadePercentage)
print("Ad Return Won by Player1 (Count):", adReturnWonByPlayer1)
print("Ad Return Won by Player1 (%):", adReturnWonByPlayer1Percentage)

# Assuming return_points is your DataFrame
deuce_return_points = return_points[(return_points['side'] == 'Deuce') & (return_points['returnerName'] == first_player1Name) & (return_points['rallyCount'] >= 2)]

# Deuce Return Points Separated by returnFhBh
deuce_forehand_return_points = deuce_return_points[deuce_return_points['returnFhBh'] == 'Forehand']
deuce_backhand_return_points = deuce_return_points[deuce_return_points['returnFhBh'] == 'Backhand']


# Count for Deuce Return Points - Made
count_deuce_forehand_made = len(deuce_forehand_return_points[(deuce_forehand_return_points['rallyCount'] > 2) | ((deuce_forehand_return_points['rallyCount'] == 2) & (deuce_forehand_return_points['lastShotResult'] != 'Error'))])
count_deuce_backhand_made = len(deuce_backhand_return_points[(deuce_backhand_return_points['rallyCount'] > 2) | ((deuce_backhand_return_points['rallyCount'] == 2) & (deuce_backhand_return_points['lastShotResult'] != 'Error'))])

# Count for Deuce Return Points - Error
count_deuce_forehand_error = len(deuce_forehand_return_points[(deuce_forehand_return_points['lastShotResult'] == 'Error') & (deuce_forehand_return_points['rallyCount'] == 2)])
count_deuce_backhand_error = len(deuce_backhand_return_points[(deuce_backhand_return_points['lastShotResult'] == 'Error') & (deuce_backhand_return_points['rallyCount'] == 2)])

# Display the counts
print("\nDeuce Forehand Return Points - Made:", count_deuce_forehand_made)
print("Deuce Forehand Return Points - Error:", count_deuce_forehand_error)

print("Deuce Backhand Return Points - Made:", count_deuce_backhand_made)
print("Deuce Backhand Return Points - Error:", count_deuce_backhand_error)

# Assuming return_points is your DataFrame
ad_return_points = return_points[(return_points['side'] == 'Ad') & (return_points['returnerName'] == first_player1Name) & (return_points['rallyCount'] >= 2)]

# Ad Return Points Separated by returnFhBh
ad_forehand_return_points = ad_return_points[ad_return_points['returnFhBh'] == 'Forehand']
ad_backhand_return_points = ad_return_points[ad_return_points['returnFhBh'] == 'Backhand']

# Count for Ad Return Points - Made
count_ad_forehand_made = len(ad_forehand_return_points[(ad_forehand_return_points['rallyCount'] > 2) | ((ad_forehand_return_points['rallyCount'] == 2) & (ad_forehand_return_points['lastShotResult'] != 'Error'))])
count_ad_backhand_made = len(ad_backhand_return_points[(ad_backhand_return_points['rallyCount'] > 2) | ((ad_backhand_return_points['rallyCount'] == 2) & (ad_backhand_return_points['lastShotResult'] != 'Error'))])

# Count for Ad Return Points - Error
count_ad_forehand_error = len(ad_forehand_return_points[(ad_forehand_return_points['lastShotResult'] == 'Error') & (ad_forehand_return_points['rallyCount'] == 2)])
count_ad_backhand_error = len(ad_backhand_return_points[(ad_backhand_return_points['lastShotResult'] == 'Error') & (ad_backhand_return_points['rallyCount'] == 2)])

# Display the counts
print("\nAd Forehand Return Points - Made:", count_ad_forehand_made)
print("Ad Forehand Return Points - Error:", count_ad_forehand_error)

print("Ad Backhand Return Points - Made:", count_ad_backhand_made)
print("Ad Backhand Return Points - Error:", count_ad_backhand_error)

print(f"\nAt Net Results for {first_player1Name}:\n")


# Total points where atNetPlayer1 = first_player1Name
total_at_net_player1 = len(point_df_eda[point_df_eda['atNetPlayer1'] == first_player1Name])

# Percentage of points where atNetPlayer1 = 1 out of total points
percentage_at_net_player1 = (total_at_net_player1 / len(point_df_eda)) * 100 if len(point_df_eda) > 0 else 0

# Display the total count and percentage of points where atNetPlayer1 = 1
print(f"Total Net Points for {first_player1Name}: {total_at_net_player1}")
print(f"Percentage of Net Points for {first_player1Name}: {percentage_at_net_player1:.2f}%")

# Points where atNetPlayer1 = first_player1Name and pointWonBy = first_player1Name
at_net_player1_and_won_by_player1 = len(point_df_eda[(point_df_eda['atNetPlayer1'] == first_player1Name) & (point_df_eda['pointWonBy'] == first_player1Name)])

# Percentage of points where atNetPlayer1 = first_player1Name and pointWonBy = first_player1Name out of total points where atNetPlayer1 = 1
percentage_at_net_player1_and_won_by_player1 = (at_net_player1_and_won_by_player1 / total_at_net_player1) * 100 if total_at_net_player1 > 0 else 0

# Display the count and percentage of points where atNetPlayer1 = 1 and pointWonBy = first_player1Name
print(f"\nTotal Net Points won by {first_player1Name}: {at_net_player1_and_won_by_player1}")
print(f"Percentage of Net Points won by {first_player1Name}: {percentage_at_net_player1_and_won_by_player1:.2f}%")



Serve Results for Emon van Loben Sels:

Total Serves: 43
First Serve In (Count): 24
First Serve Won (Count): 22
First Serve In (%): 55.81%
First Serve Won (%): 91.67%
Second Serve In (Count): 11
Second Serve Total (Count): 19
Second Serve Won (Count): 8
Second Serve In (%): 57.89%
Second Serve Won (%): 72.73%
Ace (Count): 6
Double Fault (Count): 8
Points Won on Serve (Count) 33
Points Won on Serve (%): 76.74%

Return Results for Emon van Loben Sels:

Total Return (Count): 50
Return Won (Count): 21
Return Won (%): 51.21951219512195

Return Made (Count): 41
Return Made (%): 0.82
Return Error (Count): 6
Return Winner (Count): 2

Deuce Return (Count): 27
Deuce Return Made (Count): 24
Deuce Return Made (%): 0.8888888888888888
Deuce Return Won by Player1 (%): 62.5
Deuce Return Won by Player1 (Count): 15

Ad Return (Count): 23
Ad Return Made (Count): 17
Ad Return Made (%): 0.7391304347826086
Ad Return Won by Player1 (Count): 8
Ad Return Won by Player1 (%): 47.05882352941176

Deuce Forehand R

### Breakpoint Stats

In [361]:
# Caitlin Breakpoint Data

# points_returned = point_df_eda[point_df_eda[]]

# print(f"\nBreakpoint Results for {first_player1Name}:\n")

# # Total points where isBreakPoint = 1
# total_breakpoint = len(point_df_eda[point_df_eda['isBreakPoint'] == 1])

# # Points where isBreakPoint = 1 and pointWonBy = first_player1Name
# breakpoint_and_won_by_player1 = len(point_df_eda[(point_df_eda['isBreakPoint'] == 1) & (point_df_eda['pointWonBy'] == first_player1Name)])

# # Percentage of points where isBreakPoint = 1 and pointWonBy = first_player1Name out of total points where isBreakPoint = 1
# percentage_breakpoint_and_won_by_player1 = (breakpoint_and_won_by_player1 / total_breakpoint) * 100 if total_breakpoint > 0 else 0

# # Display the total count of points where isBreakPoint = 1
# print(f"Total Breakpoints: {total_breakpoint}")

# # Display the count and percentage of points where isBreakPoint = 1 and pointWonBy = first_player1Name
# print(f"Total Breakpoints won by {first_player1Name}: {breakpoint_and_won_by_player1}")
# print(f"Percentage of Breakpoints won by {first_player1Name}: {percentage_breakpoint_and_won_by_player1:.2f}%")

# # Total points where isBreakPoint = 1 and serverName = first_player1Name
# total_breakpoint_serve = len(point_df_eda[(point_df_eda['isBreakPoint'] == 1) & (point_df_eda['serverName'] == first_player1Name)])

# # Points where isBreakPoint = 1, serverName = first_player1Name, and pointWonBy = first_player1Name
# breakpoint_and_won_by_player1_serve = len(point_df_eda[(point_df_eda['isBreakPoint'] == 1) & (point_df_eda['serverName'] == first_player1Name) & (point_df_eda['pointWonBy'] == first_player1Name)])

# # Percentage of points where isBreakPoint = 1, serverName = first_player1Name, and pointWonBy = first_player1Name out of total points where isBreakPoint = 1 and serverName = first_player1Name
# percentage_breakpoint_and_won_by_player1_serve = (breakpoint_and_won_by_player1_serve / total_breakpoint_serve) * 100 if total_breakpoint_serve > 0 else 0

# # Display the total count of points where isBreakPoint = 1 and serverName = first_player1Name
# print(f"\nTotal Breakpoints on Serve for {first_player1Name}: {total_breakpoint_serve}")

# # Display the count and percentage of points where isBreakPoint = 1, serverName = first_player1Name, and pointWonBy = first_player1Name
# print(f"Total Breakpoints won on Serve by {first_player1Name}: {breakpoint_and_won_by_player1_serve}")
# print(f"Percentage of Breakpoints won on Serve by {first_player1Name}: {percentage_breakpoint_and_won_by_player1_serve:.2f}%")

# Total points where isBreakPoint = 1 and returnerName = first_player1Name
total_breakpoint_return = len(point_df_eda[(point_df_eda['isBreakPoint'] == 1) & (point_df_eda['returnerName'] == first_player1Name)])

# Points where isBreakPoint = 1, returnerName = first_player1Name, and pointWonBy = first_player1Name
breakpoint_and_won_by_player1_return = len(point_df_eda[(point_df_eda['isBreakPoint'] == 1) & (point_df_eda['returnerName'] == first_player1Name) & (point_df_eda['pointWonBy'] == first_player1Name)])

# Percentage of points where isBreakPoint = 1, returnerName = first_player1Name, and pointWonBy = first_player1Name out of total points where isBreakPoint = 1 and returnerName = first_player1Name
percentage_breakpoint_and_won_by_player1_return = (breakpoint_and_won_by_player1_return / total_breakpoint_return) * 100 if total_breakpoint_return > 0 else 0

# Display the total count of points where isBreakPoint = 1 and returnerName = first_player1Name
print(f"\nTotal Breakpoints on Return for {first_player1Name}: {total_breakpoint_return}")

# Display the count and percentage of points where isBreakPoint = 1, returnerName = first_player1Name, and pointWonBy = first_player1Name
print(f"Total Breakpoints won on Return by {first_player1Name}: {breakpoint_and_won_by_player1_return}")
print(f"Percentage of Breakpoints won on Return by {first_player1Name}: {percentage_breakpoint_and_won_by_player1_return:.2f}%")

# Jimmy Returning Games Won
# games won/returning games by Jimmy Hou

points_returned = point_df_eda[point_df_eda["returnerName"] == first_player1Name]

# # Return percentage won on first serve
# first_serves_won = points_returned[(points_returned['firstServeIn'] == 1) & (points_returned['pointWonBy'] == first_player1Name)]
# total_first_serves = points_returned[points_returned['firstServeIn'] == 1]
# fs_won_per_player1 = 100 * len(first_serves_won) / len(total_first_serves)
# # print(f"\n{player1} won {fs_won_per_player1:.2f}% of first serves returned.")
# print(f"\nPerecentage of Breakpoinnts won by {first_player1Name} on Return when returning a first serve: {fs_won_per_player1:.2f}%")

# # Return percentage won on second serve
# second_serves_won = points_returned[(points_returned['secondServeIn'] == 1) & (points_returned['pointWonBy'] == first_player1Name)]
# total_second_serves = points_returned[points_returned['secondServeIn'] == 1]
# ss_won_per_player1 = 100 * len(second_serves_won) / len(total_second_serves)
# # print(f"{player1} won {ss_won_per_player1:.2f}% of second serves returned."
# print(f"Perecentage of Breakpoinnts won by {first_player1Name} on Return when returning a second serve: {ss_won_per_player1:.2f}%")



Total Breakpoints on Return for Emon van Loben Sels: 10
Total Breakpoints won on Return by Emon van Loben Sels: 6
Percentage of Breakpoints won on Return by Emon van Loben Sels: 60.00%


In [362]:
point_df_eda.columns

Index(['pointNumber', 'player1Name', 'player2Name', 'pointScore', 'gameScore',
       'setScore', 'tiebreakScore', 'side', 'serverName', 'returnerName',
       'clientTeam', 'opponentTeam', 'Position', 'pointEndPosition',
       'Duration', 'rallyCount', 'rallyCountFreq', 'firstServeIn',
       'secondServeIn', 'serveResult', 'serveInPlacement', 'firstServeZone',
       'secondServeZone', 'isAce', 'serverFarNear', 'serverStartLocation',
       'returnerStartLocation', 'firstServeXCoord', 'firstServeYCoord',
       'secondServeXCoord', 'secondServeYCoord', 'returnDirection',
       'returnFhBh', 'errorType', 'returnError', 'lastShotDirection',
       'lastShotFhBh', 'lastShotHitBy', 'lastShotResult', 'pointWonBy',
       'isExcitingPoint', 'isBreakPoint', 'atNetPlayer1', 'atNetPlayer2',
       'setNum', 'player1SetScore', 'player2SetScore', 'player1GameScore',
       'player2GameScore', 'player1PointScore', 'player2PointScore',
       'player1TiebreakScore', 'player2TiebreakScore', 'gam

### Serve Win Percentage

In [363]:
# Filter points where serverName is equal to first_player1Name
filtered_points = point_df[point_df['serverName'] == first_player1Name]

# Group the filtered points by player1ServePlacement and count the occurrences
serve_placement_counts = filtered_points.groupby('player1ServePlacement').size()

serve_placements_ad = serve_placement_counts.filter(like='Ad').sum()
serve_placements_deuce = serve_placement_counts.filter(like='Deuce').sum()

# Iterate over filtered_points
for index, point in filtered_points.iterrows():
    serve_placement = point['player1ServePlacement']

    # Check if serve placement is not in serve_placement_counts
    if serve_placement not in serve_placement_counts:
        print(point)


# Initialize dictionaries to store counts and percentages
point_won_counts = {}
point_won_percentages = {}
print(f"Total {len(filtered_points)}")

# Iterate over serve placements
for serve_placement, count in serve_placement_counts.items():
    # Filter points with the specific serve placement
    serve_placement_points = filtered_points[filtered_points['player1ServePlacement'] == serve_placement]
    
    # Count points won by first_player1Name
    point_won_count = serve_placement_points[serve_placement_points['pointWonBy'] == first_player1Name].shape[0]

    # Calculate percentage
    point_won_percentage = (point_won_count / count) * 100 if count > 0 else 0

    # Store counts and percentages
    point_won_counts[serve_placement] = point_won_count
    point_won_percentages[serve_placement] = point_won_percentage
    


    
    
# Filter serve placements for Ad and Deuce [CHANGED: BRIAN NTOES]
serve_placements_ad = serve_placement_counts.filter(like='Ad').sum()
serve_placements_deuce = serve_placement_counts.filter(like='Deuce').sum()    

# Print counts and percentages [CHANGED: BRIAN NOTES]
for serve_placement, count in serve_placement_counts.items():
    print(f"Serve Placement: {serve_placement}")
    print(f"Total Serves: {count}")
    
    if "Deuce" in serve_placement: # [CHANGED: BRIAN NOTES]
        deuce_serve_format_number = f"{count/serve_placements_deuce:.2f}"
        deuce_serve_percent = round(float(deuce_serve_format_number) * 100,2)
        print(f"Serve Frequency: {deuce_serve_percent}% ({count}/{serve_placements_deuce}) Deuce Serves")
    if "Ad" in serve_placement: # [CHANGED: BRIAN NOTES]
        ad_serve_format_number = f"{count/serve_placements_ad:.2f}"
        ad_serve_percent = round(float(ad_serve_format_number) * 100,2)
        print(f"Serve Frequency: {ad_serve_percent}% ({count}/{serve_placements_ad}) Ad Serves")
        
    print(f"Serves Won by {first_player1Name}: {point_won_counts.get(serve_placement, 0)}")
    print(f"Percentage: {point_won_percentages.get(serve_placement, 0):.2f}%\n")

    
# print("This is Ad count: " + str(len(point_df[(point_df['serverName'] == first_player1Name) & (point_df['side'] == 'Ad')])))
# print("This is Deuce count: " + str(len(point_df[(point_df['serverName'] == first_player1Name) & (point_df['side'] == 'Deuce')])))

Total 43
Serve Placement: 
Total Serves: 3
Serves Won by Emon van Loben Sels: 2
Percentage: 66.67%

Serve Placement: Ad: Body
Total Serves: 4
Serve Frequency: 20.0% (4/20) Ad Serves
Serves Won by Emon van Loben Sels: 2
Percentage: 50.00%

Serve Placement: Ad: T
Total Serves: 5
Serve Frequency: 25.0% (5/20) Ad Serves
Serves Won by Emon van Loben Sels: 5
Percentage: 100.00%

Serve Placement: Ad: Wide
Total Serves: 11
Serve Frequency: 55.0% (11/20) Ad Serves
Serves Won by Emon van Loben Sels: 8
Percentage: 72.73%

Serve Placement: Deuce: Body
Total Serves: 3
Serve Frequency: 15.0% (3/20) Deuce Serves
Serves Won by Emon van Loben Sels: 3
Percentage: 100.00%

Serve Placement: Deuce: T
Total Serves: 11
Serve Frequency: 55.0% (11/20) Deuce Serves
Serves Won by Emon van Loben Sels: 8
Percentage: 72.73%

Serve Placement: Deuce: Wide
Total Serves: 6
Serve Frequency: 30.0% (6/20) Deuce Serves
Serves Won by Emon van Loben Sels: 5
Percentage: 83.33%



### Error Stats

In [364]:
print(f"\nError Data for {first_player1Name}:\n")
# Filter the DataFrame based on specified conditions
total_errors = point_df[(point_df['lastShotHitBy'] == first_player1Name) &
                           (point_df['lastShotResult'] == 'Error')]

import numpy as np

# Filter rows without NaN values in relevant columns
forehand_errors = point_df[(point_df['lastShotHitBy'] == first_player1Name) &
                           (point_df['lastShotResult'] == 'Error') &
                           (point_df['lastShotFhBh'] == 'Forehand') &
                           (~point_df['errorType'].isnull())]  # Ensure 'errorType' column doesn't have NaN
 
backhand_errors = point_df[(point_df['lastShotHitBy'] == first_player1Name) &
                           (point_df['lastShotResult'] == 'Error') &
                           (point_df['lastShotFhBh'] == 'Backhand') &
                           (~point_df['errorType'].isnull())]  # Ensure 'errorType' column doesn't have NaN

# Count the occurrences of 'Forehand' and 'Backhand' separately
forehand_counts = forehand_errors.shape[0]  # Count rows
backhand_counts = backhand_errors.shape[0]  # Count rows

# Print the total error counts for verification
total_error_counts = forehand_counts + backhand_counts



# Desired output order 
desired_order = ['Net', 'Long', 'Wide Right', 'Wide Left'] 

# # Get value counts of 'errorType' for Forehand errors [CHANGED: BRIAN]
# forehand_error_types = forehand_errors['errorType'].value_counts(dropna=False).loc[desired_order]  # Include NaN values in count
# forehand_error_types_df = pd.DataFrame(forehand_error_types) # change into dataframe to erase object line
# Get value counts of 'errorType' for Backhand errors



######################### CHANGED SECTION ##############################################
# [CHANGED: BRIAN]

forehand_error_types = forehand_errors['errorType'].value_counts(dropna=False)

# Create a Series with desired index containing zeros
zeros_series = pd.Series(0, index=desired_order)

# Combine the original Series with the zeros Series
forehand_error_types_combined = forehand_error_types.combine(zeros_series, max, fill_value=0)

# Reindex the Series to follow the desired order
forehand_error_types_ordered = forehand_error_types_combined.reindex(desired_order, fill_value=0)

# Create the DataFrame
forehand_error_types_df = pd.DataFrame(forehand_error_types_ordered, columns=['Count'])

# Get value counts of 'errorType' for Backhand errors [CHANGED: BRIAN]
# backhand_error_types = backhand_errors['errorType'].value_counts(dropna=False).loc[desired_order]  # Include NaN values in count
# backhand_error_types_df = pd.DataFrame(backhand_error_types) # change into dataframe to erase object line

# Get value counts of 'errorType' for Backhand errors
backhand_error_types = backhand_errors['errorType'].value_counts(dropna=False)

# Combine the original Series with the zeros Series
backhand_error_types_combined = backhand_error_types.combine(zeros_series, max, fill_value=0)

# Reindex the Series to follow the desired order
backhand_error_types_ordered = backhand_error_types_combined.reindex(desired_order, fill_value=0)

# Create the DataFrame
backhand_error_types_df = pd.DataFrame(backhand_error_types_ordered, columns=['Count'])



######################### CHANGED SECTION ##############################################


# Print the counts and error types
print("Count of Total errors:", total_error_counts)
print("Count of Forehand errors:", forehand_counts)
print(f"Forehand Error %: {(forehand_counts/total_error_counts)*100:.2f}%")
print("Count of Backhand errors:", backhand_counts)
print(f"Backhand Error %: {(backhand_counts/total_error_counts)*100:.2f}%")
print("\nForehand errors:\n", forehand_error_types_df)


# # Group by both 'lastShotDirection' and 'errorType', and then count occurrences
forehand_error_counts = forehand_errors.groupby(['player1LastShotPlacement', 'errorType']).size().unstack(fill_value=0)  # Fill NaN with 0
forehand_error_counts_ordered = forehand_error_counts.reindex(columns = desired_order) # [CHANGED: BRIAN]

print("\nValue counts of 'errorType' for Forehand errors with different directions:\n", forehand_error_counts_ordered)


print("\nBackhand errors:\n", backhand_error_types_df)


# Group by both 'lastShotDirection' and 'errorType', and then count occurrences
backhand_error_counts = backhand_errors.groupby(['player1LastShotPlacement', 'errorType']).size().unstack(fill_value=0) # Fill NaN with 0
backhand_error_counts_ordered = backhand_error_counts.reindex(columns = desired_order) # [CHANGED: BRIAN]

print("\nValue counts of 'errorType' for Backhand errors with different directions:\n", backhand_error_counts_ordered)


Error Data for Emon van Loben Sels:

Count of Total errors: 24
Count of Forehand errors: 9
Forehand Error %: 37.50%
Count of Backhand errors: 15
Backhand Error %: 62.50%

Forehand errors:
             Count
Net             5
Long            3
Wide Right      1
Wide Left       0

Value counts of 'errorType' for Forehand errors with different directions:
 errorType                 Net  Long  Wide Right  Wide Left
player1LastShotPlacement                                  
Crosscourt                  2     2           1        NaN
Down the Line               3     1           0        NaN

Backhand errors:
             Count
Net             8
Long            2
Wide Right      5
Wide Left       0

Value counts of 'errorType' for Backhand errors with different directions:
 errorType                 Net  Long  Wide Right  Wide Left
player1LastShotPlacement                                  
Crosscourt                  1     2           3        NaN
Down the Line               7     0         