### Tennis ###
By:<br>
\-> Vlad Liviu Alexandru <br>
\-> Sandu Victor Iulian<br>

_**Run [Setup](#setup) before anything else!**_<br>


Data sources: [Jeff Sackmann (GitHub)](https://github.com/JeffSackmann/tennis_atp)<br>
Variables descriptions: [Jeff Sackmann (GitHub)](https://github.com/JeffSackmann/tennis_atp/blob/master/matches_data_dictionary.txt)

### Setup ###

In [None]:
# ========== CONFIGURATION =========

# If set to false, disables printing of big data like tables.
# Default: False
ShowBigPrints = True


# ============= Imports ============
import pandas as pd
import numpy as np
import dateutil as du
import re
from itertools import permutations
from itertools import product

# ======== Global variables ========
class Paths:
  atp_matches_2020 = r".\data\atp_matches_2020.csv"

data = pd.read_csv(Paths.atp_matches_2020)

# ========== Custom logic ==========
def cfp(f, *a): # custom function print
  if ShowBigPrints:
    f(*a)
  else:
    print("> Big print hidden.")

def cp(*a): # custom print
  cfp(print, *a)

def p(*a):
  print(*a)

### 1. Data processing ###

In [None]:
# Turn int to date object
data['tourney_date'] = data['tourney_date'].astype(str).apply(du.parser.parse)

# Create unique id
if data.columns[0] != 'Id':
  data.insert(0, 'Id', data['tourney_id'].astype(str) + '-' + data['match_num'].astype(str))

# Deleting columns that are not necessary
data.drop(columns= [c for c in data.columns.to_list() if re.search('(seed|entry)', c)], inplace=True)

In [None]:
# Initial Dataset
p(f"{"Dataset:":<13} atp_matches_2020")
p(f"{"Variables:":<13} {data.shape[1]}")
p(f"{"Observations:":<13} {data.shape[0]}")
cfp(display, data)

## 1.1 About the data


Here is the explanation of each column in the DataFrame:

- `Id`: Unique identifier for each match.
- `tourney_id`: Identifier for the tournament.
- `tourney_name`: Name of the tournament.
- `surface`: Surface type of the court (e.g., Hard, Clay, Grass).
- `draw_size`: Number of players participating in the tournament.
- `tourney_level`: Level of the tournament (e.g., A for ATP Tour, G for Grand Slam).
- `tourney_date`: Date of the tournament.
- `match_num`: Match number within the tournament.
- `winner_id`: Unique identifier for the winner player.
- `winner_seed`: Seed of the winner player.
- `winner_entry`: Entry type of the winner player (e.g., WC for Wild Card, Q for Qualifier).
- `winner_name`: Name of the winner player.
- `winner_hand`: Dominant hand of the winner player (e.g., R for Right, L for Left).
- `winner_ht`: Height of the winner player.
- `winner_ioc`: Country code of the winner player.
- `winner_age`: Age of the winner player.
- `loser_id`: Unique identifier for the loser player.
- `loser_seed`: Seed of the loser player.
- `loser_entry`: Entry type of the loser player.
- `loser_name`: Name of the loser player.
- `loser_hand`: Dominant hand of the loser player.
- `loser_ht`: Height of the loser player.
- `loser_ioc`: Country code of the loser player.
- `loser_age`: Age of the loser player.
- `score`: Match score.
- `best_of`: Number of sets required to win the match.
- `round`: Round of the tournament.
- `minutes`: Duration of the match in minutes.
- `w_ace`: Number of aces hit by the winner player.
- `w_df`: Number of double faults committed by the winner player.
- `w_svpt`: Number of service points played by the winner player.
- `w_1stIn`: Number of first serves in by the winner player.
- `w_1stWon`: Number of first serve points won by the winner player.
- `w_2ndWon`: Number of second serve points won by the winner player.
- `w_SvGms`: Number of service games won by the winner player.
- `w_bpSaved`: Number of break points saved by the winner player.
- `w_bpFaced`: Number of break points faced by the winner player.
- `l_ace`: Number of aces hit by the loser player.
- `l_df`: Number of double faults committed by the loser player.
- `l_svpt`: Number of service points played by the loser player.
- `l_1stIn`: Number of first serves in by the loser player.
- `l_1stWon`: Number of first serve points won by the loser player.
- `l_2ndWon`: Number of second serve points won by the loser player.
- `l_SvGms`: Number of service games won by the loser player.
- `l_bpSaved`: Number of break points saved by the loser player.
- `l_bpFaced`: Number of break points faced by the loser player.
- `winner_rank`: Ranking of the winner player.
- `winner_rank_points`: Ranking points of the winner player.
- `loser_rank`: Ranking of the loser player.
- `loser_rank_points`: Ranking points of the loser player.

### 2. Computations ###

Firstly, we will compute the win percentage of players based on their hand preference

In [None]:
labels = {\
  'R': f'{'Right-handed':<13}',\
  'L': f'{'Left-handed':<13}',\
  'U': f'{'Unknown':<13}',\
}

wins = {l: (sum(data['winner_hand']==l), sum(data['loser_hand']==l)) for l in labels.keys()}


for l in list(labels.keys()):
  p((\
    f'{labels[l]}'
    f'{{wins: {wins[l][0]:>4}, '
    f'loses: {wins[l][1]:>4} }}'
  ))

Now, we'll compute the actual percentages by dividing our findings to the total

In [None]:
wins = {l: (*wins[l], wins[l][0]/sum(wins[l]), wins[l][0]/data['winner_hand'].count()) for l in labels.keys()}

for l in list(labels.keys()):
  p((\
    f'{labels[l]}'
    f'{{ wins: {wins[l][0]:>4} '
    f'loses: {wins[l][1]:>4} '
    f'win: {round(wins[l][2]*100,2):>5}% '
    f'Total: {round(wins[l][3]*100,2):>5}% '
    f'}}'
  ))

In [None]:
if 'U' in labels: del labels['U']
perms = list(product(labels.keys(), repeat=2))

matchups = {\
  f'{perm[0]}-{perm[1]}':\
  data[(data['winner_hand'] == perm[0]) & (data['loser_hand'] == perm[1])].shape[0]\
  for perm in perms
  }

for m in matchups.keys():
  p(f'{m}: {matchups[m]}')



### Python Requirements solved: ###

- accessing data with loc and iloc<br>
- processing of data sets with merge / join<br>
- graphical representation of the data with the matplotlib package<br>
- using scikit-learn package (clustering, logistic regression)<br>
- using statmodels package (multiple regression).<br>
<br>
- ~~using lists and dictionaries, including their specific methods~~<br>
- ~~using sets and tuples, including their specific methods~~<br>
- ~~defining and calling some functions~~<br>
- ~~using conditional structures~~<br>
- ~~using repetitive structures~~<br>
- ~~importing a csv or json file into the pandas package~~<br>
- ~~modifying data in the pandas package~~<br>
- ~~using group functions~~<br>
- ~~dealing with missing values~~<br>
- ~~deleting columns and records~~<br>
- ~~statistical processing, grouping and aggregation of data in the pandas package~~<br>
