### Tennis ###
By:<br>
\-> Vlad Liviu Alexandru <br>
\-> Sandu Victor Iulian<br>

_**Run [Setup](#setup) before anything else!**_<br>


Data sources: [Jeff Sackmann (GitHub)](https://github.com/JeffSackmann/tennis_atp)<br>
Variables descriptions: [Jeff Sackmann (GitHub)](https://github.com/JeffSackmann/tennis_atp/blob/master/matches_data_dictionary.txt)

### Setup ###

In [None]:
# ========== CONFIGURATION =========

# If set to false, disables printing of big data like tables.
# Default: False
ShowBigPrints = True


# ============= Imports ============
import pandas as pd
import numpy as np
import dateutil as du
import re
from itertools import permutations
from itertools import product

# ======== Global variables ========
class Paths:
  atp_matches_2020 = r".\data\atp_matches_2020.csv"

data = pd.read_csv(Paths.atp_matches_2020)

# ========== Custom logic ==========
def cfp(f, *a): # custom function print
  if ShowBigPrints:
    f(*a)
  else:
    print("> Big print hidden.")

def cp(*a): # custom print
  cfp(print, *a)

def p(*a):
  print(*a)

### 1. Data processing ###

In [None]:
# Turn int to date object
data['tourney_date'] = data['tourney_date'].astype(str).apply(du.parser.parse)

# Create unique id
if data.columns[0] != 'Id':
  data.insert(0, 'Id', data['tourney_id'].astype(str) + '-' + data['match_num'].astype(str))

# Deleting columns that are not necessary
data.drop(columns= [c for c in data.columns.to_list() if re.search('(seed|entry)', c)], inplace=True)

In [None]:
# Initial Dataset
p(f"{"Dataset:":<13} atp_matches_2020")
p(f"{"Variables:":<13} {data.shape[1]}")
p(f"{"Observations:":<13} {data.shape[0]}")
cfp(display, data)

### 2. Computations ###

Firstly, we will compute the win percentage of players based on their hand preference

In [None]:
labels = {\
  'R': f'{'Right-handed':<13}',\
  'L': f'{'Left-handed':<13}',\
  'U': f'{'Unknown':<13}',\
}

wins = {l: (sum(data['winner_hand']==l), sum(data['loser_hand']==l)) for l in labels.keys()}


for l in list(labels.keys()):
  p((\
    f'{labels[l]}'
    f'{{wins: {wins[l][0]:>4}, '
    f'loses: {wins[l][1]:>4} }}'
  ))

Now, we'll compute the actual percentages by dividing our findings to the total

In [None]:
wins = {l: (*wins[l], wins[l][0]/sum(wins[l]), wins[l][0]/data['winner_hand'].count()) for l in labels.keys()}

for l in list(labels.keys()):
  p((\
    f'{labels[l]}'
    f'{{ wins: {wins[l][0]:>4} '
    f'loses: {wins[l][1]:>4} '
    f'win: {round(wins[l][2]*100,2):>5}% '
    f'Total: {round(wins[l][3]*100,2):>5}% '
    f'}}'
  ))

In [None]:
if 'U' in labels: del labels['U']
perms = list(product(labels.keys(), repeat=2))

matchups = {\
  f'{perm[0]}-{perm[1]}':\
  data[(data['winner_hand'] == perm[0]) & (data['loser_hand'] == perm[1])].shape[0]\
  for perm in perms
  }

for m in matchups.keys():
  p(f'{m}: {matchups[m]}')



### Python Requirements solved: ###

- accessing data with loc and iloc<br>
- processing of data sets with merge / join<br>
- graphical representation of the data with the matplotlib package<br>
- using scikit-learn package (clustering, logistic regression)<br>
- using statmodels package (multiple regression).<br>
<br>
- ~~using lists and dictionaries, including their specific methods~~<br>
- ~~using sets and tuples, including their specific methods~~<br>
- ~~defining and calling some functions~~<br>
- ~~using conditional structures~~<br>
- ~~using repetitive structures~~<br>
- ~~importing a csv or json file into the pandas package~~<br>
- ~~modifying data in the pandas package~~<br>
- ~~using group functions~~<br>
- ~~dealing with missing values~~<br>
- ~~deleting columns and records~~<br>
- ~~statistical processing, grouping and aggregation of data in the pandas package~~<br>
