# Obdelava teniških podatkov

## Zajeti podatki

* ime turnirja
* leto v katerem se je turnir izvajal
* ime igralcev prisotnih na dvoboju
* podalaga na kateri se je turnir igral

## Analiza

S pomočjo teh podatkov bom analiziral število odigranig iger glede na dvoboj skozi leta v odvisnosti od podlage. Analizo je možno skrčiti tudi na posamezen turnir ali igralca. Prav tako je se lahko osredotočimo tudi na dvoboje v katerih je določen igralec zmagal ali izgubil.

## Hipoteza

* včasih so igralci v povprečju na dvoboj igrali manj iger kot jih igrajo danes, ker so bile podlage hitrejše

## Viri

Vsi podatki so bili pobrani s spletne strani [ATP World tour](http://www.atpworldtour.com/en). Zajeti so vsi podatki od leta 1968 do uradnega dela sezone 2015

In [3]:
# Render our plots inline
%matplotlib inline

import pandas as pd
import matplotlib.pyplot as plt

pd.set_option('display.mpl_style', 'default') # Make the graphs a bit prettier
plt.rcParams['figure.figsize'] = (15, 5)

Tu naložimo datoteke podatki, igralci in turnirji. Pozor pot morda ni pravilna.

In [4]:
podatki = pd.read_csv('C:/Users/Miha/pandas-cookbook/cookbook/projekt/podatki.csv')

In [5]:
igralci = pd.read_csv('C:/Users/Miha/pandas-cookbook/cookbook/projekt/igralci.csv')

In [6]:
turnirji = pd.read_csv('C:/Users/Miha/pandas-cookbook/cookbook/projekt/turnirji.csv')

V tabeli podatki vidimo indeks turnirja, ki je nastal v programu in nima nobenega pomena, leto turnirja, podlago, st_iger(za nepoznavalce: V tenisu se tekma igra na 2 ali 3 dobljene sete. Vsak set pa se igra na 6 dobljenih iger), indeks porazenca in zmagovalca.

In [12]:
podatki

Unnamed: 0,turnir,leto,podlaga,st_iger,zmagovalec,porazenec
0,0,1993,Hard,18,0,1
1,0,1993,Hard,29,1,2
2,0,1993,Hard,19,0,3
3,0,1993,Hard,20,1,4
4,0,1993,Hard,20,0,5
5,0,1993,Hard,32,3,6
6,0,1993,Hard,19,2,7
7,0,1993,Hard,15,1,8
8,0,1993,Hard,16,0,9
9,0,1993,Hard,19,4,10


Spodaj sem konstruiral 3 dodatne tabele. Vsaka je dodala ime turnirja\zmagovalca\poraženca

In [32]:
tabela = podatki.merge(turnirji,left_on="turnir", right_on="id")

In [55]:
tabela1 = tabela.merge(igralci,left_on="zmagovalec", right_on="id")

In [56]:
tabela2 = tabela1.merge(igralci,left_on="porazenec", right_on="id")

Primer ki kaze koliko iger je lansko leto na posamičnem dvoboju na Wimbeldonu potreboval Djokovic, da je osvojil turnir

In [59]:
tabela2[(tabela2.turnir == 254) & (tabela2.leto == 2015) & (tabela2.ime_y == "Novak Djokovic")]

Unnamed: 0,turnir,leto,podlaga,st_iger,zmagovalec,porazenec,id_x,ime_x,id_y,ime_y,id,ime
51393,254,2015,Grass,19,735,673,254,wimbledon,735,Novak Djokovic,673,Roger Federer
53189,254,2015,Grass,28,735,726,254,wimbledon,735,Novak Djokovic,726,Philipp Kohlschreiber
63238,254,2015,Grass,31,735,306,254,wimbledon,735,Novak Djokovic,306,Jarkko Nieminen
68835,254,2015,Grass,20,735,162,254,wimbledon,735,Novak Djokovic,162,Richard Gasquet
73358,254,2015,Grass,20,735,314,254,wimbledon,735,Novak Djokovic,314,Bernard Tomic
79817,254,2015,Grass,29,735,287,254,wimbledon,735,Novak Djokovic,287,Kevin Anderson
83579,254,2015,Grass,30,735,1891,254,wimbledon,735,Novak Djokovic,1891,Marin Cilic


In [61]:
podatki["desetletje"] = 10 * (podatki["leto"]//10)

In [66]:
podatki.groupby("desetletje").mean()

Unnamed: 0_level_0,turnir,leto,st_iger,zmagovalec,porazenec
desetletje,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1960,151.047714,1968.651093,29.305666,1075.592445,1626.182406
1970,142.303085,1974.820336,23.942969,853.293947,1011.729306
1980,142.722298,1984.353472,24.239878,886.824051,987.176433
1990,143.701248,1994.255802,22.249144,546.000913,638.100624
2000,143.72997,2004.886021,21.700511,553.91066,664.99032
2010,145.013051,2012.467746,21.134346,747.495841,945.642192


Podatki kazejo da se je skozi leta povprečno število iger zmanjševalo, kar nasprotuje naši predpostavki. Oglejmo si to še natančneje za posamezno podlago

In [67]:
podatki[podatki.podlaga == "Grass"].groupby("desetletje").mean()

Unnamed: 0_level_0,turnir,leto,st_iger,zmagovalec,porazenec
desetletje,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1960,193.083333,1968.567073,31.319106,1015.857724,1535.719512
1970,180.221467,1973.996603,29.350883,868.811481,1137.779891
1980,185.613884,1984.506718,28.572937,825.87652,907.230326
1990,202.081081,1994.468955,24.326516,622.424763,673.392622
2000,202.170242,2004.912457,22.855017,671.10519,765.390657
2010,204.473829,2012.439984,22.427784,816.794569,986.681621


Sedaj pa si isto oglejmo še za posameznega igralca

In [74]:
tabela2[(tabela2.ime_y == "Novak Djokovic") | (tabela2.ime == "Novak Djokovic")].groupby("leto").mean()

Unnamed: 0_level_0,turnir,st_iger,zmagovalec,porazenec,id_x,id_y,id,desetletje
leto,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2004,142.5,16.5,157.5,735.0,142.5,157.5,735.0,2000
2005,199.75,21.0,568.916667,535.5,199.75,568.916667,535.5,2000
2006,179.35,22.8,632.075,836.925,179.35,632.075,836.925,2000
2007,158.09589,23.164384,665.835616,515.575342,158.09589,665.835616,515.575342,2000
2008,134.328125,23.421875,676.71875,639.03125,134.328125,676.71875,639.03125,2000
2009,132.25,21.336957,706.445652,645.619565,132.25,706.445652,645.619565,2000
2010,141.9,24.157143,697.842857,588.871429,141.9,697.842857,588.871429,2010
2011,139.96,22.36,721.986667,514.413333,139.96,721.986667,514.413333,2010
2012,149.723684,22.736842,686.157895,542.552632,149.723684,686.157895,542.552632,2010
2013,151.943662,22.71831,675.239437,552.056338,151.943662,675.239437,552.056338,2010


In [75]:
tabela2[(tabela2.ime_y == "Novak Djokovic")].groupby("leto").mean()

Unnamed: 0_level_0,turnir,st_iger,zmagovalec,porazenec,id_x,id_y,id,desetletje
leto,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2005,226.2,19.4,735,256.2,226.2,735,256.2,2000
2006,188.04,23.0,735,898.08,188.04,735,898.08,2000
2007,159.237288,22.542373,735,463.508475,159.237288,735,463.508475,2000
2008,129.54902,24.176471,735,614.568627,129.54902,735,614.568627,2000
2009,127.868421,21.789474,735,626.802632,127.868421,735,626.802632,2000
2010,140.166667,24.333333,735,545.574074,140.166667,735,545.574074,2010
2011,141.366197,22.464789,735,501.985915,141.366197,735,501.985915,2010
2012,148.166667,22.575758,735,513.393939,148.166667,735,513.393939,2010
2013,148.596774,22.290323,735,525.5,148.596774,735,525.5,2010
2014,155.816667,22.266667,735,617.016667,155.816667,735,617.016667,2010


In [76]:
tabela2[(tabela2.ime == "Novak Djokovic")].groupby("leto").mean()

Unnamed: 0_level_0,turnir,st_iger,zmagovalec,porazenec,id_x,id_y,id,desetletje
leto,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2004,142.5,16.5,157.5,735,142.5,157.5,735,2000
2005,180.857143,22.142857,450.285714,735,180.857143,450.285714,735,2000
2006,164.866667,22.466667,460.533333,735,164.866667,460.533333,735,2000
2007,153.285714,25.785714,374.357143,735,153.285714,374.357143,735,2000
2008,153.076923,20.461538,448.076923,735,153.076923,448.076923,735,2000
2009,153.0625,19.1875,570.8125,735,153.0625,570.8125,735,2000
2010,147.75,23.5625,572.4375,735,147.75,572.4375,735,2010
2011,115.0,20.5,491.0,735,115.0,491.0,735,2010
2012,160.0,23.8,363.8,735,160.0,363.8,735,2010
2013,175.0,25.666667,263.555556,735,175.0,263.555556,735,2010


Iz teh tabel lahko vidimo, da je Djokovic na začetku kariere hitreje igubljal kot zmagoval. V preteklih letih pa potrebuje manj iger da zmaga kot izgubi. Ker igra z boljšimi igralci so vsa števila narasla

Podatki nasprotujejo naši hipotezi, da se je zaradi hitrejših podlag včasih igralo manj iger. To gre verjetno pripisati temu, da so zaradi hiterjših podlag igralci lažje dobivali igre na svoj servis kar je pripomoglo k večim igram.