# 5. Value Betting Calculator

In this notebook we'll make use of our match dataset and machine learning model we've built in the last stage, to predict the chances for each player to win in any given match.

We'll then compare these chances with the odds available for those matches, and build a Dataframe where each row will contain information regarding which player we would have to bet on!

Let's import the libraries needed for this notebook:

In [1]:
import pandas as pd
import pickle

Now, we'll load the model trained in last notebook:

In [2]:
model = pickle.load(open("logistic_regression.lr", 'rb'))

Now, let's open the Coefficients dataframe saved in the last notebook, and let's create a dictionary object based on the dataframe, where the key is going to be the attribute name and the value will be the attribute's coefficient:

In [3]:
coefs_df = pd.read_csv("csv/Coefficients.csv")

In [4]:
coefs_df.head()

Unnamed: 0,Column,Coef
0,Indoor,0.005383
1,Outdoor,0.032588
2,Carpet,0.026138
3,Clay,0.003488
4,Grass,-0.023455


In [5]:
coefs_map = {}
for ix, row in coefs_df.iterrows():

    attribute = row['Column']
    attribute_coef = row['Coef']

    coefs_map[attribute] = attribute_coef

In [6]:
coefs_map

{'Indoor': 0.005383289793538903,
 'Outdoor': 0.03258838314694357,
 'Carpet': 0.026137510216021474,
 'Clay': 0.003487929835943539,
 'Grass': -0.023455443615196872,
 'Hard': 0.031801676503714314,
 'ATP250': 0.015870547750738567,
 'ATP500': 0.023723057429522527,
 'Grand Slam': -0.008729300106400953,
 'Masters 1000': 0.044999375612260695,
 'Masters Cup': -0.037892007745636565,
 '1st Round': -0.005793480204798979,
 '2nd Round': 0.017924331283681982,
 '3rd Round': -0.0734432367701008,
 '4th Round': 0.02242108081018733,
 'Quarterfinals': -0.026571046639702316,
 'Round Robin': 0.09054984105651548,
 'Semifinals': -0.007501679059091242,
 'The Final': 0.02038586246379168,
 'Rank Index': 0.3709012514680601,
 'Pl0 Recent Form': -0.29802038918091056,
 'Pl0 Form': 0.30286310724108145,
 'Pl1 Recent Form': 0.2979275608951469,
 'Pl1 Form': -0.5040871652507205,
 'Pl0 Perf. vs Similar Opponent': -2.453412915700585,
 'Pl1 Perf. vs Similar Opponent': 2.236855956851412,
 'Pl0 Surface Performance': -3.0225757

Let's make a quick analysis on the greatest indicators for player 1 to win a match against player 0.
Let's see which attributes matter the most:

In [8]:
coefs_df[abs(coefs_df['Coef']) > 0.1].sort_values('Coef', ascending = False)

Unnamed: 0,Column,Coef
27,Pl1 Surface Performance,3.390991
25,Pl1 Perf. vs Similar Opponent,2.236856
19,Rank Index,0.370901
21,Pl0 Form,0.302863
22,Pl1 Recent Form,0.297928
31,Reliability Pl1,-0.102739
20,Pl0 Recent Form,-0.29802
23,Pl1 Form,-0.504087
24,Pl0 Perf. vs Similar Opponent,-2.453413
26,Pl0 Surface Performance,-3.022576


We can see that the most important features are:
- Player performance on specific surface
- Player performance against similar ranked opponents
- The opponent's form
- The rank index
- The players recent form

Let's load our matches Dataset

In [9]:
df = pd.read_csv("csv/FeatureCalculated_Data.csv")

Now our goal is to build another Dataframe, containing the informations about the the players, who wins each match, the average and maximum odds available for each player, and we'll use our loaded model to calculate the predicted chances to win for each player:

In [10]:
betting_data_df = df[(df["Avg_Pl0"] > 1) & (df["Avg_Pl1"] > 1)].copy()

In [11]:
inputs = betting_data_df.iloc[:, 14:]

In [12]:
predicted_outputs = model.predict_proba(inputs)

In [13]:
index = 0
for ix, row in betting_data_df.iterrows():
    prob_pl0 = predicted_outputs[index][0] 
    prob_pl1 = predicted_outputs[index][1] 
    betting_data_df.loc[ix, 'Pl0 %'] = prob_pl0
    betting_data_df.loc[ix, 'Pl1 %'] = prob_pl1
    index += 1

In [14]:
betting_data_df.head()

Unnamed: 0,Date,Player 0,Player 1,Won,Pl0_Rank,Pl1_Rank,Avg_Pl0,Avg_Pl1,Max_Pl1,Max_Pl0,...,Pl0 Perf. vs Similar Opponent,Pl1 Perf. vs Similar Opponent,Pl0 Surface Performance,Pl1 Surface Performance,H2H Index,Exp Index,Reliability Pl0,Reliability Pl1,Pl0 %,Pl1 %
29521,2010-04-19,De Bakker T.,Falla A.,0.0,67,58,1.52,2.46,2.79,1.62,...,0.5,0.39,0.46,0.39,0.0,-0.526316,0.651163,0.665272,0.618568,0.381432
29522,2010-04-19,Starace P.,Hajek J.,1.0,61,86,1.34,3.12,3.64,1.4,...,0.52,0.4,0.52,0.39,0.23438,0.74902,0.727273,0.795699,0.678245,0.321755
29523,2010-04-19,Schwank E.,Fognini F.,0.0,59,74,1.99,1.77,1.85,2.14,...,0.46,0.6,0.45,0.58,-0.23438,-0.278107,0.696629,0.698541,0.324943,0.675057
29524,2010-04-19,Rochus C.,Garcia-Lopez G.,1.0,122,42,3.02,1.36,1.45,3.25,...,0.36,0.59,0.43,0.5,0.0,0.137681,0.73,0.6496,0.326772,0.673228
29525,2010-04-19,Bellucci T.,Nieminen J.,0.0,33,64,1.46,2.63,3.09,1.5,...,0.57,0.46,0.53,0.53,-0.23438,-0.712,0.70765,0.68612,0.51606,0.48394


Let's define a helper funxtion that takes a change in input and returns the 'fair' odds for that chance value.

In [15]:
def right_odds(chance):
    return round(1/chance,2)

Now, we have everything we need to make decisions based on which player, for each match, we get the most value betting on!
In fact, next we're going to append to this betting dataframe another 4 columns.

The fair odds for both player 0 and player 1, based on the chances of each player to win according to the model, and the value that we would get betting on each player at the average odds available!

We use the average odds to determine value in order to get a more realistic outcome.

In [16]:
for ix, row in betting_data_df.iterrows():
    right_odds_p0 = right_odds(row['Pl0 %'])
    right_odds_p1 = right_odds(row['Pl1 %'])
    bet_value_p0 = (row['Avg_Pl0'] - right_odds_p0)/max(right_odds_p0,row['Avg_Pl0'])
    bet_value_p1 = (row['Avg_Pl1'] - right_odds_p1)/max(right_odds_p1,row['Avg_Pl1'])
    betting_data_df.loc[ix, 'Fair odds for Pl0'] = right_odds_p0
    betting_data_df.loc[ix, 'Fair odds for Pl1'] = right_odds_p1
    betting_data_df.loc[ix, 'Bet on Pl0 Value'] = bet_value_p0
    betting_data_df.loc[ix, 'Bet on Pl1 Value'] = bet_value_p1

In [17]:
betting_data_df.head()

Unnamed: 0,Date,Player 0,Player 1,Won,Pl0_Rank,Pl1_Rank,Avg_Pl0,Avg_Pl1,Max_Pl1,Max_Pl0,...,H2H Index,Exp Index,Reliability Pl0,Reliability Pl1,Pl0 %,Pl1 %,Fair odds for Pl0,Fair odds for Pl1,Bet on Pl0 Value,Bet on Pl1 Value
29521,2010-04-19,De Bakker T.,Falla A.,0.0,67,58,1.52,2.46,2.79,1.62,...,0.0,-0.526316,0.651163,0.665272,0.618568,0.381432,1.62,2.62,-0.061728,-0.061069
29522,2010-04-19,Starace P.,Hajek J.,1.0,61,86,1.34,3.12,3.64,1.4,...,0.23438,0.74902,0.727273,0.795699,0.678245,0.321755,1.47,3.11,-0.088435,0.003205
29523,2010-04-19,Schwank E.,Fognini F.,0.0,59,74,1.99,1.77,1.85,2.14,...,-0.23438,-0.278107,0.696629,0.698541,0.324943,0.675057,3.08,1.48,-0.353896,0.163842
29524,2010-04-19,Rochus C.,Garcia-Lopez G.,1.0,122,42,3.02,1.36,1.45,3.25,...,0.0,0.137681,0.73,0.6496,0.326772,0.673228,3.06,1.49,-0.013072,-0.087248
29525,2010-04-19,Bellucci T.,Nieminen J.,0.0,33,64,1.46,2.63,3.09,1.5,...,-0.23438,-0.712,0.70765,0.68612,0.51606,0.48394,1.94,2.07,-0.247423,0.212928


As we can see, there are bets for which nor betting on Player 0 nor on Player 1 would be a good choice (Bet on Pl0 Value and Bet on Pl1 Value are both negative).
Since there's no point in betting where we don't think to have an edge, we will remove these records from the dataframe:

In [18]:
value_betting_data_df = betting_data_df[(betting_data_df["Bet on Pl0 Value"] > 0) | (betting_data_df["Bet on Pl1 Value"] > 0)].copy()

In [19]:
value_betting_data_df.head()

Unnamed: 0,Date,Player 0,Player 1,Won,Pl0_Rank,Pl1_Rank,Avg_Pl0,Avg_Pl1,Max_Pl1,Max_Pl0,...,H2H Index,Exp Index,Reliability Pl0,Reliability Pl1,Pl0 %,Pl1 %,Fair odds for Pl0,Fair odds for Pl1,Bet on Pl0 Value,Bet on Pl1 Value
29522,2010-04-19,Starace P.,Hajek J.,1.0,61,86,1.34,3.12,3.64,1.4,...,0.23438,0.74902,0.727273,0.795699,0.678245,0.321755,1.47,3.11,-0.088435,0.003205
29523,2010-04-19,Schwank E.,Fognini F.,0.0,59,74,1.99,1.77,1.85,2.14,...,-0.23438,-0.278107,0.696629,0.698541,0.324943,0.675057,3.08,1.48,-0.353896,0.163842
29525,2010-04-19,Bellucci T.,Nieminen J.,0.0,33,64,1.46,2.63,3.09,1.5,...,-0.23438,-0.712,0.70765,0.68612,0.51606,0.48394,1.94,2.07,-0.247423,0.212928
29526,2010-04-19,Krajinovic F.,Chela J.I.,1.0,328,53,5.5,1.13,1.18,6.76,...,0.0,-0.991471,0.739496,0.694878,0.28598,0.71402,3.5,1.4,0.363636,-0.192857
29529,2010-04-19,Cuevas P.,Zeballos H.,0.0,54,50,1.97,1.77,1.87,2.15,...,0.0,0.44086,0.669421,0.680751,0.608585,0.391415,1.64,2.55,0.167513,-0.305882


Let's rename some of the columns to help the readability of the dataframe

In [20]:
value_betting_data_df.rename(columns={"Avg_Pl1":"Pl1 Avg odds", "Avg_Pl0":"Pl0 Avg odds", "Max_Pl1":"Pl1 Max odds", 
                                "Max_Pl0":"Pl0 Max odds", "Won":"Winner"},inplace = True)

Let's save this dataframe!

In [21]:
value_betting_data_df.to_csv("csv/Betting_Data.csv", index=False)