# 5. Value Betting Calculator

In this notebook we'll make use of our match dataset and the Logistc Regression model we've built in the last stage, to predict the chances for each player to win in any given match.

We'll then compare these chances with the odds available for those matches, and build a Dataframe where each row will contain information regarding which player we would have to bet on!



trying to simulate bets on matches where we think to have an edge on the betting market (i.e. a value bet occurs when the real odds for an event are higher than the odds "predicted" by our model).

Finally, we'll compare several betting strategies and we'll find out if it's possible to make a good ROI backtesting our betting strategies.

Let's import the libraries needed for this notebook:

In [1]:
import pandas as pd
from math import exp
import pickle

Now, we'll load the Logistc Regression model trained in last notebook:

In [2]:
lreg = pickle.load(open("logistic_regression.lr", 'rb'))

Let's take a look at the coefficients and the intercept of this model:

In [3]:
lreg.coef_

array([[-1.72268220e-03, -2.85619140e-02,  2.06740250e-02,
         2.79727410e-02,  2.20402586e-01, -2.21364314e-01,
        -1.84420699e-01,  5.15308023e-02,  1.04770000e-01,
         7.34423981e-02, -3.20472172e-01,  0.00000000e+00,
         1.98313157e-01, -3.99269171e-02,  0.00000000e+00,
         1.64570494e-02,  7.87906639e-02,  1.31455934e-01,
        -5.50446504e-02, -1.77932522e-01, -6.98009296e-02,
         9.93604574e-02, -2.14261743e-02,  6.95745371e-02,
        -9.32666760e-02,  0.00000000e+00,  0.00000000e+00,
         9.62507052e-02,  0.00000000e+00,  3.21911800e-02,
         1.41731123e-02,  4.59924553e-02,  1.85087699e-01,
        -9.13550358e-02,  2.38733880e-02, -1.05396245e-01,
        -1.70035371e-02,  0.00000000e+00,  0.00000000e+00,
         6.71084407e-02,  0.00000000e+00,  1.16951460e-01,
         1.03133453e-01, -1.16351771e-01, -1.90647406e-02,
        -1.18149991e-01,  0.00000000e+00,  1.45689500e-01,
        -2.59640817e-02,  9.74332204e-02,  0.00000000e+0

In [4]:
lreg.intercept_

array([-0.00742354])

Now, let's open the Coefficients dataframe saved in the last notebook, and let's create a dictionary object based on the dataframe, where the key is going to be the attribute name and the value will be the attribute's coefficient:

In [5]:
coefs_df = pd.read_csv("csv/Coefficients.csv")

In [6]:
coefs_df.head()

Unnamed: 0,Column,Coef
0,Acapulco,-0.001723
1,Adelaide,-0.028562
2,Amersfoort,0.020674
3,Amsterdam,0.027973
4,Atlanta,0.220403


In [7]:
coefs_map = {}
for ix, row in coefs_df.iterrows():

    attribute = row['Column']
    attribute_coef = row['Coef']

    coefs_map[attribute] = attribute_coef

In [8]:
coefs_map

{'Acapulco': -0.0017226821977985485,
 'Adelaide': -0.02856191401461512,
 'Amersfoort': 0.02067402499223373,
 'Amsterdam': 0.027972741043474005,
 'Atlanta': 0.2204025864308367,
 'Auckland': -0.22136431421214905,
 'Bangkok': -0.1844206986906572,
 'Barcelona': 0.05153080227687189,
 'Basel': 0.10477000046109268,
 'Bastad': 0.07344239808530392,
 'Beijing': -0.32047217231154124,
 'Belgrade': 0.0,
 'Bogota': 0.1983131573710288,
 'Brighton': -0.039926917095466,
 'Brisbane': 0.0,
 'Bucharest': 0.016457049408396842,
 'Buenos Aires': 0.07879066392423603,
 'Casablanca': 0.13145593419830034,
 'Chennai': -0.055044650358682216,
 'Cincinnati': -0.1779325222999295,
 'Copenhagen': -0.06980092959589043,
 'Costa Do Sauipe': 0.09936045739161234,
 'Delray Beach': -0.02142617428605608,
 'Doha': 0.0695745370852727,
 'Dubai': -0.09326667603792788,
 'Dusseldorf': 0.0,
 'Eastbourne': 0.0,
 'Estoril': 0.09625070516752708,
 'Geneva': 0.0,
 'Gstaad': 0.03219117997457862,
 'Halle': 0.014173112257593828,
 'Hamburg': 

Let's make a quick analysis on the greatest indicators for player 1 to win a match against player 0.
Let's sort these coefficients to see what matters the most:

In [24]:
coefs_df.sort_values('Coef', ascending = False).head(15)

Unnamed: 0,Column,Coef
135,Pl1 Surface Performance,3.439663
133,Pl1 Perf. vs Similar Opponent,2.278845
129,Pl0 Form,0.471821
127,Rank Index,0.328705
105,Zagreb,0.263068
4,Atlanta,0.220403
81,San Marino,0.210533
71,Palermo,0.201239
12,Bogota,0.198313
32,Ho Chi Min City,0.185088


If we ignore the tournaments for a second, we see that the most important features are:
- Player performance on specific surface
- Player performance against similar ranked opponents
- The opponent's form
- The rank index
- The player's recent form

Let's load our matches Dataset

In [9]:
df = pd.read_csv("csv/FeatureCalculated_Data.csv")

Now our goal is to build another Dataframe, containing the informations about the the players, who wins each match, the average and maximum odds available for each player, and we'll use the logistic regression coefficients (and the intercept) to calculate the predicted chances to win for each player:

In [10]:
betting_data_df = df[(df["Avg_Pl0"] > 1) & (df["Avg_Pl1"] > 1) & (df["Max_Pl0"] > 1) & (df["Max_Pl1"] > 1)]

betting_df_list = []
for ix, irow in betting_data_df.iterrows():

    current_linear_combination = lreg.intercept_[0]
    for attribute in list(coefs_map.keys()):
        attr_value = irow[attribute]
        attr_weighted = attr_value * coefs_map.get(attribute)
        current_linear_combination += attr_weighted
    
    prob_p1_win = round(exp(current_linear_combination) / (1 + exp(current_linear_combination)), 2)
    prob_p0_win = round(1 - prob_p1_win, 2)

    betting_df_list.append(   {"Pl0":irow['Player 0'], "Pl1":irow['Player 1'], "Winner":irow['Won'], "Pl0 %":prob_p0_win, "Pl1 %":prob_p1_win, \
                              "Pl0 Avg odds": irow['Avg_Pl0'], "Pl1 Avg odds": irow['Avg_Pl1'], "Pl0 Max odds": irow['Max_Pl0'],
                               "Pl1 Max odds": irow['Max_Pl1']
                              })
    
betting_df = pd.DataFrame(betting_df_list)

In [11]:
betting_df.head()

Unnamed: 0,Pl0,Pl1,Winner,Pl0 %,Pl1 %,Pl0 Avg odds,Pl1 Avg odds,Pl0 Max odds,Pl1 Max odds
0,De Bakker T.,Falla A.,0.0,0.61,0.39,1.52,2.46,1.62,2.79
1,Starace P.,Hajek J.,1.0,0.66,0.34,1.34,3.12,1.4,3.64
2,Schwank E.,Fognini F.,0.0,0.35,0.65,1.99,1.77,2.14,1.85
3,Rochus C.,Garcia-Lopez G.,1.0,0.31,0.69,3.02,1.36,3.25,1.45
4,Bellucci T.,Nieminen J.,0.0,0.58,0.42,1.46,2.63,1.5,3.09


Let's define a helper funxtion that takes a change in input and returns the 'fair' odds for that chance value.

In [12]:
def right_odds(chance):
    return round(1/chance,2)

Now, we have everything we need to make decisions based on which player, for each match, we get the most value betting on!
In fact, next we're going to append to this betting dataframe another 4 columns.

The fair odds for both player 0 and player 1, based on the chances of each player to win according to the Logistic Regression model, and the value that we would get betting on each player at the max odds available!

In [13]:
for ix, row in betting_df.iterrows():
    right_odds_p0 = right_odds(row['Pl0 %'])
    right_odds_p1 = right_odds(row['Pl1 %'])
    bet_value_p0 = (row['Pl0 Max odds'] - right_odds_p0)/max(right_odds_p0,row['Pl0 Max odds'])
    bet_value_p1 = (row['Pl1 Max odds'] - right_odds_p1)/max(right_odds_p1,row['Pl1 Max odds'])
    betting_df.at[ix, 'Fair odds for Pl0'] = right_odds_p0
    betting_df.at[ix, 'Fair odds for Pl1'] = right_odds_p1
    betting_df.at[ix, 'Bet on Pl0 Value'] = bet_value_p0
    betting_df.at[ix, 'Bet on Pl1 Value'] = bet_value_p1

In [14]:
betting_df

Unnamed: 0,Pl0,Pl1,Winner,Pl0 %,Pl1 %,Pl0 Avg odds,Pl1 Avg odds,Pl0 Max odds,Pl1 Max odds,Fair odds for Pl0,Fair odds for Pl1,Bet on Pl0 Value,Bet on Pl1 Value
0,De Bakker T.,Falla A.,0.0,0.61,0.39,1.52,2.46,1.62,2.79,1.64,2.56,-0.012195,0.082437
1,Starace P.,Hajek J.,1.0,0.66,0.34,1.34,3.12,1.40,3.64,1.52,2.94,-0.078947,0.192308
2,Schwank E.,Fognini F.,0.0,0.35,0.65,1.99,1.77,2.14,1.85,2.86,1.54,-0.251748,0.167568
3,Rochus C.,Garcia-Lopez G.,1.0,0.31,0.69,3.02,1.36,3.25,1.45,3.23,1.45,0.006154,0.000000
4,Bellucci T.,Nieminen J.,0.0,0.58,0.42,1.46,2.63,1.50,3.09,1.72,2.38,-0.127907,0.229773
...,...,...,...,...,...,...,...,...,...,...,...,...,...
17099,Zverev A.,Youzhny M.,0.0,0.52,0.48,1.60,2.30,1.66,2.45,1.92,2.08,-0.135417,0.151020
17100,Troicki V.,Wawrinka S.,1.0,0.16,0.84,4.34,1.21,4.91,1.24,6.25,1.19,-0.214400,0.040323
17101,Zverev A.,Berdych T.,0.0,0.15,0.85,2.68,1.46,3.12,1.57,6.67,1.18,-0.532234,0.248408
17102,Bautista Agut R.,Wawrinka S.,1.0,0.25,0.75,3.40,1.31,3.80,1.41,4.00,1.33,-0.050000,0.056738


We just obtained 17104 matches where we can bet on the player who's value is higher!
Let's save this dataframe!

In [15]:
betting_df.to_csv("csv/Betting_Data.csv", index=False)