# Group odds from betting lines

## Getting the odds

Here is an example of getting odds to win groups from Bet Online using Beautiful Soup and regular expressions.  The html is saved in a file "bo73.html".

In [1]:
import pandas as pd
import re

from bs4 import BeautifulSoup

In [2]:
with open("bo73.html", "r") as f:
    html = f.read()

In [3]:
soup = BeautifulSoup(html)

Using the "Inspect Elements" option in Chrome, we find the css class that contains the headings we're interested in: `"offering-contests__table-description-text"`

In [4]:
divs = soup.find_all("div", class_="offering-contests__table-description-text")

Here is an example of what is in this list of divs.  We will just focus on the four group betting sections.

In [5]:
for div in divs:
    print(div.text)

 Odds to Win 2022 UEFA Women's EURO 
 Odds to Win Group A  - Group Betting
 Top Goalscorer 
 Odds to Win Group B  - Group Betting
 Odds to Win Group C  - Group Betting
 Odds to Win Group D  - Group Betting


In [6]:
group_divs = [div for div in divs if "Group Betting" in div.text]

We'll use regular expressions to get the corresponding group letter.  (This is probably overkill, but we'll use regular expressions throughout, so this is just practice.)

Here is an example with just one of the objects.

In [7]:
div = group_divs[1]
re.search("Group [A-D]", div.text)[0]

'Group B'

The element `div` is a Beautiful Soup element, and we can use Beautiful Soup to find the next sections within the html code after these group divs.

In [8]:
type(div)

bs4.element.Tag

In [9]:
odds_raw = div.find_next().text

In [10]:
odds_raw

'05:00 PMSpain  -120 Germany  +135 Denmark  +850 Finland  +5000 '

We get rid of the initial time portion.

In [11]:
re.search("\d\d:\d\d (?:AM|PM)", odds_raw)

<re.Match object; span=(0, 8), match='05:00 PM'>

In [12]:
re.search("\d\d:\d\d (?:AM|PM)", odds_raw).span()[1]

8

In [13]:
odds2 = odds_raw[re.search("\d\d:\d\d (?:AM|PM)", odds_raw).span()[1]:]

In [14]:
odds2

'Spain  -120 Germany  +135 Denmark  +850 Finland  +5000 '

I've never met anyone who can read regular expressions comfortably, so don't worry if the following looks very difficult to understand.

In [15]:
matches = re.finditer("(?P<Team>[A-Za-z ]+)(?P<Odds>[+-]\d+)", odds2)

In [16]:
for match in matches:
    print(match["Team"])
    print(match["Odds"])

Spain  
-120
 Germany  
+135
 Denmark  
+850
 Finland  
+5000


We can use this method to get the odds for the various groups.  We will put the results into a DataFrame.

In [17]:
odds_list = []
for div in group_divs:
    group = re.search("Group [A-D]", div.text)[0]
    odds_raw = div.find_next().text
    odds2 = odds_raw[re.search("\d\d:\d\d (?:AM|PM)", odds_raw).span()[1]:]
    matches = re.finditer("(?P<Team>[A-Za-z ]+)(?P<Odds>[+-]\d+)", odds2)
    for match in matches:
        team = match["Team"].strip()
        odds = match["Odds"].strip()
        odds_list.append((group, team, odds))

In [18]:
df_odds = pd.DataFrame(odds_list, columns=["Group", "Team", "Odds"])

In [19]:
df_odds

Unnamed: 0,Group,Team,Odds
0,Group A,England,-250
1,Group A,Norway,250
2,Group A,Austria,1400
3,Group A,Northern Ireland,5000
4,Group B,Spain,-120
5,Group B,Germany,135
6,Group B,Denmark,850
7,Group B,Finland,5000
8,Group C,Netherlands,-125
9,Group C,Sweden,125


## Converting to probabilities

We convert the odds into break even probabilities, and then rescale those probabilities for each individual group so that the probabilities sum to 1.

In [20]:
def odds_to_prob(s):
    x = int(s)
    if x < 0:
        y = -x
        return y/(100+y)
    else:
        return 100/(100+x)

In [21]:
odds_to_prob("-110")

0.5238095238095238

In [22]:
odds_to_prob("+300")

0.25

In [23]:
df_odds["Prob"] = df_odds["Odds"].map(odds_to_prob)

In [24]:
gp_totals = df_odds.groupby("Group").sum()["Prob"]

For example, if you add up all the break-even probabilities for Group C, the total is approximately 1.09.

In [25]:
gp_totals

Group
Group A    1.086275
Group B    1.095857
Group C    1.091057
Group D    1.090114
Name: Prob, dtype: float64

We rescale the probabilities to get the implied probabilities.

In [26]:
df_odds["Implied"] = df_odds["Prob"]/df_odds["Group"].map(lambda x: gp_totals[x])

In [27]:
df_odds

Unnamed: 0,Group,Team,Odds,Prob,Implied
0,Group A,England,-250,0.714286,0.657555
1,Group A,Norway,250,0.285714,0.263022
2,Group A,Austria,1400,0.066667,0.061372
3,Group A,Northern Ireland,5000,0.019608,0.018051
4,Group B,Spain,-120,0.545455,0.497742
5,Group B,Germany,135,0.425532,0.38831
6,Group B,Denmark,850,0.105263,0.096056
7,Group B,Finland,5000,0.019608,0.017893
8,Group C,Netherlands,-125,0.555556,0.50919
9,Group C,Sweden,125,0.444444,0.407352


As a reality check, let's make sure the implied probabilities always sum to 1.

In [28]:
df_odds.groupby("Group").sum()["Implied"]

Group
Group A    1.0
Group B    1.0
Group C    1.0
Group D    1.0
Name: Implied, dtype: float64

In [29]:
df_odds.Team

0              England
1               Norway
2              Austria
3     Northern Ireland
4                Spain
5              Germany
6              Denmark
7              Finland
8          Netherlands
9               Sweden
10         Switzerland
11            Portugal
12              France
13               Italy
14             Iceland
15             Belgium
Name: Team, dtype: object

Let's save these results.  We will also record the site and the date.

In [30]:
df_odds["Site"] = "Bet Online"
df_odds["Date"] = "2022-07-03"

In [31]:
df_odds.to_csv("data/gp_odds.csv", index=False)