# Introduction

We are now moving to the final part of the workshop, which involves formulating business recommendations. Our tasks are:
- Determining a global betting odds,
- Dividing the dataset into categories: A, B, C, D, where A is the best group and D is the weakest group,
- Determining the risk of odds based on accepted parameters for each category.

As the last task, in a discussion format, we must consider the fact that we are a new betting company. When formulating our recommendations, we need to identify the risks that may affect our operations. We will perform this task together in a brainstorming session.

# Notebook Configuration

## Import necessary libraries

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## Loading data into the workspace

> Remember to correctly specify the column separator

In [2]:
df = pd.read_csv('data/processed/hockey_teams.csv', sep=';')
df

Unnamed: 0,team,season,victories,defeats,overtime_defeats,victory_percentage,scored_goals,received_goals,goal_difference,goals_ratio
0,Boston Bruins,1990,44,24,0,0.550,299,264,35,1.132576
1,Buffalo Sabres,1990,31,30,0,0.388,292,278,14,1.050360
2,Calgary Flames,1990,46,26,0,0.575,344,263,81,1.307985
3,Chicago Blackhawks,1990,49,23,0,0.613,284,211,73,1.345972
4,Detroit Red Wings,1990,34,38,0,0.425,273,298,-25,0.916107
...,...,...,...,...,...,...,...,...,...,...
577,Tampa Bay Lightning,2011,38,36,8,0.463,235,281,-46,0.836299
578,Toronto Maple Leafs,2011,35,37,10,0.427,231,264,-33,0.875000
579,Vancouver Canucks,2011,51,22,9,0.622,249,198,51,1.257576
580,Washington Capitals,2011,42,32,8,0.512,222,230,-8,0.965217


### Checking data loading accuracy

In [3]:
df.head()

Unnamed: 0,team,season,victories,defeats,overtime_defeats,victory_percentage,scored_goals,received_goals,goal_difference,goals_ratio
0,Boston Bruins,1990,44,24,0,0.55,299,264,35,1.132576
1,Buffalo Sabres,1990,31,30,0,0.388,292,278,14,1.05036
2,Calgary Flames,1990,46,26,0,0.575,344,263,81,1.307985
3,Chicago Blackhawks,1990,49,23,0,0.613,284,211,73,1.345972
4,Detroit Red Wings,1990,34,38,0,0.425,273,298,-25,0.916107


In [4]:
df['total games'] = df['victories'] + df['defeats'] + df['overtime_defeats']
df

Unnamed: 0,team,season,victories,defeats,overtime_defeats,victory_percentage,scored_goals,received_goals,goal_difference,goals_ratio,total games
0,Boston Bruins,1990,44,24,0,0.550,299,264,35,1.132576,68
1,Buffalo Sabres,1990,31,30,0,0.388,292,278,14,1.050360,61
2,Calgary Flames,1990,46,26,0,0.575,344,263,81,1.307985,72
3,Chicago Blackhawks,1990,49,23,0,0.613,284,211,73,1.345972,72
4,Detroit Red Wings,1990,34,38,0,0.425,273,298,-25,0.916107,72
...,...,...,...,...,...,...,...,...,...,...,...
577,Tampa Bay Lightning,2011,38,36,8,0.463,235,281,-46,0.836299,82
578,Toronto Maple Leafs,2011,35,37,10,0.427,231,264,-33,0.875000,82
579,Vancouver Canucks,2011,51,22,9,0.622,249,198,51,1.257576,82
580,Washington Capitals,2011,42,32,8,0.512,222,230,-8,0.965217,82


# Determining Betting Odds

Let's review the content of the page: [click](https://trustbet.pl/kursy-bukmacherskie/), where information about methods for determining betting odds can be found. First, we will determine a global odd, which will be the starting point for our analysis (the so-called _baseline scenario_). At this point, we ignore the margin and assume that we are calculating the decimal odd.

Here is the list of steps to be performed to obtain the desired value:
- we will complete the definition of the `get_betting_odds` function, which will take `probability` of a given event as a parameter. We will use it multiple times, so it is worth preparing its implementation now
- then we need to appropriately aggregate the set and determine the **global** probability of the team's victory.

## Implementations of the `get_betting_odds` function

In [5]:
def get_betting_odds(probability):
    return 1 / probability

In [6]:
# first, I created a new column 'total_games' that sums wins and losses


win_probability = df['victories'].sum() / df['total games'].sum()

print(f"Probability of any team winning: {win_probability}")

Probability of any team winning: 0.5


### Some tests to check the correctness of the implementation

### Determining the global odds

Here, determine the probability of any team winning

In [7]:
total_wins = df['victories'].sum()


total_games = df['total games'].sum()


win_probability = total_wins / total_games


global_odds = round(get_betting_odds(win_probability), 2)

print(f"Global betting odds: {global_odds}")

# Calculate global betting odds based on the overall win probability across all teams and seasons.
# Odds of 2.0 indicate a 50% win rate — a common baseline in sports betting.

Global betting odds: 2.0


Set the global rate here using the `get_betting_odds` function. Round the result to two decimal places.

# Team Categorization

Let's discuss how we can classify teams into _leagues_. We want to establish 4 leagues:
- A - league consisting of the best teams,
- B - league consisting of good teams,
- C - league consisting of average teams,
- D - league consisting of the weakest teams.

The above terms are quite subjective, so for the purpose of this exercise, we will adopt the following assumptions:
- A - the top 5% of teams,
- B - teams performing better than 70% of the group but worse than league A,
- C - teams performing better than 20% of the group but worse than league B,
- D - the remaining teams.

To accomplish this task, we will additionally implement the function `assign_team_to_league`.

> Note: This task looks unassuming, but it is difficult. Remember that during the class, you have access to the instructor, and later to a mentor.

## Determination of cutoff points for individual leagues

In [8]:
# first, determine the percentile thresholds

percentile_95 = df['victory_percentage'].quantile(0.95)
percentile_70 = df['victory_percentage'].quantile(0.70)
percentile_20 = df['victory_percentage'].quantile(0.20)

print(f"Liga A: >= {percentile_95}")
print(f"{percentile_70} >= Liga B: < {percentile_95}")
print(f"{percentile_20} >= Liga C: < {percentile_70}")
print(f"Liga D: < {percentile_20}")

Liga A: >= 0.6218499999999998
0.512 >= Liga B: < 0.6218499999999998
0.366 >= Liga C: < 0.512
Liga D: < 0.366


In [9]:
# then create a function that assigns individual teams to leagues based on percentiles

def assing_team_to_league(x):
    if x > percentile_95:
        return 'A'
    elif x > percentile_70:
        return 'B'
    elif x > percentile_20:
        return 'C'
    else:
        return 'D'

In [10]:
# assign a league to each team

df['league'] = df['victory_percentage'].apply(assing_team_to_league)


Best_teams = (df[['team', 'victory_percentage', 'league']])

Best_teams

# select the league for each game across all seasons

Unnamed: 0,team,victory_percentage,league
0,Boston Bruins,0.550,B
1,Buffalo Sabres,0.388,C
2,Calgary Flames,0.575,B
3,Chicago Blackhawks,0.613,B
4,Detroit Red Wings,0.425,C
...,...,...,...
577,Tampa Bay Lightning,0.463,C
578,Toronto Maple Leafs,0.427,C
579,Vancouver Canucks,0.622,A
580,Washington Capitals,0.512,C


In [11]:
best_teams_avg = df.groupby('team')['victory_percentage'].mean().sort_values(ascending=False).reset_index()

best_teams_avg['league'] = best_teams_avg['victory_percentage'].apply(assing_team_to_league)

print(best_teams_avg)

# select the league for each team across all seasons

                       team  victory_percentage league
0         Detroit Red Wings            0.586000      B
1         New Jersey Devils            0.534333      B
2             Anaheim Ducks            0.522333      B
3              Dallas Stars            0.516889      B
4        Colorado Avalanche            0.516062      B
5       Pittsburgh Penguins            0.498810      C
6       Philadelphia Flyers            0.496952      C
7             Boston Bruins            0.484905      C
8           St. Louis Blues            0.482571      C
9         Vancouver Canucks            0.480571      C
10      Washington Capitals            0.477143      C
11           Buffalo Sabres            0.475143      C
12      Nashville Predators            0.471769      C
13         New York Rangers            0.468952      C
14       Montreal Canadiens            0.461952      C
15       Chicago Blackhawks            0.454286      C
16      Toronto Maple Leafs            0.453714      C
17        

## Determination of odds per league

Here we set the betting odds for each league, which will allow us to draw final conclusions and establish the basic odds for individual teams.

> Remember: After generating the results, it is worth checking if they are reasonable.

In [12]:
league_win_prob = best_teams_avg.groupby('league')['victory_percentage'].mean()

print("Average probability for every league:")
print(league_win_prob)



Average probability for every league:
league
B    0.535124
C    0.439471
D    0.363571
Name: victory_percentage, dtype: float64


In [13]:
# Betting odds pro každou ligu
league_odds = league_win_prob.apply(lambda x: round(get_betting_odds(x), 2))

print("\nBetting odds for every league:")
print(league_odds)



Betting odds for every league:
league
B    1.87
C    2.28
D    2.75
Name: victory_percentage, dtype: float64


# Discussion

We have obtained certain odds values for each league. But how does this translate into real business? The entire task was about determining certain values from which a bookmaker can begin operations. Correct determination of these values is critical to attract customers to place bets with us, and on the other hand, inappropriate determination may lead to financial losses in the first days of operation.

For this reason, before translating the results and recommendations into business objectives, the analysis is subjected to discussion. Therefore, we will now take on a review role and would like to verify the steps. To that end, we will collectively discuss and critique our work by answering the following questions together:
- What elements of the analysis were simplified? What was omitted in the analysis?
- Are there any inconsistencies in the estimated odds? What are they?
- How can we improve the odds estimates?
- How can we enrich our initial dataset to make the estimates more accurate and less risky?
- How can we simulate the outcomes of our analysis to verify that they do not lead to financial losses?

This is a discussion panel, and every idea is valuable here.


In [14]:
# 1. Which parts of the analysis were simplified and what was omitted?
# - A lot of data was missing. We didn't know whether the team was playing home or away.
# - How many fouls were committed per game
# - Player names are missing

# 2. Are there any inconsistencies in the estimated odds? What are they?
# - League A always has a higher win probability according to the analysis,
#   so we conclude there are no inconsistencies there

# 3. How can we improve odds estimates?
# - Know the current team roster
# - Consider only recent seasons (more recent data carries more weight)
# - Add more data

# 4. How can we improve the dataset for more accurate and less risky estimates?
# - Add home/away match data
# - Add player information (injuries, transfers)
# - Head-to-head statistics
# - Current form (last 5 matches)
# - Playoff results

# 5. How can we simulate the analysis results and verify they don't lead to financial losses?
# - Backtesting - test odds on historical data
# - Compare with real bookmaker odds
# - Compare with real bookmaker odds
