# Introduction

We are now moving to the final part of the workshop, which involves formulating business recommendations. Our tasks are:
- Determining a global betting odds,
- Dividing the dataset into categories: A, B, C, D, where A is the best group and D is the weakest group,
- Determining the risk of odds based on accepted parameters for each category.

As the last task, in a discussion format, we must consider the fact that we are a new betting company. When formulating our recommendations, we need to identify the risks that may affect our operations. We will perform this task together in a brainstorming session.

# Notebook Configuration

## Import necessary libraries

In [16]:
import pandas as pd
import matplotlib.pyplot as plt

## Loading data into the workspace

> Remember to correctly specify the column separator

In [17]:
df = pd.read_csv('../data/processed/hockey_teams.csv', sep=';')

### Checking data loading accuracy

In [18]:
df.head()

Unnamed: 0,team,season,victories,defeats,overtime_defeats,victory_percentage,scored_goals,received_goals,goal_difference,goals_ratio
0,Boston Bruins,1990,44,24,0,0.55,299,264,35,1.132576
1,Buffalo Sabres,1990,31,30,0,0.388,292,278,14,1.05036
2,Calgary Flames,1990,46,26,0,0.575,344,263,81,1.307985
3,Chicago Blackhawks,1990,49,23,0,0.613,284,211,73,1.345972
4,Detroit Red Wings,1990,34,38,0,0.425,273,298,-25,0.916107


# Determining Betting Odds

Let's review the content of the page: [click](https://trustbet.pl/kursy-bukmacherskie/), where information about methods for determining betting odds can be found. First, we will determine a global odd, which will be the starting point for our analysis (the so-called _baseline scenario_). At this point, we ignore the margin and assume that we are calculating the decimal odd.

Here is the list of steps to be performed to obtain the desired value:
- we will complete the definition of the `get_betting_odds` function, which will take `probability` of a given event as a parameter. We will use it multiple times, so it is worth preparing its implementation now
- then we need to appropriately aggregate the set and determine the **global** probability of the team's victory.

## Implementations of the `get_betting_odds` function

In [19]:
def get_betting_odds(probability):
    return 1/probability

### Some tests to check the correctness of the implementation

In [20]:
def test_get_betting_odds():
    assert get_betting_odds(1) == 1, "Expected 1"
    assert get_betting_odds(0.5) == 2, "Expected 2"
    assert get_betting_odds(0.25) == 4, "Expected 4"
    assert get_betting_odds(0.1) == 10, "Expected 10"
    try:
        get_betting_odds(0)
    except ZeroDivisionError:
        pass
    else:
        assert False, "Expected ZeroDivisionError"

    print("All tests passed!")

test_get_betting_odds()

All tests passed!


### Determining the global odds

Here, determine the probability of any team winning

In [6]:
global_odd = df['victories'].sum() / (df['defeats'] + df['victories']).sum()
global_odd

np.float64(0.5331134859041432)

Set the global rate here using the `get_betting_odds` function. Round the result to two decimal places.

In [7]:
global_betting_odd = get_betting_odds(global_odd)
global_betting_odd = round(global_betting_odd, 2)
print(f"Globalny kurs zakładów: {global_betting_odd}")

Globalny kurs zakładów: 1.88


# Team Categorization

Let's discuss how we can classify teams into _leagues_. We want to establish 4 leagues:
- A - league consisting of the best teams,
- B - league consisting of good teams,
- C - league consisting of average teams,
- D - league consisting of the weakest teams.

The above terms are quite subjective, so for the purpose of this exercise, we will adopt the following assumptions:
- A - the top 5% of teams,
- B - teams performing better than 70% of the group but worse than league A,
- C - teams performing better than 20% of the group but worse than league B,
- D - the remaining teams.

To accomplish this task, we will additionally implement the function `assign_team_to_league`.

> Note: This task looks unassuming, but it is difficult. Remember that during the class, you have access to the instructor, and later to a mentor.

## Determination of cutoff points for individual leagues

In [8]:
# First, determine the overall winning probability per team, then from this, we will determine the percentiles.
# Otherwise, we will get a somewhat different distribution than expected.
# Though this element may be debatable.

df_agg_team = (
    df
    .groupby('team')
    .agg({'victories': 'sum', 'defeats': 'sum'})
    .apply(lambda x: x / x.sum(), axis=1)
)
df_agg_team

Unnamed: 0_level_0,victories,defeats
team,Unnamed: 1_level_1,Unnamed: 2_level_1
Anaheim Ducks,0.590805,0.409195
Atlanta Thrashers,0.439024,0.560976
Boston Bruins,0.570629,0.429371
Buffalo Sabres,0.553411,0.446589
Calgary Flames,0.530924,0.469076
Carolina Hurricanes,0.527664,0.472336
Chicago Blackhawks,0.53125,0.46875
Colorado Avalanche,0.594381,0.405619
Columbus Blue Jackets,0.436782,0.563218
Dallas Stars,0.606941,0.393059


In [9]:
top_a_cutoff = df_agg_team['victories'].quantile(0.95)
top_b_cutoff = df_agg_team['victories'].quantile(0.70)
top_c_cutoff = df_agg_team['victories'].quantile(0.20)

In [10]:
def assign_team_to_league(x):
    if x >= top_a_cutoff:
        return 'A'
    elif x >= top_b_cutoff:
        return 'B'
    elif x >= top_c_cutoff:
        return 'C'
    else:
        return 'D'

In [11]:
df_agg_team['league'] = (
    df_agg_team
    .apply(
        lambda x: assign_team_to_league(x['victories']),
        axis=1
    )
)
df_agg_team

Unnamed: 0_level_0,victories,defeats,league
team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Anaheim Ducks,0.590805,0.409195,B
Atlanta Thrashers,0.439024,0.560976,D
Boston Bruins,0.570629,0.429371,B
Buffalo Sabres,0.553411,0.446589,B
Calgary Flames,0.530924,0.469076,C
Carolina Hurricanes,0.527664,0.472336,C
Chicago Blackhawks,0.53125,0.46875,C
Colorado Avalanche,0.594381,0.405619,B
Columbus Blue Jackets,0.436782,0.563218,D
Dallas Stars,0.606941,0.393059,B


## Determination of odds per league

Here we set the betting odds for each league, which will allow us to draw final conclusions and establish the basic odds for individual teams.

> Remember: After generating the results, it is worth checking if they are reasonable.

In [12]:
df = df.merge(
    df_agg_team['league'],
    left_on='team',
    right_index=True
)

df.head()

Unnamed: 0,team,season,victories,defeats,overtime_defeats,victory_percentage,scored_goals,received_goals,goal_difference,goals_ratio,league
0,Boston Bruins,1990,44,24,0,0.55,299,264,35,1.132576,B
1,Buffalo Sabres,1990,31,30,0,0.388,292,278,14,1.05036,B
2,Calgary Flames,1990,46,26,0,0.575,344,263,81,1.307985,C
3,Chicago Blackhawks,1990,49,23,0,0.613,284,211,73,1.345972,C
4,Detroit Red Wings,1990,34,38,0,0.425,273,298,-25,0.916107,A


In [13]:
df_agg_team.head()

Unnamed: 0_level_0,victories,defeats,league
team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Anaheim Ducks,0.590805,0.409195,B
Atlanta Thrashers,0.439024,0.560976,D
Boston Bruins,0.570629,0.429371,B
Buffalo Sabres,0.553411,0.446589,B
Calgary Flames,0.530924,0.469076,C


In [14]:
df.head()

Unnamed: 0,team,season,victories,defeats,overtime_defeats,victory_percentage,scored_goals,received_goals,goal_difference,goals_ratio,league
0,Boston Bruins,1990,44,24,0,0.55,299,264,35,1.132576,B
1,Buffalo Sabres,1990,31,30,0,0.388,292,278,14,1.05036,B
2,Calgary Flames,1990,46,26,0,0.575,344,263,81,1.307985,C
3,Chicago Blackhawks,1990,49,23,0,0.613,284,211,73,1.345972,C
4,Detroit Red Wings,1990,34,38,0,0.425,273,298,-25,0.916107,A


In [15]:
df_agg_league = (
    df
    .groupby('league')
    .agg({'victories': 'sum', 'defeats': 'sum'})
    .apply(lambda x: x['victories'] / (x['victories'] + x['defeats']), axis=1)
)

df_agg_league.apply(lambda x: get_betting_odds(x))

league
A    1.552089
B    1.741296
C    1.925992
D    2.275578
dtype: float64

# Discussion

We have obtained certain odds values for each league. But how does this translate into real business? The entire task was about determining certain values from which a bookmaker can begin operations. Correct determination of these values is critical to attract customers to place bets with us, and on the other hand, inappropriate determination may lead to financial losses in the first days of operation.

For this reason, before translating the results and recommendations into business objectives, the analysis is subjected to discussion. Therefore, we will now take on a review role and would like to verify the steps. To that end, we will collectively discuss and critique our work by answering the following questions together:
- What elements of the analysis were simplified? What was omitted in the analysis?
- Are there any inconsistencies in the estimated odds? What are they?
- How can we improve the odds estimates?
- How can we enrich our initial dataset to make the estimates more accurate and less risky?
- How can we simulate the outcomes of our analysis to verify that they do not lead to financial losses?

This is a discussion panel, and every idea is valuable here.
