# Introduction

We are now moving to the final part of the workshop, which involves formulating business recommendations. Our tasks are:
- Determining a global betting odds,
- Dividing the dataset into categories: A, B, C, D, where A is the best group and D is the weakest group,
- Determining the risk of odds based on accepted parameters for each category.

As the last task, in a discussion format, we must consider the fact that we are a new betting company. When formulating our recommendations, we need to identify the risks that may affect our operations. We will perform this task together in a brainstorming session.

# Notebook Configuration

## Import necessary libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np


## Loading data into the workspace

> Remember to correctly specify the column separator

In [2]:
import pandas as pd

df = pd.read_csv("../data/processed/hockey_teams.csv", sep=";")
df

Unnamed: 0,team,season,victories,defeats,overtime_defeats,victory_percentage,scored_goals,received_goals,goal_difference
0,Boston Bruins,1990,44,24,,0.550,299,264,35
1,Buffalo Sabres,1990,31,30,,0.388,292,278,14
2,Calgary Flames,1990,46,26,,0.575,344,263,81
3,Chicago Blackhawks,1990,49,23,,0.613,284,211,73
4,Detroit Red Wings,1990,34,38,,0.425,273,298,-25
...,...,...,...,...,...,...,...,...,...
577,Tampa Bay Lightning,2011,38,36,8.0,0.463,235,281,-46
578,Toronto Maple Leafs,2011,35,37,10.0,0.427,231,264,-33
579,Vancouver Canucks,2011,51,22,9.0,0.622,249,198,51
580,Washington Capitals,2011,42,32,8.0,0.512,222,230,-8


### Checking data loading accuracy

In [4]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 582 entries, 0 to 581
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   team                582 non-null    object 
 1   season              582 non-null    int64  
 2   victories           582 non-null    int64  
 3   defeats             582 non-null    int64  
 4   overtime_defeats    358 non-null    float64
 5   victory_percentage  582 non-null    float64
 6   scored_goals        582 non-null    int64  
 7   received_goals      582 non-null    int64  
 8   goal_difference     582 non-null    int64  
dtypes: float64(2), int64(6), object(1)
memory usage: 41.1+ KB
None


# Determining Betting Odds

Let's review the content of the page: [click](https://trustbet.pl/kursy-bukmacherskie/), where information about methods for determining betting odds can be found. First, we will determine a global odd, which will be the starting point for our analysis (the so-called _baseline scenario_). At this point, we ignore the margin and assume that we are calculating the decimal odd.

Here is the list of steps to be performed to obtain the desired value:
- we will complete the definition of the `get_betting_odds` function, which will take `probability` of a given event as a parameter. We will use it multiple times, so it is worth preparing its implementation now
- then we need to appropriately aggregate the set and determine the **global** probability of the team's victory.

## Implementations of the `get_betting_odds` function

In [3]:
def get_betting_odds(victory_percentage):
    if victory_percentage <= 0 or victory_percentage > 1:
        return None
    return round(1 / victory_percentage, 2)


In [5]:
df["win_odds"] = df["victory_percentage"].apply(get_betting_odds)
df

Unnamed: 0,team,season,victories,defeats,overtime_defeats,victory_percentage,scored_goals,received_goals,goal_difference,win_odds
0,Boston Bruins,1990,44,24,,0.550,299,264,35,1.82
1,Buffalo Sabres,1990,31,30,,0.388,292,278,14,2.58
2,Calgary Flames,1990,46,26,,0.575,344,263,81,1.74
3,Chicago Blackhawks,1990,49,23,,0.613,284,211,73,1.63
4,Detroit Red Wings,1990,34,38,,0.425,273,298,-25,2.35
...,...,...,...,...,...,...,...,...,...,...
577,Tampa Bay Lightning,2011,38,36,8.0,0.463,235,281,-46,2.16
578,Toronto Maple Leafs,2011,35,37,10.0,0.427,231,264,-33,2.34
579,Vancouver Canucks,2011,51,22,9.0,0.622,249,198,51,1.61
580,Washington Capitals,2011,42,32,8.0,0.512,222,230,-8,1.95


### Some tests to check the correctness of the implementation

In [7]:
def test_get_betting_odds():
    assert get_betting_odds(1) == 1.0, "Expected 1.0"
    assert get_betting_odds(0.5) == 2.0, "Expected 2.0"
    assert get_betting_odds(0.25) == 4.0, "Expected 4.0"
    assert get_betting_odds(0.1) == 10.0, "Expected 10.0"
    assert get_betting_odds(0) is None, "Expected None for invalid input"
    assert get_betting_odds(-0.1) is None, "Expected None for negative input"
    assert get_betting_odds(1.1) is None, "Expected None for input > 1"

    print("✅ Všechny testy prošly!")

test_get_betting_odds()

✅ Všechny testy prošly!


### Determining the global odds

Here, determine the probability of any team winning

Set the global rate here using the `get_betting_odds` function. Round the result to two decimal places.

# Team Categorization

Let's discuss how we can classify teams into _leagues_. We want to establish 4 leagues:
- A - league consisting of the best teams,
- B - league consisting of good teams,
- C - league consisting of average teams,
- D - league consisting of the weakest teams.

The above terms are quite subjective, so for the purpose of this exercise, we will adopt the following assumptions:
- A - the top 5% of teams,
- B - teams performing better than 70% of the group but worse than league A,
- C - teams performing better than 20% of the group but worse than league B,
- D - the remaining teams.

To accomplish this task, we will additionally implement the function `assign_team_to_league`.

> Note: This task looks unassuming, but it is difficult. Remember that during the class, you have access to the instructor, and later to a mentor.

## Determination of cutoff points for individual leagues

In [None]:
def assign_team_to_league(x):
    pass

In [9]:
def assign_team_to_league(df):
    """
    Přiřadí každému týmu ligu A–D na základě percentilového rozdělení victory_percentage.
    """
    # Výpočet percentilových hranic
    threshold_A = df["victory_percentage"].quantile(0.95)
    threshold_B = df["victory_percentage"].quantile(0.70)
    threshold_C = df["victory_percentage"].quantile(0.20)

    def classify_team(vp):
        if vp >= threshold_A:
            return "A"
        elif vp > threshold_B:
            return "B"
        elif vp > threshold_C:
            return "C"
        else:
            return "D"

    df["league"] = df["victory_percentage"].apply(classify_team)
    return df

In [11]:
df = assign_team_to_league(df)
df

Unnamed: 0,team,season,victories,defeats,overtime_defeats,victory_percentage,scored_goals,received_goals,goal_difference,win_odds,league
0,Boston Bruins,1990,44,24,,0.550,299,264,35,1.82,B
1,Buffalo Sabres,1990,31,30,,0.388,292,278,14,2.58,C
2,Calgary Flames,1990,46,26,,0.575,344,263,81,1.74,B
3,Chicago Blackhawks,1990,49,23,,0.613,284,211,73,1.63,B
4,Detroit Red Wings,1990,34,38,,0.425,273,298,-25,2.35,C
...,...,...,...,...,...,...,...,...,...,...,...
577,Tampa Bay Lightning,2011,38,36,8.0,0.463,235,281,-46,2.16,C
578,Toronto Maple Leafs,2011,35,37,10.0,0.427,231,264,-33,2.34,C
579,Vancouver Canucks,2011,51,22,9.0,0.622,249,198,51,1.61,A
580,Washington Capitals,2011,42,32,8.0,0.512,222,230,-8,1.95,C


## Determination of odds per league

Here we set the betting odds for each league, which will allow us to draw final conclusions and establish the basic odds for individual teams.

> Remember: After generating the results, it is worth checking if they are reasonable.

In [12]:
league_odds = df.groupby("league")["win_odds"].mean().round(2)
print(league_odds)

league
A    1.56
B    1.79
C    2.24
D    3.37
Name: win_odds, dtype: float64


# Discussion

We have obtained certain odds values for each league. But how does this translate into real business? The entire task was about determining certain values from which a bookmaker can begin operations. Correct determination of these values is critical to attract customers to place bets with us, and on the other hand, inappropriate determination may lead to financial losses in the first days of operation.

For this reason, before translating the results and recommendations into business objectives, the analysis is subjected to discussion. Therefore, we will now take on a review role and would like to verify the steps. To that end, we will collectively discuss and critique our work by answering the following questions together:
- What elements of the analysis were simplified? What was omitted in the analysis?
- Are there any inconsistencies in the estimated odds? What are they?
- How can we improve the odds estimates?
- How can we enrich our initial dataset to make the estimates more accurate and less risky?
- How can we simulate the outcomes of our analysis to verify that they do not lead to financial losses?

This is a discussion panel, and every idea is valuable here.


Zpětná vazba 
- opět duplicitní importy (například import pandas as pd je dvakrát a matplotlib se nevyužívá)
- typy a NaN - df["win_odds"] bude obsahovat None pro nevalidní vstupy, lepší je použít np.nan, aby sloupec zůstal numerický

# --- Betting odds ---
def get_betting_odds(victory_percentage: float):
    """
    Return fair odds (decimal) as 1 / p for p in (0, 1].
    Returns np.nan for invalid p.
    """
    try:
        p = float(victory_percentage)
    except (TypeError, ValueError):
        return np.nan
    if not (0 < p <= 1):
        return np.nan
    return round(1.0 / p, 2)


- dvojí definice assign_team_to_league

Jinak v pořádku. 