# Arbitrage Opportunities Analysis: PSG vs Atletico Madrid

## Objective and Context

This notebook analyzes live betting odds data, extracted from multiple sources during an event, to explore whether arbitrage opportunities arose during the match.

We chose a PSG vs Atletico Madrid football match (2025/06/16). This is a big match between two strong teams, hopefully resulting in fair quotes, liquidity, and ideal betting market conditions.

**CONTEXT:**

Arbitrage betting involves placing bets on all possible outcomes of an event across different bookmakers to guarantee a profit, regardless of the result.
We explored the two largest Italian markets: [Lottomatica](https://www.lottomatica.it) and [Sisal](https://www.sisal.it).

**KEY GOALS:**

1. Create working scrapers to extract data during a live event;
2. Compare odds across different bookmakers;
3. Detect arbitrage opportunities where total implied probability < 100%
4. Quantify potential profit margins for identified opportunities
5. Visualize odds movements and arbitrage windows over time


**DISCLAIMER:**

This is an academic exercise in the context of a University course. We were informally tasked to download live data regarding a specific event, as a way to practice web scraping techniques. The entire project spanned a single weekend. There is no claim of accuracy nor quality regarding this project.


## Arbitrage Betting Theory

Arbitrage opportunities occur when:
- **Market Inefficiencies:** Different bookmakers have varying assessments of probabilities;
- **Timing Differences:** Odds updates happen at different speeds across platforms;
- **Market Liquidity:** Lower liquidity markets may have less efficient pricing;
- **Information Asymmetry:** Some bookmakers react faster to new information (injuries, weather, etc.).

In the following analysis, we will examine theoretical betting scenarios without considering real-world factors such as rounding errors, transaction fees, interest rates, or inflation. This is a reasonable approximation for live online bets where outcomes are determined within a short timeframe during a single match. Additionally, betting large round amounts (e.g., $1000) minimizes most rounding errors. We assume sufficient liquidity is available and immediate cash rewards are possible.
All odds, when present, are strictly positive.

### Two-Outcome Scenario

For a two-outcome event (_e.g._, a tennis match), arbitrage is possible when: **`1/odds1 + 1/odds2 < 1`**

Where:
- `odds1` = odds for outcome 1 at bookmaker A
- `odds2` = odds for outcome 2 at bookmaker B

**Proof:**

If we bet amounts `p` on `A` and `q` on `B`: 
- the total stake is `S = p + q`
- the possible outcomes are `odds1 * p` or `odds2 * q` 

Events A and B exhaust the probability space. We achieve a deterministic cash payout when `odds1 * p = odds2 * q`. Therefore, when betting an amount `p`, we need:

`q = (odds1/odds2) * p`

The arbitrage opportunity (disregarding discounting and other real-world effects) arises when: `S = p + q < odds1 * p (= odds2 * q)`. Substituting the relationship between `q` and `p`, we obtain the arbitrage condition above.


### Three-Outcome Scenario

For a three-outcome event (e.g., football match: Win/Draw/Loss), arbitrage is possible when: **`1/odds1 + 1/odds2 + 1/odds3 < 1`**

Where:
- `odds1` = odds for outcome 1 (e.g., Home Win)
- `odds2` = odds for outcome 2 (e.g., Draw)
- `odds3` = odds for outcome 3 (e.g., Away Win)

The proof is identical to the two-outcome scenario. 

Finally, the **arbitrage percentage** can be extrapolated as: `(1/odds1 + 1/odds2 + ... + 1/oddsn) * 100`

- If < 100%: Arbitrage opportunity exists
- If = 100%: Break-even (no profit, no loss)
- If > 100%: No arbitrage possible (bookmaker edge)

## Analysis

In [6]:
import pandas as pd

In [13]:
lottomatica = pd.read_csv("../data/lottomatica_scraper_20250615_210109.csv", sep=",", index_col=0, parse_dates=True)

display(lottomatica.info())
display(lottomatica.describe())
display(lottomatica.head())


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 69 entries, 2025-06-15 21:01:55.166929 to 2025-06-15 23:03:05.735462
Data columns (total 18 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   source                69 non-null     object 
 1   match_id              69 non-null     object 
 2   home_team             69 non-null     object 
 3   away_team             69 non-null     object 
 4   home_win              57 non-null     float64
 5   draw                  67 non-null     float64
 6   away_win              67 non-null     float64
 7   home_or_draw          0 non-null      float64
 8   away_or_draw          0 non-null      float64
 9   home_or_away          0 non-null      float64
 10  over_1_5              0 non-null      float64
 11  under_1_5             0 non-null      float64
 12  over_2_5              0 non-null      float64
 13  under_2_5             0 non-null      float64
 14  over_3_5              0 

None

Unnamed: 0,home_win,draw,away_win,home_or_draw,away_or_draw,home_or_away,over_1_5,under_1_5,over_2_5,under_2_5,over_3_5,under_3_5,both_teams_score_yes,both_teams_score_no
count,57.0,67.0,67.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
mean,1.349825,8.981343,54.183582,,,,,,,,,,,
std,0.350431,7.978654,71.125114,,,,,,,,,,,
min,1.01,3.3,3.3,,,,,,,,,,,
25%,1.05,4.475,7.75,,,,,,,,,,,
50%,1.3,4.65,10.5,,,,,,,,,,,
75%,1.35,10.5,55.0,,,,,,,,,,,
max,2.0,50.0,225.0,,,,,,,,,,,


Unnamed: 0_level_0,source,match_id,home_team,away_team,home_win,draw,away_win,home_or_draw,away_or_draw,home_or_away,over_1_5,under_1_5,over_2_5,under_2_5,over_3_5,under_3_5,both_teams_score_yes,both_teams_score_no
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2025-06-15 21:01:55.166929,Lottomatica,psg-atletico-madrid,PSG,Atletico Madrid,1.95,3.5,3.35,,,,,,,,,,,
2025-06-15 21:02:40.413956,Lottomatica,psg-atletico-madrid,PSG,Atletico Madrid,2.0,3.55,3.3,,,,,,,,,,,
2025-06-15 21:04:13.500877,Lottomatica,psg-atletico-madrid,PSG,Atletico Madrid,2.0,3.5,3.3,,,,,,,,,,,
2025-06-15 21:05:31.916741,Lottomatica,psg-atletico-madrid,PSG,Atletico Madrid,2.0,3.5,3.35,,,,,,,,,,,
2025-06-15 21:07:42.582592,Lottomatica,psg-atletico-madrid,PSG,Atletico Madrid,2.0,3.5,3.35,,,,,,,,,,,


Unfortunately, due to a technical error, only 1X2 single quotes were scraped from the Lottomatica webpage. Let's clean up the data:

In [19]:
lotto = lottomatica[["home_win", "away_win", "draw"]].copy()
lotto = lotto.dropna(axis=0, how="all")
lotto.head()

Unnamed: 0_level_0,home_win,away_win,draw
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2025-06-15 21:01:55.166929,1.95,3.35,3.5
2025-06-15 21:02:40.413956,2.0,3.3,3.55
2025-06-15 21:04:13.500877,2.0,3.3,3.5
2025-06-15 21:05:31.916741,2.0,3.35,3.5
2025-06-15 21:07:42.582592,2.0,3.35,3.5


In [None]:
sisal = pd.read_csv("../data/sisal_scraper_20250615_210219.csv", sep=",", index_col=0, parse_dates=True)