<a href="https://colab.research.google.com/github/alessiomodonesi/Python-Exercises/blob/main/ai/lab5/Intelligenza_Artificiale_Lab5_extra_exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## EXERCISE 1: Weather's probability
You are given a (fake) <a href="https://drive.google.com/file/d/1LjZLE9ozaHcBwiCl90mHaS1nXKcglfr4/view">padua_weather.csv</a>
of historical records for Padua's weather. The weather, which can be either rainy (= 1 in the dataset), misty (= 2), or sunny (= 3), is reported for each day of the week, for a whole year (52 weeks).

After you formalised the problem (i.e. identify the random variables and necessary mathematical formulae), write a Python program that reads the dataset via Python code and computes the following:
- probability of being sunny during the weekend (one or both days);
- expected weather for each day of the week (*);
- supposed you don't know which day of the week is today: although very unrealistic, how could you guess which day is today based only on the weather?

(\*) An expected value of, for example, 2.5 can be interpreted as "a mix of misty and sunny weather".




In [4]:
import pandas as pd
import io

# Dati del file padua_weather.csv incorporati direttamente nel codice
csv_data = """Monday,Tuesday,Wednesday,Thursday,Friday,Saturday,Sunday
1,2,1,1,1,2,1
2,1,2,1,1,2,1
2,1,2,2,1,1,1
1,1,1,3,3,2,1
3,3,3,3,1,3,2
3,1,1,1,3,2,1
2,3,2,2,1,2,2
1,3,1,1,3,2,2
2,3,3,2,1,3,2
1,3,1,1,3,1,2
2,2,1,1,1,2,1
1,3,2,2,2,1,3
3,2,1,2,2,2,3
2,1,1,2,2,1,1
2,2,1,1,2,1,3
3,3,2,2,3,3,1
2,2,1,2,3,1,1
2,1,3,2,2,2,1
1,2,1,3,3,2,1
1,3,3,2,1,2,1
2,2,2,1,3,2,3
1,3,2,2,2,3,1
1,3,3,1,2,2,1
3,1,3,1,2,1,3
3,3,3,3,2,1,2
3,1,2,3,3,1,2
3,3,2,1,3,2,3
3,1,3,3,1,1,1
3,1,1,1,3,3,3
3,3,2,1,3,1,1
2,1,3,3,2,1,3
3,2,3,3,1,3,3
3,2,2,3,1,1,1
3,3,3,3,3,2,2
2,1,3,3,3,3,3
3,1,3,2,1,2,1
3,3,1,2,2,1,2
1,2,2,3,1,1,1
3,2,3,1,2,2,1
3,1,1,3,2,3,1
1,3,2,1,2,3,1
1,3,3,2,3,3,1
1,2,3,2,3,3,1
1,2,2,3,1,2,1
2,1,1,1,2,2,3
1,2,2,2,1,2,3
3,3,2,3,1,1,1
3,1,3,2,3,1,3
1,1,2,1,1,3,1
2,1,1,1,2,1,3
3,2,3,3,1,1,2
1,1,2,1,1,1,2
"""

# Leggi la stringa come DataFrame
df = pd.read_csv(io.StringIO(csv_data))
weather_codes = [1, 2, 3] # 1=Rainy, 2=Misty, 3=Sunny

# --- 1. Probabilit√† di essere soleggiato durante il fine settimana ---
weekend_sunny = (df['Saturday'] == 3) | (df['Sunday'] == 3)
prob_sunny_weekend = weekend_sunny.sum() / len(df)

# --- 2. Tempo atteso per ogni giorno della settimana ---
expected_weather = df.mean()

# --- 3. Indovinare il giorno basato solo sul tempo ---
weather_counts = df.apply(lambda col: col.value_counts(), axis=0).fillna(0)
for w in weather_codes:
    if w not in weather_counts.index:
        weather_counts.loc[w] = 0
weather_counts = weather_counts.sort_index()
max_likelihood_day = weather_counts.idxmax(axis=1)

# --- Output Finale ---
print("### Analisi Meteorologica Storica di Padova ###")
print("=" * 70)

# Risultato 1
print(" 1. Probabilit√† di essere soleggiato durante il fine settimana (almeno un giorno):")
print(f"   P(Soleggiato nel weekend) = {prob_sunny_weekend:.4f} (ovvero circa il {prob_sunny_weekend*100:.2f}%)")

# Risultato 2
print("\n 2. Tempo atteso per ogni giorno della settimana (1=Rainy, 2=Misty, 3=Sunny):")
print("-" * 35)
print(expected_weather.to_string(float_format="%.4f", header=True))
print("-" * 35)

# Risultato 3
print("\n 3. Indovinare il giorno (D) basato solo sul tempo (W) di oggi:")
print("   (Si sceglie il giorno che massimizza la probabilit√† storica $P(W | D=d)$)")
print("-" * 70)
print("| Tempo Osservato (W) | Descrizione | Giorno pi√π Probabile (D) |")
print("-" * 70)
for weather_code, day_guess in max_likelihood_day.items():
    weather_name = {1: "Rainy/Piovoso", 2: "Misty/Nebbia", 3: "Sunny/Soleggiato"}[weather_code]
    print(f"| {weather_code:<19} | {weather_name:<11} | {day_guess:<23} |")
print("-" * 70)

### Analisi Meteorologica Storica di Padova ###
 1. Probabilit√† di essere soleggiato durante il fine settimana (almeno un giorno):
   P(Soleggiato nel weekend) = 0.4423 (ovvero circa il 44.23%)

 2. Tempo atteso per ogni giorno della settimana (1=Rainy, 2=Misty, 3=Sunny):
-----------------------------------
Monday      2.0769
Tuesday     1.9808
Wednesday   2.0385
Thursday    1.9423
Friday      1.9615
Saturday    1.8462
Sunday      1.7500
-----------------------------------

 3. Indovinare il giorno (D) basato solo sul tempo (W) di oggi:
   (Si sceglie il giorno che massimizza la probabilit√† storica $P(W | D=d)$)
----------------------------------------------------------------------
| Tempo Osservato (W) | Descrizione | Giorno pi√π Probabile (D) |
----------------------------------------------------------------------
| 1                   | Rainy/Piovoso | Sunday                  |
| 2                   | Misty/Nebbia | Saturday                |
| 3                   | Sunny/Soleggiat

## EXERCISE 2: Broad Street cholera outbreak

The following is a simplified version of an example in Judea Pearl's *The Book of Why*. It refers to a case of cholera epidemic, caused by contaminated water, which killed hundreds of people in London between 1853 and 1854. The diagram below illustrates some of the key factors explaining this epidemic, in particular:
- $X$ indicates whether the water company's intake was downstream of the London's sewers;
- $W$ indicates whether the water was contaminated or not;
- $Z$ indicates the presence of other external factors (e.g. poverty, miasma, etc.);
- $Y$ indicates the outbreak of cholera.

<img src='https://drive.google.com/uc?id=10O10x_nuuxF55rqRk0TpanHV_7Q819MA'>

(please note the probabilities in the diagram are fake)

> - Formalise the problem using opportune mathematical notations and derive an expression for computing the probability distribution of the cholera given that the water company's intake is upstream (i.e. what is the query? how can it be decomposed?)
> - Write a Python program that computes the actual probabilities of the above distribution using the information from the given CPTs.

In [7]:
## üêç Programma Python per P(Y | X=f) (P(Y | ¬¨X)) mantenendo la terminologia dell'esercizio

# Definizioni delle Probabilit√† Condizionate (CPT) e marginali
P_X_f = 0.50  # P(X=f)
P_Z_t = 0.25  # P(Z=t)
P_Z_f = 1 - P_Z_t

# CPT per W: P(W=t | X, Z)
CPT_W = {
    ('t', 't'): 0.90,
    ('t', 'f'): 0.85,
    ('f', 't'): 0.10,
    ('f', 'f'): 0.02
}

# CPT per Y: P(Y=t | W, Z)
CPT_Y = {
    ('t', 't'): 0.80,
    ('t', 'f'): 0.75,
    ('f', 't'): 0.15,
    ('f', 'f'): 0.05
}

# 1. Inizializzazione per l'inferenza (Enumerazione)
P_Yt_Xf = 0  # Numeratore: P(Y=t, X=f)

# Iterazione su tutte le combinazioni di W e Z
for w in ['t', 'f']:
    for z in ['t', 'f']:

        # Probabilit√† di Z
        Prob_Z = P_Z_t if z == 't' else P_Z_f

        # Probabilit√† condizionata P(W | X=f, Z)
        key_W = ('f', z)
        Prob_W = CPT_W[key_W] if w == 't' else (1 - CPT_W[key_W])

        # Probabilit√† condizionata P(Y=t | W, Z)
        key_Y = (w, z)
        P_Yt_cond = CPT_Y[key_Y]

        # Calcolo del termine congiunto P(X=f, Z, W, Y=t)
        Term_congiunto_Yt = P_X_f * Prob_Z * Prob_W * P_Yt_cond
        P_Yt_Xf += Term_congiunto_Yt

# 2. Calcolo della Probabilit√† Condizionata P(Y | X=f)
P_Yt_dato_Xf = P_Yt_Xf / P_X_f
P_Yf_dato_Xf = 1 - P_Yt_dato_Xf

print(f"P(Y=t | ¬¨X) = {P_Yt_dato_Xf:.4f}")
print(f"P(Y=f | ¬¨X) = {P_Yf_dato_Xf:.4f}")

P(Y=t | ¬¨X) = 0.1018
P(Y=f | ¬¨X) = 0.8982
