## Exercise 1

a) identify the independencies, it does so by using local_independencies and also does it for the global.

In [10]:
from pgmpy.models import DiscreteBayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

m = DiscreteBayesianNetwork([("S","O"), ("S","L"), ("S","M"), ("L","M")])

print("Local(O):", m.local_independencies("O"))
print("Local(L):", m.local_independencies("L"))
print("Local(M):", m.local_independencies("M"))

print("\n", m.get_independencies())

Local(O): (O ⟂ M, L | S)
Local(L): (L ⟂ O | S)
Local(M): (M ⟂ O | L, S)

 (M ⟂ O | S)
(O ⟂ L | S)


b) Determine how the bayesian network classifies emails based on the attributes O, L, M (It basically takes their probabilities of being spam depending on which is present, in my nested for i check for all possibilities, as if there is an offer and a long message, usually they are trying to convince you of a scam, so therefore is seen as "spam" and using our probabilities, if there is an offer and a long message it is spam, if it is an offer and a link it is probably spam, and if all are there it is a guarentee that it will be spam)

In [11]:
cpd_S = TabularCPD("S", 2, [[0.6],[0.4]])
cpd_O = TabularCPD("O", 2, [[0.9, 0.3],[0.1, 0.7]], evidence=["S"], evidence_card=[2])
cpd_L = TabularCPD("L", 2, [[0.7, 0.2],[0.3, 0.8]], evidence=["S"], evidence_card=[2])
cpd_M = TabularCPD("M", 2, [[0.8, 0.4, 0.5, 0.1],[0.2, 0.6, 0.5, 0.9]], evidence=["S","L"], evidence_card=[2,2])

m.add_cpds(cpd_S, cpd_O, cpd_L, cpd_M)

infer = VariableElimination(m)
for O in (0,1):
    for L in (0,1):
        for M in (0,1):
            p_spam = float(infer.query(["S"], evidence={"O":O,"L":L,"M":M}).values[1])
            print(f"O={O} L={L} M={M} -> P(S=1|...)={p_spam:.6f} => {'Spam' if p_spam>0.5 else 'Not spam'}")


O=0 L=0 M=0 -> P(S=1|...)=0.038168 => Not spam
O=0 L=0 M=1 -> P(S=1|...)=0.136986 => Not spam
O=0 L=1 M=0 -> P(S=1|...)=0.129032 => Not spam
O=0 L=1 M=1 -> P(S=1|...)=0.470588 => Not spam
O=1 L=0 M=0 -> P(S=1|...)=0.454545 => Not spam
O=1 L=0 M=1 -> P(S=1|...)=0.769231 => Spam
O=1 L=1 M=0 -> P(S=1|...)=0.756757 => Spam
O=1 L=1 M=1 -> P(S=1|...)=0.949153 => Spam


Exercise 2

Solving exercise 1.b from lab 2 using BN (It gets the exact number-ish, to the exact probability of drawing a red)

In [12]:
model = DiscreteBayesianNetwork([("D", "C")])

cpd_D = TabularCPD("D", 3, [[1/2], [1/6], [1/3]])
cpd_C_given_D = TabularCPD(
    "C", 3,
    [[0.3, 0.4, 0.3],   # C=red
     [0.4, 0.4, 0.5],   # C=blue
     [0.3, 0.2, 0.2]],  # C=black
    evidence=["D"], evidence_card=[3]
)

model.add_cpds(cpd_D, cpd_C_given_D)

p_red = float(VariableElimination(model).query(["C"]).values[0])
print(f"{p_red:.6f}")


0.316667


## Exercise 3

1) Estimate which of the 2 players has the higher chance of winning by simulating the game 10000 times (By simulating it 10000 times and with 5 different trials, P1 has a estimate win rate of between 56% to 58.5%~, while P0 has a win rate of 4

In [13]:
import random, math
N = 10_000
wins = [0, 0]
for _ in range(N):
    S = random.randint(0,1)
    n = random.randint(1,6)
    other = 1 - S
    p = 4/7 if other==1 else 0.5
    m = sum(random.random() < p for _ in range(2*n))
    winner = S if n >= m else other
    wins[winner] += 1
print("Sim win rates  P0≈", wins[0]/N, " P1≈", wins[1]/N)

Sim win rates  P0≈ 0.4296  P1≈ 0.5704


2. Define a bayesian network that displays the context above S is the starter, with N being the die, and M being the number of heads, as the die is fair, it doesnt get influenced by who starts, as this bayesian network exactly simulates the rules of the game, it lets us make a exact inference without the need of having to simulate.

In [14]:
bn = DiscreteBayesianNetwork([("S","M"),("N","M")])

cpd_S = TabularCPD("S", 2, [[0.5],[0.5]])
cpd_N = TabularCPD("N", 6, [[1/6],[1/6],[1/6],[1/6],[1/6],[1/6]])

def binom_col(n, p):
    col = [0.0]*13
    for k in range(2*n + 1):
        col[k] = math.comb(2*n, k) * (p**k) * ((1-p)**(2*n-k))
    return col

cols = []
for s in (0,1):
    p = 4/7 if s==0 else 0.5
    for n in range(1,7):
        cols.append(binom_col(n, p))

cpd_M = TabularCPD("M", 13, list(map(list, zip(*cols))), evidence=["S","N"], evidence_card=[2,6])

bn.add_cpds(cpd_S, cpd_N, cpd_M)
infer = VariableElimination(bn)

3. Knowing only 1 head was chosen, determine who started the game (Given that a heads was already given, as when M = 1, it is more likely that when the other coin is less head biased, that usually happens with P1 started, as an observation, and as P0 probability is 1/2, therefore the probability of the Starter being 1, and the number of heads we got is 1, [S,M] therefore the probability of that is greater than 0.5.)

In [15]:
post = infer.query(["S"], evidence={"M":1}).values
print("P(S=0|m=1)≈", post[0], "  P(S=1|m=1)≈", post[1], "  => starter≈", ("P1" if post[1]>post[0] else "P0"))

P(S=0|m=1)≈ 0.4528939794932878   P(S=1|m=1)≈ 0.5471060205067122   => starter≈ P1
