### Exercise 2: Bayesian Network for Fraud Detection

**Objective:**
Implement a Bayesian Network to model dependencies in financial transactions. You will perform **Causal Inference** (prediction), **Diagnostic Inference** (detecting causes from symptoms), and verify **Conditional Independence**.

**Context:**
You are building a system to detect online transaction fraud. The system considers the following variables:
* **F (Fraud):** Whether the transaction is actually fraudulent.
* **L (Foreign Location):** Whether the transaction originates from a foreign IP address.
* **A (Large Amount):** Whether the transaction amount is significantly larger than average.
* **R (High-Risk Merchant):** Whether the merchant is flagged as high-risk by the system (based on the transaction amount).

**Network Structure:**
The dependencies are modeled as:
1.  Fraud influences location ($F \rightarrow L$).
2.  Fraud influences the amount ($F \rightarrow A$).
3.  The Amount influences the Merchant Risk classification ($A \rightarrow R$).

Graphically: $L \leftarrow F \rightarrow A \rightarrow R$

**Conditional Probability Tables (CPTs):**

1.  **Prior Probability (Fraud):**
    * $P(F=Yes) = 0.005$ (0.5% of transactions are fraud)

2.  **Location ($L$) given Fraud ($F$):**
    * If Fraud = Yes $\rightarrow$ 90% chance of Foreign Location.
    * If Fraud = No  $\rightarrow$ 5% chance of Foreign Location.

| F | P(L=Yes \| F) | P(L=No \| F) |
| :--- | :--- | :--- |
| **Yes** | 0.90 | 0.10 |
| **No** | 0.05 | 0.95 |

3.  **Amount ($A$) given Fraud ($F$):**
    * If Fraud = Yes $\rightarrow$ 80% chance of Large Amount.
    * If Fraud = No  $\rightarrow$ 10% chance of Large Amount.

| F | P(A=Yes \| F) | P(A=No \| F) |
| :--- | :--- | :--- |
| **Yes** | 0.80 | 0.20 |
| **No** | 0.10 | 0.90 |

4.  **Merchant Risk ($R$) given Amount ($A$):**
    * If Amount = Large $\rightarrow$ 95% chance of High Risk flag.
    * If Amount = Normal $\rightarrow$ 15% chance of High Risk flag.

| A | P(R=Yes \| A) | P(R=No \| A) |
| :--- | :--- | :--- |
| **Yes** | 0.95 | 0.05 |
| **No** | 0.15 | 0.85 |

**Tasks (Using `pgmpy`):**

1.  **Model Definition:** Define the network structure and the CPTs in Python.
2.  **Diagnostic Inference:**
    * We observe a transaction from a **Foreign Location** ($L=Yes$) involving a **High-Risk Merchant** ($R=Yes$).
    * What is the probability that this transaction is **Fraud** ($F$)?
    * Compare this to the prior probability (0.005).
3.  **Conditional Independence (Theory in Practice):**
    * Calculate the probability of **High-Risk Merchant** ($R=Yes$) given that we observed a **Large Amount** ($A=Yes$).
    * Now, assume we *also* know it is **Fraud** ($F=Yes$). Calculate $P(R=Yes | A=Yes, F=Yes)$.
    * **Question:** Does knowing $F$ change the probability of $R$ if we already know $A$? Explain why based on the graph structure ($F \rightarrow A \rightarrow R$).

In [None]:
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

# HANDLE VERSION DIFFERENCES
# We try to import the correct class based on the pgmpy version installed
try:
    # The error suggests your version requires this specific class
    from pgmpy.models import DiscreteBayesianNetwork as BayesianNetwork
except ImportError:
    try:
        # Standard modern pgmpy
        from pgmpy.models import BayesianNetwork
    except ImportError:
        # Older/Legacy pgmpy
        from pgmpy.models import BayesianModel as BayesianNetwork

# ==========================================
# EXERCISE 2: Fraud Detection (Student Template)
# ==========================================

# 1. MODEL DEFINITION
# -------------------

# TODO: Define the Network Structure
# Hint: The list of tuples represents edges. E.g., [('Parent', 'Child')]
# Structure: L <- F -> A -> R
model = BayesianNetwork([
    # ('Source Node', 'Target Node'), ...
    # Fill in the edges here
])

# TODO: Define the CPTs
# Note on pgmpy convention:
# values=[[Row 0 (State 0)], [Row 1 (State 1)]]
# Usually State 0 = "Yes", State 1 = "No" (or vice versa, just be consistent!)
# Let's assume Row 0 = Yes, Row 1 = No.

# P(F) - Fraud
cpd_f = TabularCPD(variable='F', variable_card=2, 
                   values=[[0.005], [0.995]], 
                   state_names={'F': ['Yes', 'No']})

# P(L | F) - Foreign Location given Fraud
# Cols: F=Yes, F=No
cpd_l = TabularCPD(variable='L', variable_card=2, 
                   values=[[None, None],  # Row L=Yes
                           [None, None]], # Row L=No
                   evidence=['F'], evidence_card=[2],
                   state_names={'L': ['Yes', 'No'], 'F': ['Yes', 'No']})

# P(A | F) - Large Amount given Fraud
# Cols: F=Yes, F=No
cpd_a = TabularCPD(variable='A', variable_card=2, 
                   values=[[None, None], 
                           [None, None]],
                   evidence=['F'], evidence_card=[2],
                   state_names={'A': ['Yes', 'No'], 'F': ['Yes', 'No']})

# P(R | A) - High Risk given Large Amount
# Cols: A=Yes, A=No
cpd_r = TabularCPD(variable='R', variable_card=2, 
                   values=[[None, None], 
                           [None, None]],
                   evidence=['A'], evidence_card=[2],
                   state_names={'R': ['Yes', 'No'], 'A': ['Yes', 'No']})

# Adding CPDs to the model
# (Uncomment the line below once you have defined the CPDs)
# model.add_cpds(cpd_f, cpd_l, cpd_a, cpd_r)

# Check if model is valid
# print("Model Check:", model.check_model())

# Initialize Inference Engine
# infer = VariableElimination(model)

# 2. DIAGNOSTIC INFERENCE
# -----------------------
# Scenario: Foreign Location = Yes, High-Risk Merchant = Yes.
# Find P(F = Yes)

print("\n--- Task 2: Diagnostic Inference ---")
# TODO: Use infer.query(...)
# evidence={'L': 'Yes', 'R': 'Yes'}
# prob_fraud = ...
# print(prob_fraud)


# 3. CONDITIONAL INDEPENDENCE
# ---------------------------
# Compare P(R|A) vs P(R|A, F)

print("\n--- Task 3: Conditional Independence ---")

# Case A: We know Amount=Yes. What is P(R=Yes)?
# prob_risk_given_amount = ...

# Case B: We know Amount=Yes AND Fraud=Yes. What is P(R=Yes)?
# prob_risk_given_amount_and_fraud = ...

# print(f"P(R=Yes | A=Yes): {prob_risk_given_amount}")
# print(f"P(R=Yes | A=Yes, F=Yes): {prob_risk_given_amount_and_fraud}")

"""
QUESTION:
Does knowing Fraud change the probability of High Risk if we already know the Amount is Large?
Explain why based on the concept of 'd-separation' or 'Markov Blanket'.

STUDENT ANSWER:
(Write answer here)
"""