# Bayesian Network Tutorial

## Part 1: Bayesian Network Tutorial



### Steps Spectre Can Take:

- **Hire professional hackers** who can write programs to launch a cyberattack on Q’s personal database.
- **Buy a state-of-the-art computer called “Contra”** to actually launch this cyberattack.
- **Hire ruthless mercenaries** to kidnap M and get access to the key.
- **Ensure Bond is not available** with M at the time of the kidnapping.
- **Use the cipher and key** to access the target “Double-0” files.

### Variables:

- **H**: Spectre hires professional hackers.
- **C**: Spectre buys Contra.
- **M**: Spectre hires mercenaries.
- **B**: Bond is guarding M at the time of the kidnapping.
- **Q**: Q’s database is hacked and the cipher is compromised.
- **K**: M gets kidnapped and has to give away the key.
- **D**: Spectre succeeds in obtaining the “Double-0” files.

### Classified Information:

- Spectre fails to hire skilled professional hackers with a probability of 0.5.
- Spectre obtains Contra with a probability of 0.3.
- Spectre fails to hire mercenaries with a probability of 0.2.
- Bond is protecting M with a probability of 0.5.
- If professional hackers are hired:
  - They can crack Q's database without Contra with a probability of 0.55.
  - With Contra, they can crack Q's database with a probability of 0.9.
- If professional hackers are not hired:
  - Less experienced employees attempt the hack.
  - Q's database remains secure with a probability of 0.75 with Contra.
  - Q's database remains secure with a probability of 0.95 without Contra.
- When Bond is protecting M:
  - M stays safe with a probability of 0.85 if mercenaries attack.
  - M stays safe with a probability of 0.99 if mercenaries do not attack.
- When Bond is not protecting M:
  - M gets kidnapped with a probability of 0.95 with mercenaries.
  - M gets kidnapped with a probability of 0.75 without mercenaries.
- With both the cipher and the key, Spectre accesses the “Double-0” files with a probability of 0.99.
- With neither, the probability drops to 0.02.
- With just the cipher, the probability is 0.4.
- With just the key, the probability is 0.65.

### 1a: Casting the Net

Use the description above to design a Bayesian network for this model. We'll use the `pgmpy` package to represent nodes and conditional probability arcs connecting nodes.

In [1]:
# Import necessary libraries
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD

def protain_network():
    """L: ligand
    G: gene activation
    M: mutation
    F: correct folding
    H: hydrolysis
    S: stability
    There are therefore several probability tables. Below are their defi nitions, where we assign boolean values to each of thevariables, and the tables are listed below the defi nitions
    p(L = 1): probability of a ligand being present
    p(M = 1): probability of a mutation occurring
    p(H = 1): probability of a protein being hydrolyzed
    p(G = 1|L): probability of a gene being activated given the presence of a ligand
    p(F = 1|M, G): probability of the protein correctly folding given the presence of a mutation and the gene being activated
    p(S = 1|F, H): probability of the protein being stable given the protein correctly folding and being hydrolyzed"""
    # Defining the network structure
    model = BayesianNetwork()
    model.add_nodes_from(['L', 'G', 'M', 'F', 'H', 'S'])
    model.add_edges_from([('L', 'G'), ('M', 'F'), ('G', 'F'), ('F', 'S'), ('H', 'S')])

    return model

    


### 1b: Setting the Probabilities

Now, set the conditional probabilities for the necessary variables on the network you just built.

In [3]:
from pgmpy.factors.discrete import TabularCPD

def protain_network_cpd(model):
    # Defining the CPDs:
    # L 1: 0.7, L 0: 0.3
    cpd_L = TabularCPD(variable='L', variable_card=2, values=[[0.3], [0.7]])
    # M 1: 0.3, M 0: 0.7
    cpd_M = TabularCPD(variable='M', variable_card=2, values=[[0.7], [0.3]])
    # H 1: 0.4 H 0: 0.6
    cpd_H = TabularCPD(variable='H', variable_card=2, values=[[0.6], [0.4]])
    # G  1: 0.9, 0: 0.1 | L ==1
    # G  1: 0.2, 0: 0.8 | L ==0
    cpd_G = TabularCPD(variable='G', variable_card=2, values=[[0.8, 0.1], [0.2, 0.9]], evidence=['L'], evidence_card=[2])
    # F  1: 0.3, 0: 0.7 | M ==1, G ==1
    # F  1: 0.1, 0: 0.9 | M ==1, G ==0
    # F  1: 0.9, 0: 0.1 | M ==0, G ==1
    # F  1: 0.2, 0: 0.8 | M ==0, G ==0
    cpd_F = TabularCPD(variable='F', variable_card=2, values = [[0.8, 0.1, 0.9, 0.7], [0.2, 0.9, 0.1, 0.3]], evidence=['M', 'G'], evidence_card=[2, 2])
    # S  1: 0.2, 0: 0.8 | H ==1, F ==1
    # S  1: 0.1, 0: 0.9 | H ==1, F ==0
    # S  1: 0.95, 0: 0.05 | H ==0, F ==1
    # S  1: 0.3, 0: 0.7 | H ==0, F ==0
    cpd_S = TabularCPD(variable='S', variable_card=2, values = [[0.7, 0.05, 0.9, 0.8], [0.3, 0.95, 0.1, 0.2]], evidence=['H', 'F'], evidence_card=[2, 2])

    # Associating the CPDs with the network structure.
    model.add_cpds(cpd_L, cpd_M, cpd_H, cpd_G, cpd_F, cpd_S)
    # check_model checks for the network structure and CPDs and verifies that the CPDs are correctly
    # defined and sum to 1.
    model.check_model()

    return model

def main():
    model = protain_network()
    model = protain_network_cpd(model)
    print(model.get_cpds('L'))
    print(model.get_cpds('M'))
    print(model.get_cpds('H'))
    print(model.get_cpds('G'))
    print(model.get_cpds('F'))
    print(model.get_cpds('S'))

if __name__ == '__main__':
    main()

+------+-----+
| L(0) | 0.3 |
+------+-----+
| L(1) | 0.7 |
+------+-----+
+------+-----+
| M(0) | 0.7 |
+------+-----+
| M(1) | 0.3 |
+------+-----+
+------+-----+
| H(0) | 0.6 |
+------+-----+
| H(1) | 0.4 |
+------+-----+
+------+------+------+
| L    | L(0) | L(1) |
+------+------+------+
| G(0) | 0.8  | 0.1  |
+------+------+------+
| G(1) | 0.2  | 0.9  |
+------+------+------+
+------+------+------+------+------+
| M    | M(0) | M(0) | M(1) | M(1) |
+------+------+------+------+------+
| G    | G(0) | G(1) | G(0) | G(1) |
+------+------+------+------+------+
| F(0) | 0.8  | 0.1  | 0.9  | 0.7  |
+------+------+------+------+------+
| F(1) | 0.2  | 0.9  | 0.1  | 0.3  |
+------+------+------+------+------+
+------+------+------+------+------+
| H    | H(0) | H(0) | H(1) | H(1) |
+------+------+------+------+------+
| F    | F(0) | F(1) | F(0) | F(1) |
+------+------+------+------+------+
| S(0) | 0.7  | 0.05 | 0.9  | 0.8  |
+------+------+------+------+------+
| S(1) | 0.3  | 0.95 |

### 1c: Probability Calculations - Perform Inference

Now, perform inference on the network to calculate the following probabilities:

1. What is the **marginal probability** that the “Double-0” files get compromised?
2. Given that **Contra is unavailable**, what is the **conditional probability** that the “Double-0” files get compromised?
3. Given that **Contra is unavailable** and **Bond is guarding M**, what is the **conditional probability** that the “Double-0” files get compromised?

In [4]:
from pgmpy.inference import VariableElimination

network = protain_network()
network = protain_network_cpd(network)

# Let’s say that we know that there was a ligand present, the gene was activated, a mutation did not occur, the protein
# folded correctly, and a hydrolysis occurred. Given those, what is the probability that the protein was stable? Round yo##ur
# answer to 3 decimal places. (Note: This question vaguely hints at a part of a solution in future questions.)

# Doing exact inference using Variable Elimination
inference = VariableElimination(network)    
stable_prob = inference.query(variables=['S'], evidence={'L': 1, 'G': 1, 'M': 0, 'F': 1, 'H': 1}, joint=False)
print(stable_prob['S'].values[1])


XGBoostError: 
XGBoost Library (libxgboost.dylib) could not be loaded.
Likely causes:
  * OpenMP runtime is not installed
    - vcomp140.dll or libgomp-1.dll for Windows
    - libomp.dylib for Mac OSX
    - libgomp.so for Linux and other UNIX-like OSes
    Mac OSX users: Run `brew install libomp` to install OpenMP runtime.

  * You are running 32-bit Python on a 64-bit OS

Error message(s): ["dlopen(/Users/ssun2/anaconda3/envs/deepheme/lib/python3.8/site-packages/xgboost/lib/libxgboost.dylib, 0x0006): Library not loaded: @rpath/libomp.dylib\n  Referenced from: <B9202094-7D52-318C-99CF-7034B0E9F28D> /Users/ssun2/anaconda3/envs/deepheme/lib/python3.8/site-packages/xgboost/lib/libxgboost.dylib\n  Reason: tried: '/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/Users/ssun2/anaconda3/envs/deepheme/lib/python3.8/lib-dynload/../../libomp.dylib' (no such file), '/Users/ssun2/anaconda3/envs/deepheme/bin/../lib/libomp.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS@rpath/libomp.dylib' (no such file), '/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/Users/ssun2/anaconda3/envs/deepheme/lib/python3.8/lib-dynload/../../libomp.dylib' (no such file), '/Users/ssun2/anaconda3/envs/deepheme/bin/../lib/libomp.dylib' (no such file)"]


In [None]:
# In a scenario where we know a mutation did not occur , and a ligand was present, what is the probability that a gene
#was activated? Round your answers to 3 decimal places

gene_prob = inference.query(variables=['G'], evidence={'L': 1, 'M': 0}, joint=False)
print(gene_prob['G'].values[1])

In [None]:
# In a scenario where we know a mutation did not occur, a ligand was present, and the protein was folded incorrectly,
#what is the probability that a gene was activated? Round your answer to 6 decimal places.

gene_prob = inference.query(variables=['G'], evidence={'L': 1, 'M': 0, 'F': 0}, joint=False)
print(gene_prob['G'].values[1])