In this example we will try to create the cancer (http://www.bnlearn.com/bnrepository/discrete-small.html#cancer) bayesian network using pgmpy and do some simple queries on the network.

In pgmpy, the general flow of defining a network is to first define the network and then add the parameters to it.

The graph
<img src="http://www.bnlearn.com/bnrepository/cancer/cancer.png" width="400">

**Given the above infortion, we need convert them into correct input format of TabularCPD**  

<pre>

P(P):
+----------+----------+-----------+
|    P     | 0(low)   |  1(high)  |
+----------+----------+-----------+
|   P(P)   |   0.9    |    0.1    |
+----------+----------+-----------+

P(S):
+----------+----------+-----------+
|    P     | 0(T)     |     1(F)  |
+----------+----------+-----------+
|   P(S)   |   0.3    |    0.7    |
+----------+----------+-----------+

If we want to put them in order ['Smoker', 'Pollution']
probability ( Cancer | Pollution, Smoker ) {
  (low, True) 0.03, 0.97;
  (high, True) 0.05, 0.95;
  (low, False) 0.001, 0.999;
  (high, False) 0.02, 0.98;
}

P(C | S, P):
+------+------+------+------+------+
|   S  |      0      |      1      |
+------+------+------+------+------+
|   P  |   0  |   1  |   0  |   1  |
+------+------+------+------+------+
|  C=0 | 0.03 |0.05  |0.001 |0.02  |
+------+------+------+------+------+
|  C=1 | 0.97 |0.95  |0.999 |0.98  |
+------+------+------+------+------+

</pre>

```python
cpd_cancer = TabularCPD(variable='Cancer', variable_card=2,
                        values=[[],
                                []],
                        evidence=['Smoker', 'Pollution'],
                        evidence_card=[2, 2])
```


In [1]:
# Starting with defining the network structure
from pgmpy.models import BayesianModel

cancer_model = BayesianModel([('Pollution', 'Cancer'), 
                              ('Smoker', 'Cancer'),
                              ('Cancer', 'Xray'),
                              ('Cancer', 'Dyspnoea')])

ModuleNotFoundError: No module named 'pgmpy'

In [None]:
# Now defining the parameters.
from pgmpy.factors.discrete import TabularCPD

cpd_poll = TabularCPD(variable='Pollution', variable_card=2,
                      values=[[0.9], [0.1]])
cpd_smoke = TabularCPD(variable='Smoker', variable_card=2,
                       values=[[0.3], [0.7]])
cpd_cancer = TabularCPD(variable='Cancer', variable_card=2,
                        values=[[0.03, 0.05, 0.001, 0.02],
                                [0.97, 0.95, 0.999, 0.98]],
                        evidence=['Smoker', 'Pollution'],
                        evidence_card=[2, 2])
cpd_xray = TabularCPD(variable='Xray', variable_card=2,
                      values=[[0.9, 0.2], [0.1, 0.8]],
                      evidence=['Cancer'], evidence_card=[2])
cpd_dysp = TabularCPD(variable='Dyspnoea', variable_card=2,
                      values=[[0.65, 0.3], [0.35, 0.7]],
                      evidence=['Cancer'], evidence_card=[2])

In [None]:
# Associating the parameters with the model structure.
cancer_model.add_cpds(cpd_poll, cpd_smoke, cpd_cancer, cpd_xray, cpd_dysp)

# Checking if the cpds are valid for the model.
cancer_model.check_model()

In [None]:
# Doing some simple queries on the network
cancer_model.is_active_trail('Pollution', 'Smoker')

In [None]:
cancer_model.is_active_trail('Pollution', 'Smoker', observed=['Cancer'])

In [None]:
cancer_model.local_independencies('Xray')

In [None]:
cancer_model.get_independencies()

In [None]:
from pgmpy.inference import VariableElimination
asia_infer = VariableElimination(cancer_model)


q = asia_infer.query(variables=['Xray'], evidence={'Smoker': 0, 'Dyspnoea':0, })
print(q['Xray'])