# Features of a Bayesian Network 

So far we have seen that:
* A Bayesian Network is a joint probability distribution of a set of random variables.

* A bayesian network consists of a directed acyclig graph (DAG) with the nodes representing random variables and the edges representing the conditional dependencies between random variables.

* A node is conditionaly dependent on its parent nodes. Top level parent nodes are conditionally independent of each other, that is, there is no flow in to them.

For a bayesian network, the following conditions must be satisfied:

    1. The graph is directed. The dependencies flow in a direction.
    2. It is acyclic. That is if you start from a node and traverse through the endges, you cannot end up with the starting node. 

The Bayesian Network for random variables X, Y and Z represents a joint distribution using the <b>Chain Rule</b>:


\begin{align}
 P(X) = \sqcap \space P({X}_{1} | Par({X}_{1})) \\
 where: \\
 X = \{ {x}_{1},{x}_{2},...,{x}_{n} \} \space are \space the \space nodes\\
 Par({X}_{i})) \space represents \space  the \space parents \space of \space the node {X}_{i} \\
 \end{align}


Let us consider three random variables X,Y and Z. 

<img src="../images/BN.png", style="width: 300px;">


From the definition of Bayesian Network, it follows that, 

P(X,Y,Z) = P(X) * P(Y | X) * P(z | X,Y) 

The above definition is for discrete random variables. 

For contnuous variables, the probability is replaced by PDF. For continuous variables, the distribution is a Probability Density Function.

\begin{align}
 f(X) = \sqcap \space f({X}_{1} | Par({X}_{1})) \\
 where: \\
 X = \{ {x}_{1},{x}_{2},...,{x}_{n} \} \space are \space the \space nodes\\
 Par({X}_{i})) \space represents \space  the \space parents \space of \space{X}_{i} \\
 \end{align}


Suppose we have 4 random variables W, X,Y and Z as illustrated below: 

<img src="../images/BN-linear.png", style="width: 300px;">


This is a representation of linear regression from 3 input variables W, X, Y which have an effect on the target variable Z. Another advantage of Bayesian Network is the representation of conditional probability. A node is conditionally independent on any non-descendant node given its parent. 

A Bayesian Network can be typically used:

1. To predict outcomes of causal effects (if the structure of the network is known)
2. To discover causal relationships (if the structure is unknown)  - for example back testing a scenario.


## Probablistic Influence - How does an Influence Flow through the Network?

Let us take an example of Property Valuation. For assessing a property's value let us suppose that the following parameters are evaluated.

1. L - Geographic Location - eg. city, near coast
2. N - Nature of property
3. U - Usability
4. R - Risk-Prone Area
5. Q-  Construction Quality 
6. A - Age of Construction 
7. G - Approval from local authorities
8. M - Maintenance of Property
9. V - Median Value of neighbourhood
10. P - Property Price

This is a simplified scenario compared to a real world valuation, but this demonstrates how a Bayesian Network could be applied for this situation.

The letters indicated are the random variables in question. P is the target random variable indicating the property price.

Let us construct a Bayesian Network based on the above scenario.

<img src="../images/Appraisal.png", style="width: 600px;">

* We can see an active trail from $L \to V \to P$. Hence the property price can be influenced by L since L influences V and V influences P. Similarly evidence of P can influence L via V: $L \leftarrow V \leftarrow P$ ** This is valid only if V is not observed. **

* Location can affect Risk-Prone area and Median value of neighborhood. We can flatten the portion of the graph to infer: $V \leftarrow L \to R$.

* V-Structures: $A \to U \leftarrow M$. A, M affect U and hence, A has no effect on M if U is not observed. Observing U, will reduce our possibilities of A, M and hence activates the trail from A to M or from M to A.


### V-Structures

v-structures are important to understand as not observing the intermediate node will block our trail. This makes sense as not knowing any information about Usability (U) has no effect on the probabilities of Construction Quality (Q), Age of construction (A) and Maintainence of property (M). However, knowing any information about usability will restrict the sample space of the joint probability, p(Q, A, M). Hence this means that observing U, will lead us to have the the random variables influencing each other.

* Observing the U or its children will activate the trail.

<img src="../images/v-structure1.png", style="width: 70vw;">


# Fraud Modeling Example with pgmpy

pgmpy is one of the popular packages to do Bayesian Network modeling. We shall continue to use the fraud modeling example to visualize our network. pgmpy is good for simpler problems, to visualize the indepencies and CPDs. It doesn't work very well for large dimensional problems. There are other toolkits which are available such as:

* WINMINE by Microsoft: https://www.microsoft.com/en-us/research/project/winmine-toolkit/
* pyro: Probabilistic Programming by Uber - https://github.com/uber/pyro

You can specify various conditional probability distributions by providing the evidence and number of variables. For example, to specify the gas CPD:

<img src="../images/bayesian_network.png", style="width: 800px;">

## Modeling with pgmpy library

One of the python libraries available online for modeling PGMs is the pgmpy library. You may view pgmpy's documentation to understand and learn all methods under this library (Link to documentation: http://pgmpy.org/index.html). We shall discuss some of the basic methods here:



### Storing and Retrieving Conditional Probability Distributions

Various conditional probability distributions (CPDs) are used in PGM modeling. It is required to visualize, store and retrieve these values in an efficient way so as to perform modeling easily. The 'TabularCPD' function within the pgmpy.factors.discrete sub-module allows storage and retrieval of CPDs in a tabular format. For our Fraud Modeling example we can create a CPD using the following code:

``` python
total_cpd = TabularCPD(
                variable='C',
                variable_card=2,
                values=[[.0001, .95, .0005, .95, .0004, .95, .002, .95, .002, .95, .001, .95],
                        [.9999, .05, .9995, .05, .9996, .05, .998, .05, .9998, .05, .999, .05]],
                evidence=['A', 'S', 'F'],
                evidence_card=[3, 2, 2])
                

gas_cpd = TabularCPD(variable='G',
                     variable_card=2,
                     values=[[.2, 0.01],
                           [.8, 0.99]],
                     evidence=['F'],
                     evidence_card=[2])

```

## Specify the CPDs

* Given the above examples, specify all CPDs for the fraud model:
* jewelry_cpd
* age_cpd
* fraud_cpd
* sex_cpd

In [20]:
from pgmpy.factors.discrete import TabularCPD
from pgmpy.models import BayesianModel


gas_cpd = TabularCPD(variable='G',
                     variable_card=2,
                     values=[[.2, 0.01],
                           [.8, 0.99]],
                     evidence=['F'],
                     evidence_card=[2])

Form the table for jewelry cpd by specifying the order as A, S and F. Use this table as entry points to the values.

In [21]:
fraud_cpd = TabularCPD(variable='F',
                       variable_card=2,
                       values=[[.1, .9]])

jewelry_cpd = TabularCPD(
                variable='J',
                variable_card=2,
                values=[[.2, .95, .05, .95, .04, .95, .02, .95, .02, .95, .1, .95],
                        [.8, .05, .95, .05, .96, .05, .98, .05, .98, .05, .9, .05]],
                evidence=['A', 'S', 'F'],
                evidence_card=[3, 2, 2])

age_cpd = TabularCPD(variable='A',
                     variable_card=3,
                     values=[[0.25, 0.4, 0.35]])

sex_cpd = TabularCPD(variable='S',
                    variable_card=2,
                    values=[[0.5, 0.5]])

In [22]:
ref_tmp_var = False

import numpy as np

jewelry_cpd_ = TabularCPD(
                variable='J',
                variable_card=2,
                values=[[.2, .95, .05, .95, .04, .95, .02, .95, .02, .95, .1, .95],
                        [.8, .05, .95, .05, .96, .05, .98, .05, .98, .05, .9, .05]],
                evidence=['A', 'S', 'F'],
                evidence_card=[3, 2, 2])

age_cpd_ = TabularCPD(variable='A',
                     variable_card=3,
                     values=[[0.25, 0.4, 0.35]])

sex_cpd_ = TabularCPD(variable='S',
                    variable_card=2,
                    values=[[0.5, 0.5]])

gas_cpd_ = TabularCPD(variable='G',
                     variable_card=2,
                     values=[[.2, 0.01],
                           [.8, 0.99]],
                     evidence=['F'],
                     evidence_card=[2])

try:
    if (np.all(gas_cpd.get_values() == gas_cpd_.get_values()) and
    (np.all(age_cpd.get_values() == age_cpd_.get_values()))):
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

## Building the Fraud Model

In the Fraud Modelling example, Jewelry node is a child node of Fraud, Age and Sex nodes. Similarly Gas is a child of Fraud node.
You can start building the Bayesian Model by specifying the dependencies in the Bayesian Network as arguments to BayesianModel() instance:
    
``` python
[('F', 'J'),
('A', 'J'),
('S', 'J'),
('F', 'G')]
```

* Assign the instance to fraud_model.

In [23]:
fraud_model = BayesianModel()

Use BayesianModel([('F', 'J'),
                   ('A', 'J'),
                   ('S', 'J'),
                   ('F', 'G')]) 

In [24]:
fraud_model = BayesianModel([('F', 'J'),
                             ('A', 'J'),
                             ('S', 'J'),
                             ('F', 'G')])

In [25]:
ref_tmp_var = False

a=1
try:
    if a == 1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

Now, the Bayesian model is built and stored within the variable fraud_model. In the subsequent sections, we will see how to use the bayesian model to store probabilistic dependencies between nodes (or variables) and how to make inferences on nodes based on observed evidences.

## Add CPDs

In Bayesian Networks, the relationship between nodes are specified by CPDs. In order to start working with a BayesianModel in pgmpy, we need to add CPDs created in the previous sections to the model object.

Add the pre-defined CPDs using BayesianModel's add_cpds() method and then validate the model.

In [26]:
fraud_model.add_cpds(jewelry_cpd, fraud_cpd, age_cpd, sex_cpd, gas_cpd)

In [27]:
fraud_model.check_model()

True

In [28]:
ref_tmp_var = False

a = 1
try:
    if a == 1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

Now that the CPDs are added to the model, we may need to verify them.

## Obtain CPDs, Leaves and Independencies

You can now look at the CPDs, leaves, independencies by invoking the BayesianModel's get_cpds(), get_leaves() and get_independencies() methods respectively.

In [29]:
fraud_model.get_cpds()

[<TabularCPD representing P(J:2 | A:3, S:2, F:2) at 0x126c5747c18>,
 <TabularCPD representing P(F:2) at 0x126c5747be0>,
 <TabularCPD representing P(A:3) at 0x126c546d048>,
 <TabularCPD representing P(S:2) at 0x126c546d0b8>,
 <TabularCPD representing P(G:2 | F:2) at 0x126c5744eb8>]

In [30]:
fraud_model.get_leaves()
fraud_model.get_independencies()

(F _|_ S, A)
(F _|_ S, A | G)
(F _|_ A | S)
(F _|_ S | A)
(F _|_ A | S, G)
(F _|_ S | A, G)
(J _|_ G | F)
(J _|_ G | S, F)
(J _|_ G | A, F)
(J _|_ G | A, S, F)
(A _|_ G, S, F)
(A _|_ S, G | F)
(A _|_ F, G | S)
(A _|_ S, F | G)
(A _|_ G | J, F)
(A _|_ G | S, F)
(A _|_ S | G, F)
(A _|_ F | S, G)
(A _|_ G | J, S, F)
(S _|_ A, G, F)
(S _|_ G, A | F)
(S _|_ F, A | G)
(S _|_ F, G | A)
(S _|_ G | J, F)
(S _|_ A | G, F)
(S _|_ G | A, F)
(S _|_ F | A, G)
(S _|_ G | A, J, F)
(G _|_ S, A)
(G _|_ J, S, A | F)
(G _|_ A | S)
(G _|_ S | A)
(G _|_ S, A | J, F)
(G _|_ J, A | S, F)
(G _|_ J, S | A, F)
(G _|_ A | J, S, F)
(G _|_ S | A, J, F)
(G _|_ J | A, S, F)

In [31]:
ref_tmp_var = False

a = 1
try:
    if a == 1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

## Verifying the CPDs

In order to verify the CPDs, we could use the get_cpds method on the fraud_model.

``` python
for cpd in fraud_model.get_cpds():
    print("CPD of {variable}:".format(variable=cpd.variable))
    print(cpd)
```

In [32]:
# Iterate over fraud_model.get_cpds()

In [33]:
for cpd in fraud_model.get_cpds():
    print("CPD of {variable}:".format(variable=cpd.variable))
    print(cpd)

CPD of J:
╒═════╤═════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤═════╤══════╕
│ A   │ A_0 │ A_0  │ A_0  │ A_0  │ A_1  │ A_1  │ A_1  │ A_1  │ A_2  │ A_2  │ A_2 │ A_2  │
├─────┼─────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼─────┼──────┤
│ S   │ S_0 │ S_0  │ S_1  │ S_1  │ S_0  │ S_0  │ S_1  │ S_1  │ S_0  │ S_0  │ S_1 │ S_1  │
├─────┼─────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼─────┼──────┤
│ F   │ F_0 │ F_1  │ F_0  │ F_1  │ F_0  │ F_1  │ F_0  │ F_1  │ F_0  │ F_1  │ F_0 │ F_1  │
├─────┼─────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼─────┼──────┤
│ J_0 │ 0.2 │ 0.95 │ 0.05 │ 0.95 │ 0.04 │ 0.95 │ 0.02 │ 0.95 │ 0.02 │ 0.95 │ 0.1 │ 0.95 │
├─────┼─────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼─────┼──────┤
│ J_1 │ 0.8 │ 0.05 │ 0.95 │ 0.05 │ 0.96 │ 0.05 │ 0.98 │ 0.05 │ 0.98 │ 0.05 │ 0.9 │ 0.05 │
╘═════╧═════╧══════╧══════╧══════╧══════╧══════╧══════╧══════╧══════╧══════╧═════╧══════╛


In [34]:
ref_tmp_var = False

a = 1
try:
    if a == 1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

## Computations of Probabilities

The next logical step will be the computation of probabilities and CPDs of various nodes within the Bayesian Model by specifying evidence.
This will give us inferences of different variables based on the evidences observed.

``` python

from pgmpy.inference.base import Inference
from pgmpy.factors import factor_product

import itertools


class SimpleInference(Inference):
    def query(self, var, evidence):
        # self.factors is a dict of the form of {node: [factors_involving_node]}
        factors_list = set(itertools.chain(*self.factors.values()))
        product = factor_product(*factors_list)
        reduced_prod = product.reduce(evidence, inplace=False)
        reduced_prod.normalize()
        var_to_marg = set(self.model.nodes()) - set(var) - set([state[0] for state in evidence])
        marg_prod = reduced_prod.marginalize(var_to_marg, inplace=False)
        return marg_prod
```


### Computing CPDs against Evidence

Let us first begin by observing the evidence of Age, where A=1 (middle category). Refer to the diagram below. In this case, Age is the variable being observed as evidence. Jewelry is the variable being inferred. What do we understand by this evidence?

* Query J|A=1 and assign to j

<img src="../images/fraud_model2.png", style="width: 500px;">

In [35]:
from pgmpy.inference.base import Inference
from pgmpy.factors import factor_product

import itertools


class SimpleInference(Inference):
    def query(self, var, evidence):
        # self.factors is a dict of the form of {node: [factors_involving_node]}
        factors_list = set(itertools.chain(*self.factors.values()))
        product = factor_product(*factors_list)
        reduced_prod = product.reduce(evidence, inplace=False)
        reduced_prod.normalize()
        var_to_marg = set(self.model.nodes()) - set(var) - set([state[0] for state in evidence])
        marg_prod = reduced_prod.marginalize(var_to_marg, inplace=False)
        return marg_prod

Use SimpleInference(fraud_model)

In [36]:
infer = SimpleInference(fraud_model)
j = infer.query(var=['J'], evidence=[('A', 1)])
print(j)

╒═════╤══════════╕
│ J   │   phi(J) │
╞═════╪══════════╡
│ J_0 │   0.8580 │
├─────┼──────────┤
│ J_1 │   0.1420 │
╘═════╧══════════╛


In [37]:
ref_tmp_var = False

import numpy as np

try:
    if abs(j.values[0] - 0.858) < 0.1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

What do we infer from the output of the code above? It is evident that, if Age is observed with A=1, it can be inferred that there is a high probability of Jewelr, J=0. That is, the probability of Jewelry being bough becomes very low.

# Conditional Independence

Two events A and B are independent if:

$$P(A∩B) = P(A) P(B) $$ and by extension

$$P(A|B) = P(A)$$

We can extend this to conditional independence. Two events A and B are conditionally independent given an event C with 
P(C)>0 if

$$P(A∩B|C)=P(A|C)P(B|C)$$

Let us say we have 3 random variables X, Y and Z.

By definition, X  and Y are conditionally independent [given Z] if given the knowledge of Z, probability of X gives no information on the probability of Y, and vice versa.

For random variables X and Y,


\begin{align}
 P \models     X \perp Y | Z \\
 if: \\
 P(X \cap Y   |  Z)  =  P(X | Z) * P(Y | Z) \\
 P(X | Y,Z)  =  P(X | Z) \\
 P(Y | X,Z)  =  P(Y | Z) \\
\end{align}

## Study of Conditional Independence

To study the conditional independence using pgmpy, let us look at individiual CPDs of gas and jewelry from the Fraud Modeling example. Please refer to the example from section, "Fraud Modeling Example using pgmpy."

What happens if evidence of Fraud is provided to Gas and Jewelry? Let us reassign evidence of Fraud, ('F', 0) to the infer.query() and see the results.

<img src="../images/fraud_model4.png", style="width: 500px;">

In [73]:
gas = infer.query(var=['G'], evidence=[])
print(gas)

jewelry = infer.query(var=['J'], evidence=[])
print(jewelry)

╒═════╤══════════╕
│ G   │   phi(G) │
╞═════╪══════════╡
│ G_0 │   0.0290 │
├─────┼──────────┤
│ G_1 │   0.9710 │
╘═════╧══════════╛
╒═════╤══════════╕
│ J   │   phi(J) │
╞═════╪══════════╡
│ J_0 │   0.8614 │
├─────┼──────────┤
│ J_1 │   0.1386 │
╘═════╧══════════╛


In [74]:
gas = infer.query(var=[('G')], evidence=[('F', 0)])
print(gas)

jewelry = infer.query(var=['J'], evidence=[('F', 0)])
print(jewelry)

╒═════╤══════════╕
│ G   │   phi(G) │
╞═════╪══════════╡
│ G_0 │   0.2000 │
├─────┼──────────┤
│ G_1 │   0.8000 │
╘═════╧══════════╛
╒═════╤══════════╕
│ J   │   phi(J) │
╞═════╪══════════╡
│ J_0 │   0.0642 │
├─────┼──────────┤
│ J_1 │   0.9357 │
╘═════╧══════════╛


In [75]:
ref_tmp_var = False

try:
    if (abs(gas.values[0] - 0.2) < 0.1) and (abs(jewelry.values[0] - 0.0643) < 0.1):
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

## V-Structure

With no evidence, $A \perp S \space | \space J$ To study V-structure, let us look at age, sex CPDs.


<img src="../images/v-structure2.png", style="width: 500px;">


### With Evidence

* What happens if evidence of Jewelry is observed? Use evidence ('J', 0) and see CPDs change.

In [76]:
age = infer.query(var=['A'], evidence=[])
print(age)

sex = infer.query(var=['S'], evidence=[])
print(sex)

╒═════╤══════════╕
│ A   │   phi(A) │
╞═════╪══════════╡
│ A_0 │   0.2500 │
├─────┼──────────┤
│ A_1 │   0.4000 │
├─────┼──────────┤
│ A_2 │   0.3500 │
╘═════╧══════════╛
╒═════╤══════════╕
│ S   │   phi(S) │
╞═════╪══════════╡
│ S_0 │   0.5000 │
├─────┼──────────┤
│ S_1 │   0.5000 │
╘═════╧══════════╛


Use evidence=[('J', 0)]

In [77]:
age = infer.query(var=['A'], evidence=[('J', 0)])
print(age)

sex = infer.query(var=['S'], evidence=[('J', 0)])
print(sex)

╒═════╤══════════╕
│ A   │   phi(A) │
╞═════╪══════════╡
│ A_0 │   0.2518 │
├─────┼──────────┤
│ A_1 │   0.3984 │
├─────┼──────────┤
│ A_2 │   0.3498 │
╘═════╧══════════╛
╒═════╤══════════╕
│ S   │   phi(S) │
╞═════╪══════════╡
│ S_0 │   0.5005 │
├─────┼──────────┤
│ S_1 │   0.4995 │
╘═════╧══════════╛


In [78]:
ref_tmp_var = False

try:
    if (abs(age.values[0] - 0.2518) < 0.1) and (abs(sex.values[0] - 0.5005) < 0.1):
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

As you can see above, on Jewelry being observed with J=0, Age tends to be lower A=0.
Similarly, the sex of the fraud perpetrators more likely seem to be male (S=0).

<img src="../images/fraud_model3.png", style="width: 500px;">

### Observe Jewelry=1

* Does age/sex cpds change for Jewelry=1?

In [79]:
age = infer.query(var=['A'], evidence=[('J', 0)])
print(age)

sex = infer.query(var=['S'], evidence=[('J', 0)])
print(sex)

╒═════╤══════════╕
│ A   │   phi(A) │
╞═════╪══════════╡
│ A_0 │   0.2518 │
├─────┼──────────┤
│ A_1 │   0.3984 │
├─────┼──────────┤
│ A_2 │   0.3498 │
╘═════╧══════════╛
╒═════╤══════════╕
│ S   │   phi(S) │
╞═════╪══════════╡
│ S_0 │   0.5005 │
├─────┼──────────┤
│ S_1 │   0.4995 │
╘═════╧══════════╛


Use ('J', 1) as evidence

In [80]:
age = infer.query(var=['A'], evidence=[('J', 1)])
print(age)

sex = infer.query(var=['S'], evidence=[('J', 1)])
print(sex)

╒═════╤══════════╕
│ A   │   phi(A) │
╞═════╪══════════╡
│ A_0 │   0.2390 │
├─────┼──────────┤
│ A_1 │   0.4099 │
├─────┼──────────┤
│ A_2 │   0.3511 │
╘═════╧══════════╛
╒═════╤══════════╕
│ S   │   phi(S) │
╞═════╪══════════╡
│ S_0 │   0.4968 │
├─────┼──────────┤
│ S_1 │   0.5032 │
╘═════╧══════════╛


In [81]:
ref_tmp_var = False

try:
    if (abs(age.values[0] - 0.2390) < 0.1) and (abs(sex.values[0] - 0.4968) < 0.1):
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

It is evident that up on Jewelry being observed with J=1, Age tends to be higher with higher probability of A=1 and A=2. Similarly, the sex of the fraud perpetrators more likely seem to be female (S=0).

Observing Jewelry, will block the trail from Age to Sex and vice versa. Now let us get to the next question,

## Does Sex Influence Age Without Observing Jewelry?

Let us observe evidence of Sex and see how Sex influences Age. First let us observe S=0. 

In [82]:
age_sex = infer.query(var=['A'], evidence=())

Use evidence=[('S', 0)]

In [83]:
age_sex = infer.query(var=['A'], evidence=[('S', 0)])
print(age_sex)

╒═════╤══════════╕
│ A   │   phi(A) │
╞═════╪══════════╡
│ A_0 │   0.2500 │
├─────┼──────────┤
│ A_1 │   0.4000 │
├─────┼──────────┤
│ A_2 │   0.3500 │
╘═════╧══════════╛


In [84]:
ref_tmp_var = False

try:
    if abs(age_sex.values[0] - 0.2500) < 0.1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

We can infer that Sex does not influence without observing jewelry. This means without Jewelry being observed, Age is independent of Sex.

<img src="../images/fraud_model6.png", style="width: 500px;">


## Does Age Influence Sex Without Observing Jewelry?

In [85]:

sex_age = infer.query(var=['S'], evidence=[])
print(sex_age)

╒═════╤══════════╕
│ S   │   phi(S) │
╞═════╪══════════╡
│ S_0 │   0.5000 │
├─────┼──────────┤
│ S_1 │   0.5000 │
╘═════╧══════════╛


Use evidence=[('A', 0)]

In [86]:
sex_age = infer.query(var=['S'], evidence=[('A', 0)])
print(sex_age)

╒═════╤══════════╕
│ S   │   phi(S) │
╞═════╪══════════╡
│ S_0 │   0.5000 │
├─────┼──────────┤
│ S_1 │   0.5000 │
╘═════╧══════════╛


In [87]:
ref_tmp_var = False

try:
    if abs(sex_age.values[0] - 0.500) < 0.1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

We can also infer that Age does not influence Sex without observing jewelry. This means without Jewelry being observed, Sex is independent of Age.

## What if We Observe Jewelry? Does the Influence Flow from Sex to Age?


<img src="../images/fraud_model3.png", style="width: 500px;">

Let us look at the fraud modeling example, with Jewelry being observed. What do we infer about the flow from Sex to age and vice versa?  

In [88]:
age_sex = infer.query(var=['A'], evidence=[('S', 0)])
print(age_sex)

╒═════╤══════════╕
│ A   │   phi(A) │
╞═════╪══════════╡
│ A_0 │   0.2500 │
├─────┼──────────┤
│ A_1 │   0.4000 │
├─────┼──────────┤
│ A_2 │   0.3500 │
╘═════╧══════════╛


Use ('J', 0) as additional evidence.

In [89]:
age_sex = infer.query(var=['A'], evidence=[('S', 0), ('J', 0)])
print(age_sex)

╒═════╤══════════╕
│ A   │   phi(A) │
╞═════╪══════════╡
│ A_0 │   0.2537 │
├─────┼──────────┤
│ A_1 │   0.3985 │
├─────┼──────────┤
│ A_2 │   0.3478 │
╘═════╧══════════╛


In [90]:
ref_tmp_var = False

try:
    if abs(age_sex.values[0] - 0.2537) < 0.1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

It is clear that with Jewelry being observed Age does influence Sex. In other words, Sex is not conditionally independent on Age given Jewelry.

## Observing Jewelry - Does the Influence Flow from Age to Sex?

From the above case, let us look at influence flow from Age to Sex, given Jewelry.

In [91]:
sex_age = infer.query(var=['S'], evidence=[])
print(sex_age)

╒═════╤══════════╕
│ S   │   phi(S) │
╞═════╪══════════╡
│ S_0 │   0.5000 │
├─────┼──────────┤
│ S_1 │   0.5000 │
╘═════╧══════════╛


Use evidence=('A', 0), ('J', 0)

In [92]:
sex_age = infer.query(var=['S'], evidence=[('A', 0), ('J', 0)])
print(sex_age)

╒═════╤══════════╕
│ S   │   phi(S) │
╞═════╪══════════╡
│ S_0 │   0.5043 │
├─────┼──────────┤
│ S_1 │   0.4957 │
╘═════╧══════════╛


In [93]:
ref_tmp_var = False

try:
    if abs(sex_age.values[0] - 0.5043) < 0.1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

Similarly, we find that on Jewelry being observed with J=0 (jewelry not being bought), the sex tends to be S=0 or male being the perpetrator of fraud given the evidence of A=0 or lowest age group.

## Conditional Independence - Does Jewelry Influence Gas?

Let us now look at the conditional independence of Jewelry and Gas.

First let us take the evidence of J=0 (no jewelry being bought) and F=0(No fraud happening). What do we infer about Gas being bought?

In [94]:
gas_j = infer.query(var=['G'], evidence=[('J', 0)])
print(gas_j)

╒═════╤══════════╕
│ G   │   phi(G) │
╞═════╪══════════╡
│ G_0 │   0.0114 │
├─────┼──────────┤
│ G_1 │   0.9886 │
╘═════╧══════════╛


In [95]:
gas_j = infer.query(var=['G'], evidence=[('J', 0), ('F', 0)])
print(gas_j)

╒═════╤══════════╕
│ G   │   phi(G) │
╞═════╪══════════╡
│ G_0 │   0.2000 │
├─────┼──────────┤
│ G_1 │   0.8000 │
╘═════╧══════════╛


In [96]:
ref_tmp_var = False

try:
    if abs(gas_j.values[0] - 0.2000) < 0.1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

In general, there is a high probability that Gas being bought if Jewelry is not being bought, where there Fraud is being observed (F=0).

## p(J|G) if Fraud is Not Observed

<img src="../images/fraud_model7.png", style="width: 500px;">

From the above diagram, let us take the case of Fraud not being observed. What is the impact of Gas on Jewelry?

First let us look at the case of Gas being bought (G=1).

In [38]:
jewel_g = infer.query(var=['J'], evidence=[])
print(jewel_g)

╒═════╤══════════╕
│ J   │   phi(J) │
╞═════╪══════════╡
│ J_0 │   0.8614 │
├─────┼──────────┤
│ J_1 │   0.1386 │
╘═════╧══════════╛


In [98]:
jewel_g = infer.query(var=['J'], evidence=[('G', 1)])
print(jewel_g)

╒═════╤══════════╕
│ J   │   phi(J) │
╞═════╪══════════╡
│ J_0 │   0.8770 │
├─────┼──────────┤
│ J_1 │   0.1230 │
╘═════╧══════════╛


In [99]:
ref_tmp_var = False

try:
    if abs(jewel_g.values[0] - 0.8770) < 0.1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

In this case there is a very high probability that Jewelry is not being bought, when Fraud is not observed.

## p(J|G) if Fraud is Not Observed

What happens when Gas is not bought (G=0)? Let us set the evidence of G = 0.

In [100]:
jewel_g = infer.query(var=['J'], evidence=[])
print(jewel_g)

╒═════╤══════════╕
│ J   │   phi(J) │
╞═════╪══════════╡
│ J_0 │   0.8614 │
├─────┼──────────┤
│ J_1 │   0.1386 │
╘═════╧══════════╛


Use ('G', 0)

In [101]:
jewel_g = infer.query(var=['J'], evidence=[('G', 0)])
print(jewel_g)

╒═════╤══════════╕
│ J   │   phi(J) │
╞═════╪══════════╡
│ J_0 │   0.3391 │
├─────┼──────────┤
│ J_1 │   0.6609 │
╘═════╧══════════╛


In [102]:
ref_tmp_var = False

try:
    if abs(jewel_g.values[0] - 0.3391) < 0.1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

There is a significant increase in the probability that Jewel is being bought.

When Fraud is not observed, Jewelry and Gas have a negative correlation. In the case of Gas being bought, given no knowledge about Fraud, based on the results we can say that Jewelry is not being bought. Similary, when the probability of Gas not being bought is high, it can be seen that probability of Jewelry being bought is high.


## p(J|G) if Fraud is Observed

Let us turn our attention to what happens between Gas and Jewelry, when Fraud is being observed.

<img src="../images/fraud_model4.png", style="width: 500px;">


More precisely, what happens when we know there is no fraud (F=0).

In [103]:
jewel_g = infer.query(var=['J'], evidence=[('G', 0)])
print(jewel_g)

╒═════╤══════════╕
│ J   │   phi(J) │
╞═════╪══════════╡
│ J_0 │   0.3391 │
├─────┼──────────┤
│ J_1 │   0.6609 │
╘═════╧══════════╛


Use ('G', 0)

In [104]:
jewel_g = infer.query(var=['J'], evidence=[('G', 0), ('F', 0)])
print(jewel_g)

╒═════╤══════════╕
│ J   │   phi(J) │
╞═════╪══════════╡
│ J_0 │   0.0643 │
├─────┼──────────┤
│ J_1 │   0.9358 │
╘═════╧══════════╛


In [105]:
ref_tmp_var = False

try:
    if abs(jewel_g.values[0] - 0.0643) < 0.1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

It becomes very clear from the results, that there is a significant trend to buy Jewelry when Gas is not bought, when Fraud is observed (F=0).

## Jewelry Given Gas Transaction = 1 and Fraud Observed Together

<img src="../images/fraud_model4.png", style="width: 500px;">

Similarly it can be seen from below that, even in the case that Gas is bought and Fraud is observed, the significant trend to buy Jewelry remains. This confirms our inference that Gas is conditionally independent on Jewelry given an observation of Fraud.

In [106]:
jewel_g = infer.query(var=['J'], evidence=[])
print(jewel_g)

╒═════╤══════════╕
│ J   │   phi(J) │
╞═════╪══════════╡
│ J_0 │   0.8614 │
├─────┼──────────┤
│ J_1 │   0.1386 │
╘═════╧══════════╛


Use evidence of ('G', 1), ('F', 0)

In [107]:
jewel_g = infer.query(var=['J'], evidence=[('G', 1), ('F', 0)])
print(jewel_g)

╒═════╤══════════╕
│ J   │   phi(J) │
╞═════╪══════════╡
│ J_0 │   0.0643 │
├─────┼──────────┤
│ J_1 │   0.9358 │
╘═════╧══════════╛


In [108]:
ref_tmp_var = False

try:
    if abs(jewel_g.values[0] - 0.0643) < 0.1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

However, it is also seen that when Fraud is observed (F=0) and also Gas is bought, the trend to buy Jewelry continues.
This shows that when Fraud is being observed, Jewelry is independent of Gas. ($ J \perp G \space | \space  F $)

## Probability of Gas given J=1, F=0

Let us also look at the way Jewelry impacts Gas when Fraud is observed (F=0). You can infer that Gas is independent of Jewelry when Fraud is observed.  ($ G \perp J \space | \space  F $)

In [109]:
gas_j = infer.query(var=['G'], evidence=[])
print(gas_j)

╒═════╤══════════╕
│ G   │   phi(G) │
╞═════╪══════════╡
│ G_0 │   0.0290 │
├─────┼──────────┤
│ G_1 │   0.9710 │
╘═════╧══════════╛


Use ('J', 1), ('F', 0) 

In [110]:
gas_j = infer.query(var=['G'], evidence=[('J', 1), ('F', 0)])
print(gas_j)

╒═════╤══════════╕
│ G   │   phi(G) │
╞═════╪══════════╡
│ G_0 │   0.2000 │
├─────┼──────────┤
│ G_1 │   0.8000 │
╘═════╧══════════╛


In [111]:
ref_tmp_var = False

try:
    if abs(gas_j.values[0] - 0.2) < 0.1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var