# Fraud Modeling Example with pgmpy

pgmpy is one of the popular packages to do Bayesian Network modeling. We shall continue to use the fraud modeling example to visualize our network. pgmpy is good for simpler problems, to visualize the indepencies and CPDs. It doesn't work very well for large dimensional problems. There are other toolkits which are available such as:

* WINMINE by Microsoft: https://www.microsoft.com/en-us/research/project/winmine-toolkit/
* pyro: Probabilistic Programming by Uber - https://github.com/uber/pyro

You can specify various conditional probability distributions by providing the evidence and number of variables. For example, to specify the gas CPD:

<img src="../images/bayesian_network.png", style="width: 800px;">


``` python

gas_cpd = TabularCPD(variable='G',
                     variable_card=2,
                     values=[[.2, 0.01],
                           [.8, 0.99]],
                     evidence=['F'],
                     evidence_card=[2])

```

The fraud CPD can be specified as:

``` python

fraud_cpd = TabularCPD(variable='F',
                       variable_card=2,
                       values=[[.1, .9]])
```

## Specify the CPDs

* Given the above examples, specify all CPDs for the fraud model:
* jewelry_cpd
* age_cpd
* fraud_cpd
* sex_cpd

In [1]:
from pgmpy.factors.discrete import TabularCPD
from pgmpy.models import BayesianModel


gas_cpd = TabularCPD(variable='G',
                     variable_card=2,
                     values=[[.2, 0.01],
                             [.8, 0.99]],
                     evidence=['F'],
                     evidence_card=[2])

fraud_cpd = TabularCPD(variable='F',
                       variable_card=2,
                       values=[[.1, .9]])

Form the table for jewelry cpd by specifying the order as A, S and F. Use this table as entry points to the values.

In [2]:
jewelry_cpd = TabularCPD(
                variable='J',
                variable_card=2,
                values=[[.2, .95, .05, .95, .04, .95, .02, .95, .02, .95, .1, .95],
                        [.8, .05, .95, .05, .96, .05, .98, .05, .98, .05, .9, .05]],
                evidence=['A', 'S', 'F'],
                evidence_card=[3, 2, 2])

age_cpd = TabularCPD(variable='A',
                     variable_card=3,
                     values=[[0.25, 0.4, 0.35]])

sex_cpd = TabularCPD(variable='S',
                    variable_card=2,
                    values=[[0.5, 0.5]])

gas_cpd = TabularCPD(variable='G',
                     variable_card=2,
                     values=[[.2, 0.01],
                             [.8, 0.99]],
                     evidence=['F'],
                     evidence_card=[2])

In [3]:
ref_tmp_var = False

import numpy as np

jewelry_cpd_ = TabularCPD(
                variable='J',
                variable_card=2,
                values=[[.2, .95, .05, .95, .04, .95, .02, .95, .02, .95, .1, .95],
                        [.8, .05, .95, .05, .96, .05, .98, .05, .98, .05, .9, .05]],
                evidence=['A', 'S', 'F'],
                evidence_card=[3, 2, 2])

age_cpd_ = TabularCPD(variable='A',
                     variable_card=3,
                     values=[[0.25, 0.4, 0.35]])

sex_cpd_ = TabularCPD(variable='S',
                    variable_card=2,
                    values=[[0.5, 0.5]])

gas_cpd_ = TabularCPD(variable='G',
                     variable_card=2,
                     values=[[.2, 0.01],
                             [.8, 0.99]],
                     evidence=['F'],
                     evidence_card=[2])

try:
    if (np.all(gas_cpd.get_values() == gas_cpd_.get_values()) and
    (np.all(age_cpd.get_values() == age_cpd_.get_values()))):
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

## Building the Fraud Model

You can by specify the dependencies in the Bayesian Network as arguments to BayesianModel() instance:
    
``` python
[('F', 'J'),
('A', 'J'),
('S', 'J'),
('F', 'G')]
```

* Assign the instance to fraud_model.

In [4]:
fraud_model = BayesianModel()

Use BayesianModel([('F', 'J'),
                   ('A', 'J'),
                   ('S', 'J'),
                   ('F', 'G')]) 

In [5]:
fraud_model = BayesianModel()
fraud_model = BayesianModel([('F', 'J'),
                             ('A', 'J'),
                             ('S', 'J'),
                             ('F', 'G')])

In [6]:
ref_tmp_var = False

a =1
try:
    if a == 1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

## Add CPDs

Add CPDs using add_cpds() and validate the model.

In [7]:
fraud_model.add_cpds(jewelry_cpd, fraud_cpd, age_cpd, sex_cpd, gas_cpd)

In [8]:
fraud_model.check_model()

True

In [9]:
ref_tmp_var = False

a = 1
try:
    if a == 1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

## Obtain CPDs, Leaves and Independencies

You can now look at the CPDs, leaves, independencies.

In [10]:
fraud_model.get_cpds()

[<TabularCPD representing P(J:2 | A:3, S:2, F:2) at 0x187d78c0a90>,
 <TabularCPD representing P(F:2) at 0x187d4083898>,
 <TabularCPD representing P(A:3) at 0x187d78c0a58>,
 <TabularCPD representing P(S:2) at 0x187d78c0ac8>,
 <TabularCPD representing P(G:2 | F:2) at 0x187d78c0b70>]

In [11]:
fraud_model.get_leaves()
fraud_model.get_independencies()

(F _|_ A, S)
(F _|_ A, S | G)
(F _|_ S | A)
(F _|_ A | S)
(F _|_ S | G, A)
(F _|_ A | G, S)
(J _|_ G | F)
(J _|_ G | A, F)
(J _|_ G | S, F)
(J _|_ G | A, S, F)
(A _|_ G, S, F)
(A _|_ S, F | G)
(A _|_ G, F | S)
(A _|_ G, S | F)
(A _|_ F | G, S)
(A _|_ S | G, F)
(A _|_ G | S, F)
(A _|_ G | F, J)
(A _|_ G | S, F, J)
(S _|_ G, A, F)
(S _|_ A, F | G)
(S _|_ G, F | A)
(S _|_ G, A | F)
(S _|_ F | G, A)
(S _|_ A | G, F)
(S _|_ G | A, F)
(S _|_ G | F, J)
(S _|_ G | A, F, J)
(G _|_ A, S)
(G _|_ S | A)
(G _|_ A | S)
(G _|_ A, S, J | F)
(G _|_ S, J | A, F)
(G _|_ A, J | S, F)
(G _|_ A, S | F, J)
(G _|_ J | A, S, F)
(G _|_ S | A, F, J)
(G _|_ A | S, F, J)

In [12]:
ref_tmp_var = False

a = 1
try:
    if a == 1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

## Verifying the CPDs

``` python
for cpd in fraud_model.get_cpds():
    print("CPD of {variable}:".format(variable=cpd.variable))
    print(cpd)
```

In [13]:
# Iterate over fraud_model.get_cpds()

In [14]:
for cpd in fraud_model.get_cpds():
    print("CPD of {variable}:".format(variable=cpd.variable))
    print(cpd)

CPD of J:
╒═════╤═════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤══════╤═════╤══════╕
│ A   │ A_0 │ A_0  │ A_0  │ A_0  │ A_1  │ A_1  │ A_1  │ A_1  │ A_2  │ A_2  │ A_2 │ A_2  │
├─────┼─────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼─────┼──────┤
│ S   │ S_0 │ S_0  │ S_1  │ S_1  │ S_0  │ S_0  │ S_1  │ S_1  │ S_0  │ S_0  │ S_1 │ S_1  │
├─────┼─────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼─────┼──────┤
│ F   │ F_0 │ F_1  │ F_0  │ F_1  │ F_0  │ F_1  │ F_0  │ F_1  │ F_0  │ F_1  │ F_0 │ F_1  │
├─────┼─────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼─────┼──────┤
│ J_0 │ 0.2 │ 0.95 │ 0.05 │ 0.95 │ 0.04 │ 0.95 │ 0.02 │ 0.95 │ 0.02 │ 0.95 │ 0.1 │ 0.95 │
├─────┼─────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼─────┼──────┤
│ J_1 │ 0.8 │ 0.05 │ 0.95 │ 0.05 │ 0.96 │ 0.05 │ 0.98 │ 0.05 │ 0.98 │ 0.05 │ 0.9 │ 0.05 │
╘═════╧═════╧══════╧══════╧══════╧══════╧══════╧══════╧══════╧══════╧══════╧═════╧══════╛


In [15]:
ref_tmp_var = False

a = 1
try:
    if a == 1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var

## Computations of Probabilities

``` python

from pgmpy.inference.base import Inference
from pgmpy.factors import factor_product

import itertools


class SimpleInference(Inference):
    def query(self, var, evidence):
        # self.factors is a dict of the form of {node: [factors_involving_node]}
        factors_list = set(itertools.chain(*self.factors.values()))
        product = factor_product(*factors_list)
        reduced_prod = product.reduce(evidence, inplace=False)
        reduced_prod.normalize()
        var_to_marg = set(self.model.nodes()) - set(var) - set([state[0] for state in evidence])
        marg_prod = reduced_prod.marginalize(var_to_marg, inplace=False)
        return marg_prod
```

### Computing CPDs against Evidence

* Query J|A=1 and assign to j.

<img src="../images/fraud_model2.png", style="width: 500px;">

In [16]:
from pgmpy.inference.base import Inference
from pgmpy.factors import factor_product

import itertools


class SimpleInference(Inference):
    def query(self, var, evidence):
        # self.factors is a dict of the form of {node: [factors_involving_node]}
        factors_list = set(itertools.chain(*self.factors.values()))
        product = factor_product(*factors_list)
        reduced_prod = product.reduce(evidence, inplace=False)
        reduced_prod.normalize()
        var_to_marg = set(self.model.nodes()) - set(var) - set([state[0] for state in evidence])
        marg_prod = reduced_prod.marginalize(var_to_marg, inplace=False)
        return marg_prod

Use SimpleInference(fraud_model)

In [17]:
infer = SimpleInference(fraud_model)
j = infer.query(var=['J'], evidence=[('A', 1)])
print(j)

╒═════╤══════════╕
│ J   │   phi(J) │
╞═════╪══════════╡
│ J_0 │   0.8580 │
├─────┼──────────┤
│ J_1 │   0.1420 │
╘═════╧══════════╛


In [19]:
ref_tmp_var = False

import numpy as np

try:
    if abs(j.values[0] - 0.858) < 0.1:
        ref_assert_var = True
        ref_tmp_var = True
    else:
        ref_assert_var = False
        print('Please follow the instructions given and use the same variables provided in the instructions.')
except Exception:
    print('Please follow the instructions given and use the same variables provided in the instructions.')

assert ref_tmp_var