# Lab 06 - Probability based learning

In this lab we are going to practise Naive Bayes and Bayesian networks. First let's import necessary libraries and a
dataset.

### Import libraries

In [1]:
import numpy as np
import pandas as pd

from pomegranate import DiscreteDistribution, ConditionalProbabilityTable, State, BayesianNetwork

from sklearn import datasets
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

In [2]:
dataset, target = datasets.load_iris(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(dataset, target)

## Gaussian Naive Bayes algorithm

The sci-kit learn package has a few implementations of the Naive Bayes algorithm. You can find them under sklearn.naive_bayes. For this dataset we are going to use Gaussian Naive Bayes algorithm. This algorithm is used for classification tasks with
numerical features. It uses the following likelihood function:

$$
P(x_i \mid y) = \frac{1}{\sqrt{2\pi\sigma^2_y}} \exp\left(-\frac{(x_i - \mu_y)^2}{2\sigma^2_y}\right)
$$

You can find the documentation for Gaussian Naive Bayes algorithm
[here](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB).

In [3]:
gnb = GaussianNB()

Now we have to train our classifier. Since we are using GaussianNB classifier, all of your features should be numerical
features. So if you have categorical data, you have to use an encoder and encode it in numerical format. Since iris
dataset only has 4 numerical features we can directly use our dataset to train our classifier.

In [4]:
gnb.fit(X_train, y_train)

GaussianNB()

Let's evaluate the Naive Bayes model we trained using the test set.

In [5]:
pred = gnb.predict(X_test)

print(classification_report(y_pred=pred, y_true=y_test))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       0.94      0.94      0.94        16
           2       0.90      0.90      0.90        10

    accuracy                           0.95        38
   macro avg       0.95      0.95      0.95        38
weighted avg       0.95      0.95      0.95        38



### Task 1
* Train a Gaussian Naive Bayes model using the dataset you cleaned in lab 02 and 03 and measure the performance. Use
proper encoding techniques for
 the categorical data.
* Train a [Categorical Naive Bayes](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.CategoricalNB.html) model using only the categorical features of your dataset.
    * Compare the model performance for following $\alpha$ values.
        * $\alpha$ = 0
        * $\alpha$ = 0.5
        * $\alpha$ = 1
        * $\alpha$ = 2
* Split the data into each of the following  percentages and check the variation of performance of the model.
    * 30% test and 70% train
    * 40% test and 60% train

## Bayesian Networks

*This is optional and a little advanced exercise for students who like to invest more time on coding and
experimenting. This would require some extra reading on the part of the student.*

Bayesian networks are a powerful inference tool, in which
* a set of variables are represented as nodes
* an edge represents a dependence between the two variables

We will use pomegranate library to create Bayesian Network. Unfortunately, it is not installed along with Anaconda. So you
have to install it to your python environment. You can use the installation instructions given here:

https://pomegranate.readthedocs.io/en/latest/install.html

While Bayesian networks can have extremely complex emission probabilities - usually Gaussian or conditional Gaussian
distributions - pomegranate currently supports only discrete Bayesian networks.


### The Monty hall problem

The Monty Hall problem arose from the TV gameshow _Let's Make a Deal_, where a guest had to choose which one of three doors
had a prize behind it. The twist was that after the guest chose, the host, originally Monty Hall, would then open one of
 the doors the guest **did not pick** and ask if the guest wanted to _switch_ the door they had picked. What should the
 guest do?

#### Modeling the problem
Let's try to solve it using bayesian networks. We need bayesian network with 3 nodes, one for guest, one for prize
and one for Monty(host).

For a discrete (aka categorical) bayesian network we use DiscreteDistribution objects for the root nodes and
ConditionalProbabilityTable objects for the inner and leaf nodes.

The door the guest initially chooses, and the door behind which the prize is, are completely random processes across the three
doors. We can model these using Discrete distributions. A discrete distribution, made up of characters and their
probabilities, assuming that these probabilities will sum to 1.0.

In [6]:
# The guests initial door selection is completely random
guest = DiscreteDistribution({'A': 1./3, 'B': 1./3, 'C': 1./3})

# The door the prize is behind is also completely random
prize = DiscreteDistribution({'A': 1./3, 'B': 1./3, 'C': 1./3})

The door which Monty opens is dependent on both the door the guest chooses (it cannot be the door the guest chooses),
and the door the prize is behind (it cannot be the door with the prize behind it). We can model this using
Conditional probability tables. The columns in a ConditionalProbabilityTable correspond to the order in which the
parents (the second argument) are specified, and the last column is the value the ConditionalProbabilityTable itself takes.

In [7]:
# Monty is dependent on both the guest and the prize.
monty = ConditionalProbabilityTable(
        [[ 'A', 'A', 'A', 0.0 ],
         [ 'A', 'A', 'B', 0.5 ],
         [ 'A', 'A', 'C', 0.5 ],
         [ 'A', 'B', 'A', 0.0 ],
         [ 'A', 'B', 'B', 0.0 ],
         [ 'A', 'B', 'C', 1.0 ],
         [ 'A', 'C', 'A', 0.0 ],
         [ 'A', 'C', 'B', 1.0 ],
         [ 'A', 'C', 'C', 0.0 ],
         [ 'B', 'A', 'A', 0.0 ],
         [ 'B', 'A', 'B', 0.0 ],
         [ 'B', 'A', 'C', 1.0 ],
         [ 'B', 'B', 'A', 0.5 ],
         [ 'B', 'B', 'B', 0.0 ],
         [ 'B', 'B', 'C', 0.5 ],
         [ 'B', 'C', 'A', 1.0 ],
         [ 'B', 'C', 'B', 0.0 ],
         [ 'B', 'C', 'C', 0.0 ],
         [ 'C', 'A', 'A', 0.0 ],
         [ 'C', 'A', 'B', 1.0 ],
         [ 'C', 'A', 'C', 0.0 ],
         [ 'C', 'B', 'A', 1.0 ],
         [ 'C', 'B', 'B', 0.0 ],
         [ 'C', 'B', 'C', 0.0 ],
         [ 'C', 'C', 'A', 0.5 ],
         [ 'C', 'C', 'B', 0.5 ],
         [ 'C', 'C', 'C', 0.0 ]], [guest, prize])

Here ‘A’, ‘B’, ‘C’, represent the doors picked by the guest, prize door and the door picked by Monty respectively.

Now we have to create three nodes for the network using the above created distributions.

In [8]:
s1 = State(guest, name="guest")
s2 = State(prize, name="prize")
s3 = State(monty, name="monty")

Then we have to initialize a Bayesian network object. We can give it a meaningful name.

In [9]:
model = BayesianNetwork("Monty Hall Problem")

Let's add our three nodes to the graph.

In [10]:
model.add_states(s1, s2, s3)

Now we have to add the edges to the model. Edges are added from parent to child. So second node should be the child
of the first node.

In [11]:
model.add_edge(s1, s3)
model.add_edge(s2, s3)

To finalize the network creation we have to call the bake() method.

In [12]:
model.bake()

#### Predicting probabilities
We can calculate the probability of each scenario using the network we created. Let's calculate the probability of the following scenario - Guest initially said door A, Monty then opened door C, but the actual car was behind door B.

In [13]:
model.probability([['A', 'B', 'C']]) 

0.11111111111111109

Let's see the probability when guest chooses 'A', Monty chooses 'C' and the prize is in the same door that guest selected.

In [14]:
model.probability([['A', 'A', 'C']])

0.05555555555555554

#### Predicting Solution

Let's say guest chooses door A.

In [15]:
model.predict_proba([['A', None, None]])

[array(['A',
        {
     "class" : "Distribution",
     "dtype" : "str",
     "name" : "DiscreteDistribution",
     "parameters" : [
         {
             "A" : 0.3333333333333333,
             "B" : 0.3333333333333333,
             "C" : 0.3333333333333333
         }
     ],
     "frozen" : false
 },
        {
     "class" : "Distribution",
     "dtype" : "str",
     "name" : "DiscreteDistribution",
     "parameters" : [
         {
             "B" : 0.49999999999999983,
             "A" : 0.0,
             "C" : 0.49999999999999983
         }
     ],
     "frozen" : false
 }], dtype=object)]

You can see that still the probability distribution over the price is same for all doors (0.33).
But Monty cannot choose gate A because guest opened it.

Let's say Monty chose door 'C', then what will be the new probability distribution for price.

In [16]:
model.predict_proba([{'guest': 'A', 'monty': 'C'}])

[array(['A',
        {
     "class" : "Distribution",
     "dtype" : "str",
     "name" : "DiscreteDistribution",
     "parameters" : [
         {
             "A" : 0.3333333333333334,
             "B" : 0.6666666666666664,
             "C" : 0.0
         }
     ],
     "frozen" : false
 },
        'C'], dtype=object)]

So There is a 2/3rd chance that the price is behind door B. So if the contestant chooses to change his/her pick,
he/she has twice the chance to win, compared to not changing the initial pick.

### Task 2 (Optional)
Code the example in the slide 60 in the Probability based learning - II lecture slides and get the answer.