<center><h2>Artificial and Computational Intelligence (Assignment - 2)</h2></center>

## Problem Statement

As part of the 2nd Assignment, we'll implement Bayesian Networks and also learn to use the pomegranate library.

You are required to create a bayesian network model which would help you predict the probability. The detailed problem description is attached as a PDF as a part of this assignment along with the marking scheme.  

### What is a Bayesian Network ?

A Bayesian network, Bayes network, belief network, decision network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). 

Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. 

### Dataset

The dataset can be downloaded from https://drive.google.com/drive/folders/1oMtKmmvPkN4O8DmrHMJe6M8CbB93Z5kw .You can access it only using your BITS IDs. Also, the same dataset is attached along with the assignment. 

#### Dataset Description
##### Sample Tuple

Y	won	5wickets	lost	2nd	vWest_Indies	Home	6-Nov-11

##### Explanation
- The first column represents if Ashwin was in the playing 11 or not. 
- The second column represents the Result of the match . win indicates India won the match.
- The third column represents the Margin of victory / losss.
- The fourth column represents the results of the toss. won indicates India won the toss. 
- The fifth column represents the batting order. If India batted 1st or 2nd. 
- The sixth column represents the opponent.
- The seventh column represents the location of the match. If the match was held in Home(India) or away. 
- The last column represents the start date of the match.


In [1]:
#Import libraries
import pandas as pd
from pomegranate import *

In [4]:
#Read data
df = pd.read_excel('India_Test_Stats.xlsx')

In [5]:
#Pre-process data (Whatever you feel might be required)
df.drop(['Margin'] , axis = 1 , inplace=True)
df.drop(['Opposition'] , axis = 1 , inplace=True)
df.drop(['Start Date'] , axis = 1 , inplace=True)

In [54]:
#Data Description
df.describe()

Unnamed: 0,Ashwin,Result,Toss,Bat,Location
count,85,85,85,85,85
unique,2,3,2,2,2
top,Y,won,lost,1st,Home
freq,70,47,45,46,43


In [28]:
#function to calculate prior probability of any given variable
def prior_prob(col_array):
    col_list = list(col_array)
    values = set(col_list)
    dic = {}
    for val in values:
        dic[val] = col_list.count(val)/len(col_list)
    return dic

In [46]:
#Solution for part 1 
prior_prob(df.Location)

{'Home': 0.5058823529411764, 'Away': 0.49411764705882355}

In [53]:
#function to calculate conditional probability
def posterior_prob(prev_array,mid_array,last_array):
    post_prob = []
    prev_list = list(set(df[prev_array]))
    mid_list=list(set(df[mid_array]))
    
    last_list = list(set(df[last_array]))
    
    for prev_item in prev_list:
        for mid_item in mid_list:
            for last_item in last_list:
            
                num = len(df[(df[prev_array]==prev_item) & (df[mid_array] == mid_item) & (df[last_array] == last_item)])
                den = len(df[df[prev_array]==prev_item]) + len(df[df[mid_array]==mid_item])
                temp_prob = num/den
                post_prob.append([prev_item ,mid_item ,last_item , temp_prob ])
    return post_prob

In [52]:
#Solution for part 2 
posterior_prob('Location','Ashwin','Result')

[['Home', 'Y', 'draw', 0.061946902654867256],
 ['Home', 'Y', 'won', 0.2920353982300885],
 ['Home', 'Y', 'lost', 0.02654867256637168],
 ['Home', 'N', 'draw', 0.0],
 ['Home', 'N', 'won', 0.0],
 ['Home', 'N', 'lost', 0.0],
 ['Away', 'Y', 'draw', 0.05357142857142857],
 ['Away', 'Y', 'won', 0.08035714285714286],
 ['Away', 'Y', 'lost', 0.10714285714285714],
 ['Away', 'N', 'draw', 0.05263157894736842],
 ['Away', 'N', 'won', 0.08771929824561403],
 ['Away', 'N', 'lost', 0.12280701754385964]]

In [38]:
#Solution for part 3 
#Construction of Bayesian Network using pomegrante library 
play     = DiscreteDistribution(prior_prob(df.Ashwin))
bat      = DiscreteDistribution(prior_prob(df.Bat))
result   = ConditionalProbabilityTable(posterior_prob('Ashwin','Bat','Result'),[play , bat])

s1 = State(play, name="Ashwin")
s2 = State(bat, name="Bat")
s3 = State(result, name="Result")

model = BayesianNetwork("Ashwin Assignment ACI")
model.add_states(s1, s2, s3)

model.add_edge(s1, s3)
model.add_edge(s2, s3)

model.bake()

In [39]:
#Solution for part 4
# a)
model.predict_proba({'Ashwin' : 'Y', 'Bat' :'1st'})

array(['Y', '1st',
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "draw" :0.10810810810810831,
            "won" :0.7027027027027024,
            "lost" :0.1891891891891893
        }
    ],
    "frozen" :false
}], dtype=object)

In [40]:
# b) 
model.predict_proba({'Ashwin' : 'Y', 'Bat' :'2nd'})

array(['Y', '2nd',
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "draw" :0.2727272727272728,
            "won" :0.48484848484848486,
            "lost" :0.24242424242424246
        }
    ],
    "frozen" :false
}], dtype=object)

In [41]:
# c)
model.predict_proba({'Ashwin' : 'N', 'Bat' :'2nd'})

array(['N', '2nd',
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "draw" :0.16666666666666677,
            "won" :0.0,
            "lost" :0.833333333333333
        }
    ],
    "frozen" :false
}], dtype=object)

In [42]:
# d) 
model.predict_proba({'Ashwin' : 'N', 'Bat' :'1st'})

array(['N', '1st',
       {
    "class" :"Distribution",
    "dtype" :"str",
    "name" :"DiscreteDistribution",
    "parameters" :[
        {
            "draw" :0.2222222222222223,
            "won" :0.5555555555555554,
            "lost" :0.2222222222222223
        }
    ],
    "frozen" :false
}], dtype=object)

<h3><center> Happy Coding!</center></h3>