<center><h2>Artificial and Computational Intelligence (Assignment - 2)</h2></center>

## Problem Statement

As part of the 2nd Assignment, we'll implement Bayesian Networks and also learn to use the pomegranate library.

You are required to create a bayesian network model which would help you predict the probability. The detailed problem description is attached as a PDF as a part of this assignment along with the marking scheme.  

### What is a Bayesian Network ?

A Bayesian network, Bayes network, belief network, decision network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). 

Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. 

### Dataset

The dataset can be downloaded from https://drive.google.com/drive/folders/1oMtKmmvPkN4O8DmrHMJe6M8CbB93Z5kw .You can access it only using your BITS IDs. Also, the same dataset is attached along with the assignment. 

#### Dataset Description
##### Sample Tuple

Y	won	5wickets	lost	2nd	vWest_Indies	Home	6-Nov-11

##### Explanation
- The first column represents if Ashwin was in the playing 11 or not. 
- The second column represents the Result of the match . win indicates India won the match.
- The third column represents the Margin of victory / losss.
- The fourth column represents the results of the toss. won indicates India won the toss. 
- The fifth column represents the batting order. If India batted 1st or 2nd. 
- The sixth column represents the opponent.
- The seventh column represents the location of the match. If the match was held in Home(India) or away. 
- The last column represents the start date of the match.


### Evaluation
We wish to evaluate based on 
- coding practices being followed
- commenting to explain the code and logic behind doing something
- your understanding and explanation of data
- how good the model would perform

In [2]:
# 2018ab04535 , Nitesh
# 2018ab04542 , JYOTSANA
# 2018ab04701, KRATIKA GUPTA

In [3]:
#Import libraries
import math
import pandas as pd
from pomegranate import *

In [4]:
df = pd.read_excel('India_Test_Stats.xlsx')

In [5]:
df.dropna()
print(df.describe())
print('----------------------------------------------------------')
print(df.info())

       Ashwin Result Margin  Toss  Bat   Opposition Location  \
count      85     85     85    85   85           85       85   
unique      2      3     61     2    2            8        2   
top         Y    won      -  lost  1st  v Australia     Home   
freq       70     47     16    45   46           20       43   
first     NaN    NaN    NaN   NaN  NaN          NaN      NaN   
last      NaN    NaN    NaN   NaN  NaN          NaN      NaN   

                 Start Date  
count                    85  
unique                   85  
top     2019-11-22 00:00:00  
freq                      1  
first   2011-11-06 00:00:00  
last    2019-11-22 00:00:00  
----------------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 85 entries, 0 to 84
Data columns (total 8 columns):
Ashwin        85 non-null object
Result        85 non-null object
Margin        85 non-null object
Toss          85 non-null object
Bat           85 non-null object
Opposition    8

#Data Description
Columns which dependent on one another are:

Location -> Ashwin playing
Toss -> Batting
Ashwin playing -> Result
Batting -> Result

In [6]:
def priorProbability (df,column):
    prior_prob =dict()
    total_count = df[column].unique()
    for value in df[column].unique() : 
        prior_prob[value] = 1/len(df[column].unique())
    return prior_prob

In [7]:
def conditionalProbability (df, B, A): 
    totalGroupCount = df.groupby(B)[A].count()
    dataframe = lambda g : g[A].value_counts()/len(g[A])
    conditional_prob= df.groupby(B).apply(dataframe).reset_index()
     
    uniqueDependentVal = list()
    allColumn = list()
    allColumn = B
    allColumn.append(str(A)) 
    for column in allColumn:
        uniqueDependentVal.append(df[column].unique().tolist())
    mux = pd.MultiIndex.from_product(uniqueDependentVal)
    conditional_prob.rename(columns={conditional_prob.filter(regex='level_.*').columns[0]: A, A: "Prob"},inplace=True)
    return conditional_prob.set_index(allColumn).reindex(mux, fill_value=0).reset_index().values.tolist()

In [8]:
Location = DiscreteDistribution(priorProbability(df,'Location'))
print("Prior probability of Location ", Location.parameters[0])

Prior probability of Location  {'Home': 0.5, 'Away': 0.5}


In [9]:
Toss = DiscreteDistribution(priorProbability(df,'Toss'))
print("Prior probability of Toss ",Toss.parameters[0])

Prior probability of Toss  {'lost': 0.5, 'won': 0.5}


In [10]:
AshwinPlaying = ConditionalProbabilityTable(conditionalProbability(df,['Location'],'Ashwin'),[Location])
print("Conditional probability of Location on Ashwin playing ",AshwinPlaying.parameters[0])

Conditional probability of Location on Ashwin playing  [['Home', 'Y', 1.0], ['Home', 'N', 0.0], ['Away', 'Y', 0.6428571428571429], ['Away', 'N', 0.35714285714285715]]


In [11]:
Batting = ConditionalProbabilityTable(conditionalProbability(df,['Toss'],'Bat'),[Toss])
print("Conditional probability of Toss on Batting order ",Batting.parameters[0])

Conditional probability of Toss on Batting order  [['lost', '2nd', 0.7777777777777778], ['lost', '1st', 0.2222222222222222], ['won', '2nd', 0.1], ['won', '1st', 0.9]]


In [12]:
Result = ConditionalProbabilityTable(conditionalProbability(df,['Bat', 'Ashwin'],'Result'),[Batting,AshwinPlaying])
print("Conditional probability of Toss and Ashwin playing on Match's result ",Result.parameters[0])

Conditional probability of Toss and Ashwin playing on Match's result  [['2nd', 'Y', 'won', 0.48484848484848486], ['2nd', 'Y', 'draw', 0.2727272727272727], ['2nd', 'Y', 'lost', 0.24242424242424243], ['2nd', 'N', 'won', 0.0], ['2nd', 'N', 'draw', 0.16666666666666666], ['2nd', 'N', 'lost', 0.8333333333333334], ['1st', 'Y', 'won', 0.7027027027027027], ['1st', 'Y', 'draw', 0.10810810810810811], ['1st', 'Y', 'lost', 0.1891891891891892], ['1st', 'N', 'won', 0.5555555555555556], ['1st', 'N', 'draw', 0.2222222222222222], ['1st', 'N', 'lost', 0.2222222222222222]]


In [13]:
d1 = State(Location, name="location")
d2 = State(Toss, name="toss")
d3 = State(AshwinPlaying, name="ashwinPlaying")
d4 = State(Batting, name="bating")
d5 = State(Result, name="result")

In [14]:
# Building the Bayesian Network
network = BayesianNetwork("Solving the ashwin selection probelm With Bayesian Networks")
network.add_states(d1, d2, d3,d4,d5)
network.add_edge(d1, d3)
network.add_edge(d2, d4)
network.add_edge(d4, d5)
network.add_edge(d3, d5)
network.bake()

In [15]:
#a.India winning, batting 2nd, Ashwin playing, given match happening in new New Zealand (location:Away)
beliefs = network.predict_proba({'location':'Away','bating': '2nd','ashwinPlaying':'Y'})
print("P(result=Won|bat=2nd,ashwinPlaying='Y') "+ str(beliefs[4].parameters[0]['won']))

P(result=Won|bat=2nd,ashwinPlaying='Y') 0.48484848484848475


In [16]:
#b. India winning, batting 2nd, Ashwin not playing, given match happening in new New Zealand (location:Away)
beliefs = network.predict_proba({'location':'Away','bating': '2nd','ashwinPlaying':'N'})
print("P(result=Won|bat=2nd,ashwinPlaying='N') "+ str(beliefs[4].parameters[0]['won']))

P(result=Won|bat=2nd,ashwinPlaying='N') 0.0


In [17]:
#c. India losing, batting 2nd, Ashwin playing, given match happening in new New Zealand (location:Away)
beliefs = network.predict_proba({'location':'Away','bating': '2nd','ashwinPlaying':'Y'})
print("P(result=Lost|bat=2nd,ashwinPlaying='Y') "+ str(beliefs[4].parameters[0]['lost']))

P(result=Lost|bat=2nd,ashwinPlaying='Y') 0.24242424242424246


In [18]:
#d. India losing, batting 2nd, Ashwin not playing, given match happening in new New Zealand (location:Away)
beliefs = network.predict_proba({'location':'Away','bating': '2nd','ashwinPlaying':'N'})
print("P(result=Lost|bat=2nd,ashwinPlaying='N') "+ str(beliefs[4].parameters[0]['lost']))

P(result=Lost|bat=2nd,ashwinPlaying='N') 0.8333333333333329


<h3><center> Happy Coding!</center></h3>