<center><h2>Artificial and Computational Intelligence (Assignment - 2)</h2></center>

## Problem Statement

As part of the 2nd Assignment, we'll implement Bayesian Networks and also learn to use the pomegranate library.

You are required to create a bayesian network model which would help you predict the probability. The detailed problem description is attached as a PDF as a part of this assignment along with the marking scheme.  

### What is a Bayesian Network ?

A Bayesian network, Bayes network, belief network, decision network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). 

Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. 

### Dataset

The dataset can be downloaded from https://drive.google.com/drive/folders/1oMtKmmvPkN4O8DmrHMJe6M8CbB93Z5kw .You can access it only using your BITS IDs. Also, the same dataset is attached along with the assignment. 

#### Dataset Description
##### Sample Tuple

Y	won	5wickets	lost	2nd	vWest_Indies	Home	6-Nov-11

##### Explanation
- The first column represents if Ashwin was in the playing 11 or not. 
- The second column represents the Result of the match . win indicates India won the match.
- The third column represents the Margin of victory / losss.
- The fourth column represents the results of the toss. won indicates India won the toss. 
- The fifth column represents the batting order. If India batted 1st or 2nd. 
- The sixth column represents the opponent.
- The seventh column represents the location of the match. If the match was held in Home(India) or away. 
- The last column represents the start date of the match.


### Evaluation
We wish to evaluate based on 
- coding practices being followed
- commenting to explain the code and logic behind doing something
- your understanding and explanation of data
- how good the model would perform

In [1]:
# BITS RollNumbers , Names. 
# 2018AB04638, Abhilash A
# 2018AB04575, Jeevan George Antony
# 2018AB04553, Nandu Raj
# 2018AB04573, Chithira P

In [2]:
#Install libraries
!pip install pomegranate==0.11.2
!pip install pandas

#Import libraries
import pandas as pd
from pomegranate import *



In [3]:
#Read data
data = pd.read_excel('India_Test_Stats.xlsx')
data.groupby(['Result', 'Toss']).size()
data

Unnamed: 0,Ashwin,Result,Margin,Toss,Bat,Opposition,Location,Start Date
0,Y,won,5 wickets,lost,2nd,v West Indies,Home,2011-11-06
1,Y,won,inns & 15 runs,won,1st,v West Indies,Home,2011-11-14
2,Y,draw,-,lost,2nd,v West Indies,Home,2011-11-22
3,Y,lost,122 runs,lost,2nd,v Australia,Away,2011-12-26
4,Y,lost,inns & 68 runs,won,1st,v Australia,Away,2012-01-03
5,N,lost,inns & 37 runs,lost,1st,v Australia,Away,2012-01-13
6,Y,lost,298 runs,lost,2nd,v Australia,Away,2012-01-24
7,Y,won,inns & 115 runs,won,1st,v New Zealand,Home,2012-08-23
8,Y,won,5 wickets,lost,2nd,v New Zealand,Home,2012-08-31
9,Y,won,9 wickets,won,1st,v England,Home,2012-11-15


In [4]:
#Pre-process data (Whatever you feel might be required)

#Data Description
#Columns and their definitions:

1)Ashwin : Has Ashwin played in the match. Domain : (Y,N) <br>
2)Result : The final result of the match. Domain : (won, lost) <br>
3)Margin : The margin of victory/loss. Domain : (X wickets, X runs, inns & X runs) <br>
4)Toss : If India won the toss. Domain : (won, lost) <br>
5)Bat : When did India Bat. Domain : (1st, 2nd) <br>
6)Opposition : Who was the opposite team India played against. Domain : (v X) <br>
7)Location : Was the match played in India or outside. Domain : (Home, Away) <br>
8)Start Date : The date on which match started. <br>



In [5]:
#Construction of Bayesian Network goes here 
def bayesian_network(variable_conditionality):
    s1 = State(variable_conditionality['test_location'], name="test_location")
    s2 = State(variable_conditionality['ashwin_playing'], name="ashwin_playing")
    s3 = State(variable_conditionality['toss'], name="toss")
    s4 = State(variable_conditionality['result'], name="result")
    s5 = State(variable_conditionality['batting'], name="batting")
    
    model = BayesianNetwork("Team Selection")
    model.add_states(s1, s2, s3, s4, s5)
    
    model.add_edge(s1, s2)
    model.add_edge(s3, s5)
    model.add_edge(s5, s4)

    model.bake()
    return model

In [6]:
#Solution for part 1 
def get_probability(variable_series):
    prob_dict = dict()
    unique_values = variable_series.unique()
    counts = variable_series.value_counts()
    for i in unique_values:
        probability = counts[i]/len(variable_series)
        prob_dict[i] = probability
    return prob_dict
prob_dict = get_probability(data['Result'])
print(prob_dict)

{'won': 0.5529411764705883, 'draw': 0.18823529411764706, 'lost': 0.25882352941176473}


In [7]:
#Solution for part 2 
def get_conditional_probability(independent_series, target_series):
    columns = list(independent_series.columns)
    #print(columns)
    data = pd.concat([independent_series,target_series],axis = 1)
    independent_prob = data.groupby(columns).size().div(len(data))
    #print(independent_prob)
    columns = columns + [target_series.name]
    #print(columns)
    dependend_prob = data.groupby(columns).size().div(len(data))
    #print(dependend_prob)
    conditional_prob = dependend_prob.div(independent_prob, axis=0)
    #print(condotional_prob)
    return conditional_prob.reset_index().values.tolist()
conditional_probability = get_conditional_probability(data[['Location']], data['Ashwin'])
print(conditional_probability)

[['Away', 'N', 0.35714285714285715], ['Away', 'Y', 0.6428571428571428], ['Home', 'Y', 1.0]]


In [8]:
#Solution for part 3 
test_location = DiscreteDistribution(get_probability(data['Location']))
toss = DiscreteDistribution(get_probability(data['Toss']))
batting = ConditionalProbabilityTable(get_conditional_probability(data[['Toss']], data['Bat']), [toss])
con_prob = get_conditional_probability(data[['Location']], data['Ashwin'])
print(con_prob)
con_prob.append(['Home', 'N', 0.0]) #There is no document with fields "Location"="Home" and "Ashwin"="N" which throwing error on ConditionalProbabilityTable
ashwin_playing = ConditionalProbabilityTable(con_prob, [test_location])
result = ConditionalProbabilityTable(get_conditional_probability(data[['Bat']], data['Result']), [batting])

variable_conditionality = {'test_location':test_location,'toss':toss,'batting':batting,'ashwin_playing':ashwin_playing,'result':result}
model = bayesian_network(variable_conditionality)

[['Away', 'N', 0.35714285714285715], ['Away', 'Y', 0.6428571428571428], ['Home', 'Y', 1.0]]


In [9]:
#Solution for part 4
# a)
probabity_predicted = model.probability([None, 'Y', None, 'won', '2nd'])
print('Solution of 4.a India winning, batting 2nd, Ashwin playing : ' , probabity_predicted)

# b) 
probabity_predicted = model.probability([None, 'N', None, 'won', '2nd'])
print('\n\nSolution of 4.b India winning, batting 2nd, Ashwin not playing : ' , probabity_predicted)

# c)
probabity_predicted = model.probability([None, 'Y', None, 'lost', '2nd'])
print('\n\nSolution of 4.c India losing, batting 2nd, Ashwin playing : ' , probabity_predicted)

# d) 
probabity_predicted = model.probability([None, 'N', None, 'lost', '2nd'])
print('\n\nSolution of 4.d India losing, batting 2nd, Ashwin not playing : ' , probabity_predicted)

Solution of 4.a India winning, batting 2nd, Ashwin playing :  0.41025641025641024


Solution of 4.b India winning, batting 2nd, Ashwin not playing :  0.41025641025641024


Solution of 4.c India losing, batting 2nd, Ashwin playing :  0.33333333333333337


Solution of 4.d India losing, batting 2nd, Ashwin not playing :  0.33333333333333337


In [10]:
#Feel free to add cells where necessary. 

<h3><center> Happy Coding!</center></h3>