#### About
> Bayes Theorem

Bayes Theorem is a mathematical formula used to calculate conditional probabilities. It is used to find the probability of an event occurring given that another event has already occurred. Bayes theorem is based on the idea of conditional probability, which is the probability of an event happening given that another event has already occurred.

The formula for Bayes Theorem is as follows:

P(A|B) = P(B|A) * P(A) / P(B)

where P(A|B) is the probability of A given that B has occurred, P(B|A) is the probability of B given that A has occurred, P(A) is the prior probability of A occurring, and P(B) is the prior probability of B occurring.

> Bayes theorem on iris dataset

In [4]:
#importing modules
import pandas as pd
import numpy as np

In [7]:
iris_df = pd.read_csv('/home/suraj/Downloads/Iris.csv')

In [8]:
iris_df

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica


In [9]:
# Convert the target variable to a binary classification problem
iris_df['Is_setosa'] = (iris_df['Species'] == 'Iris-setosa').astype(int)


In [18]:
iris_df['Is_setosa']

0      1
1      1
2      1
3      1
4      1
      ..
145    0
146    0
147    0
148    0
149    0
Name: Is_setosa, Length: 150, dtype: int64

In [10]:
# Separate the data into training and testing sets
train_set = iris_df.sample(frac=0.8, random_state=1)
test_set = iris_df.drop(train_set.index)


In [11]:
# Calculate the prior probabilities of the target variable
p_setosa = train_set['Is_setosa'].mean()
p_not_setosa = 1 - p_setosa

In [12]:
print('Prior probabilities:')
print('P(setosa) =', p_setosa)
print('P(not setosa) =', p_not_setosa)

Prior probabilities:
P(setosa) = 0.3333333333333333
P(not setosa) = 0.6666666666666667


In [19]:
# Define a function to calculate the conditional probabilities of the features given the target variable
def calculate_conditional_probabilities(feature, target_value):
    target_subset = train_set[train_set['Is_setosa'] == target_value]
    p_feature_given_target = (target_subset[feature] == 1).mean()
    p_feature_given_not_target = ((1 - target_subset[feature]) == 1).mean()
    return p_feature_given_target, p_feature_given_not_target

# Calculate the conditional probabilities of each feature given the target variable
p_sepal_length_given_setosa, p_sepal_length_given_not_setosa = calculate_conditional_probabilities('SepalLengthCm', 1.0)
p_sepal_width_given_setosa, p_sepal_width_given_not_setosa = calculate_conditional_probabilities('SepalWidthCm', 1.0)
p_petal_length_given_setosa, p_petal_length_given_not_setosa = calculate_conditional_probabilities('PetalLengthCm', 1.0)
p_petal_width_given_setosa, p_petal_width_given_not_setosa = calculate_conditional_probabilities('PetalWidthCm', 1.0)

print('Conditional probabilities:')
print('P(SepalLengthCm=1|setosa) =', p_sepal_length_given_setosa)
print('P(SepalLengthCm=1|not setosa) =', p_sepal_length_given_not_setosa)
print('P(SepalWidthCm=1|setosa) =', p_sepal_width_given_setosa)
print('P(SepalWidthCm=1|not setosa) =', p_sepal_width_given_not_setosa)
print('P(PetalLengthCm=1|setosa) =', p_petal_length_given_setosa)
print('P(PetalLengthCm=1|not setosa) =', p_petal_length_given_not_setosa)
print('P(PetalWidthCm=1|setosa) =', p_petal_width_given_setosa)
print('P(PetalWidthCm=1|not setosa) =', p_petal_width_given_not_setosa)


Conditional probabilities:
P(SepalLengthCm=1|setosa) = 0.0
P(SepalLengthCm=1|not setosa) = 0.0
P(SepalWidthCm=1|setosa) = 0.0
P(SepalWidthCm=1|not setosa) = 0.0
P(PetalLengthCm=1|setosa) = 0.0
P(PetalLengthCm=1|not setosa) = 0.0
P(PetalWidthCm=1|setosa) = 0.0
P(PetalWidthCm=1|not setosa) = 0.0
