# PART A: Selecting the right ML Model
© 2023 Zaka AI, Inc. All Rights Reserved.

---

Please report your choices for part A in the code cells below.

**NOTE:**
* For the algo_type, you answer by 'supervised' or 'unsupervised'.
* For the task_type, you answer by 'regression', 'classification', 'clustering', or 'association analysis'
* For the suggested algorithms you pick one or more from the following list AS THEY APPEAR. ['linear regression', 'multilinear regression', 'decision tree Classifier', 'svm', 'logistic regression', 'knn', 'random forest classifier', 'naive bayes', 'k-means clustering', 'apriori']

Even if you select one algorithm, make sure to write it inside a list (as written below)

Sample answer:

algo_type_x = 'supervised' </br>
task_type_x = 'classification' </br>
algos_x = ['naive bayes']


In [None]:
# For the Telecom Users dataset:
algo_type_1 = 'supervised'
task_type_1 = 'regression'
algos_1 = ['multilinear regression']

In [None]:
#For the Mobile Prices dataset:
algo_type_2 = 'supervised'
task_type_2 = 'classification'
algos_2 = ['random forest classifier','knn']

In [None]:
#For the Iris dataset:
algo_type_3 = 'supervised'
task_type_3 = 'classification'
algos_3 = ['naive bayes','svm']

In [None]:
#For the student performance dataset:
algo_type_4 = 'unsupervised'
task_type_4 = 'clustering'
algos_4 = ['k-means clustering']

In [None]:
#For the stroke dataset, the algorithm type is:
algo_type_5 = 'supervised'
task_type_5 = 'classification'
algos_5 = ['logistic regression','decision tree Classifier']


# PART B: Gaussian Naive Bayes Classifier

© 2023 Zaka AI, Inc. All Rights Reserved.

---



## **Case Study:** Iris Dataset

**Objective:** The objective of this challenge is to make you know about Naive Bayes applied on Numerical Values.

**DataSet Columns:**<br>
*	 Petal Height
*  Petal Width
*  Sepal Height
*  Sepal Width
*  Target: The kind of the Iris flower (Virginica, Setosa, Versicolor)

# Importing Libraries

In [None]:
import numpy as np
import pandas as pd
import math

In [None]:
import warnings
warnings.filterwarnings('ignore')

# Loading the Dataset

Load the dataset, and make sure your columns are named as follow: 'Sepal Height', 'Sepal Width', 'Petal Height', 'Petal Width', and 'Target'

In [None]:
df = pd.read_csv('/content/iris.csv')
df.columns = ['Sepal Height', 'Sepal Width', 'Petal Height', 'Petal Width','Target']
df.head()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Sepal Height  150 non-null    float64
 1   Sepal Width   150 non-null    float64
 2   Petal Height  150 non-null    float64
 3   Petal Width   150 non-null    float64
 4   Target        150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


## Data Preprocessing

We will Change the string values of the Target column to numerical.

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(df['Target'])
df['Target'] = le.transform(df['Target'])
df.head()

Unnamed: 0,Sepal Height,Sepal Width,Petal Height,Petal Width,Target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


Make sure we have no null values

In [None]:
df.isnull().sum()

Sepal Height    0
Sepal Width     0
Petal Height    0
Petal Width     0
Target          0
dtype: int64

# Naive Bayes

## Finding different Classes

First, let's try to find how many classes we have in our dataset, although it should always appear in the description of your dataset.

In [None]:
targets = len(df['Target'].unique())
print(targets)

3


SO we have 3 classes of flowers.

Remember the basic formula that we used for Naive Bayes. <br>
<img src="https://equatio-api.texthelp.com/svg/%5C%20P(%5Ctextcolor%7B%232B7FBB%7D%7BClass%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BFeatures%7D)%3D%5Cfrac%7BP(%5Ctextcolor%7B%23E94D40%7D%7BFeatures%7D%7C%5Ctextcolor%7B%232B7FBB%7D%7BClass%7D)%5Ccdot%20P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass%7D%5Cright)%7D%7BP(%5Ctextcolor%7B%23E94D40%7D%7BFeatures%7D)%7D" alt="P of open paren C l a. s s divides F of e a. t u r e s close paren equals the fraction with numerator P of open paren F of e a. t u r e s divides C l a. s s close paren times P of open paren C l a. s s close paren and denominator P of F of e a. t u r e s">

Since we have 3 classes, and 4 features, we need to calculate the following probabilities.<br>
<img src="https://equatio-api.texthelp.com/svg/P(%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BF1%2CF2%2CF3%2CF4%7D)" alt="P of open paren C l a. s s sub 0 divides F of 1 comma F of 2 comma F of 3 comma F of 4 close paren"> <br>
<img src="https://equatio-api.texthelp.com/svg/P(%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BF1%2CF2%2CF3%2CF4%7D)" alt="P of open paren C l a. s s sub 1 divides F of 1 comma F of 2 comma F of 3 comma F of 4 close paren"> <br>
<img src="https://equatio-api.texthelp.com/svg/P(%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BF1%2CF2%2CF3%2CF4%7D)" alt="P of open paren C l a. s s sub 2 divides F of 1 comma F of 2 comma F of 3 comma F of 4 close paren">


So in reality we need to calculate the following:

<img src="https://equatio-api.texthelp.com/svg/P_0%3DP(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_1%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_2%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_3%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_4%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)" alt="P sub 0 equals P of open paren F sub 1 divides C l a. s s sub 0 close paren P of open paren F sub 2 divides C l a. s s sub 0 close paren P of open paren F sub 3 divides C l a. s s sub 0 close paren P of open paren F sub 4 divides C l a. s s sub 0 close paren"><img src="https://equatio-api.texthelp.com/svg/P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%5Cright)" alt="P of open paren C l a. s s sub 0 close paren"><br><img src="https://equatio-api.texthelp.com/svg/P_1%3DP(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_1%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_2%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_3%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_4%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)" alt="P sub 1 equals P of open paren F sub 1 divides C l a. s s sub 1 close paren P of open paren F sub 2 divides C l a. s s sub 1 close paren P of open paren F sub 3 divides C l a. s s sub 1 close paren P of open paren F sub 4 divides C l a. s s sub 1 close paren"><img src="https://equatio-api.texthelp.com/svg/P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%5Cright)" alt="P of open paren C l a. s s sub 1 close paren"><br>
<img src="https://equatio-api.texthelp.com/svg/P_2%3DP(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_1%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_2%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_3%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_4%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%5Cright)" alt="P sub 2 equals P of open paren F sub 1 divides C l a. s s sub 2 close paren P of open paren F sub 2 divides C l a. s s sub 2 close paren P of open paren F sub 3 divides C l a. s s sub 2 close paren P of open paren F sub 4 divides C l a. s s sub 2 close paren P of open paren C l a. s s sub 2 close paren">



We see which one is the greatest, and based on that we assign the class.

Those probabilities will be approximated using a distribution.
In this example, we will use the Gaussien Distribution.

## Gaussian Probability Density Function

We recall that teh Gaussien Probability density function is given by:
<br>
<img src="https://equatio-api.texthelp.com/svg/f%5Cleft(x%5Cright)%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%5Ctextcolor%7B%238D44AD%7D%7B%5Csigma%7D%7D%5Cexp%5Cleft%5C%7B-%5Cfrac%7B%5Cleft(x-%5Ctextcolor%7B%233697DC%7D%7Bmean%7D%5Cright)%5E2%7D%7B2%5Ctextcolor%7B%238D44AD%7D%7B%5Csigma%7D%5E2%7D%5Cright%5C%7D" alt="f of x equals 1 over the square root of 2 pi sigma the exp of open brace negative the fraction with numerator open paren x minus m e a. n close paren squared and denominator 2 sigma squared close brace">

In [None]:
def probability (x, mean, sigma):
  prob = (1 / (sigma * math.sqrt(2 * math.pi))) * math.exp(-0.5 * ((x - mean) / sigma) ** 2)
  return prob

## Naive Bayes Implementation

Let's implement Naive Bayes function and then test it with a prediction

In [None]:
def naive_bayes (df, features, target_name):
  # A function that receives a dataframe, a feature vector, and a target name, and retruns the most likely class for the target_name given the feature vector and the dataframe.
  # df: the dataframe from which we derive the probabilities.
  # features: a 1 dimensional numpy array that contains the features of 1 sample we want to test
  # target_name: the column name inside the dataframe df that represents the taget we want to predict.
  # Refer to the code cell below to see how this function will be used and accordingly code it.
  n_examples = len(df)
  n_features = features.shape[0]

  target_0 = df[df[target_name] == 0]
  target_1 = df[df[target_name] == 1]
  target_2 = df[df[target_name] == 2]

  p_target_0 = len(target_0)/float(n_examples)
  p_target_1 = len(target_1)/float(n_examples)
  p_target_2 = len(target_2)/float(n_examples)

  p_feature_given_target_0 = []
  p_feature_given_target_1 = []
  p_feature_given_target_2 = []

  for i in range (n_features):
    p_f_given_t_0 = probability(features[i],np.mean(target_0.iloc[:,i]),np.std(target_0.iloc[:,i]))
    p_feature_given_target_0.append(p_f_given_t_0)

    p_f_given_t_1 = probability(features[i],np.mean(target_1.iloc[:,i]),np.std(target_1.iloc[:,i]))
    p_feature_given_target_1.append(p_f_given_t_1)

    p_f_given_t_2 = probability(features[i],np.mean(target_2.iloc[:,i]),np.std(target_2.iloc[:,i]))
    p_feature_given_target_2.append(p_f_given_t_2)

  p0 = np.prod(p_feature_given_target_0) * p_target_0
  p1 = np.prod(p_feature_given_target_1) * p_target_1
  p2 = np.prod(p_feature_given_target_2) * p_target_2


  return np.argmax([p0, p1, p2])


In [None]:
features = np.array([4.9, 3.0,	1.4,	0.2])
result = naive_bayes(df, features, 'Target')
print("This Iris flower is in the class ",result)

This Iris flower is in the class  0


See the performance of our NB model

Now here we will splot our data between 2 sets:

*   One from which the Naive Bayes Model will take the probabilities. (The **train** set)
*   one that it hasn't seen before to test on it (The **test** set)
Split your dataset as 80/20 with a random_state of 0

In [None]:
from sklearn.model_selection import train_test_split
x = df.drop('Target',axis=1)
y = df['Target']
x_train, x_test, y_train, y_test = train_test_split (x , y ,test_size = 0.2 , random_state = 0)

Compute the accuracy of Naive Bayes on the test set.

In [None]:
train_dataFrame = pd. concat([x_train,y_train],axis=1) #Concatenate your x_train and y_train into 1 dataframe because you will pass it to the naive_bayes function you coded.
def accuracy(df, x_test, y_test):
    y_pred = [naive_bayes(df, features, 'Target') for features in x_test.values]
    accuracy = sum(y_pred == y_test) / len(y_test)
    return accuracy * 100
accuracy = accuracy(train_dataFrame, x_test, y_test)
print("Accuracy of our Naive Bayes is:", accuracy, "%")

Accuracy of our Naive Bayes is: 96.66666666666667 %


## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. **Download your notebook as .ipynb file and put it in a zip folder along with the PART A explanation, then upload that folder to the platform**