## Iris Dataset
* The Iris dataset is one of the most well-known and commonly used datasets in the field of machine learning and statistics.
* In this article, we will explore the Iris dataset in deep and learn about its uses and applications.
* The Iris dataset consists of 150 samples of iris flowers from three different species: Setosa, Versicolor, and Virginica.
* Each sample includes four features: sepal length, sepal width, petal length, and petal width.
* It was introduced by the British biologist and statistician Ronald Fisher in 1936 as an example of discriminant analysis.
* The Iris dataset is often used as a beginner's dataset to understand classification and clustering algorithms in machine learning.
* By using the features of the iris flowers, researchers and data scientists can classify each sample into one of the three species

In [1]:
from sklearn import datasets

# List all dataset functions in sklearn.datasets
dataset_list = [func for func in dir(datasets) if func.startswith(('load_', 'fetch_'))]

# Print each dataset function on a new line
for dataset in dataset_list:
    print(dataset)


fetch_20newsgroups
fetch_20newsgroups_vectorized
fetch_california_housing
fetch_covtype
fetch_kddcup99
fetch_lfw_pairs
fetch_lfw_people
fetch_mldata
fetch_olivetti_faces
fetch_openml
fetch_rcv1
fetch_species_distributions
load_boston
load_breast_cancer
load_diabetes
load_digits
load_files
load_iris
load_linnerud
load_sample_image
load_sample_images
load_svmlight_file
load_svmlight_files
load_wine


## Take iris data

In [2]:
from sklearn.datasets import load_iris
import pandas as pd


iris = load_iris()       # Load the Iris dataset
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [3]:
from sklearn.datasets import load_iris
import pandas as pd


iris = load_iris()       # Load the Iris dataset
X = iris.data            # Independent Variables  (Features)
y = iris.target          # Dependent Variables    (Target Lables)
print(X)
print("  ")
print(y)

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.6 1.4 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3.1 4.

In [4]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print("data set trained")

data set trained


## •import naive bays (GaussianNB ,bernoulliNB, MultinationalNB)

In [5]:
from sklearn.naive_bayes import GaussianNB, BernoulliNB, MultinomialNB
from sklearn.metrics import accuracy_score

## Predict test

In [6]:
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred_gnb = gnb.predict(X_test)
print(y_pred_gnb)

[1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0]


In [7]:
bnb = BernoulliNB()
bnb.fit(X_train, y_train)
y_pred_bnb = bnb.predict(X_test)
print(y_pred_bnb)

[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]


In [8]:
mnb = MultinomialNB()
mnb.fit(X_train, y_train)
y_pred_mnb = mnb.predict(X_test)
print(y_pred_mnb)

[1 0 2 1 1 0 1 2 1 1 1 0 0 0 0 1 2 1 1 2 0 1 0 2 1 2 2 2 0 0]


## Acurracy check

In [9]:
accuracy_gnb = accuracy_score(y_test, y_pred_gnb)
print(f"GaussianNB Accuracy: {accuracy_gnb:.2f}")

GaussianNB Accuracy: 1.00


In [10]:
accuracy_bnb = accuracy_score(y_test, y_pred_bnb)
print(f"BernoulliNB Accuracy: {accuracy_bnb:.2f}")

BernoulliNB Accuracy: 0.30


In [11]:
accuracy_mnb = accuracy_score(y_test, y_pred_mnb)
print(f"MultinomialNB Accuracy: {accuracy_mnb:.2f}")

MultinomialNB Accuracy: 0.90


### Understanding Theory & Formula
### 1. Naïve Bayes Theorem:\[
P(A|B)ac{P(B X es P/)}{PB
\]
Wher
- \( |B) \) is the probability of ss A \) given evic( B .
- \(B|A) \) is the likelihood of evidence givena\( A).
- P(A) \) is the prior probability ol \( \).
- P(B) \) is the prior probability of evidence.

---

### 2. Gaussian Naïve Bayes
- Assumes features are normally distributed.
- Uses the Gaussian (Normal) Distribution Formula:[
P(x|C) c/√rt*π*𝜎ma^}} ^ac{(x μmu)/}𝜎ma^}\]- Where:

- μ = Mean of the feature.
- 𝜎^2  = Variance of the featur  
- = Variance of the feature.

---

### 3. Bernoulli Naïve Bayes (BernoulliNB)
- Used for binary (0/1eatures.
- Probabilit formul:
\[
P(x|C p (1 - p)^{(1 - x)}
\]

Where:
- \( p \) = Probability of feature being 1.

---

### 4. Multinomial Naïve Bayes (MultinomialNB)
- Used for t


eclassificacounts/f wor. For.m

\|C) =  )!}{(x_1! \cdot x_2! \cdt ..    cdot x_n!)}imes \pr{i=1}^{n} P(w_i|C)^{x_i}
\]
Wh:
- \( P(|\) is the probability of word \( w_i \) in class \( C \).

---

### 5. Manual Calculons (By Hanement by hand:
1./Cte Priors: \( ass) = \frac{\text{samples in class}}{\text{total samples}} \)
2. Compute Mean and Variance (For GaussianNB).
3. Apply Bayes' Formula to get posterior probabilities.
4. Select Class with Maximum Probability.