# Naive Bayes Algorithm in Machine Learning

## Introduction to Naive Bayes
Naive Bayes is a probabilistic machine learning algorithm based on Bayes' Theorem. It is widely used for classification tasks due to its simplicity and effectiveness. Despite its "naive" assumption that features are independent of each other, it often performs well in practice.

### Bayes' Theorem
Bayes' Theorem provides a way to calculate the probability of a hypothesis (H) given evidence (E):

\[
P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}
\]

Where:
- **P(H|E):** Posterior probability (the probability of hypothesis H given evidence E).
- **P(E|H):** Likelihood (the probability of evidence E given hypothesis H).
- **P(H):** Prior probability of hypothesis H.
- **P(E):** Marginal probability of evidence E.

## Naive Assumption
The "naive" aspect of Naive Bayes assumes that all features are independent of each other given the class label. While this assumption may not hold in real-world scenarios, it simplifies calculations and works surprisingly well in many cases.

## Types of Naive Bayes Classifiers
1. **Gaussian Naive Bayes:** Assumes that the data follows a Gaussian (normal) distribution.
2. **Multinomial Naive Bayes:** Suitable for discrete data like word counts in text classification.
3. **Bernoulli Naive Bayes:** Used for binary data, where features are binary (e.g., 0 or 1).

## Naive Bayes Algorithm
1. **Calculate Prior Probabilities:** Compute the probability of each class in the dataset.
2. **Calculate Likelihood:** For each feature, calculate the conditional probability of the feature value given the class.
3. **Apply Bayes' Theorem:** Combine the prior and likelihood to calculate the posterior probability for each class.
4. **Predict the Class:** Assign the class label with the highest posterior probability.

## Steps in Naive Bayes Classification
1. Prepare the dataset and preprocess the features.
2. Calculate the prior probabilities for each class.
3. Compute the likelihood for each feature.
4. Use Bayes' Theorem to calculate the posterior probabilities.
5. Make predictions based on the class with the highest posterior probability.

## Use Cases of Naive Bayes
1. **Text Classification:** Spam detection, sentiment analysis, and news categorization.
2. **Medical Diagnosis:** Predicting diseases based on symptoms.
3. **Document Categorization:** Automatically tagging or grouping documents.
4. **Recommendation Systems:** Suggesting products or services.

## Conclusion
Naive Bayes is a powerful yet simple algorithm, especially effective for text classification and problems with high-dimensional data. Understanding its assumptions and limitations helps in leveraging it effectively in real-world scenarios.

![Screenshot (8124).png](attachment:56bcfd58-a410-47bc-aa27-8e0058b509c3.png)

![Screenshot (8126).png](attachment:d07811c3-6b4b-4514-aec1-c869b8439c75.png)

![Screenshot (8127).png](attachment:172205c1-5ffb-4019-9014-df07a5d04686.png)

![Screenshot (8128).png](attachment:52e6dd85-cb7c-4a16-9c74-2a29831729a4.png)

In [5]:
import pandas as pd

In [13]:
data = pd.read_csv('kyphosis.csv')

In [16]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 81 entries, 0 to 80
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Kyphosis  81 non-null     object
 1   Age       81 non-null     int64 
 2   Number    81 non-null     int64 
 3   Start     81 non-null     int64 
dtypes: int64(3), object(1)
memory usage: 2.7+ KB


In [18]:
x = data.drop('Kyphosis' , axis=1)

In [22]:
x.head()

Unnamed: 0,Age,Number,Start
0,71,3,5
1,158,3,14
2,128,4,5
3,2,5,1
4,1,4,15


In [24]:
y = data['Kyphosis']
y.head()

0     absent
1     absent
2    present
3     absent
4     absent
Name: Kyphosis, dtype: object

In [26]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y, test_size=0.3)

In [28]:
from sklearn.naive_bayes import GaussianNB
NB = GaussianNB()
NB.fit(x_train , y_train)

In [32]:
pred = NB.predict(x_test)

In [34]:
pred

array(['absent', 'absent', 'absent', 'absent', 'absent', 'absent',
       'absent', 'absent', 'absent', 'absent', 'absent', 'absent',
       'absent', 'absent', 'present', 'absent', 'absent', 'absent',
       'absent', 'absent', 'absent', 'absent', 'absent', 'absent',
       'absent'], dtype='<U7')

In [36]:
y_test

61    present
47     absent
64     absent
13     absent
1      absent
71     absent
79    present
52    present
29     absent
78     absent
28     absent
70     absent
41     absent
12     absent
21    present
58     absent
24    present
49     absent
19     absent
72     absent
73     absent
55     absent
75     absent
80     absent
35     absent
Name: Kyphosis, dtype: object

In [38]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, pred)

0.84

In [42]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test , pred)

array([[20,  0],
       [ 4,  1]], dtype=int64)