**the probability of an email to be spam**



this model basically works on calculating the probability of each word given that a message is normal or spam .

naive beacuse the features are independent of each other

can be used in face detection , digits recognition ,news , spam or not spam

# üìå When to Use Naive Bayes to Train a Model

Naive Bayes is a simple yet powerful probabilistic classifier.  
Use it when your problem matches the conditions below.

---

## ‚úÖ Best Use-Cases for Naive Bayes

### **1. Text Classification (Top Strength)**
Naive Bayes performs exceptionally well on text data:
- Spam detection  
- Sentiment analysis  
- Email filtering  
- News/article classification  

**Why?**  
Because text features (words) are often independent enough for NB to work well.

---

### **2. When Features Are Mostly Independent**
NB assumes features are independent.  
If this is roughly true, the accuracy will be very good.

Example:
- Medical diagnosis using symptoms  
- Simple categorical datasets  

---

### **3. When You Need a Fast Model**
Naive Bayes is extremely fast:
- Very fast to train  
- Very fast to predict  
- Works well in real-time systems  

Examples:
- Chat moderation  
- Spam filtering  
- Real-time text classification  

---

### **4. When You Have Small Training Data**
Naive Bayes works well on:
- Small datasets  
- Low sample size  
- Sparse data  

Because it uses strong assumptions, it avoids overfitting.

---

### **5. High-Dimensional Data**
Great for datasets with thousands of features, especially:
- Bag-of-words  
- TF-IDF vectors  

NB handles high dimensionality easily.

---

### **6. When Features Are Categorical**
NB works well with:
- Categorical inputs  
- Binary features  

Examples:
- Simple fraud detection  
- Customer churn with basic attributes  

---

## ‚ùå When NOT to Use Naive Bayes

### **1. When Features Are Strongly Correlated**
If features depend on each other heavily, NB fails.

Examples:
- Image pixels  
- Financial time-series with correlated features  

---

### **2. Complex Numerical Patterns**
For non-linear or complex numeric data, Naive Bayes is not ideal.  
Better choices:
- SVM  
- Random Forest  
- Neural Networks  

---

## üìù Quick Summary Table

| Condition | Naive Bayes Suitable? |
|----------|------------------------|
| Text data | ‚úÖ Excellent |
| Small dataset | ‚úÖ Good |
| High-dimensional data | ‚úÖ Very good |
| Features independent | ‚úÖ Good |
| Correlated features | ‚ùå No |
| Complex numeric decision boundary | ‚ùå No |

---



In [1]:
import pandas as pd;
import numpy as np;

In [3]:
df = pd.read_csv("titanicNB.csv")
df.head(3)

Unnamed: 0,PassengerId,Name,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Survived
0,1,"Braund, Mr. Owen Harris",3,male,22.0,1,0,A/5 21171,7.25,,S,0
1,2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",1,female,38.0,1,0,PC 17599,71.2833,C85,C,1
2,3,"Heikkinen, Miss. Laina",3,female,26.0,0,0,STON/O2. 3101282,7.925,,S,1


In [6]:
df.drop(['PassengerId','Name','SibSp','Parch','Ticket','Cabin','Embarked'],axis='columns',inplace=True)
df.head()

Unnamed: 0,Pclass,Sex,Age,Fare,Survived
0,3,male,22.0,7.25,0
1,1,female,38.0,71.2833,1
2,3,female,26.0,7.925,1
3,1,female,35.0,53.1,1
4,3,male,35.0,8.05,0


In [7]:
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(sparse_output=False , drop='first' , dtype=np.int32)

In [10]:
sex_encoded = ohe.fit_transform(df[['Sex']])
sex_encoded[:5]

array([[1],
       [0],
       [0],
       [0],
       [1]], dtype=int32)

In [13]:
ohe.get_feature_names_out(['Sex'])

array(['Sex_male'], dtype=object)

In [16]:
encoded_df = pd.DataFrame(sex_encoded,columns=ohe.get_feature_names_out(['Sex']))
encoded_df = pd.concat([df,encoded_df],axis='columns' )
encoded_df.drop(['Sex'],axis='columns',inplace=True)
encoded_df.head()

Unnamed: 0,Pclass,Age,Fare,Survived,Sex_male
0,3,22.0,7.25,0,1
1,1,38.0,71.2833,1,0
2,3,26.0,7.925,1,0
3,1,35.0,53.1,1,0
4,3,35.0,8.05,0,1


In [19]:
targets = encoded_df.Survived
inputs = encoded_df.drop(['Survived'],axis='columns')

In [21]:
inputs.isna().sum()

Pclass        0
Age         177
Fare          0
Sex_male      0
dtype: int64

In [23]:
inputs.Age = inputs.Age.fillna(inputs.Age.mean())
inputs.isna().sum()

Pclass      0
Age         0
Fare        0
Sex_male    0
dtype: int64

In [61]:
from sklearn.model_selection import train_test_split
X_train , X_test , y_train , y_test = train_test_split(inputs,targets,test_size=0.2)


In [62]:
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train,y_train)
round(model.score(X_test,y_test)*100,2)

80.45

In [64]:
y_test

30     0
502    0
602    0
268    1
112    0
      ..
540    1
682    0
341    1
75     0
640    0
Name: Survived, Length: 179, dtype: int64

In [67]:
y_pred = model.predict(X_test)
y_pred

array([0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1,
       0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1,
       1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0,
       0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,
       1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1,
       0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0,
       1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0,
       1, 0, 0])

In [75]:
model.classes_

array([0, 1])

In [78]:
model.predict_proba(X_test)[0:10]  #probabilities for first 10 records

array([[7.35102337e-01, 2.64897663e-01],
       [3.77103200e-01, 6.22896800e-01],
       [6.94808031e-01, 3.05191969e-01],
       [1.94849777e-04, 9.99805150e-01],
       [9.52824598e-01, 4.71754023e-02],
       [9.08419907e-01, 9.15800934e-02],
       [9.08828759e-01, 9.11712407e-02],
       [9.49622277e-01, 5.03777234e-02],
       [1.69132342e-01, 8.30867658e-01],
       [9.52801699e-01, 4.71983015e-02]])

In [68]:
from sklearn.metrics import accuracy_score

In [73]:
round(accuracy_score(y_test , y_pred)*100,2) == round(model.score(X_test,y_test)*100,2)

True