 ## 🎯 AIM:
##### Build a spam classifier that can detect whether a given message is SPAM or HAM (not spam) using Naive Bayes algorithm.

In [7]:

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

messages = [
    "Win a free iPhone now", "Limited time offer, buy now", "Meeting at 10am tomorrow",
    "Congratulations! You have won a lottery", "Let's have lunch today", "Earn money fast!!!",
    "Your Amazon order has been shipped", "You have been selected for a prize"
]
labels = ["spam", "spam", "ham", "spam", "ham", "spam", "ham", "spam"]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(messages)

X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=42)

model = MultinomialNB()
model.fit(X, labels)

y_pred = model.predict(X_test)

print("📊 Accuracy:", accuracy_score(y_test, y_pred))

custom_message = ["Claim your free vacation now"]
custom_vector = vectorizer.transform(custom_message)
prediction = model.predict(custom_vector)
print(f"💬 The message '{custom_message[0]}' is classified as:", prediction[0])


📊 Accuracy: 1.0
💬 The message 'Claim your free vacation now' is classified as: spam


### 1. train_test_split → FUNCTION from sklearn.model_selection
    - Purpose: To split dataset into training set and testing set.
    - Without this: We can’t evaluate our model performance properly.
### 2. CountVectorizer → CLASS from sklearn.feature_extraction.text
    - Purpose: Converts text data into numerical vectors (Bag of Words model).
    - Without this: Model can't process text because ML algorithms need numbers, not raw words. 
### 3. MultinomialNB → CLASS from sklearn.naive_bayes
    - Purpose: Naive Bayes algorithm for classification, works well for text/spam         filtering.
    - Without this: We have no model to train; code will not classify anything.
### 4. accuracy_score → FUNCTION from sklearn.metrics
    - Purpose: Calculates percentage of correct predictions by the model.
    - Without this: We won’t know how accurate our model is.

## STEP 1: CREATE DATASET (messages + labels)
 ------------------------------------------------------
 Dataset: A list of text messages (input) and labels ("spam"/"ham").
 'spam' → Unwanted promotional/scam messages
 'ham'  → Normal genuine messages

## STEP 2: CONVERT TEXT TO NUMERIC FORM
 ------------------------------------------------------
Why? → Machine Learning models only understand numbers.
 CountVectorizer:
  - Learns all unique words in the dataset
  - Assigns each word a column position
  - Counts how many times each word appears in each message

## STEP 3: SPLIT DATA INTO TRAINING & TESTING
------------------------------------------------------
train_test_split():
   - test_size=0.2 → 20% of data for testing, 80% for training.
   - random_state=42 → Fixed seed so results are same every time.
   - X_train: Training messages (numeric form)
   - X_test: Testing messages (numeric form)
   - y_train: Training labels ("spam"/"ham")
   - y_test: Testing labels ("spam"/"ham")


## STEP 4: CREATE & TRAIN NAIVE BAYES MODEL
 ------------------------------------------------------
 MultinomialNB:
   - Designed for discrete data (word counts).
   - Works well for spam filtering because spam often uses certain words repeatedly.
 fit():
   - Teaches the model to connect word patterns to spam/ham labels.

## STEP 5: PREDICT ON TEST DATA
 ------------------------------------------------------
 predict():
   - Uses the trained model to guess labels for unseen messages.


## STEP 6: CALCULATE ACCURACY
 ------------------------------------------------------
 accuracy_score():
   - Compares predictions (y_pred) to actual labels (y_test).
   - Returns a value between 0 and 1 (e.g., 0.95 = 95% accuracy).

## STEP 7: TEST MODEL WITH CUSTOM MESSAGE
------------------------------------------------------
 Process:
   - Write a new message
   - Convert it into numeric vector using same vectorizer
   - Predict spam/ham