Importing Required Libraries

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report


Loading the Dataset

In [None]:
data = pd.read_csv("/content/drive/MyDrive/Datasets/spam dec/spam.csv", encoding='latin-1')
data = data[['v1', 'v2']]
print(data.head())

     v1                                                 v2
0   ham  Go until jurong point, crazy.. Available only ...
1   ham                      Ok lar... Joking wif u oni...
2  spam  Free entry in 2 a wkly comp to win FA Cup fina...
3   ham  U dun say so early hor... U c already then say...
4   ham  Nah I don't think he goes to usf, he lives aro...


Data Pre-processing

In [None]:
data['v1'] = data['v1'].map({'ham': 0, 'spam': 1})


Split Data into Training & Testing

In [None]:
X = data['v2']
y = data['v1']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


Convert Text to Numerical Form (TF-IDF)

In [None]:
vectorizer = TfidfVectorizer(stop_words='english')
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)


Training the Spam Detection Model

In [None]:
model = MultinomialNB()
model.fit(X_train_vec, y_train)


Testing the Model

In [None]:
y_pred = model.predict(X_test_vec)

print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))


Accuracy: 0.9668161434977578
              precision    recall  f1-score   support

         ham       0.96      1.00      0.98       965
        spam       1.00      0.75      0.86       150

    accuracy                           0.97      1115
   macro avg       0.98      0.88      0.92      1115
weighted avg       0.97      0.97      0.96      1115



Predict New Email

In [None]:
email = ["Congratulations! you regsitered succcefully"]
email_vec = vectorizer.transform(email)

prediction = model.predict(email_vec)

if prediction[0] == 1:
    print(" Spam Mail Detected")
else:
    print(" Not Spam (Ham Mail)")


 Not Spam (Ham Mail)


Summary: Spam Mail Detection Code

This code implements a Spam Mail Detection system using Machine Learning in Python. It uses a labeled email dataset where messages are classified as spam or ham (not spam).

The dataset is first loaded and preprocessed by selecting relevant columns and converting text labels into numeric form. The data is then split into training and testing sets to evaluate model performance.

Since machine learning models cannot work directly with text data, the email messages are converted into numerical features using TF-IDF Vectorization, which assigns importance to words based on their frequency.

A Naive Bayes classifier is trained on the processed training data to learn patterns associated with spam and non-spam emails. The trained model is then tested on unseen data, and its performance is evaluated using accuracy and classification metrics.

Finally, the model is used to predict whether a new email is spam or not, making the system suitable for real-time spam detection.

Gradio

In [None]:
!pip install gradio


In [None]:
# Import required libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
import gradio as gr

# Load the dataset
data = pd.read_csv("/content/drive/MyDrive/Datasets/spam dec/spam.csv", encoding='latin-1')
data = data[['v1', 'v2']]
data['v1'] = data['v1'].map({'ham': 0, 'spam': 1})

# Split data
X = data['v2']
y = data['v1']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# TF-IDF Vectorization
vectorizer = TfidfVectorizer(stop_words='english')
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

# Train model
model = MultinomialNB()
model.fit(X_train_vec, y_train)

# Test model
y_pred = model.predict(X_test_vec)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

# Prediction function for Gradio
def predict_spam(email_text):
    email_vec = vectorizer.transform([email_text])
    prediction = model.predict(email_vec)

    if prediction[0] == 1:
        return "ðŸš¨ Spam Mail Detected"
    else:
        return "âœ… Not Spam (Ham Mail)"

# Gradio Interface
interface = gr.Interface(
    fn=predict_spam,
    inputs=gr.Textbox(lines=4, placeholder="Enter email text here..."),
    outputs="text",
    title="ðŸ“§ Spam Email Detection System",
    description="Enter an email message to check whether it is Spam or Ham."
)

# Launch app
interface.launch()


Accuracy: 0.9668161434977578
              precision    recall  f1-score   support

           0       0.96      1.00      0.98       965
           1       1.00      0.75      0.86       150

    accuracy                           0.97      1115
   macro avg       0.98      0.88      0.92      1115
weighted avg       0.97      0.97      0.96      1115

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://dc37701e951183d8cb.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


