# Spam Ham Detection Using BERT and Tensorflow

### <u>Project Summary</u>

### <u>GitHub Link</u>
[Click Here](https://github.com/ajitmane36/spam-ham-detection-bert-tensorflow.git)

### <u>Problem Statement</u>

- The data is related to the classification of emails into spam or ham (non-spam). The goal of this project is to develop a model using BERT and TensorFlow to predict whether an email is spam or not based on its content. By fine-tuning a pre-trained BERT model, the objective is to enhance the accuracy and efficiency of email classification, ensuring that legitimate emails are delivered to the inbox while spam is effectively filtered out.

### <u>Data Description</u>

- **text**: Description of the email content (text).
- **spam**: Indicates whether the email is spam (1) or not (0).

In [55]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# filter warnings
import warnings
warnings.filterwarnings('ignore')

In [64]:
# Dataset Loading
df=pd.read_csv(r"C:\Users\ajitm\Downloads\DS Projects\Deep Larning Projects\1. Text Classification Using BERT & Tensorflow\spam_ham_dataset.csv")
df

Unnamed: 0,text,spam,spam.1,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 100,Unnamed: 101,Unnamed: 102,Unnamed: 103,Unnamed: 104,Unnamed: 105,Unnamed: 106,Unnamed: 107,Unnamed: 108,Unnamed: 109
0,Subject: naturally irresistible your corporate...,1,spam,,,,,,,,...,,,,,,,,,,
1,Subject: the stock trading gunslinger fanny i...,1,spam,,,,,,,,...,,,,,,,,,,
2,Subject: unbelievable new homes made easy im ...,1,spam,,,,,,,,...,,,,,,,,,,
3,Subject: 4 color printing special request add...,1,spam,,,,,,,,...,,,,,,,,,,
4,"Subject: do not have money , get software cds ...",1,spam,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5725,Subject: re : research and development charges...,0,ham,,,,,,,,...,,,,,,,,,,
5726,"Subject: re : receipts from visit jim , than...",0,ham,,,,,,,,...,,,,,,,,,,
5727,Subject: re : enron case study update wow ! a...,0,ham,,,,,,,,...,,,,,,,,,,
5728,"Subject: re : interest david , please , call...",0,ham,,,,,,,,...,,,,,,,,,,


In [67]:
# Fist five observations
df.head()

Unnamed: 0,text,spam,spam.1,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 100,Unnamed: 101,Unnamed: 102,Unnamed: 103,Unnamed: 104,Unnamed: 105,Unnamed: 106,Unnamed: 107,Unnamed: 108,Unnamed: 109
0,Subject: naturally irresistible your corporate...,1,spam,,,,,,,,...,,,,,,,,,,
1,Subject: the stock trading gunslinger fanny i...,1,spam,,,,,,,,...,,,,,,,,,,
2,Subject: unbelievable new homes made easy im ...,1,spam,,,,,,,,...,,,,,,,,,,
3,Subject: 4 color printing special request add...,1,spam,,,,,,,,...,,,,,,,,,,
4,"Subject: do not have money , get software cds ...",1,spam,,,,,,,,...,,,,,,,,,,


In [69]:
# Last five observations
df.tail()

Unnamed: 0,text,spam,spam.1,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 100,Unnamed: 101,Unnamed: 102,Unnamed: 103,Unnamed: 104,Unnamed: 105,Unnamed: 106,Unnamed: 107,Unnamed: 108,Unnamed: 109
5725,Subject: re : research and development charges...,0,ham,,,,,,,,...,,,,,,,,,,
5726,"Subject: re : receipts from visit jim , than...",0,ham,,,,,,,,...,,,,,,,,,,
5727,Subject: re : enron case study update wow ! a...,0,ham,,,,,,,,...,,,,,,,,,,
5728,"Subject: re : interest david , please , call...",0,ham,,,,,,,,...,,,,,,,,,,
5729,Subject: news : aurora 5 . 2 update aurora ve...,0,ham,,,,,,,,...,,,,,,,,,,


#### <u>Data Inispection</u>

In [79]:
# Shape of dataset
df.shape
print(f'Dataset has {df.shape[0]} observations and {df.shape[1]} columns.')

Dataset has 5730 observations and 110 columns.


In [91]:
# Dataset columns
print(df.columns.tolist())

['text', 'spam', 'spam.1', 'Unnamed: 3', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8', 'Unnamed: 9', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13', 'Unnamed: 14', 'Unnamed: 15', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28', 'Unnamed: 29', 'Unnamed: 30', 'Unnamed: 31', 'Unnamed: 32', 'Unnamed: 33', 'Unnamed: 34', 'Unnamed: 35', 'Unnamed: 36', 'Unnamed: 37', 'Unnamed: 38', 'Unnamed: 39', 'Unnamed: 40', 'Unnamed: 41', 'Unnamed: 42', 'Unnamed: 43', 'Unnamed: 44', 'Unnamed: 45', 'Unnamed: 46', 'Unnamed: 47', 'Unnamed: 48', 'Unnamed: 49', 'Unnamed: 50', 'Unnamed: 51', 'Unnamed: 52', 'Unnamed: 53', 'Unnamed: 54', 'Unnamed: 55', 'Unnamed: 56', 'Unnamed: 57', 'Unnamed: 58', 'Unnamed: 59', 'Unnamed: 60', 'Unnamed: 61', 'Unnamed: 62', 'Unnamed: 63', 'Unnamed: 64', 'Unnamed: 65', 'Unnamed: 66', 'Unnamed: 67', 'Unna

In [100]:
# Basic information of dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5730 entries, 0 to 5729
Columns: 110 entries, text to Unnamed: 109
dtypes: float64(1), object(109)
memory usage: 4.8+ MB
