## Spam messages detection

#### Step 1: Install Required Libraries

```bash
pip install tensorflow pandas numpy
```

#### Step 2: Load and Preprocess Data

Load the SMS Spam Collection dataset. Here’s the basic code structure for loading the dataset from UCI.

**Download the dataset**

1. Go to the [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/228/sms+spam+collection) and download `SMSSpamCollection` file.
2. Place the file in your working directory.

In [4]:
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GlobalAveragePooling1D, Dense

In [5]:


# Load the dataset
data = pd.read_csv('SMSSpamCollection', sep='\t', names=['label', 'message'])
data['label'] = data['label'].map({'ham': 0, 'spam': 1})  # Encode labels as binary
print(data.head())

# Split into sentences and labels
sentences = data['message'].values
labels = data['label'].values


   label                                            message
0      0  Go until jurong point, crazy.. Available only ...
1      0                      Ok lar... Joking wif u oni...
2      1  Free entry in 2 a wkly comp to win FA Cup fina...
3      0  U dun say so early hor... U c already then say...
4      0  Nah I don't think he goes to usf, he lives aro...


#### Step 3: Preprocess the Text Data

We’ll use `tf.keras.layers.TextVectorization` for preprocessing, which helps convert text into a numerical format suitable for machine learning.

In [6]:


# Parameters for vectorization
max_vocab_size = 1000  # Vocabulary size
max_sequence_length = 50  # Maximum length of a sequence

# Vectorization layer
vectorize_layer = tf.keras.layers.TextVectorization(
    max_tokens=max_vocab_size,
    output_mode='int',
    output_sequence_length=max_sequence_length
)

# Adapt the vectorization layer on the text data
vectorize_layer.adapt(sentences)

# Apply the vectorization to the text data
vectorized_sentences = vectorize_layer(sentences)


#### Step 4: Split Data into Training and Testing Sets

In [None]:


# Splitting data
training_sentences, testing_sentences, training_labels, testing_labels = train_test_split(
    vectorized_sentences.numpy(), labels, test_size=0.2, random_state=42
)

#### Step 5: Build the Model

Define a simple neural network model using TensorFlow and Keras.

In [8]:


# Building the model
model = Sequential([
    Embedding(input_dim=max_vocab_size, output_dim=16, input_length=max_sequence_length),
    GlobalAveragePooling1D(),
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')  # Sigmoid for binary classification
])

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 50, 16)            16000     
                                                                 
 global_average_pooling1d (  (None, 16)                0         
 GlobalAveragePooling1D)                                         
                                                                 
 dense (Dense)               (None, 16)                272       
                                                                 
 dense_1 (Dense)             (None, 1)                 17        
                                                                 
Total params: 16289 (63.63 KB)
Trainable params: 16289 (63.63 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


#### 6. Training the Model

In [9]:
# Train the model
history = model.fit(
    training_sentences, training_labels,
    epochs=10,
    batch_size=32,
    validation_data=(testing_sentences, testing_labels)
)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### Step 7: Evaluate the Model

After training, we evaluate the model’s performance on the test set to get an idea of how well it generalizes.

In [10]:
# Evaluate the model
test_loss, test_accuracy = model.evaluate(testing_sentences, testing_labels)
print(f'Test Accuracy: {test_accuracy:.2f}')

Test Accuracy: 0.99


#### Step 8: Make Predictions

Finally, we can use the model to make predictions on new text data.

In [12]:
def predict_spam(text):
    vectorized_text = vectorize_layer([text])
    prediction = model.predict(vectorized_text)
    print(f'Prediction: {prediction}')
    return "Spam" if prediction[0] > 0.5 else "Ham"

# Test the function
print(predict_spam("Congratulations! You've won a free iPhone. Claim now!"))
print(predict_spam("Let's catch up for lunch tomorrow."))


Prediction: [[0.79941714]]
Spam
Prediction: [[0.00977313]]
Ham
