# Neural Networks

Neural networks are a subset of machine learning algorithms inspired by the structure and function of the human brain. They consist of interconnected layers of nodes (neurons) that process data by passing it through weights, biases, and activation functions.

## Key Components
1. **Neurons (Nodes)**: Basic computational units that receive input, apply a weight and bias, and pass it through an activation function.
2. **Layers**:
   - **Input Layer**: Receives the raw data.
   - **Hidden Layers**: Perform computations, enabling the network to learn complex patterns.
   - **Output Layer**: Produces the final result.
3. **Weights and Biases**: Parameters adjusted during training to minimize error.
4. **Activation Functions**: Introduce non-linearity, allowing the network to learn more complex mappings (e.g., ReLU, sigmoid, tanh).

## Types of Neural Networks
1. **Feedforward Neural Networks (FNN)**: Data flows in one direction, often used for tasks like classification and regression.
2. **Convolutional Neural Networks (CNNs)**: Specialized for image data, using convolution layers to extract features.
3. **Recurrent Neural Networks (RNNs)**: Designed for sequential data (e.g., time series, text), using loops to retain memory of previous inputs.
4. **Generative Adversarial Networks (GANs)**: Consist of two networks (generator and discriminator) that compete to generate realistic data.
5. **Transformers**: Modern architectures (e.g., GPT, BERT) for sequence processing, replacing RNNs in many applications.

## Applications
- **Manufacturing**: Predictive maintenance, quality control, demand forecasting.
- **Healthcare**: Disease diagnosis, drug discovery.
- **Finance**: Fraud detection, algorithmic trading.
- **Natural Language Processing (NLP)**: Chatbots, translation, sentiment analysis.
- **Image Processing**: Object detection, facial recognition.

## Training a Neural Network
1. **Forward Propagation**: Compute the output for a given input.
2. **Loss Function**: Measure the difference between the predicted output and the actual target.
3. **Backward Propagation**: Adjust weights and biases using gradient descent to minimize the loss.
4. **Optimization**: Algorithms like stochastic gradient descent (SGD), Adam, or RMSprop refine the model.

## Challenges
- **Overfitting**: The network performs well on training data but poorly on unseen data.
- **Data Requirements**: Neural networks often require large datasets.
- **Computation**: Training deep networks can be resource-intensive.

---



# Identifying spam e-mails with neural networks

A common use for binary classification is sorting spam e-mails from legitimate e-mails. I use Keras to build a binary classifier for e-mails, train it with a collection of e-mails labeled with 0s (for not spam) and 1s (for spam), and then run a few e-mails through it to see how well it performs.

## Load and prepare the data


In [1]:
import pandas as pd

df = pd.read_csv('ham-spam.csv')
df.head()

Unnamed: 0,IsSpam,Text
0,0,key issues going forwarda year end reviews rep...
1,0,congrats contratulations the execution the cen...
2,0,key issues going forwardall under control set...
3,0,epmi files protest entergy transcoattached our...
4,0,california power please contact kristin walsh ...


Find out how many rows the dataset contains and confirm that there are no missing values.

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   IsSpam  1000 non-null   int64 
 1   Text    1000 non-null   object
dtypes: int64(1), object(1)
memory usage: 15.8+ KB


Remove duplicate rows from the dataset and check for balance.

In [3]:
df = df.drop_duplicates()
df.groupby('IsSpam').describe()

Unnamed: 0_level_0,Text,Text,Text,Text
Unnamed: 0_level_1,count,unique,top,freq
IsSpam,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
0,499,499,key issues going forwarda year end reviews rep...,1
1,500,500,take the reinsbecomeyour employer substantial ...,1


Create a feature column *x* containing the text in the "Text" column with stopwords removed, and a label column *y*.

In [4]:
# First, download required NLTK data
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('punkt_tab')

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

def remove_stop_words(text):
    # Use a simpler tokenization approach initially
    text = text.lower().split()
    stop_words = set(stopwords.words('english'))
    text = [word for word in text if word.isalpha() and not word in stop_words]
    return ' '.join(text)
    
x = df.apply(lambda row: remove_stop_words(row['Text']), axis=1)
y = df['IsSpam']

[nltk_data] Downloading package punkt to C:\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to C:\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt_tab to C:\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


## Tokenize the text and create padded sequences from it.

In [5]:
from keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

max_words = 20000
max_length = 500

tokenizer = Tokenizer(num_words=max_words)
tokenizer.fit_on_texts(x)
sequences = tokenizer.texts_to_sequences(x)
x = pad_sequences(sequences, maxlen=max_length)

## Train a neural network to identify spam

Create a neural network containing an [`Embedding`](https://keras.io/api/layers/core_layers/embedding/) layer for converting sequences into arrays of word vectors and a [`Dense`](https://keras.io/api/layers/core_layers/dense/) layer for classifying arrays of word vectors.

In [6]:
from keras.models import Sequential
from keras.layers import Dense, Flatten, Embedding

model = Sequential() 
model.add(Embedding(max_words, 32, input_length=max_length)) 
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 
model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 500, 32)           640000    
                                                                 
 flatten (Flatten)           (None, 16000)             0         
                                                                 
 dense (Dense)               (None, 128)               2048128   
                                                                 
 dense_1 (Dense)             (None, 1)                 129       
                                                                 
Total params: 2,688,257
Trainable params: 2,688,257
Non-trainable params: 0
_________________________________________________________________


In [7]:
from keras.models import Sequential
from keras.layers import Dense, Flatten, Embedding  # Import Embedding directly from keras.layers

model = Sequential() 
model.add(Embedding(max_words, 32, input_length=max_length)) 
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 500, 32)           640000    
                                                                 
 flatten_1 (Flatten)         (None, 16000)             0         
                                                                 
 dense_2 (Dense)             (None, 128)               2048128   
                                                                 
 dense_3 (Dense)             (None, 1)                 129       
                                                                 
Total params: 2,688,257
Trainable params: 2,688,257
Non-trainable params: 0
_________________________________________________________________


## Train the network.

In [8]:
hist = model.fit(x, y, validation_split=0.2, epochs=5, batch_size=20)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


## Chart the training and validation accuracy for each epoch.

In [9]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set()

acc = hist.history['accuracy']
val = hist.history['val_accuracy']
epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, '-', label='Training accuracy')
plt.plot(epochs, val, ':', label='Validation accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.plot()

[]

## Train a convolutional neural network to identify spam

Convolutional neural networks (CNNs) are primarily used to classify images, but they can be helpful for text classification, too. One advantage to a CNN is that it can recognize word patterns and factor them into its predictions rather than treat words independently. Let's train a CNN and see if it can achieve a higher validation accuracy than a conventional neural network.

In [10]:
from keras.layers import Conv1D, MaxPooling1D, GlobalMaxPooling1D

model = Sequential() 
model.add(Embedding(max_words, 32, input_length=max_length)) 
model.add(Conv1D(32, 7, activation='relu'))
model.add(MaxPooling1D(5))
model.add(Conv1D(32, 7, activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_2 (Embedding)     (None, 500, 32)           640000    
                                                                 
 conv1d (Conv1D)             (None, 494, 32)           7200      
                                                                 
 max_pooling1d (MaxPooling1D  (None, 98, 32)           0         
 )                                                               
                                                                 
 conv1d_1 (Conv1D)           (None, 92, 32)            7200      
                                                                 
 global_max_pooling1d (Globa  (None, 32)               0         
 lMaxPooling1D)                                                  
                                                                 
 dense_4 (Dense)             (None, 1)                

## Train the network.

In [11]:
hist = model.fit(x, y, validation_split=0.2, epochs=5, batch_size=20)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


## Chart the training and validation accuracy for each epoch.

In [12]:
acc = hist.history['accuracy']
val = hist.history['val_accuracy']
epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, '-', label='Training accuracy')
plt.plot(epochs, val, ':', label='Validation accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.plot()

[]

## Use the model to classify e-mails

Now let's see how the model classifies some sample e-mails. We'll start with a message that is not spam. The model's `predict` method predicts the probability that the input belongs to the positive class (spam).

In [13]:
cleaned_text = remove_stop_words('Can you attend a code review on Tuesday? Need to make sure the logic is rock solid.')
sequence = tokenizer.texts_to_sequences([cleaned_text])
padded_sequence = pad_sequences(sequence, maxlen=max_length)
model.predict(padded_sequence)[0][0]



0.3992149

## Now test the model with a spam message.

In [14]:
cleaned_text = remove_stop_words('Why pay more for expensive meds when you can order them online and save $$$?')
sequence = tokenizer.texts_to_sequences([cleaned_text])
padded_sequence = pad_sequences(sequence, maxlen=max_length)
model.predict(padded_sequence)[0][0]



0.6705084

In [15]:
## Test with my last email from advisor

In [16]:
# Test with a new email
test_email = "There is no group meeting today. Because of faculty recruiting,.. the end of this semester was a lot busier than expected. Since some of you will leave town soon, I wish you all happy holidays and all the best for the New Year via e-mail.  We will have a get together early next year."
cleaned_text = remove_stop_words(test_email)
sequence = tokenizer.texts_to_sequences([cleaned_text])
padded_sequence = pad_sequences(sequence, maxlen=max_length)
prediction = model.predict(padded_sequence)[0][0]

print(f"Email text: {test_email}")
print(f"Probability of being spam: {prediction:.2%}")
print(f"Classification: {'SPAM' if prediction > 0.5 else 'NOT SPAM'}")

Email text: There is no group meeting today. Because of faculty recruiting,.. the end of this semester was a lot busier than expected. Since some of you will leave town soon, I wish you all happy holidays and all the best for the New Year via e-mail.  We will have a get together early next year.
Probability of being spam: 14.33%
Classification: NOT SPAM


## And now we have a working spam filter