# **Deep Learning - AT&T**

## **Project**
One of the main pain point that AT&T users are facing is constant exposure to SPAM messages.

AT&T has been able to manually flag spam messages for a time, but they are looking for an automated way of detecting spams to protect their users.

## **Goals**
the goal is to build a spam detector, that can automatically flag spams as they come based solely on the sms' content. 
The study is structured into four key components:

1. **Initial explorations** of the data present in the dataset

2. **Data Cleaning and Preprocessing** 

3. **Models training** 

4. **Performances analysis**


In [384]:
# Import Spacy and English initialisation
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [385]:
# Library imports for project analysis

import pandas as pd
import io
import os
import re
import shutil
import string
import tensorflow as tf
import tensorflow as tf
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras import Sequential
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, GRU, LSTM
import spacy
import plotly.graph_objects as go

### 1. **Initial explorations** of the data present in the dataset

In [386]:
df = pd.read_csv("spam.csv", encoding='cp1252')

In [387]:
print("Number of rows and columns:  : {}".format(df.shape))
print()

print("Display of dataset: ")
display(df.head())
print()

print("Basics statistics: ")
data_desc = df.describe(include='all')
display(data_desc)
print()

print("Percentage of missing values: ")
display(100*df.isnull().sum()/df.shape[0])

Number of rows and columns:  : (5572, 5)

Display of dataset: 


Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,



Basics statistics: 


Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
count,5572,5572,50,12,6
unique,2,5169,43,10,5
top,ham,"Sorry, I'll call later","bt not his girlfrnd... G o o d n i g h t . . .@""","MK17 92H. 450Ppw 16""","GNT:-)"""
freq,4825,30,3,2,2



Percentage of missing values: 


v1             0.000000
v2             0.000000
Unnamed: 2    99.102656
Unnamed: 3    99.784637
Unnamed: 4    99.892319
dtype: float64

In [388]:
df["v1"].value_counts()

v1
ham     4825
spam     747
Name: count, dtype: int64

In [389]:
# Keep only the two first columns for analysis
df = df[["v1","v2"]]
df.head()

Unnamed: 0,v1,v2
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


### 2. **Data Cleaning and Preprocessing** 

In [390]:
# Import Stop words and en_core_web_sm (NLP model)
from spacy.lang.en.stop_words import STOP_WORDS
nlp = spacy.load('en_core_web_sm')

In [391]:
# Data Cleaning
df["v2_format_clean"] = df["v2"].apply(lambda x:''.join(ch for ch in x if ch.isalnum() or ch==" "))
df["v2_format_clean"] = df["v2_format_clean"].apply(lambda x: x.replace(" +"," ").lower().strip())
df["v2_format_clean"] = df["v2_format_clean"].apply(lambda x: " ".join([token.lemma_ for token in nlp(x) if (token.lemma_ not in STOP_WORDS) & (token.text not in STOP_WORDS)]))

display(100*df["v2_format_clean"].isnull().sum()/df.shape[0])

0.0

In [392]:
df.head()

Unnamed: 0,v1,v2,v2_format_clean
0,ham,"Go until jurong point, crazy.. Available only ...",jurong point crazy available bugis n great wor...
1,ham,Ok lar... Joking wif u oni...,ok lar joke wif u oni
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,free entry 2 wkly comp win fa cup final tkts 2...
3,ham,U dun say so early hor... U c already then say...,u dun early hor u c
4,ham,"Nah I don't think he goes to usf, he lives aro...",nah think usf live


In [393]:
# Encode ham and spam into 0 and 1
label_encoder = LabelEncoder()
df['v1'] = label_encoder.fit_transform(df['v1'])
df.head()

Unnamed: 0,v1,v2,v2_format_clean
0,0,"Go until jurong point, crazy.. Available only ...",jurong point crazy available bugis n great wor...
1,0,Ok lar... Joking wif u oni...,ok lar joke wif u oni
2,1,Free entry in 2 a wkly comp to win FA Cup fina...,free entry 2 wkly comp win fa cup final tkts 2...
3,0,U dun say so early hor... U c already then say...,u dun early hor u c
4,0,"Nah I don't think he goes to usf, he lives aro...",nah think usf live


In [394]:
# Encoding mails
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=1000) # instanciate the tokenizer
tokenizer.fit_on_texts(df["v2_format_clean"])
df["v2_encoded"] = tokenizer.texts_to_sequences(df.v2_format_clean)
df["len_v2"] = df["v2_encoded"].apply(lambda x: len(x))
df = df[df["len_v2"]!=0]

Tensorflow is incapable as of now to create a tensor dataset based on lists of different lengths, we will have to store all of our encoded texts into a single numpy array before creating the tensorflow dataset. Not all our sequences are the same length, this is where the tf.keras.preprocessing.sequence.pad_sequences comes in handy, it will add zero padding at the beginning (padding="pre") or at the end (padding="post") of your sequences so they all have equal length. Pad the sequences

In [395]:
df_pad = tf.keras.preprocessing.sequence.pad_sequences(df.v2_encoded, padding="post")

In [396]:
full_ds = tf.data.Dataset.from_tensor_slices((df_pad, df.v1.values))

In [397]:
# Train Test Split
TAKE_SIZE = int(0.7*df.shape[0])

train_data = full_ds.take(TAKE_SIZE).shuffle(TAKE_SIZE)
train_data = train_data.batch(64)

test_data = full_ds.skip(TAKE_SIZE)
test_data = test_data.batch(64)

for mail, spam_ham in train_data.take(1):
  print(mail, spam_ham)

tf.Tensor(
[[328 138  26 ...   0   0   0]
 [339  80   0 ...   0   0   0]
 [ 30  68 104 ...   0   0   0]
 ...
 [177   0   0 ...   0   0   0]
 [  1   0   0 ...   0   0   0]
 [830 352 831 ...   0   0   0]], shape=(64, 47), dtype=int32) tf.Tensor(
[0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1], shape=(64,), dtype=int64)


### 3. **Model training**

a) Simple RNN

In [398]:
vocab_size = 1000
model = tf.keras.Sequential([
                  # Word Embedding layer           
                  Embedding(vocab_size+1, 64, input_shape=[mail.shape[1],],name="embedding"),
                  # Gobal average pooling
                  SimpleRNN(units=64, return_sequences=True), # maintains the sequential nature
                  SimpleRNN(units=32, return_sequences=False), # returns the last output
                  # Dense layers once the data is flat
                  Dense(16, activation='relu'),
                  Dense(8, activation='relu'),

                  # output layer with as many neurons as the number of classes
                  # for the target variable and softmax activation
                  Dense(1, activation="sigmoid")
])

In [399]:
model.summary()

Model: "sequential_30"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 47, 64)            64064     
                                                                 
 simple_rnn_26 (SimpleRNN)   (None, 47, 64)            8256      
                                                                 
 simple_rnn_27 (SimpleRNN)   (None, 32)                3104      
                                                                 
 dense_96 (Dense)            (None, 16)                528       
                                                                 
 dense_97 (Dense)            (None, 8)                 136       
                                                                 
 dense_98 (Dense)            (None, 1)                 9         
                                                                 
Total params: 76097 (297.25 KB)
Trainable params: 760

In [400]:
optimizer= tf.keras.optimizers.Adam()

model.compile(optimizer=optimizer,
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=[tf.keras.metrics.BinaryAccuracy()])



In [401]:
# Weight Categories

weights = 1/(df["v1"]).value_counts()
weights = weights * len(df)/2
weights = {index : values for index , values in zip(weights.index,weights.values)}
weights

{0: 0.5798157274480394, 1: 3.632214765100671}

In [402]:
# Model training 
model.fit(train_data,
          epochs=20, 
          validation_data=test_data,
          class_weight=weights)

Epoch 1/20


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.src.callbacks.History at 0x355464bd0>

In [403]:
model.save("./model/model_simpleRNN.h5")
import json
json.dump(model.history.history, open("./history/simpleRNN_history.json", 'w'))


You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.



b) GRU

In [404]:
vocab_size = 1000
model_gru = tf.keras.Sequential([
                  Embedding(vocab_size+1, 64, input_shape=[mail.shape[1],],name="embedding"),
                  GRU(units=64, return_sequences=True), # maintains the sequential nature
                  GRU(units=32, return_sequences=False), # returns the last output
                  Dense(16, activation='relu'),
                  Dense(8, activation='relu'),

                  Dense(1, activation="sigmoid")
])

In [405]:
model_gru.summary()

Model: "sequential_31"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 47, 64)            64064     
                                                                 
 gru_18 (GRU)                (None, 47, 64)            24960     
                                                                 
 gru_19 (GRU)                (None, 32)                9408      
                                                                 
 dense_99 (Dense)            (None, 16)                528       
                                                                 
 dense_100 (Dense)           (None, 8)                 136       
                                                                 
 dense_101 (Dense)           (None, 1)                 9         
                                                                 
Total params: 99105 (387.13 KB)
Trainable params: 991

In [406]:
optimizer= tf.keras.optimizers.Adam()

model_gru.compile(optimizer=optimizer,
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=[tf.keras.metrics.BinaryAccuracy()])



In [407]:
model_gru.fit(train_data,
              epochs=20, 
              validation_data=test_data,
              class_weight=weights)

Epoch 1/20


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.src.callbacks.History at 0x355941a10>

In [408]:
model_gru.save("./model/model_gru.h5")
import json
json.dump(model_gru.history.history, open("./history/GRU_history.json", 'w'))

c) LSTM

In [409]:
vocab_size = 1000

model_lstm = tf.keras.Sequential([
                  Embedding(vocab_size+1, 64, input_shape=[mail.shape[1],],name="embedding"),
                  LSTM(units=64, return_sequences=True), # maintains the sequential nature
                  LSTM(units=32, return_sequences=False), # returns the last output
                  Dense(16, activation='relu'),
                  Dense(8, activation='relu'),

                  Dense(1, activation="sigmoid", name="last")
])


In [410]:
model_lstm.summary()

Model: "sequential_32"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 47, 64)            64064     
                                                                 
 lstm_30 (LSTM)              (None, 47, 64)            33024     
                                                                 
 lstm_31 (LSTM)              (None, 32)                12416     
                                                                 
 dense_102 (Dense)           (None, 16)                528       
                                                                 
 dense_103 (Dense)           (None, 8)                 136       
                                                                 
 last (Dense)                (None, 1)                 9         
                                                                 
Total params: 110177 (430.38 KB)
Trainable params: 11

In [411]:
optimizer= tf.keras.optimizers.Adam()

model_lstm.compile(optimizer=optimizer,
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=[tf.keras.metrics.BinaryAccuracy()])



In [412]:
model_lstm.fit(train_data,
              epochs=100, 
              validation_data=test_data,
              class_weight=weights)

Epoch 1/100


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 7

<keras.src.callbacks.History at 0x35767b2d0>

In [413]:
model_lstm.save("./model/model_lstm.h5")
import json
json.dump(model_lstm.history.history, open("./history/LSTM_history.json", 'w'))

d) LSTM with Attention

In [414]:
from tensorflow.keras.layers import Input, LSTM, Dense, Embedding, Attention
from tensorflow.keras.models import Model

# Définition de l'entrée
inputs = Input(shape=(mail.shape[1],))

# Embedding
embedding_layer = Embedding(vocab_size+1, 64, input_length=mail.shape[1], name="embedding")
embedding_output = embedding_layer(inputs)

# First LSTM
lstm_output = LSTM(units=64, return_sequences=True)(embedding_output)

# Attention
attention_output = Attention()([lstm_output, lstm_output])

# Second LSTM with Attention
lstm_with_attention_output = LSTM(units=32, return_sequences=False)(attention_output)

dense_output = Dense(16, activation='relu')(lstm_with_attention_output)
dense_output = Dense(8, activation='relu')(dense_output)

outputs = Dense(1, activation="sigmoid", name="last")(dense_output)

# Model
model_LSTM_with_attention = Model(inputs=inputs, outputs=outputs)

model_LSTM_with_attention.summary()


Model: "model_7"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_8 (InputLayer)        [(None, 47)]                 0         []                            
                                                                                                  
 embedding (Embedding)       (None, 47, 64)               64064     ['input_8[0][0]']             
                                                                                                  
 lstm_32 (LSTM)              (None, 47, 64)               33024     ['embedding[0][0]']           
                                                                                                  
 attention_7 (Attention)     (None, 47, 64)               0         ['lstm_32[0][0]',             
                                                                     'lstm_32[0][0]']       

In [415]:
optimizer= tf.keras.optimizers.Adam()

model_LSTM_with_attention.compile(optimizer=optimizer,
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=[tf.keras.metrics.BinaryAccuracy()])



In [416]:
model_LSTM_with_attention.fit(train_data,
              epochs=100, 
              validation_data=test_data,
              class_weight=weights)

Epoch 1/100


2024-02-08 18:41:38.932581: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:693] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "CPU" model: "0" frequency: 2400 num_cores: 10 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }




2024-02-08 18:41:41.093645: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:693] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "CPU" model: "0" frequency: 2400 num_cores: 10 environment { key: "cpu_instruction_set" value: "ARM NEON" } environment { key: "eigen" value: "3.4.90" } l1_cache_size: 16384 l2_cache_size: 524288 l3_cache_size: 524288 memory_size: 268435456 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 7

<keras.src.callbacks.History at 0x345ebd8d0>

In [417]:
model_LSTM_with_attention.save("./model/model_LSTM_with_attention.h5")
import json
json.dump(model_LSTM_with_attention.history.history, open("./history/LSTM_att_history.json", 'w'))

### 4. **Performance analysis**

a) RNN

In [418]:
simpleRNN_history = json.load(open("./history/simpleRNN_history.json", 'r'))
model_simpleRNN = tf.keras.models.load_model("./model/model_simpleRNN.h5")
fig = go.Figure()
fig.add_trace(go.Scatter(y=simpleRNN_history["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=simpleRNN_history["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()




b) GRU

In [419]:
GRU_history = json.load(open("./history/GRU_history.json", 'r'))
model_gru = tf.keras.models.load_model("./model/model_gru.h5")

fig = go.Figure()
fig.add_trace(go.Scatter(y=GRU_history["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=GRU_history["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()



c) LSTM

In [420]:
LSTM_history = json.load(open("./history/LSTM_history.json", 'r'))
model_lstm = tf.keras.models.load_model("./model/model_lstm.h5")

fig = go.Figure()
fig.add_trace(go.Scatter(y=LSTM_history["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=LSTM_history["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()




d) LSTM with Attention

In [422]:
LSTM_wa_history = json.load(open("./history/LSTM_att_history.json", 'r'))
model_lstm = tf.keras.models.load_model("./model/model_LSTM_with_attention.h5")

fig = go.Figure()
fig.add_trace(go.Scatter(y=LSTM_wa_history["loss"],
                    mode='lines',
                    name='loss'))
fig.add_trace(go.Scatter(y=LSTM_wa_history["val_loss"],
                    mode='lines',
                    name='val_loss'))
fig.show()




### Conclusion 


Based on the provided models for predicting whether emails are spam or ham, let's analyze their performances and conclude on the best choice considering overfitting and other factors.

* RNN Model:

    Achieves a high binary accuracy of around 99.39% on the training set and 95.63% on the validation set.
    Appears to generalize well with a consistent improvement in validation accuracy across epochs.
    Shows minimal signs of overfitting with a slight increase in validation loss towards the end but still maintains a high accuracy.

* GRU Model:

    Starts with relatively poor performance, with binary accuracy dropping significantly during training.
    Doesn't appear to learn effectively, as indicated by the low accuracy on both training and validation sets.
    Demonstrates signs of severe overfitting with a large gap between training and validation accuracies.

* LSTM Model:

    Initially achieves a high binary accuracy of around 86.17% on the training set and 97.54% on the validation set.
    Maintains high accuracy throughout training but shows signs of slight overfitting with a slight divergence between training and validation accuracies.
    However, it successfully learns the patterns in the data and generalizes well.

* LSTM with Attention Model:

    Starts with a reasonable binary accuracy of around 86.12% on the training set and 97.48% on the validation set.
    Maintains high accuracy throughout training with minimal signs of overfitting, similar to the basic LSTM model.
    Incorporates attention mechanism, which might help in focusing on important parts of the input sequence, potentially improving performance.


Considering the performances and potential for overfitting, the LSTM with Attention model seems to be the most promising choice. It achieves high accuracy on both training and validation sets while demonstrating resilience against overfitting. The attention mechanism adds an additional layer of interpretability and could potentially enhance performance further.