# SMS Phisihing Detection Using Advanced NLP Models

In [None]:
# Edit all the Mardown cells below with the appropriate information
# Run all cells, containing your code
# Save this Jupyter with the outputs of your executed cells
#
# PS: Save again the notebook with this outcome.
# PSPS: Don't forget to include the dataset in your submission

**Team:**
* Abahana Zelalem
* Benjamin Arnosti
* Brandon Botezr

**Course:** AI 574 – Natural Language Processing (Fall, 2024)

## Problem Statement
The objective of this project is to develop an effective SMS phishing detection system by leveraging cutting-edge NLP models such as BERT, RoBERTa, and GPT-3. These transformer-based models have revolutionized language understanding and exhibit the capability to detect phishing attempts in a more context-aware manner. By comparing their performance to traditional machine learning models like Logistic Regression, Support Vector Machines (SVM), and Long Short-Term Memory (LSTM) networks, this project aims to highlight the strengths and limitations of each approach. 


Our system will analyze both the textual content of SMS messages and any embedded features, such as URLs, email addresses, or phone numbers, to classify messages as phishing, smishing, spam, or legitimate (ham). While traditional models rely on handcrafted features and are generally faster, they may lack the sophistication needed to handle the complexities of modern phishing techniques. Transformer-based models, on the other hand, offer enhanced performance by capturing nuanced patterns in short messages, which is crucial in detecting sophisticated phishing attacks hidden in mobile text formats.


By comparing traditional and advanced models, the project will provide insights into which models strike the best balance between performance, interpretability, and computational efficiency for real-time phishing detection in mobile environments.


* **Keywords:** SMS, Phisishing, Smishing, text, phone, Natural Language Processing

## Data Collection
* Source (url): https://data.mendeley.com/datasets/f45bkkt8pr/1/
* Short Description : The dataset is a set of labelled text messages that have been collected for SMS Phishing research. It has 5971 text messages labeled as Legitimate (Ham) or Spam or Smishing. It includes 489 spam messages, 638 smishing messages, and 4844 ham messages. (Mishra, S., Soni, D., 2022)

* Keywords: SMS, Phisishing, Smishing, text, phone

## Required packages

- pandas
- numpy
- sklearn
- Torch/PyTorch
- transformers by HuggingFace
- datasets by HuggingFace
- matplotlib.pyplot
- imblearn

* These libraries can be installed via a Conda installer to create a virtual environmnet.  For specific information on installing PyTorch, see: https://pytorch.org/get-started/locally/.  For information on HuggingFace transformers or datasets, see: https://huggingface.co/learn/nlp-course/chapter0/1?fw=pt.


## Library Imports

In [None]:
## Your code begins here
import os

# Basic imports
import pandas as pd
import arrow
import numpy as np
import sklearn.utils

# PyTorch Imports
import torch
import torch.nn as nn
from torch.utils.data import DataLoader

# May use transformers
import transformers
# Get BERT from HF as well as Trainer class
from transformers import AutoTokenizer, BertForSequenceClassification, Trainer, TrainingArguments, DataCollatorWithPadding
from datasets import Dataset, load_dataset

# Sklearn Metric items and splitting
from sklearn import metrics
from sklearn.model_selection import train_test_split

# Plotting
import matplotlib.pyplot as plt

## GPU Setup

In [None]:
# Trigger the GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f'You are usisng {device} on a {torch.cuda.get_device_name()}.')

## Model and Tokenizer Loading

Models and tokenizers will be loaded here so they can be used later on.

In [None]:
# RNN Model?
# Should these go here or should they have their own notebook?

In [None]:
# API Calls?

In [None]:
# BERT Model Loads
# Load the BERT-base-cased model
BERT_model_name = 'google-bert/bert-base-cased'
tokenizer = AutoTokenizer.from_pretrained(BERT_model_name) # Keep on CPU
model = BertForSequenceClassification.from_pretrained(BERT_model_name, num_labels=3).to(device) # Move to GPU

## Data Preprocessing

We are using the (Mishra & Soni, 2022) dataset for the project.  This dataset contains a collection of labeled SMS messages with labels indicating whether they are smishing or legitimate.  While the dataset is largely clean, we have had to do some pre-processing to match ‘Spam’ and ‘spam’ as well as ‘Ham’ and ‘ham’ labels.  Each entry includes the label (ham, spam, smishing), the SMS message, and if any URL, email address, or phone number is present.  Table 1 shows a sample of the dataset.  The dataset itself contains 5971 text messages of which 4844 are ham, 489 are spam, and 638 are smishing.


In [None]:
# Load the dataset from local directory
ds = pd.read_csv("../data/processed/Dataset_5971.csv")
ds

I believe I need to rename `label` to `labels`... I'll do this later because I need to do some clever work in getting labels to be integers for the classification.

In [None]:
cols = {'LABEL':'labels'}
ds.rename(columns=cols, inplace=True)
ds

The labels Spam and spam are the same as well as Smishing and smishing.  I'll push them all to lowercase.

In [None]:
ds['labels'] = ds['labels'].str.lower()
ds.labels.unique()

In [None]:
# A distribution of the sentence labels
# This corresponds to the eventual binning of the 5 categories
plt.hist(ds['labels'], bins=3)
plt.title('Training Data Labels');

It can be seen by the histogram above the the data is heavily weighted to the ham values.  We may need to balance this dataset.

I'm also curious just to see what the largest length of our data is.  BERT will only take a 512.

In [None]:
max_len = 0
for item in ds['TEXT']:
    if len(item) > max_len:
        max_len = len(item)

print(f'The max text length is: {max_len}')

This is longer than BERT's max input of 512.  I wonder what the average length is for the dataset.

In [None]:
storage = 0
for item in ds['TEXT']:
    storage += len(item)

print(f'The average text length is: {storage/len(ds):.2f}')

That's a lot better for BERT.  What's the distribution of lengths?

In [None]:
length_list = []
for item in ds['TEXT']:
    length_list.append(len(item))

plt.hist(length_list);

The lengths are primarily smaller lengths.  There will be a lot of Padding then.

In [None]:
# Set values to categories.
ds[['labels', 'URL', 'EMAIL', 'PHONE']] = ds[['labels', 'URL', 'EMAIL', 'PHONE']].astype('category')
ds.info()

In [None]:
# Set category values to numeric values via codes
ds['labels'] = ds['labels'].cat.codes
ds['URL'] = ds['URL'].cat.codes
ds['EMAIL'] = ds['EMAIL'].cat.codes
ds['PHONE'] = ds['PHONE'].cat.codes
ds

In [None]:
# Split the dataset into training, validation, and test sets.
train_ds, temp_ds = train_test_split(ds[['labels','TEXT', 'URL', 'EMAIL', 'PHONE']], test_size=0.2, random_state=226)
val_ds, test_ds = train_test_split(temp_ds, test_size=0.3, random_state=226)


train_ds = train_ds.reset_index(drop=True)
val_ds = val_ds.reset_index(drop=True)
test_ds = test_ds.reset_index(drop=True)

print(f'Lengths of training: {len(train_ds)}')
print(f'Lengths of validation: {len(val_ds)}')
print(f'Lengths of test: {len(test_ds)}')

In [None]:
# Take another look at the dataset.
train_ds

How do the distributions between the train and test set look?

In [None]:
# Training distribution between ham, spam, smish
plt.hist(train_ds['labels']);

In [None]:
# Val distribution between ham, spam, smish
plt.hist(val_ds['labels']);

In [None]:
# Test distribution between ham, spam, smish
plt.hist(test_ds['labels'])

The distributions are similar.

In [None]:
from datasets import DatasetDict, Dataset # bring this in again to make sure I have the right one.

# Move the train, val, and test into datasets to then be moved to a DatasetDict
train_dataset = Dataset.from_pandas(train_ds)
val_datset = Dataset.from_pandas(val_ds)
test_dataset = Dataset.from_pandas(test_ds)

train_dataset

In [None]:
# Move data to the DatasetDict
data = DatasetDict({
    'train': train_dataset,
    'val': val_datset,
    'test': test_dataset
})
data

This matches the above histogram.

We now have all the sentence data mapped to five categorical labels.  We are now in a good place to begin tokenizing and fine-tuning the model.

### Tokenizing the Data

We can now tokenize all of the sentences by fine-tuning the BERT tokenizer.
Let's just run a quick tokenizer test to ensure we know how it works.

In [None]:
# A quick test to ensure the tokenizer is working as expected.
test = tokenizer('Hello there.')
print(f"Input_ids: {test['input_ids']}\nConversion back: {tokenizer.convert_ids_to_tokens(test['input_ids'])}")


In [None]:
print(f'Length of the tokenizer: {len(tokenizer)}\nCurrent word_embedding: {model.bert.embeddings.word_embeddings}')

Now let's tokenize everything with a function to use with map.

In [None]:
# Create a tokenizing function to apply via a map
def tokenize_the_data(dskey):
    # Set the max length to 512 as that is the BERT max.
    tokenized_data = tokenizer(dskey['TEXT'], padding='max_length', max_length=512, truncation=True,
                               return_tensors='pt', return_attention_mask=True )
    return tokenized_data

In [None]:
# Tokenize the data
tokenized_dataset = DatasetDict({
    'train': train_dataset.map(tokenize_the_data),
    'val': val_datset.map(tokenize_the_data),
    'test': test_dataset.map(tokenize_the_data)
})

In [None]:
# Check the tokenized_dataset for its layout
tokenized_dataset

In [None]:
# Check some tokenized dataset values to ensure it worked as intended.
text_holder = np.array(tokenized_dataset['train']['input_ids'][0]).flatten()
# text_holder = text_holder[0]
print(f"Input_ids: {tokenized_dataset['train']['input_ids'][0]}\nConversion back: {tokenizer.convert_ids_to_tokens(text_holder)}")

### Data Cleanup and Formatting for Torch

The tokenized dataset still has extra columns that are no longer needed.  While they should be automatically removed for the forward pass in the Trainer, I'll remove them manually here and save the Trainer the effort.

I'll also end up running into an issue if I don't `squeeze()` my data as the trainer is looking for shapes of (batch, seq_len).  Right now there is an extra dimension that needs to be taken out which I'll do here as well.

In [None]:
# Look at the dataset
tokenized_dataset

In [None]:
# Remove the unused features and reduce the dataset
tokenized_dataset_reduced = tokenized_dataset.remove_columns(['TEXT', 'URL', 'EMAIL', 'PHONE', 'token_type_ids'])
tokenized_dataset_reduced

In [None]:
# Does everything exist that should exist.
# I did drop token_type_ids which may be able to be passed along later... I'll have to get this working first.
assert 'input_ids' in tokenized_dataset_reduced['train'].column_names
assert 'attention_mask' in tokenized_dataset_reduced['train'].column_names
assert 'labels' in tokenized_dataset_reduced['train'].column_names

In [None]:
tokenized_dataset_reduced.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])

There is still a shaping issue that must be resolved for training.  this is because the tokenized_dataset_reduced['train']['input_ids'] is a list, rather than a Tensor.  I'll convert things here.

In [None]:
# Showing the poor shaping
print(f'Poor shape of input_ids:      {tokenized_dataset_reduced['train'][0]['input_ids'].shape}')
print(f'Poor shape of attention_mask: {tokenized_dataset_reduced['train'][0]['attention_mask'].shape}')
print(f'Poor shape of labels:         {tokenized_dataset_reduced['train'][0]['labels'].shape}')

In [None]:
# Funciton to squeeze my dimensions
def squeeze_dims(dskey):
    if 'input_ids' in dskey:
        dskey['input_ids'] = torch.squeeze(dskey['input_ids'])
    if 'attention_mask' in dskey:
        dskey['attention_mask'] = torch.squeeze(dskey['attention_mask'])
    if 'labels' in dskey:
        dskey['labels'] = torch.squeeze(dskey['labels'])

    return dskey

In [None]:
# Squeeze the dimensions.
tokenized_dataset_reduced = tokenized_dataset_reduced.map(squeeze_dims)

In [None]:
# Showing the good shaping
print(f'Good shape of input_ids:      {tokenized_dataset_reduced['train'][0]['input_ids'].shape}')
print(f'Good shape of attention_mask: {tokenized_dataset_reduced['train'][0]['attention_mask'].shape}')
print(f'Good shape of labels:         {tokenized_dataset_reduced['train'][0]['labels'].shape}')

In [None]:
# Check the first few labels and their types to double check they are integers for the categorization

print(tokenized_dataset_reduced['train']['labels'][:10])  # First 10 labels
print(type(tokenized_dataset_reduced['train']['labels'][0]))  # Type of the first label

We now have a working dataset to use with the models.

### Setting up an Undersampled Dataset

There is the change that given the heavy distribution of the data to ham values that there could be issues in the training of the models.  Here we'll set up a second dataset which is undersampled from the full dataset.

In [None]:
# Undersampling setups

from imblearn import under_sampling
from datasets import Dataset # Make sure I have the right Dataset
# Resample here

# Resample via undersampling up here

# I'll cut everything down to the m items which is just below the lowest class and still easily batchable
rus = under_sampling.RandomUnderSampler(random_state=226, replacement=False)

# Create a copy of the dataset and convert it to DataFrame
data_under = data.copy()
train_df = pd.DataFrame(data_under['train'])

# Ensure labels are in integer format
train_df['labels'] = train_df['labels'].astype(int)

# Separate features and labels
X = train_df.drop(columns=['labels'])
y = train_df['labels']

# Apply RandomUnderSampler
X_under, y_under = rus.fit_resample(X, y)

# Combine the resampled features and labels back into a DataFrame
train_resampled = X_under.copy()
train_resampled['labels'] = y_under

# Assign the resampled data back to the dataset
data_usample = data.copy()
data_usample['train'] = Dataset.from_pandas(train_resampled, preserve_index=False)

In [None]:
# Check the new DatasetDict
data_usample

In [None]:
# Make sure the undersampling is even
plt.hist(data_usample['train']['labels']);

Now we must go through all the data cleanup and formatting again.

In [None]:
# Rebuild the data for training with the undersampled set

# Tokenize the undersampled
tokenized_usample_dataset = DatasetDict({
    'train': data_usample['train'].map(tokenize_the_data),
    'validation': data_usample['val'].map(tokenize_the_data),
    'test': data_usample['test'].map(tokenize_the_data)
})

# Remove extra columns
tokenized_usample_dataset_reduced = tokenized_dataset.remove_columns(['TEXT', 'URL', 'EMAIL', 'PHONE', 'token_type_ids'])
tokenized_usample_dataset_reduced['train'] = tokenized_usample_dataset_reduced['train'].remove_columns([])
tokenized_usample_dataset_reduced

# Does everything exist that should exist.
# I did drop token_type_ids which may be able to be passed along later... I'll have to get this working first.
assert 'input_ids' in tokenized_usample_dataset_reduced['train'].column_names
assert 'attention_mask' in tokenized_usample_dataset_reduced['train'].column_names
assert 'labels' in tokenized_usample_dataset_reduced['train'].column_names

# Set the Tensor format
tokenized_usample_dataset_reduced.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])

# Showing the poor shaping
print(f'Poor shape of input_ids:      {tokenized_usample_dataset_reduced['train'][0]['input_ids'].shape}')
print(f'Poor shape of attention_mask: {tokenized_usample_dataset_reduced['train'][0]['attention_mask'].shape}')
print(f'Poor shape of labels:         {tokenized_usample_dataset_reduced['train'][0]['labels'].shape}')

tokenized_usample_dataset_reduced = tokenized_usample_dataset_reduced.map(squeeze_dims)

print(f'Good shape of input_ids:      {tokenized_usample_dataset_reduced['train'][0]['input_ids'].shape}')
print(f'Good shape of attention_mask: {tokenized_usample_dataset_reduced['train'][0]['attention_mask'].shape}')
print(f'Good shape of labels:         {tokenized_usample_dataset_reduced['train'][0]['labels'].shape}\n')

# Check the labels
# Check the first few labels and their types to double check they are integers for the categorization
print(tokenized_usample_dataset_reduced['train']['labels'][:10])  # First 10 labels
print(type(tokenized_usample_dataset_reduced['train']['labels'][0]))  # Type of the first label

Now we have an undersampled dataset to use as well.

## Methodology

1. Explan your Deep Learning process / methodology

Our approach to classify ham, spam, and smishing messages using neural networks involves the following stages:
1.	Text Preprocessing: The messages will undergo tokenization, stopword removal, and lemmatization. Additionally, URL features will be extracted and processed separately.

2.	Using Traditional Models:
o	We will use TensorFlow to build out traditional models such Logistic Regression and SVM, LSTM.  These baseline models will be used to classify messages based on TF-IDF features.

3.	Advanced Models:
- BERT: We will leverage a fine-tuned BERT model for SMS classification, leveraging its ability to capture contextual relationships even in short texts.  This BERT model uses a specific tokenizer on the full, unedited dataset.
- `RoBERTa: As a more robust version of BERT, RoBERTa will be used for text classification, aiming to improve the precision of phishing detection.` - **REMOVE THIS??**
- LLMs accessed via API: We will experiment with an LLM accessed via API for message generation to simulate phishing SMS scenarios and evaluate how well it detects sophisticated phishing attacks.  Some models we plan on accessing are Google’s Gemini and OpenAI’s Chat GPT-4.
`4.	Ensemble Method: We will explore ensemble learning to combine the predictions from different models, including both textual and URL-based classifiers.` - **REMOVE THIS??**


2. Introduce the Deep Neural Networks you used in your project
 * RNN - [type?]
    * Description 
 
 * BERT
    * BERT is a transformer model that learns bidirectional representations by predicting masked words and sentence relationships, enabling strong performance across various NLP tasks.
 
 * LLM
     * Description 
 
 
**Keywords:** natural language processing, recurrent neural netowrks, transformers, sentiment analysis, multi-label classification, prediction, large language models

___

**Example**
* ConvNet
    * A convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery(source Wikipedia). 
 
* **Keywords:** supervised learning, classification, ...

### Model Fitting and Validation

1. model 1 
    - decription 
2. model 2
    - decription 

### RNN

In [None]:
## Your code begins here

#### Model Evaluation

* Examine your models (coefficients, parameters, errors, etc...)

* Compute and interpret your results in terms of accuracy, precision, recall, ROC etc.

In [None]:
## Your code begins here


### BERT - Full Dataset

Now that we have tokenized data, let us fine-tune the BERT model.

This is done with the Trainer class which takes TrainingArguments.

In [None]:
# Set up the Training Args
training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=10, # Prior runs show that I may be over fitting the data at 25 epochs...
    weight_decay=0.01,
    logging_dir='./logs',
    # logging_steps=10, # This made my loss vs epoch plot too noisy...
    logging_strategy='epoch',
    save_strategy='epoch',
)

# Set up the Trainer

data_collator = DataCollatorWithPadding(tokenizer=tokenizer, padding=True, return_tensors='pt')

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset_reduced['train'],
    eval_dataset=tokenized_dataset_reduced['test'],
    tokenizer=tokenizer,
    data_collator=data_collator
)

In [None]:
# Run the Trainer
trainer.train()

#### Model Evaluation

* Examine your models (coefficients, parameters, errors, etc...)

* Compute and interpret your results in terms of accuracy, precision, recall, ROC etc. 

In [None]:
# Check the training and validation losses
# Grab the losses from the trainer's log history.
training_losses = [log['loss'] for log in trainer.state.log_history if 'loss' in log]
evaluation_losses = [log['eval_loss'] for log in trainer.state.log_history if 'eval_loss' in log and 'epoch' in log]

# Plot the losses per epoch.
plt.plot(range(0,len(training_losses)), training_losses, label='train')
plt.plot(range(0, len(training_losses)), evaluation_losses, label='eval')
plt.legend()
plt.title('BERT Training and Validation Loss per epoch on Dataset');

At late epochs, we begin to see some signs of overfitting but they are not drastic yet.

In [None]:
# Generate predictions from the test set
preds = trainer.predict(tokenized_dataset_reduced['test'])

In [None]:
# Grab the logits
preds_logits = preds.predictions
print(f'Example logit: {preds_logits[0]}')

# Which label does the logit correspond to, use proper axis
preds_labels = np.argmax(preds_logits, axis=1)
print(f'Class Label: {preds_labels[0]}')

# Get the actual labels from the test set
true_labels = preds.label_ids
print(f'True Label: {true_labels[0]}')

In [None]:
# Run some metrics on the classification of the model
class_report = metrics.classification_report(true_labels, preds_labels,
                                             target_names=['C0', 'C1', 'C2'])

print(f"Classification Report:\n", class_report)

In [None]:
# Confusion matrix to see where things are going sideways
cm = metrics.confusion_matrix(true_labels, preds_labels)
disp = metrics.ConfusionMatrixDisplay(cm)
disp.plot()
plt.title("Confusion Matrix for BERT Dataset")
plt.show();

In [None]:
# Check the distribution of the predictions as well
# Overlay the histograms

plt.hist(preds_labels, bins=3, color='blue', alpha=0.5, label='Predictions')
plt.hist(true_labels, bins=3, color='orange', alpha=0.5, label='Truth')
plt.legend()
plt.xlabel('Category')
plt.ylabel('Count')
plt.title('Prediction and Truth Histograms from BERT on Dataset')
plt.show();

In [None]:
# ROC CURVE


### BERT - Undersampled Dataset

In [None]:
# Build another BERT model for training
model_usamp = BertForSequenceClassification.from_pretrained(BERT_model_name, num_labels=3).to(device)

In [None]:
# Set up the Training Args
training_args_usamp = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=10, # Prior runs show that I may be over fitting the data at 25 epochs...
    weight_decay=0.01,
    logging_dir='./logs',
    # logging_steps=10, # This made my loss vs epoch plot too noisy...
    logging_strategy='epoch',
    save_strategy='epoch',
)

# Setup trainer
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, padding=True, return_tensors='pt')

trainer_usamp = Trainer(
    model=model_usamp,
    args=training_args_usamp,
    train_dataset=tokenized_dataset_reduced['train'],
    eval_dataset=tokenized_dataset_reduced['val'],
    tokenizer=tokenizer,
    data_collator=data_collator
)

In [None]:
# Run the trainer
trainer_usamp.train()

#### Model Evaluation

In [None]:
# Check the training and validation losses
# Grab the losses from the trainer's log history.
training_losses = [log['loss'] for log in trainer_usamp.state.log_history if 'loss' in log]
evaluation_losses = [log['eval_loss'] for log in trainer_usamp.state.log_history if 'eval_loss' in log and 'epoch' in log]

# Plot the losses per epoch.
plt.plot(range(0,len(training_losses)), training_losses, label='train')
plt.plot(range(0, len(training_losses)), evaluation_losses, label='eval')
plt.legend()
plt.title('BERT Training and Validation Loss per epoch on Undersampled Dataset');

In [None]:
# Generate predictions from the full test set
preds = trainer_usamp.predict(tokenized_dataset_reduced['test'])

In [None]:
# Grab the logits
preds_logits = preds.predictions
print(f'Example logit: {preds_logits[0]}')

# Which label does the logit correspond to, use proper axis
preds_labels = np.argmax(preds_logits, axis=1)
print(f'Class Label: {preds_labels[0]}')

# Get the actual labels from the test set
true_labels = preds.label_ids
print(f'True Label: {true_labels[0]}')

In [None]:
# Run some metrics on the classification of the model
class_report = metrics.classification_report(true_labels, preds_labels,
                                             target_names=['C0', 'C1', 'C2'])

print(f"Classification Report:\n", class_report)

In [None]:
# Confusion matrix to see where things are going sideways
cm = metrics.confusion_matrix(true_labels, preds_labels)
disp = metrics.ConfusionMatrixDisplay(cm)
disp.plot()
plt.title("Confusion Matrix for BERT Dataset")
plt.show();

In [None]:
# Check the distribution of the predictions as well
# Overlay the histograms

plt.hist(preds_labels, bins=3, color='blue', alpha=0.5, label='Predictions')
plt.hist(true_labels, bins=3, color='orange', alpha=0.5, label='Truth')
plt.legend()
plt.xlabel('Category')
plt.ylabel('Count')
plt.title('Prediction and Truth Histograms from BERT on Dataset')
plt.show();

In [None]:
# ROC CURVE


### LLM Calls

In [None]:
## Your code begins here

#### Model Evaluation 

* Examine your models (coefficients, parameters, errors, etc...)

* Compute and interpret your results in terms of accuracy, precision, recall, ROC etc. 

In [None]:
## Your code begins here


### Issues / Improvements
1. Dataset is very small
2. Use regularization / initialization
3. Use cross-validaiton
4. ...

###  References
   - Academic (if any)
   - Online (if any)
	

### Credits

- If you use and/or adapt your code from existing projects, you must provide links and acknowldge the authors. Keep in mind that all documents in your projects and code will be check against the official plagiarism detection tool used by Penn State ([Turnitin](https://turnitin.psu.edu))

> *This code is based on .... (if any)*

In [None]:
# End of Project