# IS4487 Week 12 - Practice Code

This notebook is designed to help you follow along with the **Week 12 Reading** on advanced data modeling techniques.
It includes practical code examples for the three machine learning models discussed:

- **Naïve Bayes Classifier** – For spam email detection  
- **Support Vector Machine (SVM)** – For fraud detection  
- **Neural Network** – For fraud detection using deep learning

Each section contains short explanations and annotated code that reflect the steps in the reading.

<a href="https://colab.research.google.com/github/Stan-Pugsley/is_4487_base/blob/main/Demos/demo_12_bayes_svm_neural.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



### Naïve Bayes: A Probabilistic Classifier

Naïve Bayes is a simple and efficient supervised machine learning algorithm based on Bayes' Theorem.
It's especially effective in text classification problems like spam filtering, where it calculates the probability
of a class based on feature presence.

#### Context: Spam Email Detection
In this example, we use a small set of email text samples. Each email is labeled as either:
- `1` for **Spam** (e.g., containing promotional keywords like "win", "free", "prize")
- `0` for **Not Spam** (e.g., work-related messages like "project", "meeting")

The model learns the patterns of words that typically appear in spam vs. non-spam emails and makes predictions accordingly.


In [None]:
# Import required libraries for text processing and Naïve Bayes
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Sample email dataset and their corresponding labels (1 = Spam, 0 = Not Spam)
emails = [
    "Win a free lottery ticket now",
    "Limited offer! Claim your free prize today",
    "Urgent! Update your bank account details",
    "Meeting tomorrow at 10 AM",
    "Project deadline extended to next week",
    "Hey, can you send me the report?",
    "Congratulations! You won a vacation trip",
    "Special discount just for you, act fast!",
    "Let's schedule a call for the project discussion",
    "Can we reschedule our meeting to next Monday?"
]
labels = [1, 1, 1, 0, 0, 0, 1, 1, 0, 0]

Prepare Data

In [None]:
# Convert text into numeric vectors using Bag of Words (CountVectorizer)
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)

# Split the dataset into training and testing sets (70/30 split)
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=42)

Create Model

In [None]:
# Initialize and train a Multinomial Naïve Bayes classifier
model = MultinomialNB()
model.fit(X_train, y_train)



Evaluate Model

In [None]:
# Predict on the test set and evaluate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Print test results and model performance
print("=== Spam Email Classifier Results ===\n")
for email, pred in zip(vectorizer.inverse_transform(X_test), y_pred):
    print(f"Email Words: {email} --> Predicted as {'Spam' if pred == 1 else 'Not Spam'}")

print("\nModel Accuracy:", round(accuracy * 100, 2), "%")

### Support Vector Machine (SVM): Finding the Optimal Hyperplane

SVM is a powerful classification algorithm that works by finding a hyperplane that best separates the classes.
It performs well on high-dimensional data and can use kernel functions to handle non-linear boundaries.

#### Context: Fraud Detection (Structured Data)
This example uses a synthetic dataset of 1,000 credit card transactions. Each transaction is labeled as:
- `0` for **Legitimate** transactions (lower amounts, moderate frequency, older accounts)
- `1` for **Fraudulent** transactions (high amounts, frequent activity, newly opened accounts)

SVM tries to distinguish between the two classes based on three features: transaction amount, frequency, and account age.


In [None]:
# Import required libraries for data generation, visualization, and modeling
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Generate synthetic dataset for fraud detection
np.random.seed(42)
legit_transactions = np.random.normal(loc=[50, 5, 24], scale=[15, 2, 12], size=(900, 3))
fraud_transactions = np.random.normal(loc=[500, 20, 3], scale=[100, 5, 1], size=(100, 3))
X = np.vstack((legit_transactions, fraud_transactions))
y = np.array([0]*900 + [1]*100)

Preapare Data

In [None]:
# Split dataset and normalize features
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


Create Model

In [None]:
# Train the SVM with RBF kernel
svm = SVC(kernel='rbf', gamma='scale', C=1.0, random_state=42)
svm.fit(X_train, y_train)

Evaluate Model

In [None]:
# Predict and evaluate
y_pred = svm.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

# Plot the confusion matrix
plt.figure(figsize=(5, 4))
sns.heatmap(conf_matrix, annot=True, cmap="Blues", fmt="d", xticklabels=["Legit", "Fraud"], yticklabels=["Legit", "Fraud"])
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix - SVM Fraud Detection")
plt.show()

# Print classification report and accuracy
print(classification_report(y_test, y_pred, target_names=["Legit", "Fraud"]))
print(f"Model Accuracy: {round(accuracy * 100, 2)}%")

### Neural Networks: Learning Complex Patterns

Neural Networks consist of layers of interconnected nodes (neurons) that transform inputs through weighted connections.
They are highly flexible and powerful for detecting complex relationships, especially in structured and unstructured data.

#### Context: Fraud Detection Using Deep Learning
Using the same synthetic dataset as the SVM example, this neural network attempts to learn patterns that indicate fraudulent transactions.
Neural networks are especially useful when the relationship between input features and outcomes is non-linear or more complex.


In [None]:
# Import required TensorFlow modules for neural networks
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Reuse existing dataset and normalize again for neural network
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Create Model

In [None]:
# Build the neural network model with one hidden layer
model = Sequential([
    Dense(16, activation='relu', input_shape=(3,)),  # Input layer with 3 features
    Dense(8, activation='relu'),                     # Hidden layer
    Dense(1, activation='sigmoid')                   # Output layer (binary classification)
])

# Compile the model with optimizer and loss function
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model and store training history
history = model.fit(X_train, y_train, epochs=50, batch_size=16, validation_data=(X_test, y_test), verbose=1)

Evaluate Model

In [None]:
# Evaluate performance
y_pred = (model.predict(X_test) > 0.5).astype(int)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

# Plot the confusion matrix
plt.figure(figsize=(5, 4))
sns.heatmap(conf_matrix, annot=True, cmap="Blues", fmt="d", xticklabels=["Legit", "Fraud"], yticklabels=["Legit", "Fraud"])
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix - Neural Network Fraud Detection")
plt.show()

# Display evaluation metrics and learning curves
print(classification_report(y_test, y_pred, target_names=["Legit", "Fraud"]))
print(f"Model Accuracy: {round(accuracy * 100, 2)}%")

# Plot training/validation loss and accuracy over epochs
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training & Validation Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Training & Validation Accuracy')
plt.legend()
plt.show()