# TensorFlow/Keras Approach for Binary Classification

This section demonstrates how to build, train, and evaluate a simple deep learning model for binary classification using TensorFlow and Keras.

In [14]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

# Generate a sample binary classification dataset with 1000 samples and 20 features
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build a simple deep learning model using Keras Sequential API
keras_binary_classifier = Sequential([
    Dense(32, activation='relu', input_shape=(X_train.shape[1],)),  # First hidden layer with 32 units and ReLU activation
    Dense(16, activation='relu'),                                   # Second hidden layer with 16 units and ReLU activation
    Dense(1, activation='sigmoid')                                  # Output layer with sigmoid activation for binary classification
])

# Compile the model with Adam optimizer and binary cross-entropy loss
keras_binary_classifier.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

# Train the model for 10 epochs with batch size 32, using 20% of training data for validation
keras_binary_classifier.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate the model on the test set and print the test accuracy
loss, accuracy = keras_binary_classifier.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy:.2f}")

Epoch 1/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.4250 - loss: 0.8410 - val_accuracy: 0.4750 - val_loss: 0.7658
Epoch 2/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.5359 - loss: 0.7068 - val_accuracy: 0.6000 - val_loss: 0.6762
Epoch 3/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.6578 - loss: 0.6305 - val_accuracy: 0.6875 - val_loss: 0.6170
Epoch 4/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.7500 - loss: 0.5720 - val_accuracy: 0.7563 - val_loss: 0.5640
Epoch 5/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.7969 - loss: 0.5203 - val_accuracy: 0.7937 - val_loss: 0.5167
Epoch 6/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8172 - loss: 0.4744 - val_accuracy: 0.8125 - val_loss: 0.4718
Epoch 7/10
[1m20/20[0m [32m━━━━━━━━━━

In [15]:
# The feature set X consists of 20 numerical features for each sample, as generated by make_classification.
# Each row in X represents a sample, and each column represents a feature.
# The features are synthetic and do not have specific real-world meanings, as they are randomly generated for binary classification tasks.
# Typically, in make_classification, some features are informative, some are redundant, and some are noise.
print(f"Shape of X: {X.shape}")
print("Example feature vector (first sample):")
print(X[0])

Shape of X: (1000, 20)
Example feature vector (first sample):
[-0.6693561  -1.49577819 -0.87076638  1.14183093  0.02160555  1.73062972
 -1.25169805  0.28930464  0.35716259 -0.19681112  0.82927369  0.15485045
 -0.21997009 -0.73913656  1.80201193  1.63460551 -0.93817985 -1.26733697
 -1.2763343   1.01664321]


In [25]:
import os

# Save the model's state_dict
torch.save(torch_binary_classifier.state_dict(), 'torch_binary_classifier.pth')

# Print the size of the saved file in bytes and MB
size_bytes = os.path.getsize('torch_binary_classifier.pth')
size_mb = size_bytes / (1024 * 1024)
print(f"Model file size: {size_bytes} bytes ({size_mb:.6f} MB)")


Model file size: 7720 bytes (0.007362 MB)


In [28]:
# Recreate the model architecture
recreated_model = TorchBinaryClassifier(X_test.shape[1])

# Load the saved state_dict
recreated_model.load_state_dict(torch.load('torch_binary_classifier.pth'))

# Set model to evaluation mode
recreated_model.eval()

# Run inference on a few test samples
# torch.no_grad() is a context manager that disables gradient calculation.
# This is useful during inference, as it reduces memory usage and speeds up computations.
with torch.no_grad():
    sample_inputs = X_test_tensor[:5]  # Select the first 5 test samples as input
    outputs = recreated_model(sample_inputs)  # Run the recreated model to get output probabilities
    predictions = (outputs > 0.5).float()  # Convert probabilities to binary predictions (threshold at 0.5)
    print("Predicted labels:", predictions.squeeze().numpy())  # Print predicted labels for the samples
    print("True labels:", y_test_tensor[:5].squeeze().numpy())  # Print true labels for the same samples

Predicted labels: [1. 1. 0. 1. 1.]
True labels: [1. 1. 1. 1. 1.]


  recreated_model.load_state_dict(torch.load('torch_binary_classifier.pth'))


In [29]:
# Print the state_dict of the recreated_model
state_dict = recreated_model.state_dict()
for param_tensor in state_dict:
    print(f"{param_tensor}: {state_dict[param_tensor].shape}")

print("\nExplanation:")
print("The state_dict contains all learnable parameters of the model (weights and biases) for each layer.")
print("- 'fc1.weight': Weights of the first fully connected (Linear) layer (shape: [32, 20])")
print("- 'fc1.bias': Biases of the first fully connected layer (shape: [32])")
print("- 'fc2.weight': Weights of the second fully connected layer (shape: [16, 32])")
print("- 'fc2.bias': Biases of the second fully connected layer (shape: [16])")
print("- 'fc3.weight': Weights of the output layer (shape: [1, 16])")
print("- 'fc3.bias': Bias of the output layer (shape: [1])")
print("Each 'weight' is a matrix that transforms inputs from the previous layer, and each 'bias' is a vector added to the output of the corresponding layer.")

fc1.weight: torch.Size([32, 20])
fc1.bias: torch.Size([32])
fc2.weight: torch.Size([16, 32])
fc2.bias: torch.Size([16])
fc3.weight: torch.Size([1, 16])
fc3.bias: torch.Size([1])

Explanation:
The state_dict contains all learnable parameters of the model (weights and biases) for each layer.
- 'fc1.weight': Weights of the first fully connected (Linear) layer (shape: [32, 20])
- 'fc1.bias': Biases of the first fully connected layer (shape: [32])
- 'fc2.weight': Weights of the second fully connected layer (shape: [16, 32])
- 'fc2.bias': Biases of the second fully connected layer (shape: [16])
- 'fc3.weight': Weights of the output layer (shape: [1, 16])
- 'fc3.bias': Bias of the output layer (shape: [1])
Each 'weight' is a matrix that transforms inputs from the previous layer, and each 'bias' is a vector added to the output of the corresponding layer.


In [30]:
for param_tensor in state_dict:
    print(f"{param_tensor}: {state_dict[param_tensor]}")

fc1.weight: tensor([[-4.5171e-05,  4.6646e-03, -9.8214e-02, -1.6041e-01, -2.9317e-01,
         -1.4571e-01,  1.2863e-01,  7.2176e-02, -1.8728e-01,  7.6353e-02,
         -1.6650e-02, -2.9129e-01,  1.2376e-02, -1.4162e-01,  1.5673e-01,
          2.6968e-01,  1.1151e-01,  2.4467e-02,  1.5969e-01,  2.1049e-01],
        [-2.5328e-01, -6.1873e-02, -5.1968e-02,  6.6241e-02, -1.4882e-02,
         -1.4669e-01,  3.6497e-03,  8.5239e-03,  8.1167e-02,  1.4268e-01,
          9.1777e-02,  8.7566e-02, -1.1516e-01, -2.7174e-02,  2.9731e-01,
         -2.4536e-01,  1.6482e-02,  1.3655e-01,  1.4651e-01,  5.5287e-02],
        [ 1.0175e-01, -2.8811e-01,  6.9915e-02, -8.3768e-02, -2.3513e-01,
          1.5969e-01,  7.6754e-02,  1.5865e-01, -3.5868e-02, -4.8863e-02,
          2.4992e-01,  6.0635e-02,  6.3158e-02,  1.5905e-02,  2.4044e-02,
         -1.1957e-01,  1.3696e-01, -8.9048e-02, -1.1089e-01,  4.7672e-02],
        [ 9.2829e-03, -7.9966e-02,  1.2839e-03, -1.4355e-01,  1.0766e-01,
          3.7560e-01, -

# Pytorch Approach for Binary classification


In [20]:
import torch
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

import torch.nn as nn
import torch.optim as optim

# Generate a sample binary classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

# Define a simple neural network
class TorchBinaryClassifier(nn.Module):  # Changed model name to unique
    def __init__(self, input_dim):
        super(TorchBinaryClassifier, self).__init__()
        self.fc1 = nn.Linear(input_dim, 32)
        self.fc2 = nn.Linear(32, 16)
        self.fc3 = nn.Linear(16, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        # Forward pass through the network
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.sigmoid(self.fc3(x))
        return x

# Instantiate the model
torch_binary_classifier = TorchBinaryClassifier(X_train.shape[1])

# Loss and optimizer
criterion = nn.BCELoss()  # Binary cross-entropy loss
optimizer = optim.Adam(torch_binary_classifier.parameters(), lr=0.001)

# Training loop
epochs = 10
batch_size = 32
for epoch in range(epochs):
    permutation = torch.randperm(X_train_tensor.size()[0])
    for i in range(0, X_train_tensor.size()[0], batch_size):
        indices = permutation[i:i+batch_size]
        batch_x, batch_y = X_train_tensor[indices], y_train_tensor[indices]
        
        optimizer.zero_grad()  # Zero the gradients
        outputs = torch_binary_classifier(batch_x)  # Forward pass
        loss = criterion(outputs, batch_y)  # Compute loss
        loss.backward()  # Backward pass
        optimizer.step()  # Update weights

# Evaluation
with torch.no_grad():
    outputs = torch_binary_classifier(X_test_tensor)  # Forward pass on test set
    predicted = (outputs > 0.5).float()  # Threshold predictions
    accuracy = (predicted.eq(y_test_tensor).sum().item()) / y_test_tensor.size(0)
    print(f"Test Accuracy: {accuracy:.2f}")

Test Accuracy: 0.84


# Custom Bespoke Approach for Binaary Classification

In [17]:
# Simple neural network from scratch (no frameworks) for binary classification

# Use X_train, y_train, X_test, y_test from previous cells

# Sigmoid activation function and its derivative
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_deriv(x):
    s = sigmoid(x)
    return s * (1 - s)

# Initialize parameters
np.random.seed(42)
input_dim = X_train.shape[1]
hidden1_dim = 32
hidden2_dim = 16
output_dim = 1

W1 = np.random.randn(input_dim, hidden1_dim) * 0.01
b1 = np.zeros((1, hidden1_dim))
W2 = np.random.randn(hidden1_dim, hidden2_dim) * 0.01
b2 = np.zeros((1, hidden2_dim))
W3 = np.random.randn(hidden2_dim, output_dim) * 0.01
b3 = np.zeros((1, output_dim))

lr = 0.01
epochs = 10
batch_size = 32

for epoch in range(epochs):
    permutation = np.random.permutation(X_train.shape[0])
    for i in range(0, X_train.shape[0], batch_size):
        idx = permutation[i:i+batch_size]
        Xb = X_train[idx]
        yb = y_train[idx].reshape(-1, 1)
        
        # Forward pass
        z1 = Xb @ W1 + b1
        a1 = sigmoid(z1)
        z2 = a1 @ W2 + b2
        a2 = sigmoid(z2)
        z3 = a2 @ W3 + b3
        a3 = sigmoid(z3)
        
        # Compute loss (binary cross-entropy)
        loss = -np.mean(yb * np.log(a3 + 1e-8) + (1 - yb) * np.log(1 - a3 + 1e-8))
        
        # Backward pass
        dz3 = a3 - yb
        dW3 = a2.T @ dz3 / batch_size
        db3 = np.sum(dz3, axis=0, keepdims=True) / batch_size
        
        da2 = dz3 @ W3.T
        dz2 = da2 * sigmoid_deriv(z2)
        dW2 = a1.T @ dz2 / batch_size
        db2 = np.sum(dz2, axis=0, keepdims=True) / batch_size
        
        da1 = dz2 @ W2.T
        dz1 = da1 * sigmoid_deriv(z1)
        dW1 = Xb.T @ dz1 / batch_size
        db1 = np.sum(dz1, axis=0, keepdims=True) / batch_size
        
        # Update weights
        W3 -= lr * dW3
        b3 -= lr * db3
        W2 -= lr * dW2
        b2 -= lr * db2
        W1 -= lr * dW1
        b1 -= lr * db1

    print(f"Epoch {epoch+1}/{epochs}, Loss: {loss:.4f}")

# Evaluate on test set
z1 = X_test @ W1 + b1
a1 = sigmoid(z1)
z2 = a1 @ W2 + b2
a2 = sigmoid(z2)
z3 = a2 @ W3 + b3
a3 = sigmoid(z3)
preds = (a3 > 0.5).astype(int)
accuracy = np.mean(preds.flatten() == y_test)
print(f"Test Accuracy: {accuracy:.2f}")

Epoch 1/10, Loss: 0.6929
Epoch 2/10, Loss: 0.6933
Epoch 3/10, Loss: 0.6928
Epoch 4/10, Loss: 0.6952
Epoch 5/10, Loss: 0.6954
Epoch 6/10, Loss: 0.6956
Epoch 7/10, Loss: 0.6924
Epoch 8/10, Loss: 0.6967
Epoch 9/10, Loss: 0.6954
Epoch 10/10, Loss: 0.6944
Test Accuracy: 0.47



### Assignment 1: Model Comparison
**Task:**  
Compare the test accuracy of the three binary classification approaches (Keras/TensorFlow, PyTorch, and NumPy-from-scratch).  
**Instructions:**  
- Collect the test accuracy from each approach.
- Present the results in a table.
- Briefly discuss which approach performed best and why you think that is.

---

### Assignment 2: Model Architecture Exploration
**Task:**  
Modify the neural network architectures in both the PyTorch and NumPy-from-scratch implementations.  
**Instructions:**  
- Change the number of hidden layers and/or the number of neurons per layer.
- Retrain the models and report the new test accuracies.
- Discuss how the changes affected performance.

---

### Assignment 3: Model Persistence and Inference
**Task:**  
Work with the saved PyTorch model.  
**Instructions:**  
- Load the saved model weights into a new model instance.
- Use the loaded model to make predictions on new data (e.g., `sample_inputs`).
- Compare the predictions to the true labels.

---

### Assignment 4: Feature Importance Investigation
**Task:**  
Investigate which features are most important for the classification task.  
**Instructions:**  
- Use the weights of the first layer in either the PyTorch or NumPy model.
- Identify which input features have the largest absolute weights.
- Discuss what this might mean about the data.

---

### Assignment 5: Custom Evaluation Metric
**Task:**  
Implement a custom evaluation metric (e.g., precision, recall, F1-score) for the predictions of any model.  
**Instructions:**  
- Write a function to compute the metric.
- Apply it to the predictions and true labels.
- Interpret the results.

