<a href="https://colab.research.google.com/github/Jlauf-MBAPMP/NewGitTest/blob/master/Neural_Networks_Labs_student.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Install Dependencies

Execute the code block below to ensure that you have all the necessary dependencies.

In [None]:
!pip install mplcursors ipywidgets

# Imports

This first part imports the necessary libraries we'll use throughout the tutorial. These libraries help us create data, build models, and visualize results.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import ipywidgets as widgets
from IPython.display import display, clear_output
import plotly.graph_objs as go
from plotly.subplots import make_subplots

np.random.seed(42)
torch.manual_seed(42)

# Introduction to Neural Networks

Neural networks are a class of machine learning models inspired by the human brain. They consist of interconnected nodes (neurons) organized in layers. In this lab, we'll focus on feed-forward neural networks and explore various security aspects related to them.

## Feed-Forward Neural Networks

A feed-forward neural network is the simplest form of artificial neural network. It consists of an input layer, one or more hidden layers, and an output layer. Information flows in one direction, from input to output.

Let's create a simple feed-forward neural network:

In [None]:
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        self.hidden = nn.Linear(input_size, hidden_size)
        self.output = nn.Linear(hidden_size, output_size)
        self.activation = nn.ReLU()

    def forward(self, x):
        x = self.activation(self.hidden(x))
        x = self.output(x)
        return x

# Create a simple dataset
X, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert to PyTorch tensors
X_train = torch.FloatTensor(X_train)
y_train = torch.LongTensor(y_train)
X_test = torch.FloatTensor(X_test)
y_test = torch.LongTensor(y_test)

# Create and train the model
model = SimpleNN(2, 10, 2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

for epoch in range(100):
    optimizer.zero_grad()
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()

print("Training completed")

## Backpropagation

Backpropagation is the algorithm used to train neural networks. It calculates the gradient of the loss function with respect to the network's weights, allowing us to update the weights and minimize the loss.

The code shows how backpropagation works with a

2 input, 3 hidden nodes, and 1 output node neural network:

![neural_network.png]()
```
Input Layer    Hidden Layer    Output Layer
   [x]             [h]            
   [x]    ---->    [h]    ---->   [y]
                   [h]          
```
![picture](https://drive.google.com/uc?id=1lgcl5EIpa6INh11xYPpo6M63J_GaK71j)


Forward Pass:

Information (data) enters through the input layer.
It travels through the hidden layer(s).
Finally, it reaches the output layer, giving a prediction.


Error Calculation:

We compare the prediction to the actual answer.
The difference is our error.


Backpropagation:

We send this error backwards through the network.
It's like water flowing backwards through pipes.
As it flows, it adjusts the strength (weights) of connections between neurons.


Weight Update:

Connections that contributed more to the error are adjusted more.
This process aims to minimize the error in future predictions.



Stochastic Gradient Descent (SGD):
Think of SGD as a hiker trying to find the lowest point in a hilly landscape:

The hiker (our algorithm) takes small steps downhill.
Sometimes they might go uphill briefly, but overall they're moving down.
"Stochastic" means we use random samples of data for each step, not all data at once.
This helps the hiker avoid getting stuck in small dips and find the true lowest point.

Now, let's look at a simplified Python code example:
  

In [None]:
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        # Initialize weights randomly
        self.W1 = np.random.randn(self.input_size, self.hidden_size)
        print("First layer weights:")
        print("Weights connected to the first input ", self.W1[0])
        print("Weights connected to the second input ", self.W1[1])
        print()
        self.W2 = np.random.randn(self.hidden_size, self.output_size)
        print("Second layer weights: ")
        print("Weight connected to the first top hidden node ", self.W2[0])
        print("Weights connected to the second middle hidden node ", self.W2[1])
        print("Weights connected to the third bottom hidden node ", self.W2[2])
        print()

    def forward(self, X):
        # Forward pass
        self.z1 = np.dot(X, self.W1)
        self.a1 = sigmoid(self.z1)
        self.final_node_output = np.dot(self.a1, self.W2)  #There is no sigmoid on the last layer
        return self.final_node_output

    def backward(self, X, y, y_hat):
        # Backward pass

        learning_rate = 0.1

        # Calculate error and delta for output layer
        # This is the partial derivative of the Error with respect to the output of the model (y_hat)
        self.Error = y - y_hat

        # The partial derivative of the Error with respect to W2 is `np.dot(self.a1.T, self.error)`
        # Multiply the partial of Error/y_hat * partial of y_hat/W2
        # Update W2 (weights between hidden and output layer)
        self.W2 += learning_rate * (self.a1.T @ self.Error)

        # The partial derivative of the Error with respect to the output of the middle layer (a1)
        # Multiply the partial of Error/y_hat * partial of y_hat/a1
        self.partial_of_yhat_to_a1 = np.dot(self.Error, self.W2.T)

        # The partial derivative of the Error with respect to the output of the sum operation of the middle layer (z1)
        # Multiply the partial of  Error/a1 * partial of a1/z1
        self.partial_a1_to_z1 = self.partial_of_yhat_to_a1 * sigmoid_derivative(self.a1)

        # The partial derivative of the Error with respect to W1 is `np.dot(X.T, self.delta_hidden)`
        # Multiply the partial of  Error/z1 * partial of z1/W1 to get partial of Error/W1
        # Update W1 (weights between input and hidden layer)
        self.W1 += learning_rate * np.dot(X.T, self.partial_a1_to_z1)

    def train(self, X, y, epochs):
        for _ in range(epochs):
            output = self.forward(X)
            self.backward(X, y, output)

# Example usage
nn = NeuralNetwork(2, 3, 1)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
nn.train(X, y, 10000)

# Test the trained network
print(nn.forward(np.array([0, 0])), "This should be close to 0")  # Should be close to 0
print(nn.forward(np.array([1, 1])), "This should be close to 0")  # Should be close to 0
print(nn.forward(np.array([0, 1])), "This should be close to 1")  # Should be close to 1
print(nn.forward(np.array([1, 0])), "This should be close to 1")  # Should be close to 1

# Decision Boundaries in Neural Networks

Let's create a spiral dataset and use a neural network to classify it. This will showcase the power of neural networks in learning complex, non-linear decision boundaries.



In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

# Generate spiral dataset
def generate_spiral_data(n_points, n_classes):
    X = np.zeros((n_points*n_classes, 2))
    y = np.zeros(n_points*n_classes, dtype='uint8')
    for class_idx in range(n_classes):
        ix = range(n_points*class_idx, n_points*(class_idx+1))
        r = np.linspace(0.0, 1, n_points)  # radius
        t = np.linspace(class_idx*4, (class_idx+1)*4, n_points) + np.random.randn(n_points)*0.2
        X[ix] = np.c_[r*np.sin(t*2.5), r*np.cos(t*2.5)]
        y[ix] = class_idx
    return X, y

X, y = generate_spiral_data(1000, 3)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.LongTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.LongTensor(y_test)

# Define a more complex neural network
class SpiralNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SpiralNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Create and train the model
model = SpiralNN(2, 100, 3)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
n_epochs = 1000
for epoch in range(n_epochs):
    optimizer.zero_grad()
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{n_epochs}], Loss: {loss.item():.4f}')

print("Training completed")

# Function to plot decision boundaries
def plot_decision_boundary(model, X, y):
    plt.figure(figsize=(10, 8))

    # Define the grid on which we will evaluate the classifier
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))

    # Evaluate the neural network on the grid
    Z = model(torch.FloatTensor(np.c_[xx.ravel(), yy.ravel()])).detach().numpy()
    Z = np.argmax(Z, axis=1).reshape(xx.shape)

    # Plot the decision boundary
    plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.RdYlBu)

    # Plot the spiral dataset
    scatter = plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu, edgecolor='black')

    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('Neural Network Decision Boundary for Spiral Data')

    # Add a color bar
    plt.colorbar(scatter)

    plt.show()

# Plot the decision boundary
plot_decision_boundary(model, X, y)

# Evaluate the model
model.eval()
with torch.no_grad():
    test_outputs = model(X_test_tensor)
    _, predicted = torch.max(test_outputs, 1)
    accuracy = (predicted == y_test_tensor).float().mean()
    print(f'Test Accuracy: {accuracy.item():.4f}')

"""
This visualization demonstrates the power of neural networks in learning complex, non-linear decision boundaries.

Key observations:
1. The spiral dataset consists of three intertwined classes, which is a challenging classification problem.
2. The decision boundary learned by the neural network closely follows the spiral pattern of the data.
3. The model effectively separates the three classes, as evidenced by the distinct color regions in the plot.
4. The high test accuracy shows that the model generalizes well to unseen data.

This example highlights why neural networks are powerful for complex pattern recognition tasks:
- They can learn intricate, non-linear decision boundaries.
- They can automatically extract relevant features from the raw input data.
- They can handle high-dimensional data and complex relationships between features.

These capabilities make neural networks suitable for a wide range of applications, from image and speech recognition to natural language processing and beyond.
"""

# Adversarial Examples in Neural Networks

Adversarial examples are inputs to machine learning models designed to cause the model to make a mistake. In the context of neural networks, these are often small perturbations to input data that can dramatically change the model's output.

Let's create a function to generate adversarial examples:

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Set random seed for reproducibility
torch.manual_seed(42)

# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

# Define a simple CNN model
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.relu(self.conv2(x))
        x = torch.max_pool2d(x, 2)
        x = torch.flatten(x, 1)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Initialize the model, loss function, and optimizer
model = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
def train_model(epochs=5):
    for epoch in range(epochs):
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            inputs, labels = (x.to(device) for x in data)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print(f'Epoch {epoch + 1}, Loss: {running_loss / len(trainloader):.3f}')

print("Training the model...")
# train_model()
print("Training completed.")

# Function to generate adversarial examples using FGSM
def fgsm_attack(image, epsilon, data_grad):
    sign_data_grad = data_grad.sign()
    perturbed_image = image + epsilon * sign_data_grad
    perturbed_image = torch.clamp(perturbed_image, -1, 1)
    return perturbed_image

# Function to test the model with adversarial examples
def test_adversarial(epsilon):
    model.eval()
    correct = 0
    adv_examples = []

    for data, target in trainloader:
        data = data.to(device)
        target = target.to(device)
        data.requires_grad = True
        output = model(data)
        init_pred = output.max(1, keepdim=True)[1]  # init_pred shape: [batch_size, 1]

        # Iterate over each sample in the batch
        for i in range(data.size(0)):
            if init_pred[i].item() != target[i].item():
                continue

            loss = criterion(output[i:i+1], target[i:i+1])  # Calculate loss for individual sample
            model.zero_grad()
            loss.backward(retain_graph=True)  # Retain graph for subsequent iterations

            data_grad = data.grad.data[i:i+1]  # Extract gradient for individual sample
            perturbed_data = fgsm_attack(data[i:i+1], epsilon, data_grad)
            output_perturbed = model(perturbed_data)
            final_pred = output_perturbed.max(1, keepdim=True)[1]

            if final_pred.item() == target[i].item():
                correct += 1
            else:
                adv_ex = perturbed_data.squeeze().detach().cpu().numpy()
                adv_examples.append((init_pred[i].item(), final_pred.item(), adv_ex))

            if len(adv_examples) == 5:
                break

        if len(adv_examples) == 5:
            break

    return correct, adv_examples

# Generate and display adversarial examples
epsilons = [0, .05, .1, .15, .2, .25, .3]

plt.figure(figsize=(16, 8))
for i in range(len(epsilons)):
    epsilon = epsilons[i]
    _, examples = test_adversarial(torch.tensor(epsilon, device=device))

    if len(examples) > 0:
        ex = examples[0]
        plt.subplot(2, 4, i + 1)
        plt.xticks([], [])
        plt.yticks([], [])
        plt.title(f"ε: {epsilon}")
        plt.imshow(ex[2], cmap="gray")
        plt.xlabel(f"{ex[0]} → {ex[1]}")

plt.tight_layout()
plt.show()

The numbers represent before prediction -> after perturbation prediction. The top left corner is blank because there are no adversarially modified examples.

#Data Poisoning

This interactive example demonstrates the concept of data poisoning in machine learning. Data poisoning is a technique where an attacker intentionally adds misleading data points to a dataset to manipulate the behavior of a machine learning model.
In this example, we have a simple binary classification problem with two classes: red and blue. The neural network tries to learn the boundary between these two classes. You can click anywhere on the plot to add new data points of either class. By strategically adding points, you can observe how the decision boundary of the neural network changes, potentially leading to misclassification in certain areas.

## Try to find the correct location (feature values) and use the minimum amount of points to switch points from one classification to the other. You can reset the chart by rerunning the code block that generates the chart.

In [None]:
!pip install plotly==5.5.0

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from ipywidgets import FloatSlider, Dropdown, Button, HBox, VBox, Output
from IPython.display import display, clear_output

# Generate initial dataset
np.random.seed(0)
X = np.random.randn(100, 2)
y = (X[:, 0] + X[:, 1] > 0).astype(int)

# Create and train the initial model
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
model = MLPClassifier(hidden_layer_sizes=(10,20), max_iter=1000)
model.fit(X_scaled, y)

# Create widgets
x_slider = FloatSlider(min=-3, max=3, step=0.1, description='X:')
y_slider = FloatSlider(min=-3, max=3, step=0.1, description='Y:')
label_dropdown = Dropdown(options=[('Red', 0), ('Blue', 1)], description='Label:')
add_button = Button(description="Add Point")
#out = Output()

def update_plot():
    plt.figure(figsize=(10, 8))
    #model = MLPClassifier(hidden_layer_sizes=(10,), max_iter=1000)
    #model.fit(X_scaled, y)
    # Plot decision boundary
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    #Z = model.predict_proba(scaler.transform(np.c_[xx.ravel(), yy.ravel()])).reshape(xx.shape)
    Z = model.predict_proba(scaler.transform(np.c_[xx.ravel(), yy.ravel()]))[:,1].reshape(xx.shape)
    plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.RdBu)
    plt.contour(xx, yy, Z, [0.5], linewidths=2, colors='white')

    # Plot data points
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu, edgecolor='black')

    plt.xlim(x_min, x_max)
    plt.ylim(y_min, y_max)
    plt.title("Data Poisoning Example")
    plt.xlabel("X")
    plt.ylabel("Y")

    #plt.colorbar(scatter)

    plt.show()

def add_point(b):
    global X, y, model, scaler, X_scaled # Add X_scaled to the global variables
    new_x = x_slider.value
    new_y = y_slider.value
    new_label = label_dropdown.value

    # Add new point to dataset
    X = np.vstack((X, [[new_x, new_y]]))
    y = np.append(y, new_label)

    # Retrain the model
    X_scaled = scaler.fit_transform(X)
    model = MLPClassifier(hidden_layer_sizes=(20, 10), max_iter=1000, random_state=42)
    model.fit(X_scaled, y)

    #clear_output(wait=True)
    update_plot()
    display(VBox([HBox([x_slider, y_slider]), HBox([label_dropdown, add_button])]))
    print(f"Added point at ({new_x:.2f}, {new_y:.2f}) with label {'Blue' if new_label == 0 else 'Red'}")

# Connect button to add_point function
add_button.on_click(add_point)

def on_button_clicked(b):
        add_point()

# Initial plot
update_plot()

# Display widgets
display(VBox([HBox([x_slider, y_slider]), HBox([label_dropdown, add_button])]))

The background colors in the plot represent the model's prediction probabilities for each point in the 2D space. Here's what the colors symbolize:
<br/>
Dark Blue: Areas where the model is very confident (high probability) that points belong to the "Blue" class (label 0).

Light Blue: Areas where the model predicts the "Blue" class, but with less confidence (lower probability).

White or Very Light Colors: Areas near the decision boundary where the model is uncertain (probabilities close to 0.5 for both classes).

Light Red: Areas where the model predicts the "Red" class, but with less confidence.

Dark Red: Areas where the model is very confident that points belong to the "Red" class (label 1).

The color gradient from blue to red represents the continuous probability output of the neural network:

0 (Dark Blue) → 0.5 (White/Very Light) → 1 (Dark Red)

This visualization helps to show:

The decision boundary: Where the colors transition from blue to red (often visible as a white or very light colored region).
The model's confidence: Darker colors indicate higher confidence in the prediction for that region.
Areas of uncertainty: Lighter colors, especially near the boundary between blue and red, show where the model is less certain.

When you add new points, you should see these color regions shift, demonstrating how the model's predictions change across the entire space due to the new data.

# Membership Inference Attack (Model Stealing)

This example shows how you can clone a model by querying the targeted model many times to get your training data from the targeted model and use that to train the cloned model.





In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Generate a synthetic dataset
X, y = make_moons(n_samples=1000, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert data to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.LongTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.LongTensor(y_test)

# Define a simple neural network
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        self.hidden = nn.Linear(input_size, hidden_size)
        self.output = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = torch.relu(self.hidden(x))
        x = self.output(x)
        return x

# Create and train the "black-box" model
black_box_model = SimpleNN(2, 10, 2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(black_box_model.parameters(), lr=0.01)

for epoch in range(1000):
    outputs = black_box_model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

black_box_model.eval()

# Function to query the black-box model
def query_black_box(X):
    with torch.no_grad():
        outputs = black_box_model(torch.FloatTensor(X))
        _, predicted = torch.max(outputs, 1)
    return predicted.numpy()

# Generate a dataset for the attack by querying the black-box model
# This simulates querying an API for a model and getting the label with different input feature values
X_attack = np.random.rand(5000, 2) * 4 - 2  # Generate random points in the feature space
y_attack = query_black_box(X_attack)

# Convert attack data to PyTorch tensors
X_attack_tensor = torch.FloatTensor(X_attack)
y_attack_tensor = torch.LongTensor(y_attack)

# Create and train the "attack" model
attack_model = SimpleNN(2, 10, 2)
attack_optimizer = optim.Adam(attack_model.parameters(), lr=0.01)

for epoch in range(1000):
    outputs = attack_model(X_attack_tensor)
    loss = criterion(outputs, y_attack_tensor)
    attack_optimizer.zero_grad()
    loss.backward()
    attack_optimizer.step()

attack_model.eval()

# Evaluate both models
def evaluate_model(model, X, y):
    with torch.no_grad():
        outputs = model(X)
        _, predicted = torch.max(outputs, 1)
        accuracy = (predicted == y).float().mean()
    return accuracy.item()

print(f"Black-box model accuracy: {evaluate_model(black_box_model, X_test_tensor, y_test_tensor):.4f}")
print(f"Attack model accuracy: {evaluate_model(attack_model, X_test_tensor, y_test_tensor):.4f}")

# Visualize the results
def plot_decision_boundary(ax, model, X, y, title):
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100))
    Z = query_black_box(np.c_[xx.ravel(), yy.ravel()]) if model == black_box_model else model(torch.FloatTensor(np.c_[xx.ravel(), yy.ravel()])).argmax(1).detach().numpy()
    Z = Z.reshape(xx.shape)
    ax.contourf(xx, yy, Z, alpha=0.4, cmap='RdYlBu')
    ax.scatter(X[:, 0], X[:, 1], c=y, cmap='RdYlBu', edgecolor='black')
    ax.set_title(title)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
plot_decision_boundary(ax1, black_box_model, X_test, y_test, "Original Black-box Model")
plot_decision_boundary(ax2, attack_model, X_test, y_test, "Extracted Attack Model")
plt.tight_layout()
plt.show()

As you see here you were able to create a copy of the target model that has very similar decision boundaries.

#Training Set Data Leakage



#Training Dataset Leakage/Memorization in Neural Networks

Training dataset leakage, also known as data memorization, occurs when a neural network learns to recognize specific examples from its training set rather than generalizing patterns. This can lead to abnormally high confidence levels for inputs similar to or identical to those in the training set, while performing poorly on truly novel data.
<br/><br/>

Causes:
<br/>

**Overtraining:** Training the model for too many epochs can cause it to memorize the training data instead of learning general patterns.

**Long-tail distribution: When the training data contains rare examples, the model may memorize these instances rather than learning to generalize.

**Insufficient regularization:** Lack of proper regularization techniques can allow the model to fit noise in the training data.

**Limited dataset size:** Smaller datasets increase the risk of memorization as the model has fewer examples to learn from.

**High model capacity:** Models with excessive parameters relative to the dataset size are more prone to memorization.

Here's a sample code that demonstrates how to identify inputs that correlate to training data by examining confidence levels:

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import matplotlib.pyplot as plt

# Create a simple dataset
np.random.seed(42)
X = np.random.rand(1000, 2)  # 1000 points with 2 features
y = (X[:, 0] + X[:, 1] > 1).astype(int)  # Simple rule: sum of features > 1

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"X_train shape: {X_train}")


# Create a model prone to memorization
model = Sequential([
    Dense(64, activation='relu', input_shape=(2,)),
    Dense(64, activation='relu'),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model for many epochs to encourage memorization
history = model.fit(X_train, y_train, epochs=1000, validation_split=0.2, verbose=0)

# Function to plot predictions
def plot_predictions(X, y, title):
    plt.figure(figsize=(10, 8))
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', alpha=0.7)
    plt.colorbar(label='Prediction Confidence')
    plt.title(title)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.show()

# Predict on training and test data
train_pred = model.predict(X_train).flatten()
test_pred = model.predict(X_test).flatten()

# Plot predictions
plot_predictions(X_train, train_pred, "Training Data Predictions")
plot_predictions(X_test, test_pred, "Test Data Predictions")

# Print some statistics
print(f"Average confidence on training data: {np.mean(train_pred):.4f}")
print(f"Average confidence on test data: {np.mean(test_pred):.4f}")
print(f"Percentage of high-confidence (>0.9) predictions on training data: {np.mean(train_pred > 0.9):.2%}")
print(f"Percentage of high-confidence (>0.9) predictions on test data: {np.mean(test_pred > 0.9):.2%}")

# Function to test user inputs
def test_user_input():
    while True:
        try:
            x1 = float(input("Enter value for feature 1 (between 0 and 1): "))
            x2 = float(input("Enter value for feature 2 (between 0 and 1): "))
            if 0 <= x1 <= 1 and 0 <= x2 <= 1:
                break
            else:
                print("Please enter values between 0 and 1.")
        except ValueError:
            print("Invalid input. Please enter numeric values.")

    user_input = np.array([[x1, x2]])
    prediction = model.predict(user_input)[0, 0]
    print(f"Model prediction: {prediction:.4f}")
    print(f"Actual class: {int(x1 + x2 > 1)}")

# Test user inputs
print("\nTest your own inputs:")
for _ in range(3):
    test_user_input()
    print()
