In supervised learning, one common approach is classification. This involves a model making decisions based on discrete values.

In this particular session, our focus will be on **binary classification**, a subtype of classification problems.

Binary classification deals with situations where the outcome can only be one of two possible options. Here are some examples:

- Identifying whether an image features a dog or a cat.
- Using a medical record to determine if a person's tumor is benign or malignant.
- Categorizing an email as either spam or not spam.

In our study of deep neural networks, we've learned that a network with $K$ layers has the following structure:

\begin{align*}
h_1 & = \text{ReLU}(\beta_0 + \mathbf{\Omega}_0x), \\
h_2 & = \text{ReLU}(\beta_1 + \mathbf{\Omega}_1h_1), \\
h_3 & = \text{ReLU}(\beta_2 + \mathbf{\Omega}_2h_2), \\
& \vdots \\
h_K & = \text{ReLU}(\beta_{K-1} + \mathbf{\Omega}_{K-1}h_{K-1}), \\
y & = \beta_K + \mathbf{\Omega}_K h_K.
\end{align*}

However, the output $y$ is a real number, which is not suitable for binary classification tasks.

So, how can we transform the value of $y$ to make it suitable for binary classification? This is where the **sigmoid** activation function comes into play. The sigmoid function, defined as

$$h(z) = \frac{1}{1 + e^{-z}}$$

can transform any real-valued number into a value between 0 and 1, making it ideal for binary classification tasks.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(z):
    """Sigmoid activation function."""
    return 1 / (1 + np.exp(-z))

def plot_sigmoid():
    """Plot the sigmoid activation function."""
    z = np.linspace(-10, 10, 100)
    h = sigmoid(z)

    plt.figure(figsize=(9, 6))
    plt.plot(z, h)
    plt.scatter([0], [0.5], color='red')  # Highlight the point (0, 0.5)
    plt.text(0.2, 0.5, 'h(0)=0.5', fontsize=12, verticalalignment='bottom')  # Annotate the point (0, 0.5)
    plt.title('Sigmoid Activation Function')
    plt.xlabel('z')
    plt.ylabel('h(z)')
    plt.show()

# Call the function to plot the sigmoid activation function
plot_sigmoid()

Let's see how we can use deep neural networks in action to distinguish between a large circle containing a smaller circle in 2D.

In [None]:
import numpy as np
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

def generate_data():
    """Generate a dataset of circles."""
    X, y = make_circles(n_samples=1_000, factor=0.3, noise=0.05, random_state=0)
    return X, y

def split_data(X, y):
    """Split the dataset into training and testing sets."""
    X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0)
    return X_train, X_test, y_train, y_test

def plot_data(X_train, y_train):
    """Plot the training data."""
    plt.figure(figsize=(8, 8))
    plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train)
    plt.ylabel("Feature #1")
    plt.xlabel("Feature #0")
    plt.title("Training data")
    plt.show()

# Generate and split the data
X, y = generate_data()
X_train, X_test, y_train, y_test = split_data(X, y)

# Plot the training data
plot_data(X_train, y_train)

Why do we use **test data** in addition to **training data**?

Consider this analogy: imagine you're enrolled in a college course.

The **training data** is akin to the exercises and sample questions you tackle throughout the semester to learn and understand the course material.

On the other hand, the **test data** is like the final exam that assesses your comprehension of the course content.

Just as the final exam is unseen during your study period to ensure an unbiased evaluation of your understanding, the model also shouldn't have access to the test data during its training phase.

This ensures an impartial assessment of the model's performance.

In [None]:
import tensorflow as tf

# Define the model
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(32, activation='relu', input_shape=(2,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.BinaryCrossentropy(), metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))

# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f'Test Accuracy: {accuracy*100:.2f}%')


In [None]:
model.summary()

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def create_grid(X):
    """Create a grid of points."""
    x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
    y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                         np.linspace(y_min, y_max, 100))
    return xx, yy

def predict_grid(model, xx, yy):
    """Use the model to predict the grid points."""
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    return Z

def plot_predictions(xx, yy, Z, X_test, y_test):
    """Plot the contour of predictions and the test data."""
    plt.figure(figsize=(8, 8))
    plt.contourf(xx, yy, Z, alpha=0.8)
    plt.contour(xx, yy, Z, levels=[0.5], colors='k')  # Add decision boundary
    plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, edgecolors='k')
    plt.ylabel("Feature #1")
    plt.xlabel("Feature #0")
    plt.title("Predictions for Test Data")
    plt.show()

# Create a grid of points
xx, yy = create_grid(X)

# Use the model to predict the grid points
Z = predict_grid(model, xx, yy)

# Plot the contour of predictions and the test data
plot_predictions(xx, yy, Z, X_test, y_test)
