In our previous session, we delved into the topic of supervised learning.

To formalize this concept, consider that we have an input tensor, denoted as $\mathbf{x}$, and an output tensor, denoted as $\mathbf{y}$.

There exists a mathematical function $h: \mathbf{X} → \mathbf{Y}$ that maps the input to the output.

Our model includes parameters, represented by $\boldsymbol{\phi}$, which are currently unknown.

The selection of these parameters establishes the specific relationship between the input and output. This relationship can be expressed as follows:
$$\mathbf{y} = h\left[\mathbf{x}, \boldsymbol{\phi}\right]$$

The process of learning or training a model refers to the determination of the parameters $\boldsymbol{\phi}$ using the training data, which consists of pairs of inputs and outputs like so:
$$(x_1, y_1)$$
$$(x_2, y_2)$$
$$\vdots$$
$$(x_m, y_m)$$

To simplify our discussion, let's concentrate on a single variable or one-dimensional scenario for the time being.

In [None]:
import matplotlib.pyplot as plt
import tensorflow as tf

def plot_regression(x, y, dpi=150, color='red'):
    plt.figure(figsize=(5, 5), dpi=dpi)
    plt.scatter(x, y, color=color)
    plt.xlim(0, 2)
    plt.ylim(0, 2)
    plt.xticks(tf.range(0, 2.2, 0.2))
    plt.yticks(tf.range(0, 2.2, 0.2))
    plt.xlabel('Input, x')
    plt.ylabel('Output, y')
    plt.title('1D Linear Regression')
    plt.show()

x = tf.constant([0.03, 0.19, 0.34, 0.46, 0.78, 0.81, 1.08, 1.18, 1.39, 1.60, 1.65, 1.90])
y = tf.constant([0.67, 0.85, 1.05, 1.0, 1.40, 1.5, 1.3, 1.54, 1.55, 1.68, 1.73, 1.6 ])

plot_regression(x, y)

A one-dimensional linear regression model illustrates the correlation between the input x and the output y as a linear equation:

$$y = \phi_0 + \phi_1 x$$

In [None]:
import matplotlib.pyplot as plt
import tensorflow as tf

def compute_output_1d(x, parameters):
    return parameters[0] + parameters[1] * x

def plot_regression(x, y, parameters, colors, labels, dpi=150):
    x_values = tf.linspace(0, 2, 400)
    plt.figure(figsize=(5, 5), dpi=dpi)

    for parameter, color, label in zip(parameters, colors, labels):
        y_values = compute_output_1d(x_values, parameter)
        plt.plot(x_values, y_values, label=label, color=color)

    plt.scatter(x, y, color='red')
    plt.xlim(0, 2)
    plt.ylim(0, 2)
    plt.xticks(tf.range(0, 2.2, 0.2))
    plt.yticks(tf.range(0, 2.2, 0.2))
    plt.xlabel('Input, x')
    plt.ylabel('Output, y')
    plt.legend()
    plt.show()

x = tf.constant([0.03, 0.19, 0.34, 0.46, 0.78, 0.81, 1.08, 1.18, 1.39, 1.60, 1.65, 1.90])
y = tf.constant([0.67, 0.85, 1.05, 1.0, 1.40, 1.5, 1.3, 1.54, 1.55, 1.68, 1.73, 1.6 ])

parameters = [tf.constant([1.2, -0.1], dtype=tf.float64), tf.constant([0.0, 1.0], dtype=tf.float64), tf.constant([1.0, -0.4], dtype=tf.float64)]
colors = ['black', 'orange', 'cyan']
labels = [r'$\phi_0=1.2, \phi_1=-0.1$', r'$\theta_0=0.0,\theta_1=1.0$', r'$\phi_0=1.0,\phi_1=-0.4$']

plot_regression(x, y, parameters, colors, labels)

We require a systematic method to determine which parameters $\boldsymbol{\phi}$ are superior to others.

For this purpose, we allocate a numerical score to each set of parameters that measures the discrepancy between the model and the data. This score is referred to as the loss; a smaller loss indicates a better match.

In [None]:
import matplotlib.pyplot as plt
import tensorflow as tf

def compute_output_1d(x, parameters):
    return parameters[0] + parameters[1] * x

def draw_lines_to_model(x, y, parameters, color='gray', linestyle='--'):
    for xi, yi in zip(x, y):
        y_on_line = compute_output_1d(xi, parameters)
        plt.plot([xi, xi], [yi, y_on_line], color=color, linestyle=linestyle)

def plot_regression(x, y, parameters, color, label, dpi=150):
    x_values = tf.linspace(0, 2, 400)
    plt.figure(figsize=(5, 5), dpi=dpi)

    y_values = compute_output_1d(x_values, parameters)
    plt.plot(x_values, y_values, label=label, color=color)

    plt.scatter(x, y, color='red')
    draw_lines_to_model(x, y, parameters)
    plt.xlim(0, 2)
    plt.ylim(0, 2)
    plt.xticks(tf.range(0, 2.2, 0.2))
    plt.yticks(tf.range(0, 2.2, 0.2))
    plt.xlabel('Input, x')
    plt.ylabel('Output, y')
    plt.legend()
    plt.show()

x = tf.constant([0.03, 0.19, 0.34, 0.46, 0.78, 0.81, 1.08, 1.18, 1.39, 1.60, 1.65, 1.90], dtype=tf.float64)
y = tf.constant([0.67, 0.85, 1.05, 1.0, 1.40, 1.5, 1.3, 1.54, 1.55, 1.68, 1.73, 1.6 ], dtype=tf.float64)

parameters = tf.constant([0.0, 1.0], dtype=tf.float64)
color = 'orange'
label = r'$\theta_0=0.0,\theta_1=1.0$'

plot_regression(x, y, parameters, color, label)

We can treat the loss as a function $L\left[\boldsymbol{\phi}\right]$ of these parameters. When we train the model, we are seeking parameters $\boldsymbol{\phi}$ that minimize this loss function:

$$\hat{\phi} = \underset{\boldsymbol{\phi}}{\mathrm{argmin}} \left[ L\left[\boldsymbol{\phi}\right] \right]
$$

For instance, in our single-variable scenario, we can formulate the loss function accordingly. This is referred to as the least-squares loss.

$$ \begin{align*}
L\left[\boldsymbol{\phi}\right] &= \sum_{i=1}^{m} (h\left[x_i, \boldsymbol{\phi}\right] - y_i)^2 \\
&= \sum_{i=1}^{m} (\phi_0 + \phi_1 x_i - y_i)^2
\end{align*}
$$


In [None]:
import tensorflow as tf

x = tf.constant([0.03, 0.19, 0.34, 0.46, 0.78, 0.81, 1.08, 1.18, 1.39, 1.60, 1.65, 1.90], dtype=tf.float64)
y = tf.constant([0.67, 0.85, 1.05, 1.0, 1.40, 1.5, 1.3, 1.54, 1.55, 1.68, 1.73, 1.6 ], dtype=tf.float64)

def compute_loss(x, y, phi0, phi1):
    # Compute the predicted values
    y_pred = phi0 + phi1 * x
    # Compute the squared differences
    squared_diffs = tf.square(y_pred - y)
    # Return the sum of the squared differences
    return tf.reduce_sum(squared_diffs)

compute_loss(x, y, 0.0, 1)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def compute_loss_grid(x, y, phi0_values, phi1_values, compute_loss):
    phi0_grid, phi1_grid = np.meshgrid(phi0_values, phi1_values)
    loss_grid = np.zeros_like(phi0_grid)
    for i in range(phi0_grid.shape[0]):
        for j in range(phi0_grid.shape[1]):
            loss_grid[i, j] = compute_loss(x, y, phi0_grid[i, j], phi1_grid[i, j])
    return phi0_grid, phi1_grid, loss_grid

def plot_loss_surface(phi0_grid, phi1_grid, loss_grid):
    fig = plt.figure(figsize=(10, 10), dpi=150)
    ax = fig.add_subplot(111, projection='3d')
    surf = ax.plot_surface(phi0_grid, phi1_grid, loss_grid, cmap='BrBG')

    ax.set_xlabel('Intercept, $\\phi_0$')
    ax.set_ylabel('Slope, $\\phi_1$')
    ax.set_zlabel('Loss $L(\\phi)$')
    fig.colorbar(surf, shrink=0.5, aspect=10)
    plt.show()

# Define the range of phi0 and phi1
phi0_values = np.linspace(0, 2, 100)
phi1_values = np.linspace(-1, 1, 100)

# Compute the loss for each combination of phi0 and phi1
phi0_grid, phi1_grid, loss_grid = compute_loss_grid(x, y, phi0_values, phi1_values, compute_loss)

# Plot the loss surface
plot_loss_surface(phi0_grid, phi1_grid, loss_grid)
