***

*Course:* [Math 535](https://people.math.wisc.edu/~roch/mmids/) - Mathematical Methods in Data Science (MMiDS)  
*Chapter:* 3-Optimization theory and algorithms   
*Author:* [Sebastien Roch](https://people.math.wisc.edu/~roch/), Department of Mathematics, University of Wisconsin-Madison  
*Updated:* May 30, 2024   
*Copyright:* &copy; 2024 Sebastien Roch

***

## Auto-quizzes

This notebook generates automated quizzes as well as the answers. Set the `seed` to any integer to produce unique quizzes.

In [None]:
# Python 3
import numpy as np
from numpy import linalg as LA
from numpy.random import default_rng
import torch

In [None]:
# Set the `seed` to any integer
seed=535

In [None]:
rng = default_rng(seed)

**AQ3.1**  

***

*Use the following code to generate the quiz questions. You should be able to answer them by hand -- that is, without the help of numerical computation.*

***

Consider the least-squares objective function

$$
f(\mathbf{x}) = \|A \mathbf{x} - \mathbf{b}\|_2^2,
$$

with the following matrix $A$:

In [None]:
A = np.sign(2 * rng.random(size=(3,2)) - 1)
i = rng.integers(low=1,high=3,size=1)
A[i,0] = 0
print(A)

and the following column vector $\mathbf{b}$:

In [None]:
b = np.zeros(3)
b[0] = -2
print(b)

(a) What is the rank of $A$?

(b) Compute the gradient and Hessian of $f$ at $\mathbf{x}^0 = \mathbf{0}$.

(c) Perform one step of gradient descent from $\mathbf{x}^0 = \mathbf{0}$ with stepsize $1/2$.

(d) Compute the stationary points of $f$.

***

*Use the following code to generate the answers.*

***

In [None]:
# (a)
LA.matrix_rank(A)

In [None]:
# (b) 
P = 2 * A.T @ A
q = - 2 * A.T @ b
r = LA.norm(b) ** 2
x0 = np.zeros(2)
gradient = P @ x0 + q
print(gradient)

In [None]:
hessian = P
print(hessian)

In [None]:
# (c)
stepsize = 1/2
x1 = x0 - stepsize * gradient
print(x1)

In [None]:
# (d) 
xstar = LA.solve(P, -q)
print(xstar)

$\lhd$

**AQ3.2**  

***

*Use the following code to generate the quiz questions. You should be able to answer them by hand -- that is, without the help of numerical computation.*

***

Consider the following function:

$$
f(\mathbf{x}) = x_1^{a_1} x_2^{a_2} x_3^{a_3} + x_4^{a_4} x_5^{a_5} x_6^{a_6},
$$

where $\mathbf{x} = (x_1,\ldots,x_6)$ and $\mathbf{a} = (a_1,\ldots,a_6)$ are column vectors. Note that $\mathbf{x}$ is the variable in $f$ while $\mathbf{a}$ is a fixed parameter vector defined by:

In [None]:
a_np = np.ones(6)
i = rng.integers(low=0,high=3,size=1)
j = rng.integers(low=3,high=6,size=1)
a_np[i] = 2 * rng.integers(low=1,high=4,size=1)
a_np[j] = 2 * rng.integers(low=1,high=4,size=1)
print(a_np)

Consider also the following column vector $\mathbf{p}$:

In [None]:
p = (-1) * np.ones(6)
print(p)

(a) Compute the gradient of $f$ at $\mathbf{x} = \mathbf{p}$.

(b) Compute the Hessian of $f$ at $\mathbf{x} = \mathbf{p}$.

(c) Perform one step of gradient descent from $\mathbf{x}^0 = \mathbf{p}$ with stepsize $1/2$.

(d) Let $h(z) = -\log z$ for $z \in \mathbb{R}$. Use the *Chain rule* to compute the gradient of $h(f(\mathbf{x}))$ at $\mathbf{x} = \mathbf{p}$.

***

*Use the following code to generate the answers.*

***

In [None]:
# Code by ChatGPT

# Convert Numpy array to PyTorch tensor
a = torch.tensor(a_np, dtype=torch.float, requires_grad=False)

# Define variables
x = torch.tensor([-1.0, -1.0, -1.0, -1.0, -1.0, -1.0], 
                 requires_grad=True)

# Define the function
f = x[0]**a[0] * x[1]**a[1] * x[2]**a[2] + x[3]**a[3] * x[4]**a[4] * x[5]**a[5]

# Compute the gradient
gradient = torch.autograd.grad(f, x, create_graph=True)[0]

# Prepare the Hessian matrix
hessian = torch.zeros((6, 6), dtype=torch.float)

# Compute the Hessian
for i in range(6):
    for j in range(6):
        # Compute the second derivative with respect to x[i] and x[j]
        if gradient[i].requires_grad:
            grad2_ij = torch.autograd.grad(gradient[i], x, 
                                           retain_graph=True)[0][j]
        else:
            # If the gradient doesn't require gradient (constant), the second derivative is zero
            grad2_ij = torch.tensor(0., dtype=torch.float)
        hessian[i, j] = grad2_ij

# Convert the gradient and Hessian from PyTorch tensors to Numpy arrays
gradient_np = gradient.detach().numpy()
hessian_np = hessian.detach().numpy()

In [None]:
# (a)
print(gradient_np)

In [None]:
# (b) 
print(hessian_np)

In [None]:
# (c)
x0 = p
stepsize = 1/2
x1 = x0 - stepsize * gradient_np
print(x1)

In [None]:
# (d) 
f = (-1) ** a_np[i] + (-1) ** a_np[j]
gradient_comp = - (1/f) * gradient_np
print(gradient_comp)

$\lhd$