***

*Course:* [Math 535](https://people.math.wisc.edu/~roch/mmids/) - Mathematical Methods in Data Science (MMiDS)  
*Chapter:* 6-Optimization theory and algorithms   
*Author:* [Sebastien Roch](https://people.math.wisc.edu/~roch/), Department of Mathematics, University of Wisconsin-Madison  
*Updated:* Jan 6, 2024   
*Copyright:* &copy; 2024 Sebastien Roch

***

## Auto-quizzes

This notebook generates automated quizzes as well as the answers. Set the `seed` to any integer to produce unique quizzes.

In [None]:
# Python 3
import numpy as np
from numpy import linalg as LA
from numpy.random import default_rng

In [None]:
# Set the `seed` to any integer
seed=535

In [None]:
rng = default_rng(seed)

**AQ6.1**  

***

*Use the following code to generate the quiz questions.*

***

Consider the least-squares objective function

$$
f(\mathbf{x}) = \|A \mathbf{x} - \mathbf{b}\|_2^2,
$$

with the following matrix $A$:

In [None]:
A = np.sign(2 * rng.random(size=(3,2)) - 1)
i = rng.integers(low=1,high=3,size=1)
A[i,0] = 0
print(A)

and the following column vector $\mathbf{b}$:

In [None]:
b = np.zeros(3)
b[0] = -2
print(b)

(a) What is the rank of $A$?

(b) Compute the gradient and Hessian of $f$ at $\mathbf{x}^0 = \mathbf{0}$.

(c) Perform one step of gradient descent from $\mathbf{x}^0 = \mathbf{0}$ with stepsize $1/2$.

(d) Compute the stationary points of $f$.

***

*Use the following code to generate the answers.*

***

In [None]:
# (a)
LA.matrix_rank(A)

In [None]:
# (b) 
P = 2 * A.T @ A
q = - 2 * A.T @ b
r = LA.norm(b) ** 2
x0 = np.zeros(2)
gradient = P @ x0 + q
print(gradient)

In [None]:
hessian = P
print(hessian)

In [None]:
# (c)
stepsize = 1/2
x1 = x0 - stepsize * gradient
print(x1)

In [None]:
# (d) 
xstar = LA.solve(P, -q)
print(xstar)

$\lhd$

**AQ6.2**  

***

*Use the following code to generate the quiz questions.*

***

Consider the following function:

$$
f(\mathbf{x}) = x_1^{a_1} x_2^{a_2} x_3^{a_3} + x_4^{a_4} x_5^{a_5} x_6^{a_6},
$$

where $\mathbf{x} = (x_1,\ldots,x_6)$ and $\mathbf{a} = (a_1,\ldots,a_6)$ are column vectors. Note that $\mathbf{x}$ is the variable in $f$ while $\mathbf{a}$ is a fixed parameter vector defined by:

In [None]:
a = np.ones(6)
i = rng.integers(low=0,high=3,size=1)
j = rng.integers(low=3,high=6,size=1)
a[i] = 2 * rng.integers(low=1,high=4,size=1)
a[j] = 2 * rng.integers(low=1,high=4,size=1)
print(a)

Consider also the following column vector $\mathbf{p}$:

In [None]:
p = (-1) * np.ones(6)
print(p)

(a) Compute the gradient of $f$ at $\mathbf{x} = \mathbf{p}$.

(b) Compute the Hessian of $f$ at $\mathbf{x} = \mathbf{p}$.

(c) Perform one step of gradient descent from $\mathbf{x}^0 = \mathbf{p}$ with stepsize $1/2$.

(d) Let $h(z) = -\log z$ for $z \in \mathbb{R}$. Use the *Chain rule* to compute the gradient of $h(f(\mathbf{x}))$ at $\mathbf{x} = \mathbf{p}$.

***

*Use the following code to generate the answers.*

***

In [None]:
# (a)
gradient = np.zeros(6)
for k in range(3):
    if (k == i):
        gradient[k] = a[k] * ((-1) ** (a[k]-1))
    else:
        gradient[k] = - ((-1) ** a[i])
for k in range(3,6):
    if (k == j):
        gradient[k] = a[k] * ((-1) ** (a[k]-1))
    else:
        gradient[k] = - ((-1) ** a[j])
print(gradient)

In [None]:
# (b) 
hessian = np.zeros((6,6))
for k in range(3):
    for l in range(k+1,3):
        if (k == i) or (l == i):
                hessian[k,l] = - a[i] * ((-1) ** (a[i]-1))
        else:            
                hessian[k,l] = (-1) ** a[k]
for k in range(3,6):
    for l in range(k+1,6):
        if (k == j) or (l == j):
                hessian[k,l] = - a[j] * ((-1) ** (a[j]-1))
        else:            
                hessian[k,l] = (-1) ** a[k]
hessian += hessian.T
for k in range(6):
    if (k == i) or (k == j):
        hessian[k,k] = a[k] * (a[k]-1) * ((-1) ** (a[k]-2))
print(hessian)

In [None]:
# (c)
x0 = p
stepsize = 1/2
x1 = x0 - stepsize * gradient
print(x1)

In [None]:
# (d) 
f = (-1) ** a[i] + (-1) ** a[j]
gradient_comp = - (1/f) * gradient
print(gradient_comp)

$\lhd$

**AQ6.3**  

***

*Use the following code to generate the quiz questions.*

***

Consider the following matrix $A$:

In [None]:
A = rng.integers(low=1,high=4,size=(2,4))
S = np.sign(2 * rng.random(size=(2,4)) - 1)
A = A * S
print(A)

and denote its columns by $\mathbf{a}_1,\ldots,\mathbf{a}_4$. Define $B_1 = \mathbf{a}_1 \otimes \mathbf{a}_2^T$ and $B_2= \mathbf{a}_3 \otimes \mathbf{a}_4^T$.

(a) Compute $O = B_1 \odot B_2$.

(b) Compute $X = B_1 \otimes B_2$.

(c) Compute $\mathrm{rk}(O)$.

(d) Compute $\mathrm{rk}(X)$.

***

*Use the following code to generate the answers.*

***

In [None]:
# (a)
B1 = np.outer(A[:,0],A[:,1])
B2 = np.outer(A[:,2],A[:,3])
O = B1 * B2
print(O)

In [None]:
# (b)
X = np.kron(B1,B2)
print(X)

In [None]:
# (c)
LA.matrix_rank(O)

In [None]:
# (d)
LA.matrix_rank(X)

$\lhd$