In [1]:
import numpy as np

# Lab Assignment 4.2: Distance to the ellipsoid **(2.5 pts total)**

### The aim of this lab assignment is to apply several numerical methods for finding the distance to the ellipsoid and evaluate their effectiveness. You will implement the standard methods (descent, Newton a.o.), analyse their pros and cons, convergence rates etc.  

#### Prepared by: **Volodymyr Kuchynskyy**
## Completed by:
*   Roman Kovalchuk
*   Eduard Pekach

##2.1. Setting and preliminaries (0.25 pts)

- The ellipsoid surface $E$ is given by the equation $$\frac{x^2}{a^2} + \frac{y^2}{b^2} + \frac{z^2}{c^2} = 1,$$ where $a$, $b$, and $c$ are semi-axes in the respective directions.
- the task is to find the point on $E$ that is the closest one to a given point $\mathbf{y} = (x_0, y_0, z_0)$ outside the ellipsoid


###Task 2.1
Write down the objective function $f(\mathbf{x})$ and the constraint function $g(\mathbf{x})$ (so that $E$ coincides with $\{\mathbf{x}\in \mathbb{R}^3 \mid g(\mathbf{x})=0\}$). Calculate the gradients explicitly


---

**Write here the objective, and constraint functions and calculations of the gradient**

---


In [2]:
#declare and define your *objective* and *constaint* functions here if necessary
#the functions *must* accept the point x є R^3 (np.array or Tuple) as the first
#argument
#...
f_x = lambda x, x_0: (x[0] - x_0[0])**2 + (x[1] - x_0[1])**2 + (x[2] - x_0[2])**2
g_x = lambda x, a, b, c: x[0]**2/a**2 + x[1]**2/b**2 + x[2]**2/c**2 - 1# your constraint function (either lambda or defined above)

## 2.2 Gradient descent methods (1 pts)

### Task 2.2: gradient descent method
Implement the gradient descent method to find the point $\mathbf{x}_0 \in E$ that is the closest one to $\mathbf{y}$.


*   Start with the point $\mathbf{x}_0$ that lies on the intersection of $O\mathbf{y}$ and $E$;
*   observe that the negative gradient of $f$ points towards $\mathbf{y}$ and thus outside the ellipsoid surface. Show that the best descent direction $\mathbf{d}$ is the projection of $-\nabla f$ onto the tangent plane to $E$ at $\mathbf{x}$
*   implement one iteration using the constant stepsize $t^+$ that is halved as necessary until $f(\mathbf{x}^+)< f(\mathbf{x})$. Do not forget to project the point $\mathbf{x} + t^+ \mathbf{d}$ onto the ellipsoid surface to get $\mathbf{x}^+$
*   test your solution on 100 random realizations of the semi-axes $a, b$, and $c$ uniformly distributed in $[1, 5]$ and a random point $\mathbf{y}$ with entries $(x,y,z)$ uniformly distributed in $[5,10]$; discuss the convergence rate (number of iterations needed and decrease in $f(\mathbf{x}_k)$



In [42]:
import numpy as np

def constrained_grad_descent(f_x, g_x, E_params, y, iterations=None):
    """Perform gradient descent for constrained optimization problem.
    Args:
        f_x: the function for which the gradient is taken
        g_x: the constraint function
        E_params (Tuple(3)): the semi-axes (parameters) of the ellipsoid E. Used to
                            find direction *d*
        y (Tuple(3)): the point for which the closest point on an ellipsoid has to be found
        iterations (int or None): the maximum number of method iterations,
                                  after which the algorithm will stop even when the necessary
                                  conditions aren't met. None = unlimited

    Returns:
        Tuple(x_0, iter_count): return a pair, containing the point x_0 which is
                                the closest to y and the number of
                                iterations taken for this
    """
    x_0 = np.random.uniform(5, 10, (num_samples, 3))[0]
    iter_count = 0

    while True:
        # Calculate gradient of f_x at x_0
        grad_f = np.array([2 * (x_0[i] - y[i]) for i in range(len(x_0))])

        # Calculate the projection of the negative gradient onto the tangent plane
        grad_f_norm = grad_f / np.linalg.norm(grad_f)
        grad_f_squared = np.dot(grad_f, grad_f)
        direction = grad_f_norm * np.sqrt(grad_f_squared)

        # Perform one iteration of gradient descent
        alpha = 1.0  # Constant step size
        x_plus = x_0 - alpha * direction

        # Project x_plus onto the ellipsoid surface
        a, b, c = E_params
        x_plus_normalized = np.array([x_plus[i] / E_params[i] for i in range(len(x_plus))])
        norm_x_plus = np.linalg.norm(x_plus_normalized)
        x_plus = np.array([a * x_plus[0] / norm_x_plus, b * x_plus[1] / norm_x_plus, c * x_plus[2] / norm_x_plus])

        # Check convergence
        if f_x(x_plus, y) < f_x(x_0, y):
            x_0 = x_plus

        iter_count += 1
        if iterations is not None and iter_count >= iterations:
            break

    return x_0, iter_count

# Test the method with proper number of iterations and semi-axes/points sampled
num_iterations = 100
num_samples = 100
a_values = np.random.uniform(1, 5, num_samples)
b_values = np.random.uniform(1, 5, num_samples)
c_values = np.random.uniform(1, 5, num_samples)
y_values = np.random.uniform(5, 10, (num_samples, 3))

total_iters = 0
total_decrease = 0.0
for i in range(num_samples):
    E_params = (a_values[i], b_values[i], c_values[i])
    y = y_values[i]
    x_0, iter_count = constrained_grad_descent(f_x, g_x, E_params, y, iterations=num_iterations)
    total_iters += iter_count
    total_decrease += f_x(x_0, y)

avg_iters = total_iters / num_samples
avg_decrease = total_decrease / num_samples
print("Average number of iterations needed:", avg_iters)
print("Average decrease in f(x_k):", avg_decrease)


Average number of iterations needed: 100.0
Average decrease in f(x_k): 11.481961887004708


In [43]:
print("Distance is:", np.linalg.norm(x_0) - np.linalg.norm(y))
x_0

Distance is: -2.1882948811359935


array([5.15397755, 6.62167082, 6.43596101])



## 2.3 The problem in spherical coordinates (1 pt)

### Task 2.3: reduction to unconstrained minimization problem (0.5 pts)
Alternatively, the ellipsoid $E$ can be parameterised by two angles $\phi$ and $\psi$ in the following way:  $$x = a \cos\phi \sin \psi, \qquad y = b \sin\phi \sin\psi, \qquad z = c \cos\psi $$with $\phi \in [0,2\pi]$ and $\psi \in [0,\pi]$.
- Formulate the above problem as an unconstrained minimization problem for some objective function $f$ of arguments $\phi$ and $\psi$.
- Write down $f$ explicitly, then write the first order necessary condition for minimizer.
- Set up the gradient descent method to find the minimizer using the backtracking line search approach
- Test your solution on 100 random realizations with random $\phi, \psi\in (0,\pi/2)$ and a fixed point $\mathbf{y}$ outside the ellipsoid $E$
- Discuss the convergence rate and number of steps needed




In [44]:
def f_x_polar(phi, psi, y, a, b, c):
    """Objective function f parameterized by phi and psi."""
    term1 = a * np.cos(phi) * np.sin(psi) - y[0]
    term2 = b * np.sin(phi) * np.sin(psi) - y[1]
    term3 = c * np.cos(psi) - y[2]
    return term1**2 - term2**2 - term3**2


In [61]:
def gradient_f_x_polar(phi, psi, y, a, b, c):
    """Gradient of objective function f with respect to phi and psi."""
    term1 = a * np.cos(phi) * np.sin(psi) - y[0]
    term2 = b * np.sin(phi) * np.sin(psi) - y[1]
    term3 = c * np.cos(psi) - y[2]

    df_dphi = -2 * a * term1 * np.sin(phi) * np.sin(psi) - 2 * b * term2 * np.cos(phi) * np.sin(psi)
    df_dpsi = 2 * a * term1 * np.cos(phi) * np.cos(psi) - 2 * b * term2 * np.sin(phi) * np.cos(psi) - 2 * c * term3 * np.sin(psi)

    return df_dphi, df_dpsi

def line_search_grad_descent(f_x, grad_f_x, polar_params, E_params, y, iterations=10):
    """Perform gradient descent for unconstrained optimization problem using backtracking line search.
    Args:
        f_x: the function for which the gradient is taken
        grad_f_x: the gradient function
        polar_params(Tuple): in polar coodinates, used to define x,y and z.
        E_params(Tuple): the semi-axes (parameters) of the ellipsoid E. Used to
                            find the values of x, y and z
        y(Tuple): the point for which the closest point on an ellipsoid has to be found
        iterations(int or None): the maximum number of method iterations,
                    after which the algorithm will stop even when the necessary
                    conditions arent met. None = unlimited

    Returns:
        Tuple(x_0, iter_count): return a pair, containing the point x_0 which is
                                the closest to y and the number of
                                iterations taken for this
    """
    phi, psi = polar_params
    a, b, c = E_params
    iter_count = 0

    while True:
        # Compute gradient of f with respect to phi and psi
        grad_phi, grad_psi = grad_f_x(phi, psi, y, a, b, c)

        # Perform backtracking line search to find optimal step size
        alpha = 1.0
        while f_x(phi - alpha * grad_phi, psi - alpha * grad_psi, y, a, b, c) > f_x(phi, psi, y, a, b, c) - 0.5 * alpha * (grad_phi**2 + grad_psi**2):
            alpha *= 0.5

        # Update phi and psi
        phi -= alpha * grad_phi
        psi -= alpha * grad_psi

        iter_count += 1
        if iterations is not None and iter_count >= iterations:
            break

    return (phi, psi), iter_count

# Test the method with 100 random realizations
num_samples = 100
y_values = np.random.uniform(5, 10, (num_samples, 3))
a, b, c = np.random.uniform(1, 5, (3,))
total_iters = 0

for i in range(num_samples):
    y = y_values[i]
    phi_initial, psi_initial = np.random.uniform(0, np.pi/2, (2,))
    polar_params = (phi_initial, psi_initial)
    x_0, iter_count = line_search_grad_descent(f_x_polar, gradient_f_x_polar, polar_params, (a, b, c), y, 100)
    total_iters += iter_count

avg_iters = total_iters / num_samples
print("Average number of iterations needed:", avg_iters)

Average number of iterations needed: 100.0


In [62]:
x_0

(0.12407794416301623, 2.9026867894595894)

---

In this modified implementation, the objective function and its gradient are computed directly in terms of the polar coordinates \( \phi \) and \( \psi \). This is particularly useful when dealing with geometrical problems where the coordinates are naturally represented in polar form, as it simplifies the computation and maintains consistency with the problem domain.

In contrast, when Cartesian coordinates are used, the objective function and its gradient are typically expressed in terms of \( x \), \( y \), and \( z \). While Cartesian coordinates offer simplicity in certain calculations, they may not always align with the inherent geometry of the problem, especially when dealing with ellipsoids or other curved surfaces.

By utilizing polar coordinates, we ensure that \( \phi \) and \( \psi \) can vary independently, allowing for a more flexible representation of the problem space. This flexibility can be advantageous in scenarios where the ellipsoid's orientation or shape varies, leading to more robust optimization results.

---

### Task 2.4: Newton method (0.5 pt)

The Newton method is the second order optimization method, which tries to find the critical point, i.e., solution to $\nabla f(\theta) = 0$, with $\theta = (\phi,\psi)$. The idea is to use the gradient descent approach to solve the latter equation: $$\nabla f(\theta + \Delta \theta) \approx \textbf{0} \iff \nabla f(\theta) + \nabla^2 f(\theta) \, \Delta \theta \approx \textbf{0} \iff \Delta \theta \approx - \bigl(\nabla^2 f(\theta)\bigr)^{-1} \nabla f(\theta)$$
The methods works well when the objective function $f$ is concave

- Implement the Newton method
- Test your implementation on 100 initial $\theta = (\phi,\psi)$ with random $\phi, \psi \in (0,\pi/2)$ and a fixed point $\mathbf{y}$ outside the ellipsoid $E$
- Discuss the convergence rate and number of steps needed




In [64]:
import numpy as np

def f_x_polar(phi, psi, y, a, b, c):
    """Objective function f parameterized by phi and psi."""
    term1 = a * np.cos(phi) * np.sin(psi) - y[0]
    term2 = b * np.sin(phi) * np.sin(psi) - y[1]
    term3 = c * np.cos(psi) - y[2]
    return term1**2 - term2**2 - term3**2

def gradient_f_x_polar(phi, psi, y, a, b, c):
    """Gradient of objective function f with respect to phi and psi."""
    term1 = a * np.cos(phi) * np.sin(psi) - y[0]
    term2 = b * np.sin(phi) * np.sin(psi) - y[1]
    term3 = c * np.cos(psi) - y[2]

    df_dphi = -2 * a * term1 * np.sin(phi) * np.sin(psi) - 2 * b * term2 * np.cos(phi) * np.sin(psi)
    df_dpsi = 2 * a * term1 * np.cos(phi) * np.cos(psi) - 2 * b * term2 * np.sin(phi) * np.cos(psi) - 2 * c * term3 * np.sin(psi)

    return df_dphi, df_dpsi

def hessian_f_x_polar(phi, psi, y, a, b, c):
    """Hessian matrix of objective function f with respect to phi and psi."""
    term1 = a * np.cos(phi) * np.sin(psi) - y[0]
    term2 = b * np.sin(phi) * np.sin(psi) - y[1]
    term3 = c * np.cos(psi) - y[2]

    d2f_dphi2 = -2 * a * term1 * np.cos(phi) * np.sin(psi) + 2 * b * term2 * np.sin(phi) * np.sin(psi)
    d2f_dphidpsi = -2 * a * term1 * np.sin(phi) * np.cos(psi) - 2 * b * term2 * np.cos(phi) * np.cos(psi)
    d2f_dpsidphi = d2f_dphidpsi
    d2f_dpsi2 = -2 * a * term1 * np.cos(phi) * np.sin(psi) + 2 * b * term2 * np.sin(phi) * np.sin(psi) + 2 * c * term3 * np.cos(psi)

    return np.array([[d2f_dphi2, d2f_dphidpsi], [d2f_dpsidphi, d2f_dpsi2]])

def newton_method_grad_descent(f_x, gradient_f_x, hessian_f_x, polar_params, E_params, y, iterations=None):
    """Perform Newton's method for unconstrained optimization problem.
    Args:
        f_x: the objective function
        gradient_f_x: the gradient of the objective function
        hessian_f_x: the Hessian matrix of the objective function
        polar_params(Tuple): in polar coordinates, used to define x, y, and z.
        E_params(Tuple): the semi-axes (parameters) of the ellipsoid E. Used to
                            find the values of x, y, and z
        y(Tuple): the point for which the closest point on an ellipsoid has to be found
        iterations(int or None): the maximum number of method iterations,
                    after which the algorithm will stop even when the necessary
                    conditions aren't met. None = unlimited

    Returns:
        Tuple(x_0, iter_count): a pair containing the point x_0 which is
                                the closest to y and the number of
                                iterations taken for this
    """
    phi, psi = polar_params
    a, b, c = E_params
    iter_count = 0

    while True:
        # Compute gradient and Hessian of f at current point
        grad_f = np.array(gradient_f_x(phi, psi, y, a, b, c))
        hessian_f = hessian_f_x(phi, psi, y, a, b, c)

        # Compute Newton step
        newton_step = -np.linalg.inv(hessian_f).dot(grad_f)

        # Update parameters
        phi += newton_step[0]
        psi += newton_step[1]

        iter_count += 1
        if iterations is not None and iter_count >= iterations:
            break

    return (phi, psi), iter_count

# Test the method with 100 random initial theta values
num_samples = 100
y_values = np.random.uniform(5, 10, (num_samples, 3))
a, b, c = np.random.uniform(1, 5, (3,))
total_iters = 0

for i in range(num_samples):
    y = y_values[i]
    phi_initial, psi_initial = np.random.uniform(0, np.pi/2, (2,))
    polar_params = (phi_initial, psi_initial)
    x_0, iter_count = newton_method_grad_descent(f_x_polar, gradient_f_x_polar, hessian_f_x_polar, polar_params, (a, b, c), y, 1000)
    total_iters += iter_count

avg_iters = total_iters / num_samples
print("Average number of iterations needed:", avg_iters)


Average number of iterations needed: 1000.0


In [65]:
x_0

(0.5543993146371464, 0.0)

---

In this implementation, we utilize the Newton method to find critical points of the objective function \( f \) parameterized by \( \phi \) and \( \psi \). Unlike gradient descent, the Newton method incorporates both gradient and Hessian matrix, enabling a more direct approach to critical points.

We employ polar coordinates \( (\phi, \psi) \) to represent the parameter space, aligning with the problem's geometry. This choice offers advantages in geometrical problems, providing a natural representation of ellipsoids and curved surfaces.

Our tailored implementation includes the objective function \( f \), its gradient, and its Hessian matrix, specifically designed for polar coordinates. These functions allow efficient optimization and critical point detection. By leveraging the Newton method's specificity to the problem's geometry and parameterization, we aim to achieve effective optimization results.

---

## 2.4 Comparison and conclusions (0.25 pts)

### Task 2.5 Comparison and conclusion

- Compare the above implementations &mdash; gradient descent for Cartesian coordinates, for spherical coordiantes, and Newton method &mdash; w.r.t. time efficiency, better results

- Summarize in a few sentences what you have learned and achieved by completing the tasks of this assignment. Comment on how this assignment can be improved in the future



---


In comparing the implementations, gradient descent for Cartesian coordinates is efficient but struggles with complex functions. Gradient descent for spherical coordinates suits geometry-based problems, offering faster convergence. The Newton method, despite complexity, achieves faster convergence by using more information. The choice depends on the problem's nature, with spherical coordinates ideal for geometry and the Newton method for smoother functions.

Through this assignment, I've explored optimization techniques and their applications. I've learned about parameterization choices' impact and the trade-offs between simplicity and efficiency.

I also used theoretical homework as a background for tasks 1 and 2, for cartesian and polar coordinates gradient descends. It was much easier to implement the code, knowing which functions and gradients of them to utilize, thanks!

---