In [0]:
import requests
from IPython.core.display import HTML
HTML(f"""
<style>
@import "https://cdn.jsdelivr.net/npm/bulma@0.9.4/css/bulma.min.css";
</style>
""")

# Non-Linear Optimization: Multivariable Functions
This exercise is about implementing non-linear optimization of multivariate functions. Non-linear optimization requires:
- Compute partial derivatives.
- Calculating forward and backward passes of the function.
- Iteratively using the forward and backward passes.



In [0]:
# import packages
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

## Function Definition
Define the function:

$$ f(x, y) = e^{-x^2 - y^2} \sin(x) \cos(y) $$
$f$ is interesting function because it includes an exponential decay and sinusoidal variation of both variables.
## Class Implementation
The cell below defines the class `ExpTrig`
 with the following methods:
1. `forward`
 should return the function value of `f(x,y)`
.
2. `df_dx`
 should return the partial derivative of the function with respect to `x`

3. `df_dy`
 should return the partial derivative of the function with respect to `y`

4. `backward`
 should return the gradient of `f(x,y)`
 as a tuple `(df_dx, df_dy)`
. 
5. `display_function`
 makes a figure of the function defined in `forward`
.

Your task will be to implement these in the following steps.


In [0]:
class ExpTrig:

    def forward(self, x, y):
        """
        Args:
        x: x-values (Can be single float or 1D array)
        y: y-values (Can be single float or 1D array)

        Returns:
        The function values of f (size-like x and y)
        """
        # Write the code here

    def df_dx(self, x,y):
        """
        Args:
        x: x-values (Can be single float or 1D numpy array)
        y: y-values (Can be single float or 1D numpy array)

        Returns:
        The partial derivative of f with respect to x (size-like x and y)
        """
        # Write the code here

    def df_dy(self, x,y):
        """
        Args:
        x: x-values (Can be single float or 1D numpy array)
        y: y-values (Can be single float or 1D numpy array)

        Returns:
        The partial derivative of f with respect to y (size-like x and y)
        """
        # Write the code here

    def backward(self, x, y):
        """
        args:
        x: x-values
        y: y-values
        Returns:
        Patial derivatives of the function (i.e. the gradient) as a tuple
        """
        # Write the code here

    def display_function(self):
        
        x = np.linspace(-2, 2, 400)
        y = np.linspace(-2, 2, 400)
        x, y = np.meshgrid(x, y)

        z = self.forward(x,y)

        # Create a 3D plot  
        fig = plt.figure(figsize=(10, 8))
        ax = fig.add_subplot(111, projection='3d')
        ax.plot_surface(x, y, z, cmap='viridis', alpha=0.8)
        ax.set_xlabel('X axis')
        ax.set_ylabel('Y axis')
        ax.set_zlabel('f(x, y)')
        ax.set_title('Surface of f(x, y) = e^{-x^2 - y^2} sin(x) cos(y)')
        plt.show()

<article class="message task"><a class="anchor" id="forward"></a>
    <div class="message-header">
        <span>Task 1: Forward pass</span>
        <span class="has-text-right">
          <i class="bi bi-code"></i><i class="bi bi-stoplights easy"></i>
        </span>
    </div>
<div class="message-body">


Implement the  `forward`
 function in the class above to evaluate the function `f(x,y)`
.


</div></article>

<article class="message task"><a class="anchor" id="testing"></a>
    <div class="message-header">
        <span>Task 2: Testing the function</span>
        <span class="has-text-right">
          <i class="bi bi-stoplights easy"></i>
        </span>
    </div>
<div class="message-body">


Run the code below to visualize the function.


</div></article>



In [0]:
# Grid of x, y points
f = ExpTrig()

f.display_function()

## Partial Derivatives
<article class="message task"><a class="anchor" id="partial"></a>
    <div class="message-header">
        <span>Task 3: Backward pass</span>
        <span class="has-text-right">
          <i class="bi bi-stoplights medium"></i>
        </span>
    </div>
<div class="message-body">


1. On a piece of paper, find the partial derivatives of the function `f(x,y)`
  with respect to `x`
 and `y`
 ( i.e. $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$):

2. Implement the functions `df_dx`
 and `df_dy`
 in the `ExpTrig`
 class, which return the values of the partial derivatives for a given `x`
 and `y`
.


3. Implement the `backward`
 pass method in the `ExpTrig`
 class, so that it returns the gradient evaluated in x, y.




</div></article>

## Optimization Method
Basic gradient descent minimizes the function by iteratively adjusting variables in the opposite direction of the gradient:  
$$ x_{t+1} = x_{t} -\nabla_x f(x)_{t} \lambda,$$

where $\lambda$ is the learning rate and $t$ refers to the interation step.
In the following steps you will implement gradient descent of the function $f(x,y)$, as implemented in the class `ExpTrig`
.
<article class="message task"><a class="anchor" id="partial2"></a>
    <div class="message-header">
        <span>Task 4: Gradient descent</span>
        <span class="has-text-right">
          <i class="bi bi-code"></i><i class="bi bi-stoplights medium"></i>
        </span>
    </div>
<div class="message-body">


Implementing the `optimize_function`
 involves a series of steps that must be carried out both before and within the function itself. Here's a step-by-step guide:
1. **Initialize Coordinates**: Start by initializing `x`
 and `y`
 with `start_x`
 and `start_y`
, respectively.

2. **Initialize History Lists**: Create two lists, `x_all`
 and `y_all`
, to record the history of coordinates visited during the optimization.

3. **Start Gradient Descent Loop**: Implement a loop that runs for the number of iterations specified. Within each iteration:
    - Use the `backward`
 method of `func`
 to compute the gradients at the current `(x, y)`
.
    - Update `x`
 and `y`
 by taking a step in the direction opposite to the gradient, scaled by the `learning_rate`
.


4. **Logging (optional)**: Add a logging mechanism to print the current state of optimization. For example, after every 25 iterations, print the current iteration number, `x`
, `y`
, and the function value at this point.

5. **Record History**: In each iteration, append the current values of `x`
 and `y`
 to `x_all`
 and `y_all`
 respectively.

6. **Return the Result**: After completing all iterations, return `x_all`
 and `y_all`
.




</div></article>



In [0]:
def optimize_function(func, start_x, start_y, learning_rate, iterations):
    """
    Optimize a given function using gradient descent.

    Args:
    - func (Function): A function object that must have 'forward' and 'backward' methods.
                       The 'forward' method computes the function value at a given point (x, y),
                       and the 'backward' method computes the gradient at that point.
    - start_x (float): The starting x-coordinate for the optimization.
    - start_y (float): The starting y-coordinate for the optimization.
    - learning_rate (float): The step size for each iteration in the gradient descent.
    - iterations (int): The total number of iterations for the optimization process.

    Returns:
    - x_all (list of float): List of all x-coordinates of the points visited during the optimization.
    - y_all (list of float): List of all y-coordinates of the points visited during the optimization.

    This function performs gradient descent on the provided function object. Starting from 
    (start_x, start_y), it iteratively moves in the direction opposite to the gradient, 
    with step sizes determined by the learning rate. The function's value and current 
    coordinates are printed every 25 iterations. The function returns the history of 
    coordinates visited during the optimization.
    """
    # return ...
    # Write the optimization rutine here

# Example optimization
x_arr, y_arr = optimize_function(ExpTrig(), start_x=-1.5, start_y=-1., learning_rate=0.2, iterations=100)

In the last snippet of code. We plot the optimization path of the `optimize_function`
 for the `ExpTrig`



In [0]:
x = np.linspace(-2, 2, 400)
y = np.linspace(-2, 2, 400)
x, y = np.meshgrid(x, y)
z = ExpTrig().forward(x,y)

# Assuming you have lists `x_values` and `y_values` containing the optimization path
x_values = np.array(x_arr)
y_values = np.array(y_arr)
z_values = np.exp(-np.array(x_values)**2 - np.array(y_values)**2) * np.sin(x_values) * np.cos(y_values)

# Plot the optimization path on the surface
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x, y, z, cmap='viridis', alpha=0.6)
ax.plot(x_values, y_values, z_values, marker='o', color='r', markersize=5, label='Optimization Path')
ax.plot(x_values[-1], y_values[-1], z_values[-1], marker='x', color='b', markersize=10, label='Optimization Path')
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('f(x, y)')
ax.set_title('Optimization Path on f(x, y) Surface')
ax.legend()
plt.show()

<article class="message task"><a class="anchor" id="reflection"></a>
    <div class="message-header">
        <span>Task 5: Reflection</span>
        <span class="has-text-right">
          <i class="bi bi-lightbulb-fill"></i><i class="bi bi-stoplights medium"></i>
        </span>
    </div>
<div class="message-body">


Access the proficiency of the of the gradient descent algorithm
1. Use the the following starting positions:

$$x_{start}=1.5, y_{start}=1.5 $$


$$x_{start}=-1.5,  y_{start}=-1.5$$


$$x_{start}=-1.0, y_{start}=1.3$$


$$x_{start}=1.2, y_{start}=-1.5$$


- What do you observe?
- Explain why the optimization function sometimes fails to find the $\textbf{global}$ minimum?

2. Do different learning rates (try $\tau= \{0.1,0.5,1.0\}$ ) affect the result?
3. List 2 different issues with a simple gradient descent optimization function.    - List a potential solution for each of the problems. 





</div></article>

