# **Minimizing a Sixtic Polynomial using Gradient Descent**

This notebook demonstrates the application of the Gradient Descent algorithm to find the local minima of a sixth-degree polynomial function.

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
# mpl_toolkits.mplot3d is not directly used in this script but is included
# for consistency with previous notebooks if 3D plotting were to be added.
from mpl_toolkits.mplot3d import Axes3D 

np.random.seed(42) # For reproducibility of random processes


# **1. Parameters**

Define the learning rate (`eta`), the total number of iterations, and the frequency for displaying iteration progress for the Gradient Descent algorithm.

In [ ]:
eta = 0.0005  # Learning rate: controls the step size in each iteration
num_iters = 1000 # Total number of iterations for gradient descent
num_iter_to_display = 50 # Number of initial and final iterations to print


# **2. Sixtic Polynomial Definition**

Define the coefficients of the sixth-degree polynomial function. The polynomial is given by:

$$ P(x) = \frac{1}{10} (x^6 -4x^5 -26x^4 +56x^3 + 253x^2 + 20x -300) $$

This polynomial has six real roots: $[-3, -2, -2, 1, 5, 5]$.
In NumPy, a polynomial is typically represented by a `poly1d` object, which simplifies evaluation and differentiation.

In [ ]:
# Coefficients of the sixtic polynomial, from highest degree to constant term
sixtic_pol_coeffs = 1/10 * np.array([1, -4, -26, 56, 253, 20, -300])

# Create a polynomial object using np.poly1d for easy evaluation and differentiation
P = np.poly1d(sixtic_pol_coeffs)

# Generate x values for plotting the polynomial function
m = 1000
x_range = np.linspace(-7, 8.5, m)

# Evaluate the polynomial function over the x_range
y_values = P(x_range)


# **3. Polynomial Visualization**

Visualize the sixtic polynomial function over a broad range and then a zoomed-in view. The zoomed-in plot helps to highlight the local minima and maxima that Gradient Descent will attempt to find.

In [ ]:
plt.figure(figsize=(10, 6))
plt.plot(x_range, y_values, linewidth=2)
plt.grid(True) # Add a grid for better readability
plt.title('Sixtic Polynomial Function')
plt.xlabel('x')
plt.ylabel('P(x)')
plt.show()

plt.figure(figsize=(10, 6))
plt.plot(x_range, y_values, linewidth=2)
plt.grid(True)
plt.xlim([-6, 8])  # Set x-axis limits for zoomed view
plt.ylim([-50, 140]) # Set y-axis limits for zoomed view
plt.title('Sixtic Polynomial Function (Zoomed In)')
plt.xlabel('x')
plt.ylabel('P(x)')

# Add annotations for axes (approximating MATLAB's annotation function)
# x-axis arrow
plt.annotate('', xy=(plt.xlim()[1], 0), xytext=(plt.xlim()[0], 0), 
             arrowprops=dict(facecolor='black', shrink=0.05, width=1, headwidth=8))
plt.text(plt.xlim()[1] + 0.2, 0, 'x', fontsize=12, ha='left', va='center')

# y-axis arrow
plt.annotate('', xy=(0, plt.ylim()[1]), xytext=(0, plt.ylim()[0]), 
             arrowprops=dict(facecolor='black', shrink=0.05, width=1, headwidth=8))
plt.text(0, plt.ylim()[1] + 5, 'P(x)', fontsize=12, ha='center', va='bottom')

plt.show()


# **4. Gradient Calculation**

Compute the first derivative (i.e., the gradient) of the polynomial function. In NumPy, this can be easily done using the `.deriv()` method of a `poly1d` object. The derivative is crucial for Gradient Descent as it indicates the direction of the steepest ascent.

In [ ]:
# Compute the first derivative of the polynomial P
grad_P = P.deriv()

plt.figure(figsize=(10, 6))
plt.plot(x_range, y_values, linewidth=2, label='Sixtic Polynomial Function')
plt.grid(True)
# Plot the derivative (gradient) function
plt.plot(x_range, grad_P(x_range), '-r', linewidth=2, label='Its Derivative (Gradient)')
plt.legend(loc='upper left', fontsize=12) # Add a legend to distinguish plots
plt.xlim([-5.5814, 7.2100]) # Set x-axis limits
plt.ylim([-650.015, 1015.517]) # Set y-axis limits
plt.xlabel('x')
plt.ylabel('Value')
plt.title('Polynomial Function and Its Derivative')
plt.show()


# **5. Critical Points of the Polynomial**

The roots of the gradient function correspond to the x-coordinates where the original polynomial has local minima or maxima. These are the critical points where the slope is zero.

In [ ]:
# Find the roots of the derivative (gradient) polynomial
location_of_local_minima_and_maxima = grad_P.roots
print(f"Locations of local minima and maxima (roots of the gradient): {location_of_local_minima_and_maxima}")

# For this specific polynomial, the approximate location of the true absolute minimum is at x = -0.04007195,
# where the function value is y = -30.03988514.


# **6. Gradient Descent Algorithm Implementation**

This function implements the core Gradient Descent algorithm. It iteratively updates the current 'x' value by moving in the direction opposite to the gradient, scaled by the learning rate, aiming to reach a local minimum of the polynomial function.

In [ ]:
def minimization_using_GD(pol, x0, grad, eta, num_iters):
    """
    Performs minimization of a polynomial function using Gradient Descent.

    Args:
        pol (np.poly1d): The polynomial function to minimize.
        x0 (float): The initial starting point for x.
        grad (np.poly1d): The derivative (gradient) of the polynomial function.
        eta (float): The learning rate.
        num_iters (int): The number of iterations to perform.

    Returns:
        tuple: A tuple containing:
            - x_history (np.array): History of x values during optimization.
            - pol_history (np.array): History of polynomial values at x_history.
    """
    # Initialize arrays to store the history of x values and corresponding polynomial values
    x_history = np.zeros(num_iters + 1)
    pol_history = np.zeros(num_iters + 1)

    # Store the initial values
    x_history[0] = x0
    pol_history[0] = pol(x0)

    old_x = x0

    # Gradient Descent loop
    for k in range(num_iters):
        # Calculate the new x value using the gradient descent update rule
        # new_x = old_x - learning_rate * gradient(old_x)
        new_x = old_x - eta * grad(old_x)
        
        # Update old_x for the next iteration
        old_x = new_x
        
        # Store the current x value and its corresponding polynomial value
        x_history[k + 1] = new_x
        pol_history[k + 1] = pol(new_x)

    return x_history, pol_history


# **7. Displaying Iterations**

This helper function prints a summary of the Gradient Descent iterations, showing the initial, a few intermediate, and the final steps of the optimization process.

In [ ]:
def displayIterations(x_his, pol_his, num_iter_to_display, x0):
    """
    Displays a summary of the Gradient Descent iterations.

    Args:
        x_his (np.array): History of x values.
        pol_his (np.array): History of polynomial values.
        num_iter_to_display (int): Number of initial and final iterations to display.
        x0 (float): The initial starting point for x.
    """
    print(f"\n\nDisplaying the first {num_iter_to_display} and the latest {num_iter_to_display} iterations\nwhen starting from x0={x0}")
    
    # Display initial iterations
    for i in range(min(num_iter_to_display + 1, len(x_his))):
        print(f"it:{i:<3d}  x:{x_his[i]:<10.5f}  pol(x):{pol_his[i]:<10.5f}")
    
    # Display ellipsis if there are more iterations than displayed
    if len(x_his) > 2 * num_iter_to_display + 1:
        print('...')

    # Display latest iterations
    for i in range(max(num_iter_to_display + 1, len(x_his) - num_iter_to_display), len(x_his)):
        print(f"it:{i:<4d}  x:{x_his[i]:<10.5f}  pol(x):{pol_his[i]:<10.5f}")


# **8. Running Gradient Descent from Different Starting Points**

Execute the Gradient Descent algorithm with various initial 'x' values (`x0`) to demonstrate its convergence behavior. Gradient Descent converges to the local minimum closest to the starting point, not necessarily the global minimum, depending on the function's landscape.

In [ ]:
# Starting from x0 = -6: Expected to converge to the first local minimum from the left (approx. x = -2.70)
x0_1 = -6
x_his_1, pol_his_1 = minimization_using_GD(P, x0_1, grad_P, eta, num_iters)
displayIterations(x_his_1, pol_his_1, num_iter_to_display, x0_1)

# Starting from x0 = +8: Expected to converge to the local minimum on the right (approx. x = +5)
x0_2 = +8
x_his_2, pol_his_2 = minimization_using_GD(P, x0_2, grad_P, eta, num_iters)
displayIterations(x_his_2, pol_his_2, num_iter_to_display, x0_2)

# Starting from x0 = +2: Expected to converge to the true absolute minimum (approx. x = -0.04)
x0_3 = +2
x_his_3, pol_his_3 = minimization_using_GD(P, x0_3, grad_P, eta, num_iters)
displayIterations(x_his_3, pol_his_3, num_iter_to_display, x0_3)
