# Gradient Descent with Adaptive Learning rate

Our modified code now includes an adaptive learning rate mechanism based on the change in gradient direction. This adaptive learning rate function attempts to increase the learning rate if the angle between the new and previous gradient vectors is less than 45 degrees and decrease it otherwise.

**1. Adaptive Learning Rate Function (`adaptive_learning_rate`):**
   - This function calculates the dot product of the new and previous gradients and compares it with a threshold to determine whether the angle between the vectors is less than 45 degrees.
   - If the angle is less than 45 degrees, it increases the learning rate by a factor of 1.25. Otherwise, it decreases the learning rate by a factor of 0.9.

**2. Integration of Adaptive Learning Rate in `descent` Function:**
   - The `adaptive_learning_rate` function is called within the main optimization loop to adjust the learning rate for each iteration.

**3. Additional Information in Print Statements:**
   - The learning rate is now printed in each iteration for better insight into the adaptive process.

**4. Minor Adjustments:**
   - A line break is added after printing the initial objective value for better readability.

The overall structure of the code remains intact, and the adaptive learning rate mechanism is now incorporated. We can run this modified code to observe how the learning rate adapts during the optimization process.
We have experimented with multiple values of adaptive learning threshold and the factors by which the learning rate is increased or decreased and concluded that these values are best for this particular objective function. For other problems we can experiment with other values and see which better suits our requirements.


In [19]:
import numpy as np
import math

def F(w):
    """Objective function."""
    return 3 * w[0]**2 + 4 * w[1]**2 - 5 * w[0] + 7

def grad(w):
    """Gradient of the objective function."""
    g = [0] * 2
    g[0] = 6 * w[0] - 5
    g[1] = 8 * w[1]
    return g

def has_converged(w_new, w_prev, threshold):
    """Check if the parameters have converged."""
    return np.linalg.norm(np.array(w_new, dtype=float) - np.array(w_prev, dtype=float)) < threshold

def adaptive_learning_rate(lr, grad_prev, grad_new):
    """Adapt the learning rate based on the change in gradient."""
    if np.dot(grad_new, grad_prev) > (1/math. sqrt(2)):
        """If angle between new gradient and previous gradient vectors is less than 45 degrees increase learning rate"""
        return lr * 1.25
    else:
        """If angle between new gradient and previous gradient vectors is greater than 45 degrees decrease learning rate"""
        return lr * 0.9

def descent(w_new, w_prev, lr, threshold, max_iter=1000):
    """Gradient Descent optimization with adaptive learning rate."""
    print("Initial Parameters:", w_prev)
    print("Initial Objective Value:", F(w_prev),"\n")
    
    grad_prev = grad(w_prev)
    
    for iteration in range(max_iter):
        w_prev = w_new
        grad_new = grad(w_prev)
        
        # Update the learning rate adaptively
        lr = adaptive_learning_rate(lr, grad_prev, grad_new)
        
        w_0 = w_prev[0] - lr * grad_new[0]
        w_1 = w_prev[1] - lr * grad_new[1]
        w_new = [w_0, w_1]
        
        print("Iteration:", iteration + 1)
        print("Learning rate: ",lr)
        print("Updated Parameters:", w_new)
        print("Objective Value:", F(w_new),"\n")
        
        # Check if the objective function is increasing
        if F(w_new) > F(w_prev):
            print("Objective function is increasing. Try reducing the learning rate.")
            break

        # Check for convergence
        if has_converged(w_new, w_prev, threshold):
            print(f"Converged after {iteration+1} iterations.")
            break
        
        # Check for divergence (NaN values)
        if any(math.isnan(val) for val in [F(w_new)] + grad_new):
            print("Divergence detected. Try reducing the learning rate.")
            break
        
        # Update the gradient for the next iteration
        grad_prev = grad_new

# Example usage with adaptive learning rate
descent([5, 10], [5, 10], 0.1, pow(10, -6))


Initial Parameters: [5, 10]
Initial Objective Value: 457 

Iteration: 1
Learning rate:  0.125
Updated Parameters: [1.875, 0.0]
Objective Value: 8.171875 

Iteration: 2
Learning rate:  0.15625
Updated Parameters: [0.8984375, 0.0]
Objective Value: 4.92938232421875 

Iteration: 3
Learning rate:  0.1953125
Updated Parameters: [0.8221435546875, 0.0]
Objective Value: 4.917042300105095 

Iteration: 4
Learning rate:  0.17578125
Updated Parameters: [0.8339452743530273, 0.0]
Objective Value: 4.916667790082101 

Iteration: 5
Learning rate:  0.158203125
Updated Parameters: [0.8333644084632397, 0.0]
Objective Value: 4.916666669563657 

Iteration: 6
Learning rate:  0.1423828125
Updated Parameters: [0.8333378610768705, 0.0]
Objective Value: 4.916666666728168 

Iteration: 7
Learning rate:  0.12814453125000003
Updated Parameters: [0.8333343798434314, 0.0]
Objective Value: 4.916666666669952 

Iteration: 8
Learning rate:  0.11533007812500003
Updated Parameters: [0.8333336556788832, 0.0]
Objective Value: 