# Second Wolfe conditions

## Introduction to optimization and operations research.

Michel Bierlaire


In [None]:

import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import fsolve


In this lab, you will practice the **second Wolfe condition**
in an inexact line search. Starting from a quadratic objective and an initial point, you
will (i) compute the gradient and Hessian, (ii) form Newton’s direction and verify it is a
**descent direction** via the directional derivative, and (iii) study how the ratio
⟨∇f(x_α), d_N⟩ / ⟨∇f(x_0), d_N⟩ behaves along the line. You will plot this ratio, locate the
α values that satisfy the condition for a given β₂, and compare them to the minimizer along
the line. The goal is to connect the algebra (∇f, ∇²f), the geometry of line search, and the
practical choice of step sizes that guarantee sufficient curvature decrease.

Consider the unconstrained optimization problem
$$\min_{x \in \mathbb{R}^2} f(x)=x_1^2+x_1x_2+2x_2^2,
$$
and the point $x_0=(1,1)^T$.


- Calculate Newton's direction at $x_0$.
- Verify that it is a descent direction.
- Consider the second Wolfe condition with $\beta_2=0.7$. What are
the values of the step $\alpha$ that verify the condition?

First, implement the function and its derivatives

In [None]:


def f(x: np.array) -> float:
    """Objective function"""
    # Python starts the numbering at zero
    x_1 = x[0]
    x_2 = x[1]
    result = x_1 * x_1 + x_1 * x_2 + 2 * x_2 * x_2
    return result



In [None]:
x_zero = np.array([1.0, 1.0])
f_zero = f(x_zero)
print(f'f({x_zero}) = {f_zero}')



In [None]:
def gradient(x: np.array) -> np.array:
    """Gradient of the objective function"""
    x_1 = x[0]
    x_2 = x[1]
    g_1 = 2 * x_1 + x_2
    g_2 = x_1 + 4 * x_2
    return np.array([g_1, g_2])



In [None]:
g_zero = gradient(x_zero)
print(f'Gradient of f({x_zero}) = {g_zero}')



In [None]:
def hessian(x: np.array) -> np.array:
    """Second derivative matrix of the objective function"""
    # In this case, the hessian does not depend on x
    x_1 = x[0]
    x_2 = x[1]
    h_1_1 = 2
    h_1_2 = 1
    h_2_1 = 1
    h_2_2 = 4
    h = np.array([[h_1_1, h_1_2], [h_2_1, h_2_2]])
    return h



In [None]:
h_zero = hessian(x_zero)
print(f'Hessian of f({x_zero}) =\n{h_zero}')


Note that there exists Python packages for automatic differentiation,
such as ``autograd``or ``jax``.

Calculate Newton's direction

In [None]:
newton_direction = np.linalg.solve(h_zero, -g_zero)
print(f"Newton's direction: {newton_direction}")


Verify that it is a descent direction.
We calculate the directional derivative.

In [None]:
directional_derivative = np.inner(newton_direction, g_zero)
print(f'Directional derivative: {directional_derivative}')


It must be negative.

In [None]:
if directional_derivative < 0:
    print('Descent direction')
else:
    print('Not a descent direction')



Write the function that associates a step alpha along Newton's direction with the value of
the objective function

In [None]:
def linesearch(alpha: float) -> float:
    """

    :param alpha: step along the direction
    :return: value of the objective function
    """
    new_point = x_zero + alpha * newton_direction
    return f(new_point)



We plot the function.

In [None]:
alpha_values = np.linspace(0, 2.5, 100)
objective_values = [linesearch(alpha) for alpha in alpha_values]
plt.plot(alpha_values, objective_values)
plt.xlabel('Step alpha')
plt.ylabel('Objective Function Value')
plt.title('Line Search Plot')
plt.grid(True)
plt.ylim(top=4)
plt.show()



Consider the second Wolfe condition with $\beta_2=0.7$.
$$
\nabla f(x_\alpha)^Td_N \geq \beta_2 \nabla f(x_0)^Td_N.
$$

In [None]:
beta_2 = 0.7



Consider this equivalent version of the same condition.
$$
\frac{\nabla f(x_\alpha)^Td_N}{\nabla f(x_0)^Td_N} \leq \beta_2.
$$
Note that the inequality has changed because $\nabla f(x_0)^Td_N < 0$.

We defined the function calculating that ratio.

In [None]:
def second_wolfe(alpha: float) -> float:
    """
    Second wolfe condition

    :param alpha: step along the direction
    :return: ratio of the second Wolfe condition
    """
    new_point = x_zero + alpha * newton_direction
    numerator = np.inner(
        gradient(new_point), newton_direction
    )
    denominator = directional_derivative
    return numerator / denominator



Plot the line.

In [None]:
wolfe_values = [second_wolfe(alpha) for alpha in alpha_values]
plt.plot(alpha_values, objective_values)
plt.plot(alpha_values, wolfe_values)
plt.axhline(y=beta_2, color='blue', linestyle='--')
plt.text(
    alpha_values[-1], beta_2, f'beta_2={beta_2}', color='blue', ha='right', va='bottom'
)
plt.xlabel('Step alpha')
plt.ylabel('Objective Function Value')
plt.title('Line Search Plot')
plt.grid(True)
plt.ylim(top=4)
plt.show()


The line should intersect two points:

- At $\alpha=0$, it is equal to 1.
- At $\alpha$ corresponding to the minimum of the function, it is equal to 0. Indeed, the directional
derivative is zero for this value of $\alpha$.

What are the values of the step $\alpha$ that verify the condition?

We need to find the values of alpha such that the difference is positive

In [None]:
def difference(alpha: float) -> float:
    """
    Difference between the first wolfe condition and the function

    :param alpha: step along the direction
    :return: Wolfe condition
    """
    return beta_2 - second_wolfe(alpha)



Find the root of that function. Use the function `fsolve` from `scipy`.

In [None]:
guess = 0.25
root = fsolve(difference, guess)[0]
print(f'Point where the ratio equals beta_2: {root:.2g}')


Plot the function with the root.

In [None]:
plt.plot(alpha_values, objective_values)
plt.plot(alpha_values, wolfe_values)
plt.axvline(root, color='red', linestyle='--')
plt.axhline(y=beta_2, color='blue', linestyle='--')
plt.text(
    alpha_values[-1], beta_2, f'beta_2={beta_2}', color='blue', ha='right', va='bottom'
)
plt.text(
    root, 2.5, f'alpha={root:.2g}', color='blue', ha='left', va='bottom', rotation=90
)
plt.xlabel('Step alpha')
plt.ylabel('Objective Function Value')
plt.title('Line Search Plot')
plt.ylim(top=4)
plt.show()


The values of the step $\alpha$ that verify the condition are
$$ \alpha \geq 0.3$$.