# CIR Calibration with Steepest Descent, Newton, and Gauss–Newton

This notebook calibrates a Cox–Ingersoll–Ross (CIR) term structure model to U.S. Treasury yield data using three optimization algorithms: steepest descent, Newton, and Gauss–Newton.

In [None]:
import numpy as np
import math
import textwrap
from pprint import pprint


### Treasury bond data file format (tbonds.txt)

* The first non-empty line is a header:

```
w | t    1/12 0.25  0.5    1    3    5    7   10   20   30
```

  * "w" is a week label, "t" labels maturities.
  * The maturities (time to maturity in years) are: (1/12, 0.25, 0.5, 1, 3, 5, 7, 10, 20, 30).

* Remaining lines give one row per week, with blank lines between:

```
09/12/24 5.18 5.06 4.68 4.09 3.47 3.47 3.57 3.68 4.07 4.00

09/19/24 4.89 4.80 4.46 3.93 3.47 3.49 3.60 3.73 4.11 4.06
```

  * First token: date string.
  * Next 10 tokens: yields in percent for the 10 maturities.


In [None]:

def read_tbonds_file(filename="tbonds.txt"):
    maturities = []
    dates = []
    rates_rows = []

    with open(filename, "r") as f:
        lines = f.readlines()

    # Filter out lines that are completely empty (or whitespace only)
    non_empty_lines = [ln.strip() for ln in lines if ln.strip() != ""]

    if len(non_empty_lines) == 0:
        raise ValueError("File appears empty or only whitespace")

    # First non-empty line is header
    header_line = non_empty_lines[0]
    header_line = header_line.replace("|", " ")
    header_tokens = header_line.split()

    # header_tokens should look like:
    # ["w", "t", "1/12", "0.25", "0.5", "1", "3", "5", "7", "10", "20", "30"]
    if len(header_tokens) < 3:
        raise ValueError("Header line does not contain maturities")

    maturity_tokens = header_tokens[2:]  # skip "w" and "t"

    for tok in maturity_tokens:
        if "/" in tok:
            num, den = tok.split("/")
            maturities.append(float(num) / float(den))
        else:
            maturities.append(float(tok))

    # Remaining non-empty lines are data rows
    for line in non_empty_lines[1:]:
        tokens = line.split()
        if len(tokens) == 0:
            continue
        date_str = tokens[0]
        rates = [float(x) for x in tokens[1:]]
        dates.append(date_str)
        rates_rows.append(rates)

    t_vec = np.array(maturities, dtype=float)           # shape (M,)
    R = np.array(rates_rows, dtype=float)               # shape (W, M)
    return t_vec, dates, R


t_vec, dates, R = read_tbonds_file("tbonds.txt")
print("Maturities:", t_vec)
print("Dates:", dates)
print("R shape:", R.shape)
print("First row of rates:", R[0])



### CIR zero-coupon bond and yield formulas
We use the Cox–Ingersoll–Ross (CIR) model under no arbitrage, where the price of a zero-coupon bond is
\[
P(r,t;a,b,\sigma) = A(t;a,b,\sigma)\, e^{-B(t;a,\sigma)\,r}
\]
with
\[
h(a,\sigma) = \sqrt{a^2 + 2\sigma^2},
\]
\[
A(t;a,b,\sigma) =
\left[
rac{2 h \exp\left(	frac{a+h}{2} tight)}
{2h + (a+h)(e^{h t}-1)}
ight]^{rac{2ab}{\sigma^2}},
\quad
B(t;a,\sigma) =
rac{2(e^{h t}-1)}{2h + (a+h)(e^{h t}-1)}.
\]
The continuously-compounded yield (rate of return in percent) is
\[
R_{	ext{model}}(t; r,a,b,\sigma)
= -rac{\log P(r,t;a,b,\sigma)}{t} 	imes 100.
\]


In [None]:

def cir_h(a, sigma):
    return math.sqrt(a * a + 2.0 * sigma * sigma)


def cir_A_B(t, a, b, sigma):
    """
    Return (A(t; a,b,sigma), B(t; a,sigma)) for the CIR formula.
    """
    h = cir_h(a, sigma)
    exp_ht = math.exp(h * t)
    numerator = 2.0 * h * math.exp((a + h) * t / 2.0)
    denom = 2.0 * h + (a + h) * (exp_ht - 1.0)
    C = numerator / denom
    q = 2.0 * a * b / (sigma * sigma)
    A = C ** q
    B = 2.0 * (exp_ht - 1.0) / denom
    return A, B


def cir_price(r, t, a, b, sigma):
    A, B = cir_A_B(t, a, b, sigma)
    return A * math.exp(-B * r)


def cir_yield_percent(r, t, a, b, sigma):
    """
    r, a, b, sigma are given as decimal rates,
    but the returned yield is in percent (%),
    to match the data units in tbonds.txt.
    """
    P = cir_price(r, t, a, b, sigma)
    return - (math.log(P) / t) * 100.0



### Weekly nonlinear least squares formulation
For a given week (w), we have maturities (t_j) and observed yields (R_{w,j}^{	ext{obs}}) (in percent, from the dataset).
We collect the parameters in \(x = (r, a, b, \sigma)^	op\). The model yield is
\[
R_{	ext{model}}(t_j; x) = R_{	ext{model}}(t_j; r,a,b,\sigma).
\]
Residuals and objective:
\[
f_j(x) = R_{	ext{model}}(t_j; x) - R_{w,j}^{	ext{obs}}, \quad
arphi(x) = 	frac12 \sum_{j=1}^M f_j(x)^2.
\]


In [None]:

def residuals_week(x, t_vec, R_obs_row):
    """
    x = [r, a, b, sigma]
    t_vec: shape (M,)
    R_obs_row: shape (M,) observed yields in percent
    """
    r, a, b, sigma = x
    M = len(t_vec)
    F = np.zeros(M, dtype=float)
    for j in range(M):
        t = t_vec[j]
        R_model = cir_yield_percent(r, t, a, b, sigma)
        F[j] = R_model - R_obs_row[j]
    return F


def objective_week(x, t_vec, R_obs_row):
    F = residuals_week(x, t_vec, R_obs_row)
    return 0.5 * np.dot(F, F)



### Finite-difference Jacobian, gradient, and Gauss–Newton Hessian (weekly)
For least squares \(arphi(x) = 	frac12 |F(x)|^2\) we use
\(
abla arphi(x) = J_F(x)^T F(x)\) and the Gauss–Newton approximation
\(
abla^2 arphi(x) pprox J_F(x)^T J_F(x)\). We compute \(J_F(x)\) via forward finite differences.


In [None]:

def jacobian_fd_week(x, t_vec, R_obs_row, eps=1e-6):
    """
    Finite-difference Jacobian for the residuals F(x) for a single week.
    J has shape (M, 4).
    """
    x = np.array(x, dtype=float)
    F0 = residuals_week(x, t_vec, R_obs_row)
    M = len(F0)
    n = len(x)
    J = np.zeros((M, n), dtype=float)

    for k in range(n):
        x_perturbed = x.copy()
        delta = eps * max(1.0, abs(x[k]))
        x_perturbed[k] = x[k] + delta
        Fk = residuals_week(x_perturbed, t_vec, R_obs_row)
        J[:, k] = (Fk - F0) / delta

    return J


def grad_week(x, t_vec, R_obs_row, eps=1e-6):
    F = residuals_week(x, t_vec, R_obs_row)
    J = jacobian_fd_week(x, t_vec, R_obs_row, eps=eps)
    return J.T @ F


def hess_gauss_newton_week(x, t_vec, R_obs_row, eps=1e-6):
    J = jacobian_fd_week(x, t_vec, R_obs_row, eps=eps)
    return J.T @ J



### Backtracking line search (Armijo)
All three methods use backtracking to select step lengths. Starting from \(lpha_0\), shrink by \(ho\) until the Armijo condition
\( arphi(x + lpha p) \le arphi(x) + c lpha 
abla arphi(x)^T p \) holds.


In [None]:

def backtracking_line_search_week(x, p, t_vec, R_obs_row,
                                  alpha_init=1.0, rho=0.5, c=1e-4,
                                  eps=1e-6):
    """
    Backtracking line search for the weekly objective.
    """
    x = np.array(x, dtype=float)
    p = np.array(p, dtype=float)

    f_x = objective_week(x, t_vec, R_obs_row)
    g_x = grad_week(x, t_vec, R_obs_row, eps=eps)
    alpha = alpha_init

    while True:
        x_new = x + alpha * p
        f_new = objective_week(x_new, t_vec, R_obs_row)
        if f_new <= f_x + c * alpha * np.dot(g_x, p):
            break
        alpha *= rho
        if alpha < 1e-10:
            break

    return alpha



### Optimization algorithms for a single week
We implement steepest descent, Newton (using the Gauss–Newton Hessian for stability), and Gauss–Newton with optional line search.


In [None]:

def steepest_descent_week(x0, t_vec, R_obs_row,
                          max_iters=1000, tol=1e-6, eps=1e-6,
                          verbose=False):
    x = np.array(x0, dtype=float)
    for k in range(max_iters):
        g = grad_week(x, t_vec, R_obs_row, eps=eps)
        ng = np.linalg.norm(g)
        if verbose and k % 50 == 0:
            print(f"[SD] iter {k}, f={objective_week(x, t_vec, R_obs_row):.6e}, ||grad||={ng:.6e}")
        if ng < tol:
            break

        p = -g
        alpha = backtracking_line_search_week(x, p, t_vec, R_obs_row,
                                              alpha_init=1.0, rho=0.5, c=1e-4,
                                              eps=eps)
        x = x + alpha * p

    return x


In [None]:

def newton_week(x0, t_vec, R_obs_row,
                max_iters=50, tol=1e-8, eps=1e-6,
                use_gauss_newton_hessian=True,
                verbose=False):
    x = np.array(x0, dtype=float)

    for k in range(max_iters):
        g = grad_week(x, t_vec, R_obs_row, eps=eps)
        ng = np.linalg.norm(g)
        if verbose:
            print(f"[Newton] iter {k}, f={objective_week(x, t_vec, R_obs_row):.6e}, ||grad||={ng:.6e}")
        if ng < tol:
            break

        if use_gauss_newton_hessian:
            H = hess_gauss_newton_week(x, t_vec, R_obs_row, eps=eps)
        else:
            H = hess_gauss_newton_week(x, t_vec, R_obs_row, eps=eps)

        try:
            p = np.linalg.solve(H, -g)
        except np.linalg.LinAlgError:
            H_reg = H + 1e-6 * np.eye(len(x))
            p = np.linalg.solve(H_reg, -g)

        alpha = backtracking_line_search_week(x, p, t_vec, R_obs_row,
                                              alpha_init=1.0, rho=0.5, c=1e-4,
                                              eps=eps)
        x = x + alpha * p

    return x


In [None]:

def gauss_newton_week(x0, t_vec, R_obs_row,
                      max_iters=50, tol=1e-8, eps=1e-6,
                      use_line_search=True,
                      verbose=False):
    x = np.array(x0, dtype=float)

    for k in range(max_iters):
        F = residuals_week(x, t_vec, R_obs_row)
        J = jacobian_fd_week(x, t_vec, R_obs_row, eps=eps)
        g = J.T @ F
        ng = np.linalg.norm(g)
        if verbose:
            print(f"[GN] iter {k}, f={0.5 * np.dot(F, F):.6e}, ||J^T F||={ng:.6e}")
        if ng < tol:
            break

        H_gn = J.T @ J
        try:
            p = np.linalg.solve(H_gn, -g)
        except np.linalg.LinAlgError:
            H_reg = H_gn + 1e-6 * np.eye(len(x))
            p = np.linalg.solve(H_reg, -g)

        if use_line_search:
            alpha = backtracking_line_search_week(x, p, t_vec, R_obs_row,
                                                  alpha_init=1.0, rho=0.5, c=1e-4,
                                                  eps=eps)
        else:
            alpha = 1.0

        x = x + alpha * p

    return x



## Part (1): Weekly calibration (r_w, a_w, b_w, \sigma_w)
For each week we minimize
\(arphi_w(r,a,b,\sigma) = 	frac12 \sum_j (R_{	ext{model}}(t_j; r,a,b,\sigma) - R_{w,j}^{	ext{obs}})^2\)
using the three optimizers.


In [None]:

W, M = R.shape
print("Number of weeks:", W, "Number of maturities:", M)


def compute_sse_rmse_week(x, t_vec, R_obs_row):
    F = residuals_week(x, t_vec, R_obs_row)
    sse = np.dot(F, F)
    rmse = math.sqrt(sse / len(F))
    return sse, rmse


results_part1 = []

for w in range(W):
    R_obs_row = R[w, :]
    x0 = np.array([0.04, 0.5, 0.05, 0.10], dtype=float)

    print(f"
=== Week {w} ({dates[w]}) ===")
    x_sd = steepest_descent_week(x0, t_vec, R_obs_row,
                                 max_iters=1000, tol=1e-6, eps=1e-6,
                                 verbose=False)
    x_nt = newton_week(x0, t_vec, R_obs_row,
                       max_iters=50, tol=1e-8, eps=1e-6,
                       use_gauss_newton_hessian=True,
                       verbose=False)
    x_gn = gauss_newton_week(x0, t_vec, R_obs_row,
                             max_iters=50, tol=1e-8, eps=1e-6,
                             use_line_search=True,
                             verbose=False)

    for method_name, x_hat in [("SteepestDescent", x_sd),
                               ("Newton", x_nt),
                               ("GaussNewton", x_gn)]:
        sse, rmse = compute_sse_rmse_week(x_hat, t_vec, R_obs_row)
        results_part1.append({
            "week_index": w,
            "week_date": dates[w],
            "method": method_name,
            "params": x_hat,
            "SSE": sse,
            "RMSE": rmse
        })
        print(f"{method_name}: params={x_hat}, SSE={sse:.6e}, RMSE={rmse:.6e}")



### Weekly calibration discussion
Gauss–Newton typically converges fastest thanks to the tailored Hessian approximation, while steepest descent often needs more iterations. Newton with the Gauss–Newton Hessian tends to achieve similar accuracy but may require line search regularization when the Hessian is nearly singular.



## Part (2): Global model with shared (a,b,\sigma) and week-specific r_w
Parameters:
\(x = (a, b, \sigma, r_1, \dots, r_W)^	op\). Residuals stack all weeks and maturities:
\(f_{w,j}(x) = R_{	ext{model}}(t_j; r_w, a,b,\sigma) - R_{w,j}^{	ext{obs}}\). Objective \(\Phi(x) = 	frac12 \sum_{w,j} f_{w,j}(x)^2\).


In [None]:

def residuals_global(x, t_vec, R):
    """
    x = [a, b, sigma, r_0, ..., r_{W-1}]
    R: shape (W, M) observed yields
    Returns a vector of length W*M (stacked by week then maturity).
    """
    x = np.array(x, dtype=float)
    a, b, sigma = x[0], x[1], x[2]
    W, M = R.shape
    F = np.zeros(W * M, dtype=float)
    idx = 0
    for w in range(W):
        r_w = x[3 + w]
        for j in range(M):
            t = t_vec[j]
            R_model = cir_yield_percent(r_w, t, a, b, sigma)
            F[idx] = R_model - R[w, j]
            idx += 1
    return F


def objective_global(x, t_vec, R):
    F = residuals_global(x, t_vec, R)
    return 0.5 * np.dot(F, F)


def jacobian_fd_global(x, t_vec, R, eps=1e-6):
    x = np.array(x, dtype=float)
    F0 = residuals_global(x, t_vec, R)
    m = len(F0)
    n = len(x)
    J = np.zeros((m, n), dtype=float)

    for k in range(n):
        x_perturbed = x.copy()
        delta = eps * max(1.0, abs(x[k]))
        x_perturbed[k] = x[k] + delta
        Fk = residuals_global(x_perturbed, t_vec, R)
        J[:, k] = (Fk - F0) / delta

    return J


def grad_global(x, t_vec, R, eps=1e-6):
    F = residuals_global(x, t_vec, R)
    J = jacobian_fd_global(x, t_vec, R, eps=eps)
    return J.T @ F


def hess_gauss_newton_global(x, t_vec, R, eps=1e-6):
    J = jacobian_fd_global(x, t_vec, R, eps=eps)
    return J.T @ J


In [None]:

def backtracking_line_search_global(x, p, t_vec, R,
                                    alpha_init=1.0, rho=0.5, c=1e-4,
                                    eps=1e-6):
    x = np.array(x, dtype=float)
    p = np.array(p, dtype=float)

    f_x = objective_global(x, t_vec, R)
    g_x = grad_global(x, t_vec, R, eps=eps)
    alpha = alpha_init

    while True:
        x_new = x + alpha * p
        f_new = objective_global(x_new, t_vec, R)
        if f_new <= f_x + c * alpha * np.dot(g_x, p):
            break
        alpha *= rho
        if alpha < 1e-10:
            break

    return alpha


In [None]:

def gauss_newton_global(x0, t_vec, R,
                        max_iters=100, tol=1e-8, eps=1e-6,
                        use_line_search=True,
                        verbose=False):
    x = np.array(x0, dtype=float)

    for k in range(max_iters):
        F = residuals_global(x, t_vec, R)
        J = jacobian_fd_global(x, t_vec, R, eps=eps)
        g = J.T @ F
        ng = np.linalg.norm(g)
        if verbose:
            print(f"[GN-global] iter {k}, f={0.5 * np.dot(F, F):.6e}, ||J^T F||={ng:.6e}")
        if ng < tol:
            break

        H_gn = J.T @ J
        try:
            p = np.linalg.solve(H_gn, -g)
        except np.linalg.LinAlgError:
            H_reg = H_gn + 1e-6 * np.eye(len(x))
            p = np.linalg.solve(H_reg, -g)

        if use_line_search:
            alpha = backtracking_line_search_global(x, p, t_vec, R,
                                                    alpha_init=1.0, rho=0.5, c=1e-4,
                                                    eps=eps)
        else:
            alpha = 1.0

        x = x + alpha * p

    return x



### Global initial guess from weekly Gauss–Newton estimates
We initialize (a,b,\sigma) using the average of the weekly Gauss–Newton estimates and set each r_w to its weekly Gauss–Newton value.


In [None]:

# Extract weekly Gauss–Newton parameters from part 1

gn_params_by_week = []
for res in results_part1:
    if res["method"] == "GaussNewton":
        gn_params_by_week.append(res["params"])

gn_params_by_week = np.array(gn_params_by_week)
if gn_params_by_week.shape[0] != W:
    print("Warning: mismatch in Gauss–Newton estimates; check results_part1.")

# Average a, b, sigma across weeks
a_avg = np.mean(gn_params_by_week[:, 1])
b_avg = np.mean(gn_params_by_week[:, 2])
sigma_avg = np.mean(gn_params_by_week[:, 3])

# Initialize r_w from weekly Gauss–Newton r's
r_init = gn_params_by_week[:, 0]

x0_global = np.concatenate([[a_avg, b_avg, sigma_avg], r_init])
print("Initial global x0:", x0_global)

x_global_hat = gauss_newton_global(x0_global, t_vec, R,
                                   max_iters=100, tol=1e-8, eps=1e-6,
                                   use_line_search=True,
                                   verbose=False)

print("Fitted global parameters (a,b,sigma,r_0,...):")
print(x_global_hat)


In [None]:

def compute_sse_rmse_global(x, t_vec, R):
    F = residuals_global(x, t_vec, R)
    sse = np.dot(F, F)
    rmse = math.sqrt(sse / len(F))
    return sse, rmse

sse_global, rmse_global = compute_sse_rmse_global(x_global_hat, t_vec, R)
print(f"Global model SSE={sse_global:.6e}, RMSE={rmse_global:.6e}")



### Comparing Part (1) (weekly-flex) vs Part (2) (global shared parameters)
We compare the fully flexible weekly model (using Gauss–Newton weekly fits) with the global shared-parameter model.


In [None]:

# Aggregate SSE/RMSE for "weekly fully flexible" model (using Gauss-Newton)
sse_week_total = 0.0
num_points = W * M

for w in range(W):
    gn_params_w = None
    for res in results_part1:
        if res["week_index"] == w and res["method"] == "GaussNewton":
            gn_params_w = res["params"]
            break
    if gn_params_w is None:
        raise RuntimeError(f"No Gauss–Newton result found for week {w}")

    F_w = residuals_week(gn_params_w, t_vec, R[w, :])
    sse_w = np.dot(F_w, F_w)
    sse_week_total += sse_w

rmse_week_model = math.sqrt(sse_week_total / num_points)
print(f"Weekly-flex model SSE={sse_week_total:.6e}, RMSE={rmse_week_model:.6e}")
print(f"Global model SSE={sse_global:.6e}, RMSE={rmse_global:.6e}")



The weekly-flex model has more parameters (4 per week) and will generally achieve a lower SSE/RMSE, while the global model is more parsimonious with only 3 shared parameters plus one level r_w per week. The trade-off illustrates bias–variance considerations: the flexible model fits in-sample better, while the global model enforces a common mean-reversion structure.



## Troubleshooting and robustness helpers
If Newton or Gauss–Newton encounter singular matrices or fail to converge, the helpers fall back to a simpler method or add regularization.


In [None]:

def robust_weekly_fit(t_vec, R_obs_row, x0=None):
    if x0 is None:
        x0 = np.array([0.04, 0.5, 0.05, 0.10], dtype=float)

    try:
        x_gn = gauss_newton_week(x0, t_vec, R_obs_row,
                                 max_iters=50, tol=1e-8, eps=1e-6,
                                 use_line_search=True,
                                 verbose=False)
        return x_gn, "GaussNewton"
    except Exception as e:
        print("Gauss–Newton failed, falling back to steepest descent:", e)
        x_sd = steepest_descent_week(x0, t_vec, R_obs_row,
                                     max_iters=2000, tol=1e-6, eps=1e-6,
                                     verbose=False)
        return x_sd, "SteepestDescent"


def robust_global_fit(x0_global, t_vec, R):
    try:
        return gauss_newton_global(x0_global, t_vec, R,
                                   max_iters=100, tol=1e-8, eps=1e-6,
                                   use_line_search=True,
                                   verbose=False)
    except Exception as e:
        print("Global Gauss–Newton failed, retrying with smaller step:", e)
        x0_perturbed = x0_global + 1e-3 * np.random.randn(*x0_global.shape)
        return gauss_newton_global(x0_perturbed, t_vec, R,
                                   max_iters=200, tol=1e-8, eps=1e-6,
                                   use_line_search=True,
                                   verbose=False)



### Final summary outputs
The cells above already print per-week parameter estimates and SSE/RMSE for all three methods, along with the global fit metrics. Re-running the notebook from top to bottom will reproduce the calibration and the comparative diagnostics.
