# Notebook 04: Logit Model from Scratch (Swiss Route Choice)

**Objective:** Step-by-step construction of a Multinomial Logit (MNL) discrete choice model using the Apollo Swiss route choice dataset. We will go from computing utilities and choice probabilities to constructing the log-likelihood and discussing estimation (though we might not do full optimization manually, we'll illustrate the concept). Key concepts introduced include:

- Utility specification for route choice (with travel time, cost, etc.).

- The log-likelihood function for model estimation.

- Calculating Value of Time (VOT) from model coefficients.

- Evaluating model fit via metrics like Log-Likelihood and McFadden's $R^2$

The Swiss route choice dataset `apollo_swissRouteChoiceData.csv` comes from a stated preference survey in Switzerland. Each respondent chose between two hypothetical routes for a trip. The attributes include travel time, travel cost, headway (for public transport, likely a train scenario), and number of interchanges. Additional data: car availability and trip purpose are given, but we will start with a simple model using time and cost as key attributes.

Let's load the data:

In [2]:
import pandas as pd

df_route = pd.read_csv("../data/raw/apollo_swissRouteChoiceData.csv")
print("Loaded route choice data:", df_route.shape)
df_route.head(3)


Loaded route choice data: (3492, 16)


Unnamed: 0,ID,choice,tt1,tc1,hw1,ch1,tt2,tc2,hw2,ch2,hh_inc_abs,car_availability,commute,shopping,business,leisure
0,2439,2,58,7,30,1,50,8,30,0,50000,1,1,0,0,0
1,2439,1,30,8,60,0,41,7,15,2,50000,1,1,0,0,0
2,2439,1,41,7,30,0,34,8,15,2,50000,1,1,0,0,0


We see columns `tt1`, `tc1`, `hw1`, `ch1` for alternative 1 and `tt2`, `tc2`, `hw2`, `ch2` for alternative 2, plus `choice` (1 or 2), and `ID`, `car_availability`, etc.

In [4]:
df_route.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3492 entries, 0 to 3491
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype
---  ------            --------------  -----
 0   ID                3492 non-null   int64
 1   choice            3492 non-null   int64
 2   tt1               3492 non-null   int64
 3   tc1               3492 non-null   int64
 4   hw1               3492 non-null   int64
 5   ch1               3492 non-null   int64
 6   tt2               3492 non-null   int64
 7   tc2               3492 non-null   int64
 8   hw2               3492 non-null   int64
 9   ch2               3492 non-null   int64
 10  hh_inc_abs        3492 non-null   int64
 11  car_availability  3492 non-null   int64
 12  commute           3492 non-null   int64
 13  shopping          3492 non-null   int64
 14  business          3492 non-null   int64
 15  leisure           3492 non-null   int64
dtypes: int64(16)
memory usage: 436.6 KB


In [5]:
print("Shape:", df_route.shape)
print("Columns:", df_route.columns.tolist())
df_route.describe()

Shape: (3492, 16)
Columns: ['ID', 'choice', 'tt1', 'tc1', 'hw1', 'ch1', 'tt2', 'tc2', 'hw2', 'ch2', 'hh_inc_abs', 'car_availability', 'commute', 'shopping', 'business', 'leisure']


Unnamed: 0,ID,choice,tt1,tc1,hw1,ch1,tt2,tc2,hw2,ch2,hh_inc_abs,car_availability,commute,shopping,business,leisure
count,3492.0,3492.0,3492.0,3492.0,3492.0,3492.0,3492.0,3492.0,3492.0,3492.0,3492.0,3492.0,3492.0,3492.0,3492.0,3492.0
mean,22180.878866,1.503436,52.588488,19.667526,32.478522,0.939863,52.472795,19.694731,32.37543,0.945876,76507.731959,0.378866,0.286082,0.082474,0.092784,0.53866
std,15913.386164,0.50006,46.874473,22.295193,18.488605,0.809456,46.622205,22.574277,18.470759,0.79902,44364.765243,0.485174,0.451993,0.275125,0.29017,0.498575
min,2439.0,1.0,2.0,1.0,15.0,0.0,2.0,1.0,15.0,0.0,10000.0,0.0,0.0,0.0,0.0,0.0
25%,15308.0,1.0,18.0,5.0,15.0,0.0,18.0,5.0,15.0,0.0,50000.0,0.0,0.0,0.0,0.0,0.0
50%,18533.0,2.0,37.0,11.0,30.0,1.0,36.5,11.0,30.0,1.0,70000.0,0.0,0.0,0.0,0.0,1.0
75%,21948.25,2.0,75.0,26.0,60.0,2.0,74.0,25.0,60.0,2.0,112500.0,1.0,1.0,0.0,0.0,1.0
max,84525.0,2.0,389.0,206.0,60.0,2.0,385.0,268.0,60.0,2.0,167500.0,1.0,1.0,1.0,1.0,1.0


**Data context:** 388 individuals, 3492 observations (choices). Each choice is between 2 alternatives described by those attributes. For example, alt1 and alt2 correspond to two different routes (say a fast expensive route vs a slow cheap route). The task is to model the probability of choosing alt1 vs alt2 as a function of attributes.

## 04.1 Utility Specification

We'll assume a simple linear utility form for each alternative:

$$U_{alt1} = \beta_{time} \; . \; tt1 + \beta_{cost} \; . \; tc1 + \beta_{hw} \; . \; hw1 + \beta_{ch} \; . \; ch1 + \beta_{ASC}

$$U_{alt2} = \beta_{time} \; . \; tt2 + \beta_{cost} \; . \; tc2 + \beta_{hw} \; . \; hw2 + \beta_{ch} \; . \; ch2 

Where:

- $\beta_{time}$ ​is the coefficient for travel time (per minute, expected negative).

- $\beta_{cost}$ for travel cost (per CHF, expected negative).

- $\beta_{hw}$ for headway (minutes between services, only relevant if these are public transit routes, expected negative).

- ​$\beta_{ch}$ for number of interchanges (transfers, expected negative).

- $\beta_{ASC}$ ​is an alternative-specific constant (ASC) for alt1, to capture any inherent preference for alt1 not explained by attributes. We set ASC for alt2 as 0 for identification (so $\beta_{ASC}$ effectively measures preference for alt1 relative to alt2). In binary choice, one can include an ASC for one alternative.

At first, let's consider a simpler model ignoring headway and interchanges (or lump them into error). But the data likely expects those to be used. We can include them for completeness since their interpretation is straightforward (disutility for waiting time and transfers).

We will not estimate these coefficients from scratch by gradient methods here (that would be complex for this format), but we can demonstrate calculation of log-likelihood for given parameters and perhaps do a simple manual search or reasoning. Suppose initial guesses:

- $\beta_{time}$ ​= -0.1 (utility per minute),

- $\beta_{cost}$ = -1.0 (utility per CHF, since cost often has bigger coefficient magnitude if currency in CHF, an initial guess might be that 10 CHF ~ 10 minutes equivalent? We'll see).

- $\beta_{hw}$ = -0.05 (per minute of headway),

- ​$\beta_{ch}$ = -1.0 (per interchange),

- $\beta_{ASC}$ = 0 (no inherent bias to start).

We can compute the log-likelihood of the model with these guesses and then discuss adjusting them.

In [15]:
# Inspect variable scales first
print(df_route[['tt1','tc1','tt2','tc2']].describe())

               tt1          tc1          tt2          tc2
count  3492.000000  3492.000000  3492.000000  3492.000000
mean     52.588488    19.667526    52.472795    19.694731
std      46.874473    22.295193    46.622205    22.574277
min       2.000000     1.000000     2.000000     1.000000
25%      18.000000     5.000000    18.000000     5.000000
50%      37.000000    11.000000    36.500000    11.000000
75%      75.000000    26.000000    74.000000    25.000000
max     389.000000   206.000000   385.000000   268.000000


In [3]:
# Initialize parameters
beta_time = -0.05
beta_cost = -0.5
beta_hw = -0.05
beta_ch = -1.0
beta_ASC = 0.0  # ASC for alt1

# rescale cost if tc is in cents/large integers -> convert to CHF
df_route['tc1_chf'] = df_route['tc1'] / 10.0
df_route['tc2_chf'] = df_route['tc2'] / 10.0


# Compute utilities for each alternative for all observations
V1 = (beta_time * df_route["tt1"] + beta_cost * df_route["tc1_chf"] + 
      beta_hw * df_route["hw1"] + beta_ch * df_route["ch1"] + beta_ASC)
V2 = (beta_time * df_route["tt2"] + beta_cost * df_route["tc2_chf"] + 
      beta_hw * df_route["hw2"] + beta_ch * df_route["ch2"] + 0)  # ASC for alt2 = 0

# Choice probabilities for each observation
# Use numerically stable logit by subtracting the max utility to avoid overflow
import numpy as np
maxV = np.maximum(V1, V2)
expV1 = np.exp(V1 - maxV)
expV2 = np.exp(V2 - maxV)
P1 = expV1 / (expV1 + expV2)
P2 = expV2 / (expV1 + expV2)

# Avoid log(0) by clipping probabilities
eps = 1e-12
P1 = np.clip(P1, eps, 1 - eps)
P2 = np.clip(P2, eps, 1 - eps)

# Now compute log-likelihood
chosen = df_route["choice"]  # 1 or 2
# If choice==1, contribution = log(P1); if choice==2, log(P2)
log_likelihood = np.where(chosen == 1, np.log(P1), np.log(P2)).sum()
print("Log-likelihood with initial guess:", log_likelihood)


Log-likelihood with initial guess: -1738.424356840138


For context, the null log-likelihood (no predictors, just probability 0.5 each since two options) would be:

$$LL_{null} = N \, . \, \ln(0.5)$$

where N = 3492 observations. That would be 3492 * ln(0.5) $\approx$ -2421.8


Our model's log-likelihood with initial guess might be better (less negative) if the guess is somewhat reasonable. The aim in estimation is to find $\beta$'s that maximize this log-likelihood.

At the moment, we won’t run an optimizer here, but we can conceptually discuss which direction to move parameters:

- If we increase $|\beta_{time}|$ (make it more negative), that means we give more disutility to longer travel time. If in the data people often chose the route with less travel time, a more negative $\beta_{time}$ will boost probability of the shorter route, improving likelihood.

- If cost coefficient is off, say initial -1.0 might be too high or too low. Value of Time concept can guide adjusting both together (we discuss VOT soon).

- The ASC might not be zero in reality; if say alt1 was chosen more often even after accounting for attributes, $\beta_{ASC}$ would adjust positive.

Let’s try a pseudo-optimization by intuition or brute force on one parameter (not too computational heavy as 3492 obs):

We could loop over, say, $\beta_{time}$ values and see how log-likelihood changes:

In [26]:
for bt in [-0.01, -0.05, -0.1, -0.2, -0.3]:
    V1 = bt*df_route["tt1"] + beta_cost*df_route["tc1_chf"] + beta_hw*df_route["hw1"] + beta_ch*df_route["ch1"]
    V2 = bt*df_route["tt2"] + beta_cost*df_route["tc2_chf"] + beta_hw*df_route["hw2"] + beta_ch*df_route["ch2"]
    expV1, expV2 = np.exp(V1), np.exp(V2)
    P1 = expV1/(expV1+expV2)
    loglik = np.where(chosen==1, np.log(P1), np.log(1-P1)).sum()
    print(f"beta_time={bt}: logLik={loglik:.1f}")


beta_time=-0.01: logLik=-1807.0
beta_time=-0.05: logLik=-1738.4
beta_time=-0.1: logLik=-1985.4
beta_time=-0.2: logLik=-3099.4
beta_time=-0.3: logLik=-4572.6


This will show which $\beta_{time}$ yields higher logLik (less negative). We expect a certain negative value around optimum. A rigorous approach would vary all parameters, but for brevity, let's assume we find something like $\beta_{time} \approx -0.05$, $\beta_{cost} \approx -0.8$, $\beta_{hw} \approx -0.1$, $\beta_{ch} \approx -0.2$, $\beta_{ASC} \approx ???$ yields a good fit. (These are guessy; let's focus on concepts.)

## 04.2 Value of Time (VOT)

Value of Time is a key derivative from logit models: it is the trade-off between time and cost, i.e., 

$$VOT = -\frac{\beta_{time}}{\beta_{cost}}$$

in currency units per time (e.g., CHF per minute, which can be converted to CHF per hour by $\times 60$). Essentially, how much money a traveler is willing to pay to save time (the marginal rate of substitution between time and cost).

Once we have estimated $\beta_{time}$ and $\beta_{cost}$, we compute VOT:

In [40]:
beta_time_est = -0.05  # assume from estimation
beta_cost_est = -0.8  # assume from estimation
VOT_minutes = - beta_time_est / beta_cost_est  # in CHF per minute
VOT_hours = VOT_minutes * 60
print(f"Value of Time: {VOT_minutes:.2f} CHF per minute ({VOT_hours:.2f} CHF per hour)")


Value of Time: -0.06 CHF per minute (-3.75 CHF per hour)


If $\beta_{time} = -0.05$, $\beta_{cost} = -0.8$, then $VOT = -(-0.0.05/-0.8) = -0.06$ CHF/minute, which is 3.75 CHF/hour. That would be a plausible number depending on context. If many business travelers, VOT might be higher.

Interpretation: A VOT of 3.75 CHF/hour means, on average, travelers are willing to pay 3.75 CHF to save one hour of travel time. If our model found that, it quantifies the trade-off in a single metric. Typically, one compares that to wage rates or policy values to see if it makes sense.

## 04.3 Model Fit and McFadden's $R^2$


Let's fit the model by minimizing the negative log‑likelihood using BFGS on the original (unstandardized) predictors.

In [None]:

import numpy as np
from scipy.optimize import minimize

# ensure cost is scaled (adjust divisor if your describe() suggests different scale)
if "tc1_chf" not in df_route.columns:
    df_route["tc1_chf"] = df_route["tc1"] / 10.0
    df_route["tc2_chf"] = df_route["tc2"] / 10.0

chosen = df_route["choice"].to_numpy()

def neg_loglike(params):
    bt, bc, bhw, bch, basc = params
    V1 = bt * df_route["tt1"].to_numpy() + bc * df_route["tc1_chf"].to_numpy() + bhw * df_route["hw1"].to_numpy() + bch * df_route["ch1"].to_numpy() + basc
    V2 = bt * df_route["tt2"].to_numpy() + bc * df_route["tc2_chf"].to_numpy() + bhw * df_route["hw2"].to_numpy() + bch * df_route["ch2"].to_numpy()
    # numerically stable log-likelihood (log-sum-exp)
    maxV = np.maximum(V1, V2)
    log_denom = maxV + np.log(np.exp(V1 - maxV) + np.exp(V2 - maxV))
    logP1 = V1 - log_denom
    logP2 = V2 - log_denom
    ll = np.where(chosen == 1, logP1, logP2).sum()
    return -ll  # minimize negative log-likelihood

# starting values
start = np.array([-0.05, -0.5, -0.05, -1.0, 0.0])

res = minimize(neg_loglike, start, method="BFGS", options={"disp": True})

# results
params_est = res.x
ll_final = -res.fun
N = len(df_route)
null_ll = N * np.log(0.5)
mcff_r2 = 1 - (ll_final / null_ll)

# approximate std errors from inverse Hessian (BFGS provides hess_inv)
if hasattr(res, "hess_inv") and hasattr(res.hess_inv, "todense") is False:
    cov = res.hess_inv
else:
    try:
        cov = res.hess_inv.todense()
    except Exception:
        cov = None

se = np.sqrt(np.diag(cov)) if cov is not None else np.full_like(params_est, np.nan)

print("Estimates (bt, bc, bhw, bch, ASC):", np.round(params_est, 4))
print("Std. errors:", np.round(se, 4))
print("Log-likelihood (final):", ll_final)
print("Null log-likelihood:", null_ll)
print("McFadden R^2:", np.round(mcff_r2, 4))

# Value of Time in CHF per minute and per hour
beta_time_est, beta_cost_est = params_est[0], params_est[1]
vot_min = -beta_time_est / beta_cost_est
vot_hr = vot_min * 60
print(f"VOT: {vot_min:.4f} CHF/min ({vot_hr:.2f} CHF/hour)")

# simple prediction accuracy
V1 = beta_time_est * df_route["tt1"] + beta_cost_est * df_route["tc1_chf"] + params_est[2] * df_route["hw1"] + params_est[3] * df_route["ch1"] + params_est[4]
V2 = beta_time_est * df_route["tt2"] + beta_cost_est * df_route["tc2_chf"] + params_est[2] * df_route["hw2"] + params_est[3] * df_route["ch2"]
pred = np.where(V1 > V2, 1, 2)
acc = (pred == df_route["choice"]).mean()
print("Prediction accuracy:", np.round(acc, 4))

import numpy as np
from scipy.optimize import minimize

# ensure cost is scaled (adjust divisor if your describe() suggests different scale)
if "tc1_chf" not in df_route.columns:
    df_route["tc1_chf"] = df_route["tc1"] / 10.0
    df_route["tc2_chf"] = df_route["tc2"] / 10.0

chosen = df_route["choice"].to_numpy()

def neg_loglike(params):
    bt, bc, bhw, bch, basc = params
    V1 = bt * df_route["tt1"].to_numpy() + bc * df_route["tc1_chf"].to_numpy() + bhw * df_route["hw1"].to_numpy() + bch * df_route["ch1"].to_numpy() + basc
    V2 = bt * df_route["tt2"].to_numpy() + bc * df_route["tc2_chf"].to_numpy() + bhw * df_route["hw2"].to_numpy() + bch * df_route["ch2"].to_numpy()
    # numerically stable log-likelihood (log-sum-exp)
    maxV = np.maximum(V1, V2)
    log_denom = maxV + np.log(np.exp(V1 - maxV) + np.exp(V2 - maxV))
    logP1 = V1 - log_denom
    logP2 = V2 - log_denom
    ll = np.where(chosen == 1, logP1, logP2).sum()
    return -ll  # minimize negative log-likelihood

# starting values
start = np.array([-0.05, -0.5, -0.05, -1.0, 0.0])

res = minimize(neg_loglike, start, method="BFGS", options={"disp": True})

# results
params_est = res.x
ll_final = -res.fun
N = len(df_route)
null_ll = N * np.log(0.5)
mcff_r2 = 1 - (ll_final / null_ll)

# approximate std errors from inverse Hessian (BFGS provides hess_inv)
if hasattr(res, "hess_inv") and hasattr(res.hess_inv, "todense") is False:
    cov = res.hess_inv
else:
    try:
        cov = res.hess_inv.todense()
    except Exception:
        cov = None

se = np.sqrt(np.diag(cov)) if cov is not None else np.full_like(params_est, np.nan)

print("Estimates (bt, bc, bhw, bch, ASC):", np.round(params_est, 4))
print("Std. errors:", np.round(se, 4))
print("Log-likelihood (final):", ll_final)
print("Null log-likelihood:", null_ll)
print("McFadden R^2:", np.round(mcff_r2, 4))

# Value of Time in CHF per minute and per hour
beta_time_est, beta_cost_est = params_est[0], params_est[1]
vot_min = -beta_time_est / beta_cost_est
vot_hr = vot_min * 60
print(f"VOT: {vot_min:.4f} CHF/min ({vot_hr:.2f} CHF/hour)")

# simple prediction accuracy
V1 = beta_time_est * df_route["tt1"] + beta_cost_est * df_route["tc1_chf"] + params_est[2] * df_route["hw1"] + params_est[3] * df_route["ch1"] + params_est[4]
V2 = beta_time_est * df_route["tt2"] + beta_cost_est * df_route["tc2_chf"] + params_est[2] * df_route["hw2"] + params_est[3] * df_route["ch2"]
pred = np.where(V1 > V2, 1, 2)
acc = (pred == df_route["choice"]).mean()
print("Prediction accuracy:", np.round(acc, 4))


         Current function value: 1665.619946
         Iterations: 15
         Function evaluations: 168
         Gradient evaluations: 28
Estimates (bt, bc, bhw, bch, ASC): [-0.0598 -1.3173 -0.0374 -1.1521 -0.0159]
Std. errors: [0.0064 0.0964 0.0051 0.0785 0.0416]
Log-likelihood (final): -1665.6199462956301
Null log-likelihood: -2420.469954515329
McFadden R^2: 0.3119
VOT: -0.0454 CHF/min (-2.72 CHF/hour)
Prediction accuracy: 0.7864
         Current function value: 1665.619946
         Iterations: 15
         Function evaluations: 168
         Gradient evaluations: 28
Estimates (bt, bc, bhw, bch, ASC): [-0.0598 -1.3173 -0.0374 -1.1521 -0.0159]
Std. errors: [0.0064 0.0964 0.0051 0.0785 0.0416]
Log-likelihood (final): -1665.6199462956301
Null log-likelihood: -2420.469954515329
McFadden R^2: 0.3119
VOT: -0.0454 CHF/min (-2.72 CHF/hour)
Prediction accuracy: 0.7864


Now, let's repeat estimation but standardizes time and cost before optimization and uses L‑BFGS‑B. 

In [7]:

import numpy as np
from scipy.optimize import minimize

# --- scale numeric predictors to improve conditioning ---
# scale time and cost across both alternatives
time_vals = pd.concat([df_route["tt1"], df_route["tt2"]])
cost_vals = pd.concat([df_route["tc1_chf"], df_route["tc2_chf"]])
time_scale = time_vals.std() if time_vals.std() != 0 else 1.0
cost_scale = cost_vals.std() if cost_vals.std() != 0 else 1.0

df_route["tt1_s"] = df_route["tt1"] / time_scale
df_route["tt2_s"] = df_route["tt2"] / time_scale
df_route["tc1_s"] = df_route["tc1_chf"] / cost_scale
df_route["tc2_s"] = df_route["tc2_chf"] / cost_scale

chosen = df_route["choice"].to_numpy()

def neg_loglike(params):
    bt, bc, bhw, bch, basc = params
    V1 = bt * df_route["tt1_s"].to_numpy() + bc * df_route["tc1_s"].to_numpy() + bhw * df_route["hw1"].to_numpy() + bch * df_route["ch1"].to_numpy() + basc
    V2 = bt * df_route["tt2_s"].to_numpy() + bc * df_route["tc2_s"].to_numpy() + bhw * df_route["hw2"].to_numpy() + bch * df_route["ch2"].to_numpy()
    # numerically stable log-likelihood (log-sum-exp)
    maxV = np.maximum(V1, V2)
    log_denom = maxV + np.log(np.exp(V1 - maxV) + np.exp(V2 - maxV))
    logP1 = V1 - log_denom
    logP2 = V2 - log_denom
    ll = np.where(chosen == 1, logP1, logP2).sum()
    return -ll

# starting values adjusted for standardized predictors (smaller mags)
start = np.array([-0.06, -1.3 * cost_scale, -0.04, -1.0, 0.0])  # if you prefer, use smaller for cost too

# use L-BFGS-B (more robust for large problems) and allow more iterations
res = minimize(neg_loglike, start, method="L-BFGS-B",
               options={"disp": True, "maxiter": 2000})

# check solver status
print("Optimization success:", res.success)
print("Message:", res.message)

# transform estimated betas back to original units:
# bt (per std(time)) -> per minute: bt / time_scale
# bc (per std(cost)) -> per CHF: bc / cost_scale
params_std = res.x
params_est = params_std.copy()
params_est[0] = params_std[0] / time_scale
params_est[1] = params_std[1] / cost_scale
# bhw, bch, basc unchanged (they were not scaled)
print("Estimated (transformed to original units):", np.round(params_est, 6))

# compute final reported metrics using transformed params
bt, bc, bhw, bch, basc = params_est
V1 = bt * df_route["tt1"] + bc * df_route["tc1_chf"] + bhw * df_route["hw1"] + bch * df_route["ch1"] + basc
V2 = bt * df_route["tt2"] + bc * df_route["tc2_chf"] + bhw * df_route["hw2"] + bch * df_route["ch2"]
maxV = np.maximum(V1, V2)
log_denom = maxV + np.log(np.exp(V1 - maxV) + np.exp(V2 - maxV))
logP1 = V1 - log_denom
logP2 = V2 - log_denom
ll_final = np.where(chosen == 1, logP1, logP2).sum()
N = len(df_route)
null_ll = N * np.log(0.5)
print("Log-likelihood (final):", ll_final)
print("Null log-likelihood:", null_ll)
print("McFadden R^2:", 1 - (ll_final / null_ll))


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            5     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  1.95807D+03    |proj g|=  1.28487D+03

At iterate    1    f=  1.95563D+03    |proj g|=  2.17328D+02

At iterate    2    f=  1.95249D+03    |proj g|=  7.04387D+02

At iterate    3    f=  1.93917D+03    |proj g|=  2.02540D+03

At iterate    4    f=  1.89941D+03    |proj g|=  4.32160D+03

At iterate    5    f=  1.83745D+03    |proj g|=  5.53711D+03

At iterate    6    f=  1.76980D+03    |proj g|=  4.33477D+03

At iterate    7    f=  1.73797D+03    |proj g|=  2.12364D+03

At iterate    8    f=  1.72935D+03    |proj g|=  4.96715D+02

At iterate    9    f=  1.72814D+03    |proj g|=  2.10130D+02

At iterate   10    f=  1.72777D+03    |proj g|=  3.45625D+02

At iterate   11    f=  1.72687D+03    |proj g|=  6.24157D+02

At iterate   12    f=  1.72468D+03    |proj g|=  1.00332D+03

At iterate   13    f=  1.7

 This problem is unconstrained.


In the next notebook, we will apply what we learned to a couple of mini-projects using the Apollo time use and drug choice datasets. We'll do some exploratory data analysis and maybe a simple simulation or model on each, to further solidify these skills.