In [2]:
import sys
import os
if not any(path.endswith('textbook') for path in sys.path):
    sys.path.append(os.path.abspath('../../..'))
from textbook_utils import *

In [3]:
outlier_czs = [34105, 34113, 34112, 34106]
df = (
    pd.read_csv('data/mobility.csv')
    # filter out rows with NaN AUM values
    .query('not aum.isnull()', engine='python')
    # take out outlier CZs
    .query('cz not in @outlier_czs')
)

predictors = [
    'frac_traveltime_lt15',
    'gini',
    'dropout_r',
    'rel_tot',
    'cs_fam_wkidsinglemom',
    'taxrate',
    'gradrate_r',
    'frac_worked1416',
    'cs_born_foreign',
]

X = (df[predictors]
    # Some predictors are missing; we'll drop them for simplicity
    .dropna()
    .assign(intr=1)
    # Move intercept column to appear first
    [['intr', *predictors]]
)
y = df.loc[X.index, 'aum']
X = X.reset_index(drop=True)
y = y.reset_index(drop=True)

(sec:linear_multi_fit)=
# Fitting the Multiple Linear Model

For a $ n \times (p + 1) $ design matrix $ X $, a $ n $-dimensional
column vector of outcomes $ y $, and a $ (p + 1) $-dimensional column 
vector of model parameters $ \theta $, we assume that:

$$
\begin{aligned}
y = X \theta + \epsilon
\end{aligned}
$$

Here, $ \epsilon $ is a $ n $-dimensional column vector that represents the
sampling error.
We define the multiple linear model as:

$$
\begin{aligned}
f_{\theta}(X) = X \theta
\end{aligned}
$$

Similar to the simple linear model, we'll fit $ f_{\theta}(X) $ using
the squared loss function.
We want to find the model parameters $ \hat{\theta} $ that minimize the
mean squared loss:

$$
\begin{aligned}
L(\theta, X, y)
 &= \frac{1}{n} \left | y - f_{\theta}(X) \right|^2
\end{aligned}
$$

Here, we're using the notation $ |v|^2 $ for a vector $ v $ as a
shorthand for the sum of each vector element squared [^l2]:
$ |v|^2 = \sum_i v_i^2 $ .

[^l2]: $ |v| $ is also called the $ \ell_2 $ norm of a
vector $ v $.

In this section, we'll fit our model by figuring out what the
minimizing $ \hat{\theta} $ is.
One idea is to use calculus as we did for the simple linear model.
However, this approach needs knowledge of vector calculus that we won't
cover in this book.
Instead, we'll use a geometric argument.

## A Geometric Problem

Our goal is the find the $ \hat{\theta} $ that minimizes our loss
function---we want to make $ L(\theta, X, y) $ as small as possible
for a given $ X $ and $ y $.
The key insight is that we can restate this goal in a geometric way.
Remember: the model predictions $ f_{\theta}(X) $ and the true outcomes
$ y $ are both vectors.
We can treat vectors as points---for example, we can plot
the vector $ [ 2, 3 ] $ at $ x = 2, y = 3 $ in 2D space.
Then, minimizing $ L(\theta, X, y) $ is equivalent to finding
$ \hat{\theta} $ that makes $ f_{\theta}(X) $ as close as possible to
$ y $ when we plot them as points.
As depicted in {numref}`Figure %s <fig:geom-2d>`, different values of
$ \theta $ give different predictions $ f_{\theta}(X) $ (hollow points).
Then, $ \hat{\theta} $ is the vector of parameters that put
$ f_{\theta}(X) $ as close to $ y $ (filled point) as possible.

```{figure} figures/geom-2d.svg
---
name: fig:geom-2d
width: 250px
---

A plot showing different values of $ f_{\theta}(X) $ (hollow points) and
the outcome vector $ y $ (filled point).
```

Next, we'll look at the possible values of $ f_{\theta}(X) $.
In {numref}`Figure %s <fig:geom-2d>`, we showed a few possible 
$ f_{\theta}(X) $.
Instead of just plotting a few possible points, we can
plot *all* possible values of $ f_{\theta}(X) $ by varying $ \theta $.
This results in a subspace of possible $ f_{\theta}(X) $ values, as shown in
{numref}`Figure %s <fig:geom-span>`.

```{figure} figures/geom-span.svg
---
name: fig:geom-span
width: 250px
---

A plot showing all possible values of $ f_{\theta}(X) $ as a line.
```

The 

Next, we notice that the vector $ f_{\theta}(X) $ always lies in the
span of the columns of $ X $ .

The key insight is that the model predictions
$ f_{\theta}(X) = X \theta $ will always lie in the span of the 
columns of the design matrix $ X $.