# Simple fits

Here we are fitting a line from scratch.
In the next notebook, we will do fancier fits with neural networks, but let's start with a basic problem and complicate it as we go along.


In [1]:
from typing import Tuple

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


We start by generating some fake dataset, which is simple enough that we can visualize the results easily. For this reason, the dataset will contain only two  variables.

The simulated example data will be $f(x) = 3 x + \epsilon$, where $\epsilon \sim \mathcal{N}(\mu=0, \sigma=0.5)$.


In [2]:
def generate_data(N: int) -> np.ndarray:
    x = 2*np.random.randn(N, 1)
    epsilon = 0.5*np.random.randn(N, 1)
    z = 3*x + epsilon
    return np.concatenate((x, z), axis=1).astype(np.float32)

data = generate_data(N=1000)

We can fit this line from scratch, assuming $y = f(x) = \beta x + \alpha + \epsilon$, where $\epsilon$ is a zero-mean Gaussian noise.

How would you do it? Feel free to use standard Python modules. Look at the solution for a simple mathematical expression for this fit with a full derivation.

Tip: Look for the documentation for `numpy.linalg.lstsq`.