# Linear Regression with Numpy Implementation

## 1. Introduction

Linear regression is the most important machine learning model for continuous response. I will summarize its methodology and implement it from scratch using NumPy.

The problem we want to solve is prediction for **continuous response,** for example, we want to use house size and the number of rooms, $(x_1, x_2)$, to predict house price, $y \in R$, such as house price.

## 2. Linear Regression Model

Generally, we would like to use multiple features $(x_1, ..., x_p)$ for better prediction of response $y$, with the observed data $\{(y_i, x_{i1}, ..., x_{ip})\},\ i = 1, ..., n\}$, where $n$ is the number of examples. Suppose

$$
y_i = b + w_1 x_{i1} + ...+ w_p x_{ip} + \epsilon_i = b + x_i^T w + \epsilon_i,
$$

where
- $b$ and $w = (w_1, ..., w_p)^T$ are the unknown model parameters; $b$ is the model bias, and $w$ is the model weights, and
- $\epsilon_i = y_i - (b + x_i^T w)$ is the resulting estimation error.

We want to estimate $(b, w)$ which minimize the **mean squared errors:**

$$
J(b, w) = \frac{1}{2m} \sum_{i=1}^n (y_i - (b + x_i^T w))^2.
$$

[To be continued.]

## 3. Data Preparation and Preprocessing

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
from sklearn.model_selection import train_test_split

In [3]:
true_w = np.array([2, -3.4])
true_b = 4.2

n_examples = 1000
n_inputs = len(true_w)

X = np.random.normal(scale=1, size=(n_examples, n_inputs))
y = np.matmul(X, true_w) + true_b + np.random.normal(scale=0.01, size=n_examples)

In [4]:
X[:3]

array([[0.87476274, 0.54884147],
       [0.15622398, 0.95024411],
       [0.11297256, 2.09362462]])

In [5]:
y[:3]

array([ 4.06239435,  1.276877  , -2.68890923])

In [6]:
# Split data into training and test datasets.
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=71, shuffle=True)

## 4. Fitting Linear Regression from Scratch

[To be continued.]