In the past, I've done some models using several libraries in Python without really understanding what I was doing. I want to implement multiple learning models, using less -or **none**- *pre-made* ML resources. I hope this will be reflected in a  deeper understanding on what each learning model does.

This time, I'll be implementing **Linear regression** in Python.

I'll be using `pandas` to import and process data sets, `numpy` to get useful linear algebra functions, and `pyplot` to plot graphics.

In [1]:
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Import and examine data

In [2]:
# import dataframe
df = pd.read_csv('../input/kc_house_data.csv')

By looking at the original dataset, I see that
`df[2], price` will be `y`, my prediction target.
`df[0], id` is not a predictor variable, so I won't include it in my feature matrix `X`, which is all of the remaining columns.

In [3]:
# This re-orders dataframe in a more convenient way for me to work in further steps.
cols = ['id', 'price', 'floors', 'bedrooms', 'bathrooms', 'condition', 'grade', 'sqft_living',
        'sqft_lot', 'sqft_above', 'sqft_basement', 'sqft_living15', 'sqft_lot15', 'view',
        'waterfront', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 'long', 'date']
df = df[cols]

# Get y (price), and X (feature matrix)
y = df.iloc[:,1:2].copy()
X = df.iloc[:,2:].copy()

Now, I'll decide which features need to be normalized. To do that, I'll take a look at every feature, and its min and max values.

In [4]:
# This generates a 2 x n matrix, with min and max values of every feature
def minMax(x):
    return pd.Series(index=['min','max'],data=[x.min(),x.max()])
print (X.apply(minMax))

     floors  bedrooms  bathrooms  condition  grade  sqft_living  sqft_lot  \
min     1.0         0        0.0          1      1          290       520   
max     3.5        33        8.0          5     13        13540   1651359   

     sqft_above  sqft_basement  sqft_living15  sqft_lot15  view  waterfront  \
min         290              0            399         651     0           0   
max        9410           4820           6210      871200     4           1   

     yr_built  yr_renovated  zipcode      lat     long             date  
min      1900             0    98001  47.1559 -122.519  20140502T000000  
max      2015          2015    98199  47.7776 -121.315  20150527T000000  


`floors`, `bedrooms`, `bathrooms`, `condition`, `grade`, and `view` **can** be normalized and scaled, and doing isn't bad, so i'll do it.

All `sqft_#` features **should** be normalized and scaled.

`waterfront` acts like a boolean, and takes either 0 or 1. So I won't be normalizing it.

In [5]:
# Array containing the indexes of the columns to normalize
toNormalize = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

To normalize, we implement the mathematical model into a python function

$$
x_{i}:=\dfrac{x_{i}−μ_{i}}{s_{i}}
$$

In [6]:
# X is the feature matrix, and a the array of the features to be normalized
def normalize(X, arr):
    X_norm = X[:]    # copy feature matrix
    mu = []          # median matrix
    sigma = []       # std matrix
    for i in range(len(arr)):
        mu.append(X_norm.iloc[:,arr[i]].mean())        # append median of (arr[i]th) column
        sigma.append(X_norm.iloc[:,arr[i]].std())      # append std of (arr[i]th)column
        # normalize (arr[i]th) column
        X_norm.iloc[:,arr[i]] = (X_norm.iloc[:,arr[i]] - mu[i]) / sigma[i]
    return mu, sigma, X_norm

# mu list contains median of all normalized columns
# sigma list contains std of all normalized columns
# X_norm is X matrix normalized
[mu, sigma, X_norm] = normalize(X, toNormalize)

Now we modify X_norm, adding the intercept term.
