# Practical Practices

## Feature Scaling

Making sure features are on a similar scale, gradient descent can converge quicker.

(rounder contour of the cost function) ⇒ for example, range from (0, 1) or (-1, 1)

### Mean normalization

Replace $x_i$ with $x_i - \mu_i$ to make features have approximately zero mean

$\frac{x_i -\mu_i}{s_1}$ 

$s_1$ = range (max - min)

In [25]:
# import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [26]:
# get audi car price dataset
# https://www.kaggle.com/adityadesai13/used-car-dataset-ford-and-mercedes
dataset = pd.read_csv('data/audi.csv')

# print dataset
dataset.iloc[:5,:]

Unnamed: 0,model,year,price,transmission,mileage,fuelType,tax,mpg,engineSize
0,A1,2017,12500,Manual,15735,Petrol,150,55.4,1.4
1,A6,2016,16500,Automatic,36203,Diesel,20,64.2,2.0
2,A1,2016,11000,Manual,29946,Petrol,30,55.4,1.4
3,A4,2017,16800,Automatic,25952,Diesel,145,67.3,2.0
4,A3,2019,17300,Manual,1998,Petrol,145,49.6,1.0


In [27]:
# get only A4's
dataset['model'] = dataset['model'].str.strip()
a4_dataset = dataset[dataset['model'] == "A4"]
a4_dataset.iloc[:5,:]

Unnamed: 0,model,year,price,transmission,mileage,fuelType,tax,mpg,engineSize
3,A4,2017,16800,Automatic,25952,Diesel,145,67.3,2.0
7,A4,2016,11750,Manual,75185,Diesel,20,70.6,2.0
25,A4,2017,18500,Automatic,17418,Diesel,145,62.8,2.0
28,A4,2018,17200,Automatic,25138,Diesel,145,70.6,2.0
38,A4,2017,16000,Manual,29063,Diesel,145,70.6,2.0


In [28]:
input_data = np.array([a4_dataset.mileage, a4_dataset.engineSize, a4_dataset.tax, a4_dataset.mpg])
input_data = input_data.transpose(1, 0)
print(input_data)

[[2.5952e+04 2.0000e+00 1.4500e+02 6.7300e+01]
 [7.5185e+04 2.0000e+00 2.0000e+01 7.0600e+01]
 [1.7418e+04 2.0000e+00 1.4500e+02 6.2800e+01]
 ...
 [2.3700e+04 2.0000e+00 3.0000e+01 6.1400e+01]
 [7.8000e+04 3.0000e+00 3.0500e+02 3.9800e+01]
 [9.5000e+04 2.0000e+00 1.4500e+02 5.3300e+01]]


In [29]:
def mean_normalization(data):
    d_range = np.max(data) - np.min(data)
    d_mean = np.average(data)
    normalize = lambda x: (x - d_mean)/d_range
    return normalize(data)

for i in range(input_data.shape[1]):
    input_data[:,i] = mean_normalization(input_data[:,i])
    
print(input_data)

[[-0.02526515 -0.00322403  0.06086995  0.25125454]
 [ 0.30741273 -0.00322403 -0.15842829  0.31533221]
 [-0.08293121 -0.00322403  0.06086995  0.1638759 ]
 ...
 [-0.0404824  -0.00322403 -0.14088443  0.13669144]
 [ 0.32643429  0.23487121  0.34157171 -0.28272604]
 [ 0.44130691 -0.00322403  0.06086995 -0.02059012]]


## Learning Rate

$J(\theta)$ should decrease after every iteration

Declare convergence if $J(\theta)$ decreases by less than $10^{-3}$ in one iteration

- If $J(\theta)$ is increasing, use a smaller $\alpha$
- If $J(\theta)$ is going up and down, use a smaller $\alpha$
- if $J(\theta)$ converges too slowly, use a larger $\alpha$

## Features

Create new features from current feature

## Polynomial Regression

$h_\theta(x) = \theta_0 + \theta_1(size) + \theta_2(size)^2+\theta_3(size)^3 + ...$

$x_1 = size$

$x_2 = size^2$

$x_3 = size^3$