
# Gradient Descent Lab

*  We implement our very own gradient descent algorithm to solve the task of predict median house values in Californian districts, given a number of features from these districts.
* In this notebook, we strip out a lot of the data investigation work. In addition, we only consider a small subset of the columns
* Read/run through the notebook and fund the exercises at the end


# Setup

First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20.

In [None]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np

import os
import tarfile
import urllib.request

import pandas as pd

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt

from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import mean_squared_error


# Get the Data
   
   But first, few constant and f()s

In [None]:
# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "end_to_end_project"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

## Download the Data

In [None]:
DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
        os.makedirs(housing_path)
    tgz_path = os.path.join(housing_path, "housing.tgz")
    urllib.request.urlretrieve(housing_url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close()

In [None]:
fetch_housing_data()

In [None]:
def load_housing_data(housing_path=HOUSING_PATH):
    csv_path = os.path.join(housing_path, "housing.csv")
    return pd.read_csv(csv_path)

In [None]:
housing = load_housing_data()
housing.head()

## Create a Test Set

In [None]:
# to make this notebook's output identical at every run
np.random.seed(42)

In [None]:
import numpy as np

# For illustration only. Sklearn has train_test_split()
def split_train_test(data, test_ratio):
    shuffled_indices = np.random.permutation(len(data))
    test_set_size = int(len(data) * test_ratio)
    test_indices = shuffled_indices[:test_set_size]
    train_indices = shuffled_indices[test_set_size:]
    return data.iloc[train_indices], data.iloc[test_indices]

In [None]:
train_set, test_set = split_train_test(housing, 0.2)
len(train_set)

In [None]:
len(test_set)

# Discover and Visualize the Data to Gain Insights

## Lot's of graphs and geographical displays ... will skip for this lab

## Looking for Correlations ... will skip for this lab

## Experimenting with Attribute Combinations ... will skip for this lab

# Prepare the Data for Machine Learning Algorithms

## Data Cleaning ... minimal

In [None]:
housing_cols = ['housing_median_age', 'total_rooms', 'population', 'median_income']
housing_num = housing[housing_cols].fillna(housing[housing_cols].median())
housing_labels = housing['median_house_value']

## Transformation Pipelines

Now let's build a pipeline for preprocessing the numerical attributes:

In [None]:
num_pipeline = Pipeline([
        ('imputer', SimpleImputer(strategy="median")),
        ('std_scaler', StandardScaler()),
    ])

housing_num_tr = num_pipeline.fit_transform(housing_num)

In [None]:
housing_num_tr

# Select and Train a Model

## Easy way : Training and Evaluating on the Training Set

In [None]:
sgd_reg = SGDRegressor(max_iter=10000, eta0=1e-3, tol=1e-3, random_state=42)
sgd_reg.fit(housing_num_tr, housing_labels)

In [None]:
housing_predictions = sgd_reg.predict(housing_num_tr)
sgd_mse = mean_squared_error(housing_labels, housing_predictions)
sgd_rmse = np.sqrt(sgd_mse)
sgd_rmse


In [None]:
print(" theta values from SGD",sgd_reg.coef_)
print(" y intercept ", sgd_reg.intercept_)

In [None]:
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()
lin_reg.fit(housing_num_tr, housing_labels)

In [None]:
from sklearn.metrics import mean_squared_error

housing_predictions = lin_reg.predict(housing_num_tr)
lin_mse = mean_squared_error(housing_labels, housing_predictions)
lin_rmse = np.sqrt(lin_mse)
lin_rmse

In [None]:
print(" theta values from Linear regression",sgd_reg.coef_)
print(" y intercept ", sgd_reg.intercept_)

## Manual SGD : Training and Evaluating on the Training Set

Sci-kit learn SGD and linear regression add a term for the y-intercept automatically, but we have to do it ourselves for the manual solution

In [None]:
housing_num_tr_b = np.c_[np.ones((housing_num_tr.shape[0], 1)), housing_num_tr] # add x0 = 1 to each instance

### Exercise : Cost Function

Implement the Mean squared error cost function for a linear regression (Pg 114 of the book)

In [None]:
# Define computeCost function
def computeCost(X, y, theta):
 

We test out our cost function for specific theta values

In [None]:
# Calculate computeCost with theta equal to zeros
theta = np.zeros(housing_num_tr_b.shape[1])

J1 = computeCost(housing_num_tr_b, housing_labels, theta)
print("With theta = %s, \nCost computed = %.2f " % (theta, J1))

#Answer Should be :
#    With theta = [0. 0. 0. 0. 0.], 
#    Cost computed = 28052415994.94 

### Exercise : Batch Gradient Descent 

Implement the batch gradient descent algorithm, save the cost at every iteration and return it along with 
the improved theta value

In [None]:
#  Define the gradient descent algorithm and return cost history and theta

def gradientDescent(X, y, theta, iterations, eta):
    J_hist = np.zeros([iterations])

    
    
    return theta, J_hist

Now run the batch gradient descent algorithm and check against my results

In [None]:
theta, J_hist = gradientDescent(housing_num_tr_b, housing_labels, theta, iterations=5000,eta=1e-3)
J1 = computeCost(housing_num_tr_b, housing_labels, theta)
J1

# With theta initialized to zero, I get 3221295137.061254 for the cost 

In [None]:
print(" theta values from Manual BGD",theta[1:4])
print(" y intercept ", theta[0])

In [None]:
# Lets look at how the cost changes through iterations
plt.title("J_history Plot")
plt.plot(J_hist)

### Exercise : Compute the root mean squared error 

Compare results to linear regression and Stochastic gradient descent. Why do you think the results are not quite as good as the scikit learn results ? (May want to look 

I get 80265.74782634563 with theta initialized to zero, iterations=5000,eta=1e-3

### Exercise : Hyperparameters

Change some of the hyperparameters (iterations or eta) to get an improved root mean squared (I was able to match scki-kit learn results of 80211) 

### Graduate students, Extra credit 

Implement Stochastic gradient descent and compare the performance to batch gradient descent. Explain the results