<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Gradient Descent in Sklearn

_Authors: Kiefer Katovich (SF)_

---

Until now we've been using specific sklearn model classes to perform regression and classification such as `LinearRegression` and `LogisticRegression`. Unfortunately, while these methods work well on smaller datasets with relatively small numbers of columns, once you start getting into "Medium Data" these slow down to a crawl, and take up so much memory that fitting them becomes mind-numbingly slow (especially on a laptop).

Luckily, sklearn comes with  stochastic gradient descent solvers for regression and classification:
- `SGDRegressor`
- `SGDClassifier`

Due to its ability to minimize the loss function iteratively on smaller portions of the data, it avoids the intense slowdown other models suffer on large datasets.

> **Note:** The gradient descent solvers are very flexible and can fit a variety of different model types not covered here. I highly recommend reading their documentation in detail.

---

### SF assessor data

This lab uses data from the SF assessor's office on housing prices in San Francisco - it's already cleaned up.

You can see that the dataset has 250k rows. When expanding this with dummy-coded categorical columns it can become quite large. Be careful that you don't exceed the memory on your computer.


In [29]:
import numpy as np
import scipy 
import seaborn as sns
import pandas as pd
import scipy.stats as stats

import patsy

import matplotlib
import matplotlib.pyplot as plt

%config InlineBackend.figure_format = 'retina'
%matplotlib inline

plt.style.use('fivethirtyeight')

### 1. Load the data.

Examine the columns.

In [2]:
prop = pd.read_csv('./datasets/assessor_sample.csv')

In [3]:
# A:

(754147, 17)

### 2. Sample down the data

Despite this already being a sample of the full assessor dataset, you should sample the data down further the sake of speed and your computers memory.

Use the `.sample()` function for pandas dataframes to subset this down to < 25000 rows. 

Sampling down large datasets is a common procedure. Finding the optimal parameters with larger subsets of the data may change the hyperparameters and the results, and will get you closer to the best coefficients, but the returns are marginal at a point.

In [34]:
# prop_samp = prop.sample(n=25000)

### 3. Regression with stochastic gradient descent

Below I set up X, y data predicting value (housing price) from the remaining variables. There are ~75,000 rows, with 170 columns.


The `SGDRegressor` is very general and flexible, and can be customized with a variety of keyword arguments.

**Arguments**
- `loss`: `['squared_loss','huber', ...]`
    - The `'squared_loss'` loss corresponds to solving a regression with the least squares loss. This is what I expect you'll use, but there are other options. Huber loss is a "robust" regression loss.
- `penalty`: `['none','l1','l2','elasticnet']`
    - This defines the penalty on the regression that you would like to solve. The l1 and l2 are the Lasso and Ridge, while the elasticnet is the combination of them both.
- `alpha`
    - The regularization strength to be used with a chosen penalty. Same as in Lasso and Ridge.
- `l1_ratio`
    - The mix of the Lasso and Ridge penalties when elasticnet is chosen as the penalty.
- `n_iter`
    - The number of training "epochs" over the data. This is the number of passes that the gradient descent algorithm will make over the data to iteratively fit the weights (defaults to 5).

`SGDRegressor` is most often used in tandem with grid searching to find the optimal parameters for certain models. 

**It is up to you how you want to define the model. You should:**

1. Choose a target to estimate (this should be continuous).
- Select predictors to use.
- Standardize your predictor matrix.
- Build a stochastic gradient descent solver to fit your model. You will likely want to do some kind of gridsearch to find the optimal parameters for your model.
- Describe the model selected through gridsearch and compare the performance to baseline.
- Examine and interpret the coefficients.

In [35]:
# A:

value ~ baths + beds + lot_depth + basement_area + front_ft + owner_pct + rooms + property_class + neighborhood + tax_rate + volume + sqft + stories + year_recorded + year_built + zone -1
(25000,) (25000, 163)


### 4. Classification with stochastic gradient descent

The `SGDClassifier` is very similar to the `SGDRegressor`. The main difference is that the loss functions are changed to regression loss functions.

**Arguments**
- `loss`: `['log', ...]`
    - The `'log'` loss corresponds to solving a logistic regression classifier. This is what I expect you'll use, but there are many other options.
- `penalty`: `['none','l1','l2','elasticnet']`
    - This defines the penalty on the regression that you would like to solve. The l1 and l2 are the Lasso and Ridge, while the elasticnet is the combination of them both.
- `alpha`
    - The regularization strength to be used with a chosen penalty. Same as in Lasso and Ridge.
- `l1_ratio`
    - The mix of the Lasso and Ridge penalties when elasticnet is chosen as the penalty.
- `n_iter`
    - The number of training "epochs" over the data. This is the number of passes that the gradient descent algorithm will make over the data to iteratively fit the weights (defaults to 5).

Like `SGDRegressor`, `SGDClassifier` is most often used in tandem with grid searching to find the optimal parameters for certain models. 

**It is up to you how you want to define the model. You should:**

1. Choose a target to classify (you may need to engineer one from existing variables).
- Calculate the baseline accuracy.
- Select predictors to use.
- Standardize your predictor matrix.
- Build a stochastic gradient descent solver to fit your model. You will likely want to do some kind of gridsearch to find the optimal parameters for your model.
- Describe the model selected through gridsearch and compare the performance to baseline.
- Examine and interpret the coefficients.

In [49]:
# A:

Index([u'baths', u'beds', u'lot_depth', u'basement_area', u'front_ft',
       u'owner_pct', u'rooms', u'property_class', u'neighborhood', u'tax_rate',
       u'volume', u'sqft', u'stories', u'year_recorded', u'year_built',
       u'zone', u'value'],
      dtype='object')