# Benchmark SGD Implementations  
In this iPython notebook, we will benchmark the Parallel implementaton of SGD against the implementation in SciKit-Learn library. We will use SGD in both cases for regression on synthetic data.   

## The Dataset  
The dataset will be generated using SciKit-Learn's `make_regression` function.  

In [1]:
from sklearn.datasets import make_regression

In [2]:
n_samples = 1000
n_features = 100
seed = 1

In [3]:
X, y = make_regression(n_samples=n_samples, 
                          n_features=n_features, 
                          random_state=seed)

Split the dataset into train and test

In [4]:
from sklearn.cross_validation import ShuffleSplit

In [5]:
for train, test in ShuffleSplit(n=n_samples, n_iter=1, test_size=0.2):
    pass

In [6]:
X_train = X[train]
X_test = X[test]
y_train = y[train]
y_test = y[test]

## SciKit-Learn SGD

In [7]:
from sklearn.linear_model import SGDRegressor

In [8]:
sgd = SGDRegressor()

In [9]:
%time sgd.fit(X_train, y_train)

CPU times: user 3.76 ms, sys: 1.77 ms, total: 5.53 ms
Wall time: 15.6 ms


SGDRegressor(alpha=0.0001, average=False, epsilon=0.1, eta0=0.01,
       fit_intercept=True, l1_ratio=0.15, learning_rate='invscaling',
       loss='squared_loss', n_iter=5, penalty='l2', power_t=0.25,
       random_state=None, shuffle=True, verbose=0, warm_start=False)

In [10]:
sgd.score(X_test, y_test)

0.99982297026241373

##  Parallel SGD
**Work in progress.**  
This part of the notebook has not been parallelized yet.  

In [20]:
import parallel_sgd
reload(parallel_sgd)

<module 'parallel_sgd' from 'parallel_sgd.py'>

In [21]:
psgd = parallel_sgd.ParallelSGDRegressor()

In [22]:
%time psgd.fit(X_train, y_train)

CPU times: user 3.06 ms, sys: 2.6 ms, total: 5.67 ms
Wall time: 5.12 ms


ParallelSGDRegressor(alpha=0.0001, average=False, epsilon=0.1, eta0=0.01,
           fit_intercept=True, l1_ratio=0.15, learning_rate='invscaling',
           loss='squared_loss', n_iter=5, penalty='l2', power_t=0.25,
           random_state=None, shuffle=True, verbose=0, warm_start=False)