# Ridge Regression Demo
Ridge extends linear regression by providing L2 regularization of the coefficients. It can reduce the variance of the predictors, and improves the conditioning of the problem.

The model can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well as cuDF DataFrames as the input. 

For information about cuDF, refer to the cuDF documentation: https://rapidsai.github.io/projects/cudf/en/latest/

For information about cuML's ridge regression implementation: https://rapidsai.github.io/projects/cuml/en/latest/api.html#ridge-regression

In [None]:
import os

import numpy as np

import pandas as pd
import cudf as gd

from sklearn.datasets import make_regression

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

from cuml.linear_model import Ridge as cuRidge
from sklearn.linear_model import Ridge as skRidge

## Define Parameters

In [None]:
n_samples = 2**20
n_features = 399

## Generate Data

### Host

In [None]:
%%time
X,y = make_regression(n_samples=n_samples, n_features=n_features, random_state=0)

X = pd.DataFrame(X)
y = pd.DataFrame(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=0)

### GPU

In [None]:
%%time
X_cudf = gd.DataFrame.from_pandas(X_train)
X_cudf_test = gd.DataFrame.from_pandas(X_test)

y_cudf = gd.Series(y_train.values[:,0])

## Scikit-learn Model

### Fit

In [None]:
%%time
ridge_sk = skRidge(fit_intercept=False,
                   normalize=True,
                   alpha=0.1)

ridge_sk.fit(X_train, y_train)

### Predict

In [None]:
%%time
predict_sk= ridge_sk.predict(X_test)
error_sk = mean_squared_error(y_test, predict_sk)

## cuML Model

### Fit

In [None]:
%%time
# run the cuml ridge regression model to fit the training dataset.  Eig is the faster algorithm, but svd is more accurate 
ridge_cuml = cuRidge(fit_intercept=False,
                     normalize=True,
                     solver='svd',
                     alpha=0.1)

ridge_cuml.fit(X_cudf, y_cudf)

### Predict

In [None]:
%%time
predict_cuml = ridge_cuml.predict(X_cudf_test).to_array()
error_cuml = mean_squared_error(y_test, predict_cuml)

## Evaluate Results

In [None]:
print("SKL MSE(y): %s" % error_sk)
print("CUML MSE(y): %s" % error_cuml)