This notebook is an introductory notebook created to explore O'Reilly Introduction to ML w/Python in the area of linear regressions.

In [15]:
from sklearn.linear_model import LinearRegression
import mglearn
from sklearn.model_selection import train_test_split
X,y = mglearn.datasets.make_wave(n_samples=60)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

Standard linear regression as a baseline. 
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

In [22]:
lr = LinearRegression().fit(X_train, y_train)

Score is the coefficient of determination R2 defined as R2 = (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative.

In [23]:
print("Train set score: {:.8f}".format(lr.score(X_train, y_train)))
print("Test set score: {:.8f}".format(lr.score(X_test, y_test)))

Train set score: 0.67008903
Test set score: 0.65933686


Ridge regression focusses on regularization (linear least squares with l2 regularization), which is less likely to overfit.
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html

In [18]:
from sklearn.linear_model import Ridge
ridge10 = Ridge().fit(X_train, y_train)
print("Train set score: {:.8f}".format(ridge10.score(X_train, y_train)))
print("Test set score: {:.8f}".format(ridge10.score(X_test, y_test)))

Train set score: 0.67005993
Test set score: 0.65779469


Decreasing alpha decreases the impact of regularization.

In [19]:
ridge01 = Ridge(alpha=0.1).fit(X_train, y_train)
print("Train set score: {:.8f}".format(ridge01.score(X_train, y_train)))
print("Test set score: {:.8f}".format(ridge01.score(X_test, y_test)))

Train set score: 0.67008874
Test set score: 0.65918341


Lasso regression also focusses on regularization (L1 prior as regularizer aka the Lasso) by forcing some theta (coefficients) to zero.  This can lead to underfitting by removing to many features.
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html

In [20]:
from sklearn.linear_model import Lasso
lasso10 = Lasso().fit(X_train, y_train)
print("Train set score: {:.8f}".format(lasso10.score(X_train, y_train)))
print("Test set score: {:.8f}".format(lasso10.score(X_test, y_test)))

Train set score: 0.28528831
Test set score: 0.23759680


As with Ridge, we can decrease alpha to reduce regularization. The number of iterations is increased.

In [21]:
from sklearn.linear_model import Lasso
lasso01 = Lasso(alpha=0.1,max_iter=100000).fit(X_train, y_train)
print("Train set score: {:.8f}".format(lasso01.score(X_train, y_train)))
print("Test set score: {:.8f}".format(lasso01.score(X_test, y_test)))

Train set score: 0.66624102
Test set score: 0.63935113
