### Ridge Regression

    -Ridge Regression is almost identical to Linear Regression except that we introduce a small amount 
    of bias. In return, we get a large drop in variance. Ultimately, by starting off with a slightly worse
    fit, Ridge Regression performs better against data that doesn't exactly follow the same pattern
    as the data the model was trained on. 
    
    -Ridge Regression is sometimes referred to as L2 regression. This term is introduced to the loss function
    of a least squares regression model. The goal is to seek coefficients that fit the data well resulting in a low RSS. However, we introduct the term...
    
$$ \lambda \sum_jB^2_j$$
   



    which is referred to as a shrinkage penatly. The penalty is small when the coefficients are close to zero. It has the effect of shrinking the coefficient estimates towards zero. The lambda value serves to control the relative impact of the term on the coefficient estimates. When lambda equals 0, then it has no effect, and ridge
    regression will produce the same output as the linear regression. 

In [122]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import california_housing
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler


In [139]:
data = pd.read_csv('/Users/Matt/Documents/Intro To Stat Learning/MachineLearningFromScratch/data/Credit.csv')
df = data[['Income','Rating','Balance','Limit']]

In [145]:
y = df['Income']
X = df.iloc[:,1:]
scaled = StandardScaler()
scaled.fit(X)
X = pd.DataFrame(scaled.transform(X))
X.columns = ['Rating','Balance','Limit']

In [151]:
def addConstantFunc(X):

    x0 = np.ones(len(X))

    #add a constant
    X['constant'] = x0

    return X


def calculate_cost_function(X,y, coefficients):



    #add a constant
    X =  addConstantFunc(X)

    cost = np.sum((X.dot(coefficients) - y)**2)/ (2*len(y))

    return cost


def LinearRegression(X,y, alpha, n_iterations, Lambda, step_loss = True):

    """
    if step loss is true, the function returns the gradient descent output


    """
    X_ = addConstantFunc(X)

    gradient_preds = []

    #create base intercept
    coefficients = np.array(np.zeros(X_.shape[1]))

    cost_history = [0] * n_iterations

    for i in range(n_iterations):

        h = X_.dot(coefficients)

        loss = h - y

        gradient = X_.T.dot(loss)/ len(y)

        coefficients = coefficients - alpha  * gradient

        cost = calculate_cost_function(X_,y, coefficients)
        regCost = cost + Lambda/(2*len(y)) * sum(coefficients**2)

        gradient_preds.append(tuple([i, regCost]))

        cost_history[i] = regCost

        steploss = pd.DataFrame(gradient_preds,columns=['Steps','Loss'])



    if step_loss==True:
        return (coefficients, cost_history)

    else:
        return coefficients


def predict(X, coefficients):

    X = addConstantFunc(X)
    prediction = X.dot(coefficients)
    return prediction



In [162]:

coeff, ch = LinearRegression(X, y, 0.001, 1000, 0 , step_loss=True)

In [163]:
ch

[1638.0394731049541,
 1634.1924021780476,
 1630.3592010961572,
 1626.539806763238,
 1622.734156408543,
 1618.9421875848514,
 1615.1638381667078,
 1611.3990463486728,
 1607.64775064358,
 1603.9098898808047,
 1600.185403204543,
 1596.474230072099,
 1592.7763102521815,
 1589.0915838232095,
 1585.4199911716307,
 1581.7614729902457,
 1578.1159702765422,
 1574.4834243310388,
 1570.86377675564,
 1567.256969451995,
 1563.6629446198733,
 1560.0816447555417,
 1556.5130126501538,
 1552.9569913881526,
 1549.4135243456712,
 1545.882555188954,
 1542.36402787278,
 1538.8578866388953,
 1535.3640760144574,
 1531.8825408104835,
 1528.4132261203142,
 1524.9560773180772,
 1521.5110400571662,
 1518.0780602687262,
 1514.6570841601456,
 1511.2480582135602,
 1507.8509291843604,
 1504.4656440997123,
 1501.0921502570836,
 1497.7303952227778,
 1494.3803268304775,
 1491.041893179795,
 1487.7150426348335,
 1484.3997238227507,
 1481.0958856323368,
 1477.8034772125966,
 1474.5224479713395,
 1471.2527475737788,
 1467

In [167]:
coeff,ld  = LinearRegression(X, y, 0.001, 1000, 1, step_loss=True)

In [171]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])