**Linear Models**

These are the methods for regression where the target variable is expected to be a linear combination of the features. 

**1.1.1 Ordinary Least Squares**

$$ \text{RSS} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

The goal of OLS is to estimate the coefficients that minimize the sum of squared residuals(errors) between the observed target values and the value of the target predicted by the linear approximation. 

LinearRegression will take in its fit method arrays X,y and will store the coefficients 'w' of the linear model in its coef_ member. 

In [2]:
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit([[0,0], [1,1], [2,2]], [0,1,2])
reg.coef_

array([0.5, 0.5])

OLS relies on the assumption that features are independent. Multicollinearity occurs when features are highly correlated, which makes the design matrix nearly singular and the coefficient estimates unstable. When this happens, the model's predictions become very sensitive to small errors in the data, leading to high variance in the estimates. 

Example: If you're predicting house prices, and your features include both 'house size' and 'number of rooms', which are correlated, your model might gibr inconsistent or unreliable estimates of how each feature impacts price. 

**1.1.1.1. Non-negative Least Squares**

This model constrains the coefficients to be non-negative by setting the parameter positive to be True. It is applicable while representing non-negative entities such as frequency counts or prices of goods. 



1.1.2 Ridge Regression and Classification

1.1.2.1 Regression

Ridge regression introduces a regularization penalty that penalizes the size of all coefficients in the model, and in doing so, automatically reduces the impact of multicollinearity without explicitly detecting it. This penalty is proportional to the sum of squares of the coefficients. 

In other words, it shrinks the coefficients in proportion to their contribution to the model. The correlated predictors end up having their coefficients shrunk more because they provide redundant information. Predictors that are uncorrelated with others will receive less shrinkage because they are seen as contributing unique information to the model. 

In [3]:
from sklearn import linear_model
reg = linear_model.Ridge(alpha = 0.5)
reg.fit([[0,0], [0,0], [1,1]], [0, .1,1])
reg.coef_
reg.intercept_

0.13636363636363638

The class Ridge allows for the user to specify that the solver be automatically chosen by setting solver = auto. When this option is specified, Ridge will choose between 'lbfgs', 'cholesky' and 'sparse_cg' solvers. Ridge will begin checking the conditions and choose the solver as follows: 

'lbfgs' - The positive = True option is specified. 
'cholesky' - The input array X is not sparse. 
'sparse_cg' - None of the above conditions are fulfilled.

1.1.2.2 Classification

The ridge regressor has a classifier variant: RidgeClassifier. 

1. Binary Classification (eg Cat vs Dog)

Imagine we have a dataset with 2 features (e.g size and weight) to classify images as either 'cat' or 'dog'

**Step 1: Convert labels to {-1,1}**

Normally, we might have labels like 0 for 'cat' and 1 for 'dog'.

For RidgeClassifier, we convert these labels to {-1,1}

**Step 2: Treat the problem as regression**

-RidgeClassifier treats this as a regression problem and fits a ridge regression model to the data, trying to predict a continuous value. 

-For example, RidgeClassifier will try to find a model that best fits the data such that it predicts a value close to 1 for 'dog' and close to -1 for 'cat'. 

**Step 3: Predict and classify based on the sign**

After fitting the model, RidgeClassifer predicts a continuous value (say 0.5 for a test example). If the prediction is positive (greater than or equal to 0), the model classifies it as +1 ('dog'). If the prediction is negative, the model classifies it as -1 ('cat'). 

For instance: 

-For an image of a dog, the model might predict a value like +0.8. Since it is positive, the model classifies it as 'dog'. 

-For an image of a cat, the model might predict -0.6. Since it is negative, the model classifies it as 'cat'. 


2. MultiClass Classification

Let's consider a multiclass classification task where we classify data into 3 categories: 'cat', 'dog', 'rabbit'

**Step 1: Setup multiclass classification**

RidgeClassifier treats this as a multi-output regression probelm. 

For each class, it performs a separate regression and the model predicts a value for each class. 

**Step 2: Training the model**

RidgeClassifier will fit the model and generate a set of coefficients for each class. 

For instance, it will predict a value for 'cat', a value for 'dog', and a value for 'rabbit' for each example in the training set. 

**Step 3: Predict the class with the highest value**

After the model is trained, it will predict three values for a test instance, one for each class (cat, dog, rabbit). 

For example: 

Predicted values for a test instance might be: cat - 0.3, dog - 1.2, rabbit - (-0.5)

RidgeClassifier will then classify the test instance as the class with the highest predicted value, which in this case is 'dog'. 




