### Linear Models. 
Linear Models are a class of models that are widely used in practice and have been studied extensively in the last few decades, with roots going back over a hundred years. Linear models make a prediction using a <i> linear function </i> of the input features. 

### Linear Models For Regression.
For regression, the general prediction formula for a linear model looks as follows:

y_pred = w[0] * x[0] + w[1] * x[1] + .... + w[p] * x[p] + b

Here, x[0] to x[p] denotes the features (in this example, the number of features is p) of a single data point, w and b are parameters of the model that are learned, and y_pred is the predicton the model makes. For a dataset with a single feature this is:

y_pred = w[0] * x[0] + b

Which is basically the equation of a line. 


`Note : The function described above is  an affine function. An affine function is a linear function with a constant term` `(intercept), which is precisely what we have here:`

y_pred = w[0] * x[0] + w[1] * x[1] + .... + w[p] * x[p] + b

`The term (b) is the y-axis offset or intercept, making this function affine rather than purely linear. A purely linear function would pass through the` `origin, while an affine function does not necessarily do so because of the intercept term (b).`

Linear Models for regression can be characterized as regression models for which the prediction is a line for a single feature, a plane when using 2 features, or a hyperplane in higher dimensions (i.e when using more than 2 features).

`Note : `

1. **Restrictiveness of Linear Models in One Dimension**:
   - Linear models can seem overly simplistic and restrictive when applied to one-dimensional data. They may fail to capture the fine details and nuances in the data, which can be better captured by models like KNeighborsRegressor.

2. **Power of Linear Models in High-Dimensional Spaces**:
   - For datasets with many features, linear models can be very powerful and effective. When the number of features exceeds the number of training data points, a linear model can fit the training data `perfectly`. This means that the target \( y \) can be expressed as a linear combination of the features.

3. **Perfect Modeling with Many Features**:
   - In high-dimensional spaces, linear models have the flexibility to fit the training data perfectly because they have enough parameters (weights) to capture the relationships in the data. This is known as overfitting, where the model fits the training data very well but may not generalize well to new, unseen data.

Overall, while linear models may seem limited in low-dimensional cases, their strength lies in their ability to handle high-dimensional datasets effectively.

### Models...
 **Linear Regession or Ordinary Least Squares (OLS) :** This is the simplest and most classic linear method for regression. Linear regression finds the parameters `w` and `b` that minimize the *mean squared error* between predictions and the true regression targets `y` on the training set. The mean squared error is the sum of the squared diiferences between the predictions and the true values. Linear regression has no parameters, which is a benefit, but it also has no way to control the model complexity. 

In [1]:
import pandas as pd
melborn_data = pd.read_csv('../datasets/melb_data.csv')
# Drop rows with missing values. 
melborn_data.dropna(axis = 0, inplace = True)

In [2]:
# Get the targets. 
y = melborn_data.Price

In [3]:

# Select features  we'll use. 
X = melborn_data.select_dtypes(exclude = 'object').drop('Price', axis = 1) # Drop the price else the model knows everything '..' 

In [4]:
X.head()

Unnamed: 0,Rooms,Distance,Postcode,Bedroom2,Bathroom,Car,Landsize,BuildingArea,YearBuilt,Lattitude,Longtitude,Propertycount
1,2,2.5,3067.0,2.0,1.0,0.0,156.0,79.0,1900.0,-37.8079,144.9934,4019.0
2,3,2.5,3067.0,3.0,2.0,0.0,134.0,150.0,1900.0,-37.8093,144.9944,4019.0
4,4,2.5,3067.0,3.0,1.0,2.0,120.0,142.0,2014.0,-37.8072,144.9941,4019.0
6,3,2.5,3067.0,4.0,2.0,0.0,245.0,210.0,1910.0,-37.8024,144.9993,4019.0
7,2,2.5,3067.0,2.0,1.0,2.0,256.0,107.0,1890.0,-37.806,144.9954,4019.0


Train the model. 

In [5]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split 

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)
# Instantiate the model.
lr = LinearRegression()

lr.fit(X_train, y_train)

The `slope` parameters `(w)`, also called weights or *coefficients*, are stored in the **coef_** attribute, while the offset or intercept `(b)` is stored in the **intercept_** attribute : 

In [6]:
print(f'lr.coef_: {lr.coef_}')
print(f'lr.intercept_: {lr.intercept_}')

lr.coef_: [ 2.03021012e+05 -4.13735240e+04  9.64915635e+02 -7.20462366e+02
  2.08742879e+05  7.28485345e+04  1.63253301e+01  1.68715213e+03
 -4.24972311e+03 -1.11997581e+06  7.00877352e+05 -1.78919111e+00]
lr.intercept_: -138403149.04183686


The **intercept_** attribute is always a single float number, while the **coef_** attribute is a Numpy array with one entry per input feature (and we used 5 attributes, so 5 entries). 

Let's look at  the training set and test set performance:


In [7]:
import numpy as np
from sklearn.metrics import r2_score

In [8]:
print(f'Training set score : {lr.score(X_train, y_train):.2f}')
print(f'Test set score : {lr.score(X_test, y_test):.2f}')


Training set score : 0.59
Test set score : 0.64


An R<sup>2</sup> score of 0.64 is ...fair?

In [9]:
lr.predict(np.array(X_test.iloc[90]).reshape(1,-1))



array([825777.16997147])

In [10]:
y_test.iloc[90]

np.float64(865000.0)

Let's try a RandomForestRegressor for fun. 

In [11]:
from sklearn.ensemble import RandomForestRegressor
rfc = RandomForestRegressor()
rfc.fit(X_train, y_train)

In [12]:
# Let's compute the R^2 score of the RandomForestClassifier
print(f'R**2 Score. : {r2_score(y_test, rfc.predict(X_test)):.2f}')

R**2 Score. : 0.81


The R<sup>2</sup> Score of the RandomForestRegressor did better than the OLSR

### Definition of R<sup>2</sup> Score
The R<sup>2</sup> score, also known as the coefficient of determination, is a statistical measure that indicates how well the independent variables in a regression model explain the variability of the dependent variable. It is given by the formula:

R<sup>2</sup> = 1 - (SSres / SStot)

Where:
- SSres (Residual Sum of Squares) is the sum of the squares of the residuals (the differences between the observed and predicted values).
- SStot (Total Sum of Squares) is the sum of the squares of the differences between the observed values and the mean of the observed values.

### Interpretation of R<sup>2</sup> Score
- **R<sup>2</sup> = 1**: Perfect fit. The model explains all the variance in the target variable. This means that the predictions are exactly on target with the observed values.
- **R<sup>2</sup> = 0**: The model does not explain any of the variance in the target variable. This implies that the model?s predictions are no better than simply using the mean of the target variable as the prediction for all data points.
- **0 < R<sup>2</sup> < 1**: Indicates the proportion of the variance in the target variable that is predictable from the independent variables. For example, an R<sup>2</sup> of 0.81 means that 81% of the variance in the target variable is explained by the model.
- **R<sup>2</sup> < 0**: This is rare and indicates that the model performs worse than a simple horizontal line (mean of the target variable). It suggests that the model is not capturing the underlying trend at all and is worse than random chance.

### Practical Implications
- **High R<sup>2</sup>**: Generally desirable as it indicates a good fit of the model to the data. However, a very high R<sup>2</sup> (close to 1) on training data but much lower on test data may indicate overfitting.
- **Low R<sup>2</sup>**: Indicates that the model is not explaining much of the variance in the target variable. It could be due to a poor model choice, lack of relevant features, noise in the data, or the inherent unpredictability of the target variable.

### Limitations of R<sup>2</sup>
- **Does not indicate whether the predictions are biased**: R<sup>2</sup> alone does not tell you whether the model systematically over- or under-predicts.
- **Sensitive to outliers**: Outliers can have a large impact on the R<sup>2</sup> score.
- **Not always appropriate for non-linear models**: For some non-linear models, other metrics (like adjusted R<sup>2</sup> or mean squared error) might be more informative.

### Conclusion
The R<sup>2</sup> score is a useful metric for assessing the goodness of fit for regression models. It provides a straightforward interpretation of how well the model explains the variability of the target variable, but should be used alongside other metrics and domain knowledge to make informed decisions about model performance.