# Lasso Regression

Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters). This particular type of regression is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination.

The lasso procedure is useful in some contexts due to its tendency to prefer solutions with fewer parameter values, effectively reducing the number of variables upon which the given solution is dependent. This can be useful in situations where there are a large number of variables compared to the number of observations, or when dealing with very large datasets.

In summary, the lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters).

The formula of Lasso Regression is given by:

$$\text{Lasso Regression} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^{p} |\beta_j|$$

Where:

- $y_i$ is the actual value of the dependent variable.
- $\hat{y}_i$ is the predicted value of the dependent variable.
- $\lambda$ is the penalty term.
- $\beta_j$ is the coefficient of the independent variable.


### Key Concept

The key concept of Lasso Regression is to add a penalty term to the loss function. The penalty term is the absolute value of the magnitude of the coefficients. This penalty term is multiplied by the penalty parameter $\lambda$. 


### Advantages of Lasso Regression

1. Lasso regression can be used to select important features of a dataset.
2. Lasso regression can be used to reduce the complexity of the model.
3. Lasso regression can be used to prevent overfitting.
4. Lasso regression can be used to handle multicollinearity.

### Disadvantages of Lasso Regression

1. Lasso regression can be sensitive to outliers.
2. Lasso regression can be computationally expensive.
3. Lasso regression can be difficult to interpret.

### Assumptions of Lasso Regression

1. The relationship between the dependent variable and independent variables should be linear.
2. The independent variables should not be correlated with each other.
3. The residuals should be normally distributed.
4. The residuals should be homoscedastic.



In [30]:
# Libraries for Lasso Regression
from sklearn.datasets import make_regression
from sklearn.linear_model import Lasso, Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [31]:
X, y = make_regression(n_samples=10000, n_features=4, noise=0.1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.9)

lasso = Lasso(alpha=0.1)
ridge = Ridge(alpha=0.1)

lasso.fit(X_train, y_train)
ridge.fit(X_train, y_train)

y_pred_lasso = lasso.predict(X_test)
y_pred_ridge = ridge.predict(X_test)

print('Mean Squared Error of Lasso Regression:', mean_squared_error(y_test, y_pred_lasso))
print('Mean Squared Error of Ridge Regression:', mean_squared_error(y_test, y_pred_ridge))

Mean Squared Error of Lasso Regression: 0.046068277743981895
Mean Squared Error of Ridge Regression: 0.010184722756408908
