# LASSO -short introduction

- LASSO (Least Absolute Shrinkage and Selection Operator), similar to ridge regression, is a certain modification of linear regression. 
- Ridge regression is specifically designed to address multcollinearity in the dataset. The LASSO method has a completely different but also useful advantage. 
- It performs both feature selection and regularization. 

#### Feature Selection
- Feature selection in machine learning involves selecting a subset of relevant features (or variables) from the original set of features available in the dataset. 
- The goal of feature selection is to improve model performance by reducing the dimensionality of the feature space, removing irrelevant or reduntant features, and focusing only on those features that contribute most to the predictive power of the model. 

#### Regularization 
- Refers to the technique of adding a penalty term to the objective function during training to prevent overfitting and improve generalization performance. 
- The primary goal of regularization is to discourage the model from fitting the training data too closely, which can lead to poor performance on unseen data. 

- Ridge regression also introduces a regularization term that penalizes large coefficients. 
- In the case of ridge regression, the regularization term is known as the L2 penalty. 
- In LASSO, the regularization term has a slightly different form and is called the L1 penalty. 
- Both penalties forms serve a similar purpose in terms of regularization, but their different forms mean that each of them regularizes the model in a different way. 
- Additionalyy, each penalty form has a different properties. 

## Lasso --definition
- In LASSO, we also minimize the RSS, however, augmented by a regularization term called the L1 penalty.
- ![alt text](image.png)
- The parameter lambda is a tuning parameter that determines how much penalty we impose on the model for having excessively large coeeficients. 

### What is the difference between LASSO and ridge regression? 
- The expression we minimize in these models only differs in norm we use for the penalty. 
- In ridge regression, we used the norm p=2, while in LASSO we use the norm p=1. 
- ![alt text](image-1.png)


### Solution 
- The key to an effective solution for LASSO became the coordinate descent algorithm. 

### What is the coordinate descent? 
1. Start with an initial guess for the parameters. 
2. For each paramater k=1,..p:
    2.1 Iteratevly solve the single variable optimization problem. 
    2.2 Update the parameter in a way that minimizes the objective function with respect to that parameter alone and repeat this process for each parameter until convergence criteria are met. 

- The key idea behind coordinate descent is that, for many optimization problems, updating one parameter at a time can be simpler and more computationally efficient than updating all paramters simultaneously. 
- This approach is particulary well-suited for problems with a large number of parameters or when the objective funciton is not differentiable o rnot easily optimized using gradient-based methods.
- ![alt text](image-2.png)

## Extra Notes

- LASSO forces some coeeficients to be exactly zero, it is useful for identifying unimportant features that can be dropped from the model. 
- Regularization is a technique used in machine learning to penalize complex models to protect them from overfitting. 
- By doing this, regularization helps to prevent models from over-interpreting the noise and randomness found in data sets. 
- Lasso regularization encourages sparsity by forcing some coefficients to reduce their values until they eventually become zero while others remain unaffrected or shrink less dramatically. 
- Ridge regularization can be more effective than Lasso when there are many collinear variables because it prevents individual coefficients from becoming too large and overwhelming others. 

#### Advantes and Disadvantages
- One of the main advantages of Lasso Regression is its ability to perform feature selection. 
- Can help to reduce the complexity of the model and improve its interpretability(yorumlanabilmek). 
- Lasso is computationally efficient and can handle a large number of features, which makes it suitable for high-dimensional datasets. 

- However, Lasso Regression also has some limitations. 
- One of the main disadvantages is that it is not well-suited for datasets with correlated features.