# Lasso is the new multiple linear regression
*Using Lasso and tsfresh to distinguish drum samples*

## Introduction 
Every carpenter has a favorite hammer they grab for basic things. Only when they need to drill a hole in heavy concrete do you bring out the big guns. As a data scientist there is a similar escalation ladder when it comes to statistical models. For many, the basic workhorse model is multiple linear regression. It serves as the first port of call in many analyses, and serves as the benchmark that more complicated models need to overcome. One of its strenghts is the easy interpretability of the resulting coefficients, something that especially neural networks struggle with. However, linear regression is not without its challenges. In this article we focus on one particular challenge: feature selection. 

Selecting the features for the model can be either done by via expert knowledge or by some optimisation mechanism. In the first situation we already know which features will be used, for example stemming from the underlying physics. In the second situation we construct possible models and evaluate them using some kind of goal function. It is however almost never possible to try all possible combinations of models beacuse of resource constraints. Here some kind of optimisation technique comes into play that helps us to find the optimal set of features by only testing a subset of the possible models. One example of such a method is stepwise regression, which is [generally regarded as a poor choice](https://towardsdatascience.com/stopping-stepwise-why-stepwise-selection-is-bad-and-what-you-should-use-instead-90818b3f52df). 

Another regression method that includes a more robust way of performing feature selection is [Lasso](https://en.wikipedia.org/wiki/Lasso_(statistics)). Lasso adds an additional term to the least squares estimation formula. Not only does it try to minimize the squared difference between the model and the observations, but also tries to limit the size of the coefficients. The tradeoff between minimizing the squared difference and the size of the coefficients is a hyperparameter of Lasso that needs to be optimised. This minimising of the coefficient can be all the way down to a valule of 0, effectively eliminating the feature from the model. In this way Lasso can perform features selection. 

In this article you will learn about the following topics:

- How to generate features for Lasso using tsfresh
- How to fit a Lasso in Python using sklearn
- How to optimise the hyperparameter of Lasso

First I will introduce the data we will work with, generate a set of features using tsfresh, and then continue with fitting our Lasso model. Finally I will make my case why I regard Lasso as a good baseline model. 

## The data

## Generating features

## Fitting Lasso

## The case for Lasso as a baseline model

Lasso goed:
- Snel
- Kan feature selectie doen iets meer fire and forget
- Feature selectie kan via cross-validatie
- Coef zijn nog interpreteerbaar, dit maakt het ook tot een goede data verkenningsmethode
- Vormt een mooie benchmark voor meer complexe methodes zoals Gradient Boosting Machines. 