# Orientation

where are you in the grand-scheme of machine learning?
    - doing _supervised_ learning (and thus parametric modelling), on a _regression_ problem --- i.e. a problem with which you are expecting to be given inputs, and you will have to predict some kind of continuous valued output.

many models can be used to do regression:
- linear model: $\hat{\beta} = \arg \min_{\beta} \frac{1}{n}\|y-X \beta\|^2_2$
- LASSO model: $\hat{\beta}_{\text{LASSO}} = \arg \min_{\beta} \frac{1}{n}\|y-X \beta\|^2_2 + \lambda \| \beta \|_1$
- Ridge model: $\hat{\beta}_{\text{Ridge}} = \arg \min_{\beta} \frac{1}{n}\|y-X \beta\|^2_2 + \alpha \| \beta \|_2^2$ 
- logistic regression
- svm
- decision trees
- perceptron       
- neural networks

however, you must note that these models can also be recycled for classification problems, and thus the line distinguishing models and methods in machine learning are very much blurred.

below, I do a basic fitting of linear models to a linear regression problem to get the ball rolling.

In [40]:
from keras_core.src.losses import mean_squared_error
from sklearn.linear_model import LinearRegression, Lasso, Ridge, LassoCV
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from torch.nn.functional import mse_loss

In [41]:
rs = np.random.RandomState(123)
n = 1000 # samples
p = 10   # features
noise = 0.4
features = p // 2
X, y = make_regression(n, p, noise=noise, n_informative=features, random_state=rs)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=rs)

# fitting these models to the training data:
m_linear_reg = LinearRegression().fit(X_train, y_train)
m_lasso_reg = Lasso().fit(X_train, y_train)
m_ridge_reg = Ridge().fit(X_train, y_train)

# predictions on training:
ypred_train_linear_reg = m_linear_reg.predict(X_train)
ypred_train_lasso_reg = m_lasso_reg.predict(X_train)
ypred_train_ridge_reg = m_ridge_reg.predict(X_train)

# then test data:
ypred_test_linear_reg = m_linear_reg.predict(X_test)
ypred_test_lasso_reg = m_lasso_reg.predict(X_test)
ypred_test_ridge_reg = m_ridge_reg.predict(X_test)


In [42]:
loss_data = [
    ['linear regression',
     mean_squared_error(y_train, ypred_train_linear_reg),
     mean_squared_error(y_test, ypred_test_linear_reg),
     mean_absolute_error(y_train, ypred_train_linear_reg),
     mean_absolute_error(y_test, ypred_test_linear_reg)
     ],
    ['lasso regression',
     mean_squared_error(y_train, ypred_train_lasso_reg),
     mean_squared_error(y_test, ypred_test_lasso_reg),
     mean_absolute_error(y_train, ypred_train_lasso_reg),
     mean_squared_error(y_test, ypred_test_lasso_reg)
     ],
    ['ridge regression',
     mean_squared_error(y_train, ypred_train_ridge_reg),
     mean_squared_error(y_test, ypred_test_ridge_reg),
     mean_absolute_error(y_train, ypred_train_ridge_reg),
     mean_absolute_error(y_test, ypred_test_ridge_reg)
     ]
] 
loss_df = pd.DataFrame(loss_data, columns=['name', 'train mse', 'test mse', 'train mae', 'test mae'])
print(loss_df)


                name  train mse  test mse  train mae  test mae
0  linear regression   0.151018  0.176519   0.312670  0.334087
1   lasso regression   4.907329  4.128131   1.795599  4.128131
2   ridge regression   0.189097  0.193945   0.353909  0.351923


## Analysis

`lasso regression` performs poorly. this is because it depends on a parameter $\lambda$

In [43]:
from sklearn.linear_model import LassoCV
m_lassoCV = LassoCV().fit(X_train, y_train) # you must never FIT the model to test
ypred_train_lassoCV = m_lassoCV.predict(X_train)
ypred_test_lassoCV = m_lassoCV.predict(X_test) # it is fine to predict though

lassoCV_row = {'name': 'lasso cv',
               'train mse': mean_squared_error(y_train, ypred_train_lassoCV), 
               'test mse': mean_squared_error(y_test, ypred_test_lassoCV), 
               'train mae': mean_absolute_error(y_train, ypred_train_lassoCV), 
               'test mae': mean_absolute_error(y_test, ypred_test_lassoCV)}
loss_df.loc[len(loss_df)] = lassoCV_row
print(loss_df)

                name  train mse  test mse  train mae  test mae
0  linear regression   0.151018  0.176519   0.312670  0.334087
1   lasso regression   4.907329  4.128131   1.795599  4.128131
2   ridge regression   0.189097  0.193945   0.353909  0.351923
3           lasso cv   0.203116  0.208579   0.365991  0.362380


# Conclusion

different models solve the regression problem in different ways. furthermore, the nature of the problem will determine the most appropriate model to be used.

i could sit here and fit decision trees, SVMs or random forests to the `X_train, y_train` dataset, but that wouldn't be fun and the methods would lose their magic.
instead, i am choosing to solve many problems; and each time select the few models which are optimal.