# Technical features only

### After examining the results, it looks like most of the fundamental and derived features have very weird distributions. So, lets focus on technical features first.

All results are based on
- #### only technical features given
- learnt ignoring the id field
- not in an online way (i.e. prediction doesn't update the training model)

| R train  | R valid  | Model                                               | Normalization | Imputation |
|----------|----------|-----------------------------------------------------|---------------|------------|
| 0.0255   |0.007537  | LinearRegression                                    |      SS       |   Mean     |
| 0.025606 |0.007966  | LinearRegression                                    |      SS       |  Median    |
| 0.001439 |-0.00279  | LinearRegression                                    |      RS       |   Mean     |
| 0.001702 |-0.00283  | LinearRegression                                    |      RS       |  Median    |
| -1.99220 |-2.00206  | RANSACRegressor                                     |      SS       |   Mean     |
| -2.04748 |-2.25799  | RANSACRegressor                                     |      SS       |  Median    |
| -0.01047 |-0.02664  | HuberRegressor                                      |      SS       |  Median    |
| 0.000307 |-0.00285  | Lasso (a = 1)                                       |      SS       |  Median    |
| 0.021038 |0.017875  | Lasso (a = 1e-4)                                    |      SS       |  Median    |
| 0.025340 |0.014118  | Lasso (a = 1e-5)                                    |      SS       |  Median    |
| 0.025606 |0.007966  | Ridge (a = 1)                                       |      SS       |  Median    |
| 0.025606 |0.007966  | Ridge (a = 1e-3)                                    |      SS       |  Median    |
| 0.025603 |0.009000  | BayesianRidge (default)                             |      SS       |  Median    |

Normalization:
- SS: StandardScaler
- RS: RobustScaler

## Notes
- RANSACRegressor performs very badly, essentially, there are a lot of predictions that goes beyond the 'expected'
  bound from the previous y values.
- Lasso at alpha = 1 and alpha = 0.01 actually kills off all the features... hmm.
- Ridge is almost useless since the weight is probably too small anyway to be affected by regularization.

### So far, seems like Lasso performs the best, followed by LinearRegression.

# Linear Model Results

### LinearRegressor seems to perform the best, followed by RANSACRegressor. The rest are not even close.

Results for sklearn.linear_model.

All results are based on
- all features given
- learnt ignoring the id field
- not in an online way (i.e. prediction doesn't update the training model)
- using median value for imputation
- using RobustScaler for normalizing features.

|R value   | Model                                               | 
|----------|-----------------------------------------------------|
|-0.002833 | Linear regression                                   |
|-0.348907 | Ridge (a = 1)                                       |
|-0.348906 | Ridge (a = 0.01)                                    |
|-0.348906 | Ridge (a = 0.001)                                   |
|-0.016866 | Lasso (a = 1)                                       |
|-0.034226 | Lasso (a = 0.01)                                    |
|-0.047910 | Lasso (a = 0.001)                                   |
|-0.193901 | OrthogonalMatchingPursuit (default)                 |
|-0.025288 | BayesianRidge (default)                             |
|-9.652e82 | SGDRegressor (sq_loss, l2, default)                 |
|-8.153e82 | SGDRegressor (sq_loss, elasticnet, default)         |
|-1.603e83 | SGDRegressor (sq_loss, l1, default)                 |
|-6.465e69 | SGDRegressor (huber, l1, default)                   |
|-0.008357 | PassiveAggressiveRegressor (default)                |
|-0.008184 | HuberRegressor (default)                            |
|-0.002871 | RANSACRegressor (default)                           |


## Details
- Adding regularization (Ridge, Lasso, OMP etc) seem to hurt R value in general.
- Lasso training takes really long.
- Lasso at alpha = 1 was able to get pretty decent results even though it only selects about
~10 features with nonnegative weight.
- ARDRegression kept crashing the notebook, so we didn't ended up trying it.
- SGDRegressor has very high error and feature weights.
- RANSACRegressor actually performs pretty well, making it the second best.

### Notes: LinearRegression actually performs badly on the testing set (score of -0.853...), while RANSACRegressor gets score < -1.00 ...
Looking at distribution of the LinearRegression's predictions is concerning where most of the values are between .000249 and 0.000519, which is very different from the original y distribution of -0.0860941 to 0.0934978.

# LinearRegression Results
#### In general, LinearRegression does not perform very well, with only about -0.002833 for the best result

All results are based on
- all features given
- learnt ignoring the id field
- not in an online way (i.e. prediction doesn't update the training model)

|R value   | Model            | Normalization | Missing |
|----------|------------------|---------------|---------|
|-0.008351 | Always set to 0  |               |         |
|-0.005666 | Linear regression| None          |    0    |
|-0.074787 | Linear regression| None          |  mean   |
|-0.060731 | Linear regression| None          | median  |
|-0.293456 | Linear regression| normalize=true|    0    |
|-0.329960 | Linear regression| normalize=true|  mean   |
|-0.002833 | Linear regression| RS            |    0    |
|-0.005737 | Linear regression| RS            |  mean   |
|-0.002833 | Linear regression| RS            | median  |
|-0.345930 | Linear regression| SS            |    0    |
|-0.329960 | Linear regression| SS            |  mean   |
|-0.348907 | Linear regression| SS            | median  |
|-0.345930 | Linear regression| MAS           |    0    |
|-0.329960 | Linear regression| MAS           |  mean   |
|-0.348907 | Linear regression| MAS           | median  |


Column descriptions:
- Normalization: Whether we perform normalization on the feature values.
  - RS: RobustScaler
  - SS: StandardScaler
  - MAS: MaxAbsScaler
- Features: The features used by model.
- Id: Whether we use the id to train a different model per row.
- Online: Whether we use the eval data to update the model param.
- Missing: What value do we replace with the missing value.

## Details

A dumb method of setting everything to 0 actually perform relatively well...

For Linear regression without any normalization, the coefficient for each feature is very low (around e^-20 - e^-30), probably because the features are not normalized and thus too high. It also seems like one of the weight (derived_3) is 0. Hypothesis is that derived_features is computed from fundamental features.

Trying a bunch of imputation for NaN and normalization techniques.

Normalization:
- Not doing any normalization actually perform relatively well, especially since it is a linear regression. Only RobustScaler performs better.
- normalize=True for LinearRegression model seems to perform a lot worse than not doing normalization (most likely it is a StandardScaler)
- RobustScaler performs relatively well.
- StandardScaler and MaxAbsScaler performs badly.

Imputation:
- Missing value = 0 is actually fairly good.
- Missing value = "most frequent" needs a really long time for training, not sure why.
