## SparseRegression.jl
Git: https://github.com/joshday/SparseRegression.jl

---

#### Summary
Sparse regression is a package to achieve high performance regression of linear models for large dataset where coefficients often turn out to be sparse.
The main call follows the form SModel(x,y, args) where arguments include the loss, penalty, and the $\lambda$ and $\omega$ arguments.
Prediction are done through *predict(X, model)* call

The loss and penalty functions are based on the _LossFunctions_ and _PenaltyFunctions_ MLJulia core packages.

Additionally, one can use learning strategies from the _LearningStategies_ package. This allows to set parameters that are purely learning based, such as optimizers, max iterations or max items. 
More on this in the documentation.

Tis structure allows for one model to be used for the many linear models such as OLS, ridge, lasso etc. which all have the same underlying structure.

---
#### Details

| Test        | Results           
| ------------- |:-------------:|
| Package works | yes |
| Deprecations warnings      | No      |
| Compatible with JuliaDB | If targets transformed into array |
| Contains documentation | yes, but not great |
| Simplicity | good |


---
#### Usage

In [1]:
using SparseRegression;
include("load_titanic.jl");

In [2]:
X_train, y_train, X_test, y_test = load();

In [10]:
# Example using lasso regression
model = SModel(X_train,y_train, L2DistLoss(), L1Penalty());
learn!(model);
model

# Example using ridge regression
# model = SModel(X_train,y_train, L2DistLoss(), α*L1Penalty());
# learn!(model);
# model

[1m[36mINFO: [39m[22m[36mMaxIter(100) finished
[39m

---
### Simple benchmark vs python 

(Only lasso regression is tested)

In [None]:
### This cell takes ~5mins to run on my laptop, I would suggest trusting the results listed below instead of trying to run it.
IJulia.set_verbose(false)

n_points = 10_000
n_dims = [5000]

avg_times = []

for n_dim in n_dims
    times = []
    for i in 1:5
        x = randn(n_points, n_dim);
        y = x * linspace(-1, 1, n_dim) + randn(n_points);
        s = SModel(x, y);

        tic();
        learn!(s);
        time = toc();
        
        push!(times, time);
    end
    avg_times = mean(times);
end

IJulia.set_verbose(true)

#### Results

| Dimensions    | Julia | Python    
| ------------- |:-----:|:-----:|
| 10 | 0.00055s | 0.023s |
| 100 | 0.0073s | 0.19s |
| 1000 | 0.29s | 2.05s|
| 5000 | 58s | 17.68|

Clearly, something goes wrong with the package when dimensions increase over a certain threshold, while python's performances seem to increase as expected.
The code for the python's results can be found in *python_scripts.py*