![](https://img.shields.io/pypi/l/skmixed)
![](https://img.shields.io/github/workflow/status/aksholokhov/skmixed/Testing%20and%20Coverage/sr3)
![](https://img.shields.io/readthedocs/skmixed)
![](https://img.shields.io/codecov/c/github/aksholokhov/skmixed/sr3?flag=unittests)
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/749695b3c6fd43bb9fdb499ec0ace67b)](https://www.codacy.com/gh/aksholokhov/skmixed/dashboard?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=aksholokhov/skmixed&amp;utm_campaign=Badge_Grade)

# SR3: Features Selection with Relaxed Regularization Framework.

# Quickstart

SR3 is a library designed for feature selection via Sparse Relaxed Regularized Regression (SR3) framework.

## Installation

In [1]:
# !pip install skmixed

## Usage

### Linear Models

In [2]:
import numpy as np

from skmixed.linear.problems import LinearProblem

# Create a sample dataset
seed = 42
num_objects = 300
num_features = 500
np.random.seed(seed)
# create a vector of true model's coefficients
true_x = np.random.choice(2, size=num_features, p=np.array([0.9, 0.1]))
# create sample data
a = 10*np.random.randn(num_objects, num_features)
b = a.dot(true_x) + np.random.randn(num_objects)

print(f"The dataset has {a.shape[0]} objects and {a.shape[1]} features; \n"
      f"The vector of true parameters contains {sum(true_x != 0)} non-zero elements out of {num_features}.")

The dataset has 300 objects and 500 features; 
The vector of true parameters contains 55 non-zero elements out of 500.


In [3]:
# Automatic features selection using information criterion
from skmixed.linear.models import LinearL1ModelSR3
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import RandomizedSearchCV
from sklearn.utils.fixes import loguniform

model = LinearL1ModelSR3(logger_keys=('aic', 'bic'))
params = {
    #"el": loguniform(1e-2, 1e2),
    "lam": loguniform(1e-1, 1e2)
}
selector = RandomizedSearchCV(estimator=model,
                              param_distributions=params,
                              n_iter=50,
                              scoring=lambda clf, X, y: -clf.logger_.get('bic'))

selector.fit(a, b)
maybe_x = selector.best_estimator_.coef_['x']
tn, fp, fn, tp = confusion_matrix(true_x, maybe_x != 0).ravel()

print(f"The model found {tp} out of {tp + fn} features correctly, but also chose {fp} extra irrelevant features. \n"
      f"The best parameter is {selector.best_params_}")

The model found 55 out of 55 features correctly, but also chose 2 extra irrelevant features. 
The best parameter is {'lam': 0.15055187290939537}


Linear Mixed-Effects Models

In [8]:
from skmixed.lme.models import L1LmeModelSR3
from skmixed.lme.problems import LMEProblem, LMEStratifiedShuffleSplit

problem, true_parameters = LMEProblem.generate(groups_sizes=[10]*6,
                                               features_labels=["fixed+random"]*20,
                                               beta=np.array([0, 1]*10),
                                               gamma=np.array([0, 0, 0, 1]*5),
                                               obs_var=0.1)
x, y, columns_labels = problem.to_x_y()

model = L1LmeModelSR3()

params = {
    "lam": loguniform(1e-3, 1e3)
}
selector = RandomizedSearchCV(estimator=model,
                              param_distributions=params,
                              n_iter=10,
                              cv=LMEStratifiedShuffleSplit(n_splits=2, test_size=0.5,
                                                           random_state=seed,
                                                           columns_labels=columns_labels),
                              scoring=lambda clf, x, y: -clf.get_information_criterion(x, y, columns_labels=columns_labels, ic="muller_ic"),
                              random_state=seed,
                              n_jobs=20
                              )
selector.fit(x, y, columns_labels=columns_labels)
best_model = selector.best_estimator_

maybe_beta = best_model.coef_["beta"]
maybe_gamma = best_model.coef_["gamma"]
ftn, ffp, ffn, ftp = confusion_matrix(true_parameters["beta"], abs(maybe_beta) > 1e-2).ravel()
rtn, rfp, rfn, rtp = confusion_matrix(true_parameters["gamma"], abs(maybe_gamma) > 1e-2).ravel()

print(f"The model found {ftp} out of {ftp + ffn} correct fixed features, and also chose {ffp} out of {ftn + ffn} extra irrelevant fixed features. \n"
      f"It also identified {rtp} out of {rtp + rfn} random effects correctly, and got {rfp} out of {rtn + rfn} non-present random effects. \n"
      f"The best sparsity parameter is {selector.best_params_}")

The model found 9 out of 10 correct fixed features, and also chose 2 out of 9 extra irrelevant fixed features. 
It also identified 5 out of 5 random effects correctly, and got 0 out of 15 non-present random effects
