python-data-analytics-feature-selection

Python project on feature selection

Features

The project tries to simplify the feature selection provided by scikit-learn to a single line for the following feature selection technique:

Remove low-variance features
Uni-variate feature selection technique
- regression: f_regression (default, which is anova),
- classification: chi2 (default),
L1-based feature selection technique
- regression: lasso
- classification: linearSVC (default),

Usage

For unsupervised learning

The following shows how to do feature selection with data for unsupervised learning:

from ml_feature_selection.library.feature_selector import FeatureSelector
import numpy as np
X = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]])
print(X.shape)
s = FeatureSelector(X).should_remove_low_variance_features()
X2 = s.apply()
print(X2.shape)

After the above steps, X2 has two features

For classification

The following shows how to do feature selection with data for classification:

from ml_feature_selection.library.feature_selector import FeatureSelector
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target
print(X.shape)
s = FeatureSelector(samples=X, categorical_targets=y) \
    .should_apply_univariate_feature_selection(k=3) \
    .should_apply_L1_feature_selection(k=2) 
X2 = s.apply()
print(X2.shape)

For regression

The following shows how to do feature selection with data for regression:

from ml_feature_selection.library.feature_selector import FeatureSelector
from sklearn.datasets import load_boston

boston = load_boston()
X, y = boston['data'], boston['target']
print(X.shape)
s = FeatureSelector(samples=X, numerical_targets=y) \
    .should_apply_univariate_feature_selection(k=3) \
    .should_apply_L1_feature_selection(k=2) 
X2 = s.apply()
print(X2.shape)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ml_feature_selection		ml_feature_selection
notes		notes
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

python-data-analytics-feature-selection

Features

Usage

For unsupervised learning

For classification

For regression

About

Uh oh!

Releases

Packages

Languages

License

chen0040/python-data-analytics-feature-selection

Folders and files

Latest commit

History

Repository files navigation

python-data-analytics-feature-selection

Features

Usage

For unsupervised learning

For classification

For regression

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages