## streamml2
<hr>

Example usage of FeatureSelectionStream

Feature Selection Params:

`
    def flow(self, 
             models_to_flow=[], 
             params=None, 
             test_size=0.2, 
             nfolds=3, 
             nrepeats=3,
             pos_split=1,
             n_jobs=1, 
             metrics=[], 
             verbose=False, 
             regressors=True,
             cut=None,
             ensemble=False):
`

Feature Selection Models:

`regression_options = {"mixed_selection" : mixed_selection,
                       "svr" : supportVectorRegression,
                       "rfr":randomForestRegression,
                       "abr":adaptiveBoostingRegression,
                       "lasso":lassoRegression,
                       "enet":elasticNetRegression,
                       "plsr":partialLeastSquaresRegression}`

`classification_options = {'abc':adaptiveBoostingClassifier,
                            'rfc':randomForestClassifier,
                            'svc':supportVectorClassifier
                         }`

In [1]:
!pip install --force-reinstall streamml2

Requirement already up-to-date: streamml2 in c:\users\bmccs\appdata\local\continuum\anaconda3\lib\site-packages (0.1)


In [2]:
import pandas as pd
import numpy as np
from streamml2.streams import FeatureSelectionStream
from sklearn.datasets import load_iris
iris=load_iris()
X=pd.DataFrame(iris['data'], columns=iris['feature_names'])
y=pd.DataFrame(iris['target'], columns=['target'])

return_dict = FeatureSelectionStream(X,y).flow(["rfc", "abc", "svc"],
                                                params={},
                                                verbose=True,
                                                regressors=False,
                                                ensemble=True,
                                                featurePercentage=0.5,
                                                n_jobs=3)

print("Feature data ...")
print(pd.DataFrame(return_dict['feature_importances']))
print("Features rankings decision maker...")
print(return_dict['ensemble_results'])
print("Reduced data ...")
print(X[return_dict['kept_features']].head())

from sklearn.datasets import load_boston
boston=load_boston()
X=pd.DataFrame(boston['data'], columns=boston['feature_names'])
y=pd.DataFrame(boston['target'],columns=["target"])

return_dict = FeatureSelectionStream(X,y).flow(["plsr", "mixed_selection", "rfr", "abr", "svr"],
                                                params={"mixed_selection__threshold_in":0.01,
                                                        "mixed_selection__threshold_out":0.05,
                                                        "mixed_selection__verbose":True},
                                                verbose=True,
                                                regressors=True,
                                                ensemble=True,
                                                featurePercentage=0.5,
                                                n_jobs=3)

*************************
=> (Classifier) => Feature Selection Streamline: rfc --> abc --> svc
*************************
Constructed RandomForestClassifierPredictiveModel: rfc
Returning rfc best estiminator
Constructed AdaptiveBoostingClassifierPredictiveModel: abc
Returning abc best estiminator
Constructed SupportVectorClassifierPredictiveModel: svc
Returning svc best estiminator
 50.0  % -> (2) features kept.
['petal length (cm)', 'petal width (cm)']
Feature data ...
        rfc   abc       svc
0  0.092803  0.06  0.000238
1  0.040814  0.00  0.033805
2  0.402965  0.46  1.062634
3  0.463418  0.48  0.295574
Features rankings decision maker...
                   TOPSIS  WeightedSum  WeightedProduct
sepal length (cm)       3            3                4
sepal width (cm)        4            4                3
petal length (cm)       1            1                1
petal width (cm)        2            2                2
Reduced data ...
   petal length (cm)  petal width (cm)
0             

## Parameters
<hr>
<p>Base sklearn objects will have the parameters object to tune them using the <code>sklearn</code> library</p>

In [19]:
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

params={'abc__algorithm':['SAMME'],
        'abc__n_estimators':[50, 100, 150],
        'rfc__n_estimators':[50, 100, 150],
        'svc__C':list(np.arange(1e-5,1e-1,0.001)),
        'svc__gamma':list(np.arange(1e-5,1e-1,0.001))}

iris=load_iris()
X=pd.DataFrame(iris['data'], columns=iris['feature_names'])
y=pd.DataFrame(iris['target'], columns=['target'])

return_dict = FeatureSelectionStream(X,y).flow(["rfc", "abc", "svc"],
                                                params=params,
                                                verbose=True,
                                                regressors=False,
                                                ensemble=True,
                                                featurePercentage=0.5,
                                                n_jobs=3)

*************************
=> (Classifier) => Feature Selection Streamline: rfc --> abc --> svc
*************************
Constructed RandomForestClassifierPredictiveModel: rfc
Returning rfc best estiminator
Constructed AdaptiveBoostingClassifierPredictiveModel: abc
Returning abc best estiminator
Constructed SupportVectorClassifierPredictiveModel: svc
Returning svc best estiminator
 50.0  % -> (2) features kept.
['petal length (cm)', 'petal width (cm)']


In [18]:
return_dict

{'feature_importances': {'rfc': array([0.11345457, 0.02500382, 0.4370457 , 0.42449591]),
  'abc': array([0.02510157, 0.06064175, 0.37710141, 0.53715527]),
  'svc': array([1.40048990e-03, 1.34620687e-04, 2.29755277e-01, 3.49448522e-02])},
 'ensemble_results':                    TOPSIS  WeightedSum  WeightedProduct
 sepal length (cm)       4            4                3
 sepal width (cm)        3            3                4
 petal length (cm)       1            1                1
 petal width (cm)        2            2                2,
 'kept_features': ['petal length (cm)', 'petal width (cm)']}