Skip to content

Latest commit

 

History

History
113 lines (58 loc) · 4 KB

File metadata and controls

113 lines (58 loc) · 4 KB

MLFeatureSelection

License: MIT PyPI version

General features selection based on certain machine learning algorithm and evaluation methods

Function Parameters

sf = sequence_selection.Select(Sequence=True, Random=True, Cross=True)

Parameters:

Sequence (bool, optional, (defualt=True)) - switch for sequence selection selection include forward,backward and simulate anneal selection

Random (bool, optional, (defualt=True)) - switch for randomly selection of features combination

Cross (bool, optional, (defualt=True)) - switch for cross term generate, need to set sf.ImportCrossMethod() after

sf.ImportDF(df,label)

Parameters:

df (pandas.DataFrame) - dataframe includes include all features

label (str) - name of the label column

sf.ImportLossFunction(lossfunction,direction)

Parameters:

lossfunction (function handle) - handle of the loss function, function should return score as float (logloss, AUC, etc)

direction (str,'ascend'/'descend') - direction to improve, 'descend' for logloss, 'ascend' for AUC, etc

sf.InitialFeatures(features)

Parameters:

features (list, optional, (defualt=[])) - list of initial features combination, empty list will drive code to start from nothing list with all trainable features will drive code to start backward searching at the beginning

sf.InitialNonTrainableFeatures(features) #only for sequence selection

Parameters:

features (list) - list of features that not trainable (labelname, string, datetime, etc)

sf.GenerateCol(key=None,selectstep=1) #only for sequence selection

Parameters:

key (str or list of str, optional, default=None) - only the features with keyword will be seleted, default to be None

selectstep (int, optional, default=1) - value for features selection step

sf.SetFeatureEachRound(df,ser, featureeachroundRandom=False)

Parameters:

ser (int) - randomly select ser features from all features each round, can speed up adding features if there is a lot of features

ser (bool) - if it is true, ser features will be selected randomly from features pool, if false, they will be selected chunk by chunk

sf.SelectRemoveMode(frac=1,batch=1,key='')

Parameters:

frac (float, optional, default=1) - percentage of delete features from all features default to be set as using the batch

batch (int, optional, default=1) - delete features quantity every iteration

key (str, optional, default=None) - only delete the features with keyword

sf.ImportCrossMethod(CrossMethod)

Parameters:

CrossMethod (dict) - different cross method like add, divide, multiple and substraction

sf.AddPotentialFeatures(features)

Parameters:

features (list, optional, default=[]_) - list of strong features, switch for simulate anneal

sf.SetTimeLimit(TimeLimit=inf)

Parameters:

TimeLimit (float, optional, default=inf) - maximum running time, unit in minute

sf.SetFeaturesLimit(FeaturesLimit=inf)

Parameters:

FeaturesLimit (int, optional, default=inf_) - maximum feature quantity

sf.SetClassifier(clf)

Parameters:

clf (predictor) - classfier or estimator, sklearn, xgboost, lightgbm, etc. Need to match the validate function

sf.SetLogFile(logfile)

Parameters:

logfile (str) - log file name

sf.run(validate)

Parameters:

validate (function handle) - function return evaluation score and predictor input features dataset X, label series Y, used features, predictor, lossfunction handle