MLFeatureSelection

General features selection based on certain machine learning algorithm and evaluation methods

Function Parameters

sf = sequence_selection.Select(Sequence=True, Random=True, Cross=True)

Parameters:

Sequence (bool, optional, (defualt=True)) - switch for sequence selection selection include forward,backward and simulate anneal selection

Random (bool, optional, (defualt=True)) - switch for randomly selection of features combination

Cross (bool, optional, (defualt=True)) - switch for cross term generate, need to set sf.ImportCrossMethod() after

sf.ImportDF(df,label)

Parameters:

df (pandas.DataFrame) - dataframe includes include all features

label (str) - name of the label column

sf.ImportLossFunction(lossfunction,direction)

Parameters:

lossfunction (function handle) - handle of the loss function, function should return score as float (logloss, AUC, etc)

direction (str,'ascend'/'descend') - direction to improve, 'descend' for logloss, 'ascend' for AUC, etc

sf.InitialFeatures(features)

Parameters:

features (list, optional, (defualt=[])) - list of initial features combination, empty list will drive code to start from nothing list with all trainable features will drive code to start backward searching at the beginning

sf.InitialNonTrainableFeatures(features) #only for sequence selection

Parameters:

features (list) - list of features that not trainable (labelname, string, datetime, etc)

sf.GenerateCol(key=None,selectstep=1) #only for sequence selection

Parameters:

key (str or list of str, optional, default=None) - only the features with keyword will be seleted, default to be None

selectstep (int, optional, default=1) - value for features selection step

sf.SetFeatureEachRound(df,ser, featureeachroundRandom=False)

Parameters:

ser (int) - randomly select ser features from all features each round, can speed up adding features if there is a lot of features

ser (bool) - if it is true, ser features will be selected randomly from features pool, if false, they will be selected chunk by chunk

sf.SelectRemoveMode(frac=1,batch=1,key='')

Parameters:

frac (float, optional, default=1) - percentage of delete features from all features default to be set as using the batch

batch (int, optional, default=1) - delete features quantity every iteration

key (str, optional, default=None) - only delete the features with keyword

sf.ImportCrossMethod(CrossMethod)

Parameters:

CrossMethod (dict) - different cross method like add, divide, multiple and substraction

sf.AddPotentialFeatures(features)

Parameters:

features (list, optional, default=[]_) - list of strong features, switch for simulate anneal

sf.SetTimeLimit(TimeLimit=inf)

Parameters:

TimeLimit (float, optional, default=inf) - maximum running time, unit in minute

sf.SetFeaturesLimit(FeaturesLimit=inf)

Parameters:

FeaturesLimit (int, optional, default=inf_) - maximum feature quantity

sf.SetClassifier(clf)

Parameters:

clf (predictor) - classfier or estimator, sklearn, xgboost, lightgbm, etc. Need to match the validate function

sf.SetLogFile(logfile)

Parameters:

logfile (str) - log file name

sf.run(validate)

Parameters:

validate (function handle) - function return evaluation score and predictor input features dataset X, label series Y, used features, predictor, lossfunction handle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MLFeatureSelection

Function Parameters

sf = sequence_selection.Select(Sequence=True, Random=True, Cross=True)

Parameters:

sf.ImportDF(df,label)

Parameters:

sf.ImportLossFunction(lossfunction,direction)

Parameters:

sf.InitialFeatures(features)

Parameters:

sf.InitialNonTrainableFeatures(features) #only for sequence selection

Parameters:

sf.GenerateCol(key=None,selectstep=1) #only for sequence selection

Parameters:

sf.SetFeatureEachRound(df,ser, featureeachroundRandom=False)

Parameters:

sf.SelectRemoveMode(frac=1,batch=1,key='')

Parameters:

sf.ImportCrossMethod(CrossMethod)

Parameters:

sf.AddPotentialFeatures(features)

Parameters:

sf.SetTimeLimit(TimeLimit=inf)

Parameters:

sf.SetFeaturesLimit(FeaturesLimit=inf)

Parameters:

sf.SetClassifier(clf)

Parameters:

sf.SetLogFile(logfile)

Parameters:

sf.run(validate)

Parameters:

Files

README.md

Latest commit

History

README.md

File metadata and controls

MLFeatureSelection

Function Parameters

sf = sequence_selection.Select(Sequence=True, Random=True, Cross=True)

Parameters:

sf.ImportDF(df,label)

Parameters:

sf.ImportLossFunction(lossfunction,direction)

Parameters:

sf.InitialFeatures(features)

Parameters:

sf.InitialNonTrainableFeatures(features) #only for sequence selection

Parameters:

sf.GenerateCol(key=None,selectstep=1) #only for sequence selection

Parameters:

sf.SetFeatureEachRound(df,ser, featureeachroundRandom=False)

Parameters:

sf.SelectRemoveMode(frac=1,batch=1,key='')

Parameters:

sf.ImportCrossMethod(CrossMethod)

Parameters:

sf.AddPotentialFeatures(features)

Parameters:

sf.SetTimeLimit(TimeLimit=inf)

Parameters:

sf.SetFeaturesLimit(FeaturesLimit=inf)

Parameters:

sf.SetClassifier(clf)

Parameters:

sf.SetLogFile(logfile)

Parameters:

sf.run(validate)

Parameters: