## Adding Your Custom Primitive

First, import the class `AutoMLClassifier`

In [1]:
from alpha_automl import AutoMLClassifier
import pandas as pd

### Generating Pipelines for CSV Datasets

In this example, we are generating pipelines for a CSV dataset. The 299_libras_move dataset is used for this example.

In [2]:
output_path = 'tmp/'
train_dataset = pd.read_csv('datasets/299_libras_move/train_data.csv')
test_dataset = pd.read_csv('datasets/299_libras_move/test_data.csv')

Removing the target column from the features for the train dataset

In [3]:
target_column = 'class'
X_train = train_dataset.drop(columns=[target_column])
X_train

Unnamed: 0,xcoord1,ycoord1,xcoord2,ycoord2,xcoord3,ycoord3,xcoord4,ycoord4,xcoord5,ycoord5,...,xcoord41,ycoord41,xcoord42,ycoord42,xcoord43,ycoord43,xcoord44,ycoord44,xcoord45,ycoord45
0,0.82979,0.76620,0.82979,0.76620,0.82979,0.77083,0.82785,0.77083,0.82979,0.76620,...,0.41199,0.45370,0.37524,0.43750,0.33269,0.43056,0.29787,0.44213,0.26886,0.47222
1,0.80271,0.54630,0.80077,0.54398,0.80271,0.54398,0.80271,0.54630,0.80271,0.54398,...,0.20503,0.64583,0.20503,0.68056,0.20696,0.71296,0.21083,0.74537,0.21277,0.77315
2,0.78917,0.59028,0.79110,0.59028,0.79110,0.59028,0.79304,0.59028,0.79110,0.59028,...,0.20503,0.57407,0.19149,0.57176,0.18569,0.57407,0.18956,0.57639,0.19149,0.56944
3,0.88395,0.61574,0.88201,0.61806,0.87234,0.62037,0.87041,0.61574,0.84526,0.61574,...,0.27079,0.65972,0.26886,0.62731,0.27660,0.59259,0.27660,0.56250,0.27853,0.53009
4,0.60155,0.77315,0.59768,0.77315,0.59961,0.77315,0.59381,0.77083,0.58801,0.75000,...,0.63830,0.47917,0.63250,0.52778,0.63830,0.57407,0.63830,0.63194,0.63830,0.68750
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
283,0.85300,0.57639,0.85493,0.57407,0.85300,0.57639,0.85300,0.57407,0.85493,0.56944,...,0.19923,0.80787,0.17021,0.79630,0.14313,0.75231,0.12959,0.69213,0.12959,0.62037
284,0.66925,0.78009,0.66925,0.78009,0.66925,0.78009,0.67118,0.78009,0.66925,0.78009,...,0.67311,0.28704,0.66925,0.26389,0.66731,0.24306,0.66731,0.22454,0.66538,0.20602
285,0.57060,0.65741,0.57060,0.65741,0.57060,0.65509,0.56867,0.65046,0.54739,0.64120,...,0.36944,0.62500,0.42166,0.62269,0.47389,0.62269,0.52418,0.63194,0.56286,0.64120
286,0.62282,0.65278,0.62476,0.65046,0.62669,0.64815,0.61315,0.63426,0.55319,0.60185,...,0.28433,0.66435,0.30754,0.63889,0.33462,0.61574,0.37331,0.58102,0.42747,0.54398


Selecting the target column for the train dataset

In [4]:
y_train = train_dataset[[target_column]]
y_train

Unnamed: 0,class
0,12
1,15
2,7
3,12
4,3
...,...
283,12
284,8
285,2
286,1


### Adding Your Primitives into AlphaAutoML's Search Space

In [5]:
automl = AutoMLClassifier(output_path, time_bound=1)

In [6]:
from alpha_automl.base_primitive import BasePrimitive

# Create a custom feature selection method
class MyDropFeatureSelector(BasePrimitive):
    # If running it in Windows or CUDA environment, this implementation should be in an external module.
    def __init__(self, variables):
        self.variables = variables
        
    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        X_dropped = X.drop(self.variables, axis = 1)
        return X_dropped

# Adding MyDropFeatureSelector to AlphaAutoML
object_dfs = MyDropFeatureSelector(['xcoord1', 'xcoord2'])
automl.add_primitives([(object_dfs, 'FEATURE_SELECTOR')])

### Searching  Pipelines

In [7]:
automl.fit(X_train, y_train)

INFO:alpha_automl.automl_api:Found pipeline, time=0:00:00, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.125
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:00, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.027777777777777776
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:00, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.18055555555555555
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:00, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:00, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.09722222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:00, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5972222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:01, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.08333333333333333
INFO:alpha_automl.au

After the pipeline search is complete, we can display the leaderboard:

In [8]:
automl.plot_leaderboard()

ranking,pipeline,accuracy_score
1,"MyDropFeatureSelector, KNeighborsClassifier",0.694
2,"MyDropFeatureSelector, LogisticRegression",0.639
3,"MyDropFeatureSelector, LinearDiscriminantAnalysis",0.597
4,"MaxAbsScaler, MultinomialNB",0.542
5,"MyDropFeatureSelector, MultinomialNB",0.542
6,"MaxAbsScaler, SelectKBest, KNeighborsClassifier",0.417
7,"MyDropFeatureSelector, QuadraticDiscriminantAnalysis",0.292
8,"MaxAbsScaler, GenericUnivariateSelect, ExtraTreesClassifier",0.264
9,"MaxAbsScaler, GenericUnivariateSelect, RandomForestClassifier",0.236
10,"MaxAbsScaler, GenericUnivariateSelect, BaggingClassifier",0.236


### Exploring Pipelines

In [None]:
automl.plot_comparison_pipelines()