## Solving Tabular Regression Tasks

First, import the class `AutoMLRegressor`

In [1]:
from alpha_automl  import AutoMLRegressor
import pandas as pd

### Generating Pipelines for CSV Datasets

In this example, we are generating pipelines for a CSV dataset. The 196_autoMpg dataset is used for this example.

In [2]:
output_path = 'tmp/'
train_dataset = pd.read_csv('datasets/196_autoMpg/train_data.csv')
test_dataset = pd.read_csv('datasets/196_autoMpg/test_data.csv')

Removing the target column from the features for the train dataset

In [3]:
target_column = 'class'
X_train = train_dataset.drop(columns=[target_column])
X_train

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model,origin
0,8,350.0,165.0,3693,11.5,70,1
1,8,318.0,150.0,3436,11.0,70,1
2,8,302.0,140.0,3449,10.5,70,1
3,8,454.0,220.0,4354,9.0,70,1
4,8,440.0,215.0,4312,8.5,70,1
...,...,...,...,...,...,...,...
293,4,144.0,96.0,2665,13.9,82,3
294,4,135.0,84.0,2370,13.0,82,1
295,4,151.0,90.0,2950,17.3,82,1
296,4,135.0,84.0,2295,11.6,82,1


Selecting the target column for the train dataset

In [4]:
y_train = train_dataset[[target_column]]
y_train

Unnamed: 0,class
0,15.0
1,18.0
2,17.0
3,14.0
4,14.0
...,...
293,32.0
294,36.0
295,27.0
296,32.0


### Searching  Pipelines

In [5]:
automl = AutoMLRegressor(output_path, time_bound=10)
automl.fit(X_train, y_train)

INFO:datamart_profiler.core:Setting column names from header
INFO:datamart_profiler.core:Identifying types, 7 columns...
INFO:datamart_profiler.core:Processing column 0 'cylinders'...
INFO:datamart_profiler.core:Column type http://schema.org/Integer []
INFO:datamart_profiler.core:Processing column 1 'displacement'...
INFO:datamart_profiler.core:Column type http://schema.org/Integer []
INFO:datamart_profiler.core:Processing column 2 'horsepower'...
INFO:datamart_profiler.core:Column type http://schema.org/Integer []
INFO:datamart_profiler.core:Processing column 3 'weight'...
INFO:datamart_profiler.core:Column type http://schema.org/Integer []
INFO:datamart_profiler.core:Processing column 4 'acceleration'...
INFO:datamart_profiler.core:Column type http://schema.org/Float []
INFO:datamart_profiler.core:Processing column 5 'model'...
INFO:datamart_profiler.core:Column type http://schema.org/Integer []
INFO:datamart_profiler.core:Processing column 6 'origin'...
INFO:datamart_profiler.core:C

INFO:alpha_automl.pipeline_search.Coach:COACH ACTION 0
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: S -> IMPUTATION ENCODERS FEATURE_SCALING FEATURE_SELECTION REGRESSION
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 1
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: IMPUTATION|ENCODERS|FEATURE_SCALING|FEATURE_SELECTION|REGRESSION
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: REGRESSION -> sklearn.linear_model.LinearRegression
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: IMPUTATION|ENCODERS|FEATURE_SCALING|FEATURE_SELECTION|sklearn.linear_model.LinearRegression
INFO:alpha_automl.pipeline_search.MCTS:Prediction 0.010728438
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 2
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: IMPUTATION|ENCODERS|FEATURE_SCALING|FEATURE_SELECTION|REGRESSION
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: REGRESSION -> skl

INFO:alpha_automl.automl_api:Scored pipeline, score=-11.084285028384294
INFO:alpha_automl.pipeline_search.MCTS:Prediction 0.011586031
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 2
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: S
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: S -> IMPUTATION ENCODERS FEATURE_SCALING FEATURE_SELECTION REGRESSION
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: IMPUTATION|ENCODERS|FEATURE_SCALING|FEATURE_SELECTION|REGRESSION
INFO:alpha_automl.pipeline_search.MCTS:Prediction 0.010356691
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 3
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: S
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: S -> IMPUTATION ENCODERS FEATURE_SCALING FEATURE_SELECTION REGRESSION
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: IMPUTATION|ENCODERS|FEATURE_SCALING|FEATURE_SELECTION|REGRESSION
INFO:alpha_a

INFO:alpha_automl.automl_api:Scored pipeline, score=-18.92494330404196
INFO:alpha_automl.pipeline_synthesis.pipeline_builder:New pipelined created:
Pipeline(steps=[('sklearn.impute.SimpleImputer',
                 SimpleImputer(strategy='most_frequent')),
                ('sklearn.preprocessing.MaxAbsScaler', MaxAbsScaler()),
                ('sklearn.feature_selection.SelectKBest', SelectKBest()),
                ('sklearn.linear_model.RidgeCV', RidgeCV())])
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-11.617772389342438
Traceback (most recent call last):
  File "/Users/rlopez/D3M/alpha-automl/alpha_automl/scorer.py", line 73, in score_pipeline
    scores = cross_val_score(pipeline, X, y, cv=splitting_strategy, scoring=scoring, error_score='raise')
  File "/Users/rlopez/opt/anaconda3/envs/alphaautoml/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 515, in cross_val_score
    cv_resu

INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:findwin 1
INFO:alpha_automl.pipeline_search.Coach:COACH ACTION 1
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: ENCODERS -> E
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 1
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: IMPUTATION|sklearn.preprocessing.StandardScaler|sklearn.linear_model.LinearRegression
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: IMPUTATION -> sklearn.impute.SimpleImputer
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: sklearn.impute.SimpleImputer|sklearn.preprocessing.StandardScaler|sklearn.linear_model.LinearRegression
INFO:alpha_automl.automl_api:Scored pipeline, score=-10.982554036634546
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:findwin -1
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 2
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: IMPUTATION|sklearn.preprocessing.Standard

INFO:alpha_automl.automl_api:Scored pipeline, score=-14.347166795046391
INFO:alpha_automl.pipeline_search.pipeline.NNet:EPOCH ::: 1
INFO:alpha_automl.pipeline_search.pipeline.NNet:EPOCH ::: 2
INFO:alpha_automl.pipeline_search.Coach:PITTING AGAINST PREVIOUS VERSION
INFO:alpha_automl.pipeline_search.Arena:Turn 1
INFO:alpha_automl.pipeline_search.Arena:Player 1
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: S
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 1
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: S
INFO:alpha_automl.pipeline_search.MCTS:Prediction 0.011586031
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 2
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: S
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: S -> IMPUTATION ENCODERS FEATURE_SCALING FEATURE_SELECTION REGRESSION
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: IMPUTATION|ENCODERS|FEATURE_SCALING|FEATURE_SELECTION

INFO:alpha_automl.scorer:Score: -18.95418649071102
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 3
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: sklearn.impute.SimpleImputer|sklearn.preprocessing.MaxAbsScaler|sklearn.feature_selection.GenericUnivariateSelect|REGRESSION
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: REGRESSION -> sklearn.linear_model.RidgeCV
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: sklearn.impute.SimpleImputer|sklearn.preprocessing.MaxAbsScaler|sklearn.feature_selection.GenericUnivariateSelect|sklearn.linear_model.RidgeCV
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:findwin -1
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 4
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: sklearn.impute.SimpleImputer|sklearn.preprocessing.MaxAbsScaler|sklearn.feature_selection.GenericUnivariateSelect|REGRESSION
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE AC

INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 4
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: IMPUTATION|ENCODERS|FEATURE_SCALING|FEATURE_SELECTION|REGRESSION
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: REGRESSION -> sklearn.linear_model.Ridge
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: IMPUTATION|ENCODERS|FEATURE_SCALING|FEATURE_SELECTION|sklearn.linear_model.Ridge
INFO:alpha_automl.pipeline_search.MCTS:Prediction 0.011047
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 5
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: IMPUTATION|ENCODERS|FEATURE_SCALING|FEATURE_SELECTION|REGRESSION
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: REGRESSION -> sklearn.ensemble.ExtraTreesRegressor
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: IMPUTATION|ENCODERS|FEATURE_SCALING|FEATURE_SELECTION|sklearn.ensemble.ExtraTreesRegressor
INFO:alpha_automl.pipeline_se

INFO:alpha_automl.scorer:Score: -20.432852435034484
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 2
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: sklearn.impute.SimpleImputer|sklearn.preprocessing.MaxAbsScaler|sklearn.feature_selection.GenericUnivariateSelect|REGRESSION
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: REGRESSION -> sklearn.ensemble.RandomForestRegressor
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: sklearn.impute.SimpleImputer|sklearn.preprocessing.MaxAbsScaler|sklearn.feature_selection.GenericUnivariateSelect|sklearn.ensemble.RandomForestRegressor
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:03, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-20.432852435034484
INFO:alpha_automl.pipeline_synthesis.pipeline_builder:New pipelined created:
Pipeline(steps=[('sklearn.impute.SimpleImputer',
                 SimpleImputer(strategy='most_frequent')),
                ('sklearn.prepr

INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: REGRESSION -> sklearn.linear_model.HuberRegressor
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: sklearn.impute.SimpleImputer|sklearn.preprocessing.MaxAbsScaler|FEATURE_SELECTION|sklearn.linear_model.HuberRegressor
INFO:alpha_automl.pipeline_search.MCTS:Prediction 0.011981594
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 3
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: sklearn.impute.SimpleImputer|sklearn.preprocessing.MaxAbsScaler|FEATURE_SELECTION|REGRESSION
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: REGRESSION -> sklearn.neighbors.KNeighborsRegressor
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: sklearn.impute.SimpleImputer|sklearn.preprocessing.MaxAbsScaler|FEATURE_SELECTION|sklearn.neighbors.KNeighborsRegressor
INFO:alpha_automl.pipeline_search.MCTS:Prediction 0.012327208
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMU

INFO:alpha_automl.automl_api:Scored pipeline, score=-19.668361146810472
INFO:alpha_automl.scorer:Score: -18.0
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:findwin -1
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: REGRESSION -> sklearn.linear_model.ARDRegression
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:findwin 1
INFO:alpha_automl.pipeline_search.Arena:Turn 6
INFO:alpha_automl.pipeline_search.Arena:Player 1
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: sklearn.impute.SimpleImputer|sklearn.preprocessing.MaxAbsScaler|sklearn.feature_selection.GenericUnivariateSelect|sklearn.linear_model.ARDRegression
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:findwin 1
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:findwin 1
INFO:alpha_automl.pipeline_search.Arena:Turn 1
INFO:alpha_automl.pipeline_search.Arena:Player 1
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: S
INFO:alpha_automl.pipeline_sear

INFO:alpha_automl.automl_api:Scored pipeline, score=-17.80821710539882
INFO:alpha_automl.scorer:Score: -19.973513956840808
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:findwin -1
INFO:alpha_automl.pipeline_search.MCTS:MCTS SIMULATION 3
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: sklearn.impute.SimpleImputer|sklearn.preprocessing.MaxAbsScaler|sklearn.feature_selection.GenericUnivariateSelect|REGRESSION
INFO:alpha_automl.pipeline_search.pipeline.PipelineLogic:MOVE ACTION: REGRESSION -> sklearn.linear_model.LassoCV
INFO:alpha_automl.pipeline_search.pipeline.PipelineGame:PIPELINE: sklearn.impute.SimpleImputer|sklearn.preprocessing.MaxAbsScaler|sklearn.feature_selection.GenericUnivariateSelect|sklearn.linear_model.LassoCV
INFO:alpha_automl.pipeline_synthesis.pipeline_builder:New pipelined created:
Pipeline(steps=[('sklearn.impute.SimpleImputer',
                 SimpleImputer(strategy='most_frequent')),
                ('sklearn.preprocessing.MaxAbsScaler', 

After the pipeline search is complete, we can display the leaderboard:

In [6]:
automl.plot_leaderboard()

ranking,pipeline,max_error
1,"SimpleImputer, StandardScaler, LinearRegression",-10.982554
2,"SimpleImputer, StandardScaler, RidgeCV",-11.084285
3,"SimpleImputer, MaxAbsScaler, RidgeCV",-11.617772
4,"SimpleImputer, RobustScaler, SVR",-14.347167
5,"SimpleImputer, MaxAbsScaler, GenericUnivariateSelect, Lasso",-17.808217
6,"SimpleImputer, MaxAbsScaler, GenericUnivariateSelect, ExtraTreesRegressor",-18.0
7,"SimpleImputer, MaxAbsScaler, GenericUnivariateSelect, DecisionTreeRegressor",-18.0
8,"SimpleImputer, MaxAbsScaler, GenericUnivariateSelect, Ridge",-18.591397
9,"SimpleImputer, MaxAbsScaler, GenericUnivariateSelect, LassoCV",-18.90213
10,"SimpleImputer, MaxAbsScaler, GenericUnivariateSelect, RidgeCV",-18.924943


Removing the target column from the features for the test dataset

In [7]:
X_test = test_dataset.drop(columns=[target_column])
X_test

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model,origin
0,8,307,130.0,3504,12.0,70,1
1,8,304,150.0,3433,12.0,70,1
2,8,429,198.0,4341,10.0,70,1
3,8,390,190.0,3850,8.5,70,1
4,6,198,95.0,2833,15.5,70,1
...,...,...,...,...,...,...,...
95,6,181,110.0,2945,16.4,82,1
96,6,232,112.0,2835,14.7,82,1
97,4,140,86.0,2790,15.6,82,1
98,4,97,52.0,2130,24.6,82,2


Selecting the target column for the test dataset

In [8]:
y_test = test_dataset[[target_column]]
y_test

Unnamed: 0,class
0,18.0
1,16.0
2,15.0
3,15.0
4,22.0
...,...
95,25.0
96,22.0
97,27.0
98,44.0


Pipeline predictions are accessed with:

In [9]:
y_pred = automl.predict(X_test)
y_pred

array([[14.88126855],
       [15.11125796],
       [ 9.87748377],
       [12.63954372],
       [18.92734178],
       [25.54918914],
       [22.72537224],
       [ 7.47411631],
       [23.07557363],
       [25.7592489 ],
       [21.14299456],
       [ 9.89495727],
       [ 6.45203815],
       [19.2127908 ],
       [22.57707044],
       [27.44120371],
       [25.14682615],
       [26.33190746],
       [11.23649046],
       [10.88025371],
       [13.72269498],
       [12.72190265],
       [12.77477908],
       [20.43845785],
       [23.82274486],
       [20.75269993],
       [24.8211903 ],
       [27.46991951],
       [ 8.35733714],
       [12.55358309],
       [ 9.68480114],
       [20.86344088],
       [ 8.73571356],
       [27.27370345],
       [22.9574655 ],
       [25.64477326],
       [11.98137611],
       [26.68863942],
       [16.07830408],
       [21.98728636],
       [23.88439743],
       [13.13527892],
       [26.87858751],
       [26.78901014],
       [20.3738318 ],
       [12

The pipeline can be evaluated against a held out dataset with the function call:

In [10]:
automl.score(X_test, y_test)

INFO:alpha_automl.automl_api:Metric: max_error, Score: 9.331098492370522


{'metric': 'max_error', 'score': 9.331098492370522}

### Visualizing pipelines using Pipeline Profiler

In order to explore the produced pipelines, we can use [PipelineProfiler](https://github.com/VIDA-NYU/PipelineVis). PipelineProfiler is a visualization that enables users to compare and explore the pipelines generated by the AlphaAutoML system.

After the pipeline search process is completed, we can use PipelineProfiler with:

In [None]:
automl.plot_comparison_pipelines()

For more information about how to use PipelineProfiler, click [here](https://towardsdatascience.com/exploring-auto-sklearn-models-with-pipelineprofiler-5b2c54136044). There is also a video demo available [here](https://www.youtube.com/watch?v=2WSYoaxLLJ8).