## Solving Tabular Regression Tasks

First, import the class `AutoMLRegressor`

In [1]:
from alpha_automl import AutoMLRegressor
import pandas as pd

### Generating Pipelines for CSV Datasets

In this example, we are generating pipelines for a CSV dataset. The 196_autoMpg dataset is used for this example.

In [2]:
output_path = 'tmp/'
train_dataset = pd.read_csv('datasets/196_autoMpg/train_data.csv')
test_dataset = pd.read_csv('datasets/196_autoMpg/test_data.csv')

Removing the target column from the features for the train dataset

In [3]:
target_column = 'class'
X_train = train_dataset.drop(columns=[target_column])
X_train

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model,origin
0,8,350.0,165.0,3693,11.5,70,1
1,8,318.0,150.0,3436,11.0,70,1
2,8,302.0,140.0,3449,10.5,70,1
3,8,454.0,220.0,4354,9.0,70,1
4,8,440.0,215.0,4312,8.5,70,1
...,...,...,...,...,...,...,...
293,4,144.0,96.0,2665,13.9,82,3
294,4,135.0,84.0,2370,13.0,82,1
295,4,151.0,90.0,2950,17.3,82,1
296,4,135.0,84.0,2295,11.6,82,1


Selecting the target column for the train dataset

In [4]:
y_train = train_dataset[[target_column]]
y_train

Unnamed: 0,class
0,15.0
1,18.0
2,17.0
3,14.0
4,14.0
...,...
293,32.0
294,36.0
295,27.0
296,32.0


### Searching  Pipelines

In [5]:
automl = AutoMLRegressor(output_path, time_bound=10)
automl.fit(X_train, y_train)

INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=2.7033732051184005
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=3.320064448003014
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=2.6358713146555934
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=3.320064448003014
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=2.7042504499847975
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=2.4910210599897056
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=3.3182052729132256
INFO:alpha_auto

INFO:alpha_automl.automl_api:Scored pipeline, score=2.7836817793853896
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:06, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=2.84296743108743
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:06, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=2.8367634922879765
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:06, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=2.8305782029082014
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:06, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=3.320491890780021
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:06, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=3.612588678056791
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:06, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=3.3225426339938173
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:06, scoring...
INFO:alpha_automl

INFO:alpha_automl.automl_api:Found pipeline, time=0:00:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=3.4646257132067064
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=2.786475050875051
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=3.4421053430164883
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=2.7083452572670157
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=3.5524322397267345
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=2.8237333333333328
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:17, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=3.3225600839490532
INFO:alpha_aut

[E thread_pool.cpp:113] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:113] Exception in thread pool task: mutex lock failed: Invalid argument
[E thread_pool.cpp:113] Exception in thread pool task: mutex lock failed: Invalid argument


### Exploring Pipelines

After the pipeline search is complete, we can display the leaderboard:

In [6]:
automl.plot_leaderboard()

ranking,pipeline,mean_absolute_error
1,"SimpleImputer, StandardScaler, RandomForestRegressor",1.869
2,"SimpleImputer, RobustScaler, ExtraTreesRegressor",1.891
3,"SimpleImputer, RandomForestRegressor",1.898
4,"SimpleImputer, StandardScaler, ExtraTreesRegressor",1.914
5,"SimpleImputer, MaxAbsScaler, RandomForestRegressor",1.926
6,"SimpleImputer, MaxAbsScaler, ExtraTreesRegressor",1.93
7,"SimpleImputer, RobustScaler, RandomForestRegressor",1.988
8,"SimpleImputer, RobustScaler, GradientBoostingRegressor",2.05
9,"SimpleImputer, MaxAbsScaler, GradientBoostingRegressor",2.064
10,"SimpleImputer, RobustScaler, KNeighborsRegressor",2.313


In order to explore the produced pipelines, we can use [PipelineProfiler](https://github.com/VIDA-NYU/PipelineVis). PipelineProfiler is a visualization that enables users to compare and explore the pipelines generated by the AlphaAutoML system.

After the pipeline search process is completed, we can use PipelineProfiler with:

In [None]:
automl.plot_comparison_pipelines()

For more information about how to use PipelineProfiler, click [here](https://towardsdatascience.com/exploring-auto-sklearn-models-with-pipelineprofiler-5b2c54136044). There is also a video demo available [here](https://www.youtube.com/watch?v=2WSYoaxLLJ8).

### Testing Pipelines

Removing the target column from the features for the test dataset

In [8]:
X_test = test_dataset.drop(columns=[target_column])
X_test

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model,origin
0,8,307,130.0,3504,12.0,70,1
1,8,304,150.0,3433,12.0,70,1
2,8,429,198.0,4341,10.0,70,1
3,8,390,190.0,3850,8.5,70,1
4,6,198,95.0,2833,15.5,70,1
...,...,...,...,...,...,...,...
95,6,181,110.0,2945,16.4,82,1
96,6,232,112.0,2835,14.7,82,1
97,4,140,86.0,2790,15.6,82,1
98,4,97,52.0,2130,24.6,82,2


Selecting the target column for the test dataset

In [9]:
y_test = test_dataset[[target_column]]
y_test

Unnamed: 0,class
0,18.0
1,16.0
2,15.0
3,15.0
4,22.0
...,...
95,25.0
96,22.0
97,27.0
98,44.0


Pipeline predictions are accessed with:

In [10]:
y_pred = automl.predict(X_test)
y_pred

array([16.006, 16.881, 13.7  , 13.815, 18.974, 26.54 , 23.96 , 11.46 ,
       23.847, 24.971, 20.051, 13.06 , 12.265, 18.766, 22.51 , 27.748,
       27.225, 24.626, 12.92 , 12.26 , 14.785, 14.535, 14.45 , 20.71 ,
       23.565, 20.547, 22.551, 25.975, 11.79 , 14.375, 11.9  , 18.854,
       12.51 , 24.95 , 22.34 , 25.264, 13.37 , 26.005, 14.24 , 19.352,
       24.946, 14.309, 25.91 , 26.168, 19.489, 14.325, 24.234, 26.63 ,
       28.905, 23.841, 22.83 , 30.486, 27.431, 30.366, 14.803, 21.249,
       30.579, 18.19 , 18.771, 15.379, 15.288, 24.684, 28.734, 37.447,
       35.227, 20.838, 19.877, 18.745, 23.022, 19.548, 21.159, 17.521,
       16.632, 34.246, 19.123, 26.637, 37.821, 29.371, 26.239, 36.044,
       32.846, 26.368, 35.096, 35.351, 36.533, 29.394, 29.787, 29.827,
       26.586, 34.352, 36.485, 35.733, 36.875, 35.472, 33.981, 28.269,
       29.111, 26.971, 40.269, 29.548])

The pipeline can be evaluated against a held out dataset with the function call:

In [11]:
automl.score(X_test, y_test)

INFO:alpha_automl.automl_api:Metric: mean_absolute_error, Score: 1.7187900000000007


{'metric': 'mean_absolute_error', 'score': 1.7187900000000007}