## Solving Tabular Regression Tasks

First, import the class `AutoMLRegressor`

In [1]:
from alpha_automl  import AutoMLRegressor
import pandas as pd

### Generating Pipelines for CSV Datasets

In this example, we are generating pipelines for a CSV dataset. The 196_autoMpg dataset is used for this example.

In [2]:
output_path = 'tmp/'
train_dataset = pd.read_csv('datasets/196_autoMpg/train_data.csv')
test_dataset = pd.read_csv('datasets/196_autoMpg/test_data.csv')

Removing the target column from the features for the train dataset

In [3]:
target_column = 'class'
X_train = train_dataset.drop(columns=[target_column])
X_train

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model,origin
0,8,350.0,165.0,3693,11.5,70,1
1,8,318.0,150.0,3436,11.0,70,1
2,8,302.0,140.0,3449,10.5,70,1
3,8,454.0,220.0,4354,9.0,70,1
4,8,440.0,215.0,4312,8.5,70,1
...,...,...,...,...,...,...,...
293,4,144.0,96.0,2665,13.9,82,3
294,4,135.0,84.0,2370,13.0,82,1
295,4,151.0,90.0,2950,17.3,82,1
296,4,135.0,84.0,2295,11.6,82,1


Selecting the target column for the train dataset

In [4]:
y_train = train_dataset[[target_column]]
y_train

Unnamed: 0,class
0,15.0
1,18.0
2,17.0
3,14.0
4,14.0
...,...
293,32.0
294,36.0
295,27.0
296,32.0


### Searching  Pipelines

In [5]:
automl = AutoMLRegressor(output_path, time_bound=10)
automl.fit(X_train, y_train)

INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-11.084285028384294
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-18.92494330404196
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-11.617772389342438
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-18.92494330404196
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-10.982554036634546
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:03, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-14.347166795046391
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:03, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-18.95418649071102
INFO:alph

INFO:alpha_automl.automl_api:Scored pipeline, score=-19.15292857142857
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:06, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-18.0
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:06, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-19.41747619047619
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:07, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-18.0
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:07, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-18.91822000563559
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:07, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-17.55926149284314
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:07, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=-19.21206058472361
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:07, scoring...
INFO:alpha_automl.automl_api:Scored pip

After the pipeline search is complete, we can display the leaderboard:

In [6]:
automl.plot_leaderboard()

ranking,pipeline,max_error
1,"SimpleImputer, RobustScaler, Lars",-10.363686
2,"SimpleImputer, StandardScaler, Lars",-10.535857
3,"SimpleImputer, MaxAbsScaler, Lars",-10.750064
4,"SimpleImputer, Lars",-10.930767
5,"SimpleImputer, MaxAbsScaler, LinearRegression",-10.982554
6,"SimpleImputer, StandardScaler, LinearRegression",-10.982554
7,"SimpleImputer, RobustScaler, LinearRegression",-10.982554
8,"SimpleImputer, LinearRegression",-10.982554
9,"SimpleImputer, Ridge",-10.98471
10,"SimpleImputer, RidgeCV",-11.00335


Removing the target column from the features for the test dataset

In [7]:
X_test = test_dataset.drop(columns=[target_column])
X_test

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model,origin
0,8,307,130.0,3504,12.0,70,1
1,8,304,150.0,3433,12.0,70,1
2,8,429,198.0,4341,10.0,70,1
3,8,390,190.0,3850,8.5,70,1
4,6,198,95.0,2833,15.5,70,1
...,...,...,...,...,...,...,...
95,6,181,110.0,2945,16.4,82,1
96,6,232,112.0,2835,14.7,82,1
97,4,140,86.0,2790,15.6,82,1
98,4,97,52.0,2130,24.6,82,2


Selecting the target column for the test dataset

In [8]:
y_test = test_dataset[[target_column]]
y_test

Unnamed: 0,class
0,18.0
1,16.0
2,15.0
3,15.0
4,22.0
...,...
95,25.0
96,22.0
97,27.0
98,44.0


Pipeline predictions are accessed with:

In [9]:
y_pred = automl.predict(X_test)
y_pred

array([16.17594075, 16.5593719 ,  8.40615905, 12.25031677, 19.59061757,
       25.37900055, 22.27470123,  6.83028964, 22.18710614, 25.27320556,
       21.51490654,  8.7639713 ,  4.95537747, 18.84548957, 21.64539569,
       27.83326706, 25.53160554, 25.89589633, 10.27116937,  9.90857079,
       14.71282856, 13.45786287, 13.40529068, 19.03090845, 23.05481909,
       19.49683107, 23.96098277, 27.51667104,  6.32891573, 13.05267487,
        7.76875196, 21.59613888,  7.18102676, 27.25485452, 24.75580235,
       25.45527664, 11.14569674, 26.36192631, 16.71799224, 22.81199429,
       22.83549187, 13.89607338, 26.8280351 , 26.33093855, 20.36982564,
       12.68723518, 22.8418091 , 27.7639779 , 29.700172  , 23.62110191,
       23.42186255, 32.84132424, 28.3261874 , 33.66217693, 17.70705711,
       26.79608139, 32.93052208, 18.85075878, 19.87289665, 15.56689456,
       14.79151918, 23.52046125, 28.90573404, 34.14041032, 35.10682171,
       24.67415083, 21.29016813, 23.46852583, 23.67614548, 24.95

The pipeline can be evaluated against a held out dataset with the function call:

In [10]:
automl.score(X_test, y_test)

INFO:alpha_automl.automl_api:Metric: max_error, Score: 8.569593468604381


{'metric': 'max_error', 'score': 8.569593468604381}

### Visualizing pipelines using Pipeline Profiler

In order to explore the produced pipelines, we can use [PipelineProfiler](https://github.com/VIDA-NYU/PipelineVis). PipelineProfiler is a visualization that enables users to compare and explore the pipelines generated by the AlphaAutoML system.

After the pipeline search process is completed, we can use PipelineProfiler with:

In [None]:
automl.plot_comparison_pipelines()

For more information about how to use PipelineProfiler, click [here](https://towardsdatascience.com/exploring-auto-sklearn-models-with-pipelineprofiler-5b2c54136044). There is also a video demo available [here](https://www.youtube.com/watch?v=2WSYoaxLLJ8).