## Export pipelines to Python code

First, import the class `AutoML`. If you plan to use AlphaD3m via Docker/Singularity, use:
`DockerAutoML` or `SingularityAutoML` classes.

In [1]:
from alphad3m import AutoML
# from alphad3m_containers import DockerAutoML/SingularityAutoML as AutoML

### Generating pipelines for CSV datasets

In this example, we are generating pipelines for a CSV dataset. The [185_baseball_MIN_METADATA dataset](https://gitlab.com/ViDA-NYU/d3m/alphad3m/-/tree/devel/examples/datasets) is used for this example. The baseball dataset contains information about baseball players and play statistics, including Games_played, At_bats, Runs, Hits, Doubles, Triples, Home_runs, RBIs, Walks, Strikeouts, Batting_average, On_base_pct, Slugging_pct and Fielding_ave.
<div class="alert alert-info"><b>Important</b>
    
Note that the path to the dataset directories must be an absolute path and not a relative path. 
</div>

In [1]:
output_path = '/Users/rlopez/D3M/examples/tmp/'
train_dataset = '/Users/rlopez/D3M/examples/datasets/185_baseball_MIN_METADATA/train_data.csv'
test_dataset = '/Users/rlopez/D3M/examples/datasets/185_baseball_MIN_METADATA/test_data.csv'

In [2]:
automl = AutoML(output_path)
automl.search_pipelines(train_dataset, time_bound=1, target='Hall_of_Fame', metric='f1Macro', task_keywords=['classification', 'multiClass', 'tabular'])

INFO: Initializing AlphaD3M AutoML...
INFO: AlphaD3M AutoML initialized!
INFO: Found pipeline id=87e725e9-09ac-4fa1-bf37-b96f3e101dbb, time=0:00:20.347129, scoring...
INFO: Scored pipeline id=87e725e9-09ac-4fa1-bf37-b96f3e101dbb, f1_macro=0.64214
INFO: Found pipeline id=c6384850-c1f3-42cf-8b3c-cad174a0595c, time=0:00:35.742241, scoring...
INFO: Scored pipeline id=c6384850-c1f3-42cf-8b3c-cad174a0595c, f1_macro=0.61677
INFO: Found pipeline id=279e8c54-4d3b-4a46-9a32-8321a66e899a, time=0:00:51.113531, scoring...
INFO: Search completed, still scoring some pending pipelines...
INFO: Scored pipeline id=279e8c54-4d3b-4a46-9a32-8321a66e899a, f1_macro=0.71535
INFO: Scoring completed for all pipelines!


After the pipeline search is complete, we can display the leaderboard:

In [3]:
automl.plot_leaderboard()

Unnamed: 0,ranking,id,summary,f1_macro
0,1,279e8c54-4d3b-4a46-9a32-8321a66e899a,"imputer.sklearn, encoder.dsbox, gradient_boosting.sklearn",0.71535
1,2,87e725e9-09ac-4fa1-bf37-b96f3e101dbb,"imputer.sklearn, encoder.dsbox, random_forest.sklearn",0.64214
2,3,c6384850-c1f3-42cf-8b3c-cad174a0595c,"imputer.sklearn, encoder.dsbox, extra_trees.sklearn",0.61677


Individual pipelines need to be trained with the full data. The training is done with the call:

In [4]:
best_pipeline_id = automl.get_best_pipeline_id()
automl.score(best_pipeline_id, test_dataset)

('f1_macro', 0.64322)

### Exporting Python code for a pipeline

In [5]:
automl.export_pipeline_code(best_pipeline_id)

In [6]:
from d3m_interface.pipeline import Pipeline

pipeline = Pipeline()

input_data = pipeline.make_pipeline_input()

step_0 = pipeline.make_pipeline_step('d3m.primitives.data_transformation.denormalize.Common')
pipeline.connect(input_data, step_0, from_output='dataset')

step_1 = pipeline.make_pipeline_step('d3m.primitives.data_transformation.dataset_to_dataframe.Common')
pipeline.connect(step_0, step_1)

step_2 = pipeline.make_pipeline_step('d3m.primitives.data_transformation.column_parser.Common')
pipeline.connect(step_1, step_2)

step_3 = pipeline.make_pipeline_step('d3m.primitives.data_transformation.extract_columns_by_semantic_types.Common')
pipeline.set_hyperparams(step_3, exclude_columns=[], semantic_types=['https://metadata.datadrivendiscovery.org/types/Attribute'])
pipeline.connect(step_2, step_3)

step_4 = pipeline.make_pipeline_step('d3m.primitives.data_cleaning.imputer.SKlearn')
pipeline.set_hyperparams(step_4, strategy='most_frequent')
pipeline.connect(step_3, step_4)

step_5 = pipeline.make_pipeline_step('d3m.primitives.data_preprocessing.encoder.DSBOX')
pipeline.set_hyperparams(step_5, n_limit=50)
pipeline.connect(step_4, step_5)

step_6 = pipeline.make_pipeline_step('d3m.primitives.data_transformation.extract_columns_by_semantic_types.Common')
pipeline.set_hyperparams(step_6, semantic_types=['https://metadata.datadrivendiscovery.org/types/TrueTarget'])
pipeline.connect(step_1, step_6)

step_7 = pipeline.make_pipeline_step('d3m.primitives.classification.gradient_boosting.SKlearn')
pipeline.connect(step_5, step_7)
pipeline.connect(step_6, step_7, to_input='outputs')

step_8 = pipeline.make_pipeline_step('d3m.primitives.data_transformation.construct_predictions.Common')
pipeline.connect(step_1, step_8, to_input='reference')
pipeline.connect(step_7, step_8)


The pipeline can be evaluated against a held out dataset with the function call:

In [7]:
automl.score(pipeline, test_dataset)

('f1_macro', 0.64322)

After the analysis is complete, end the session to stop the process and clean up temporary files:

In [8]:
automl.end_session()

INFO: Ending session...
INFO: Session ended!
