## Use Case 2

First, import the class `AutoML`. If you plan to use AlphaD3m via Docker/Singularity, use:
`DockerAutoML` or `SingularityAutoML` classes.

In [1]:
from alphad3m import AutoML
# from alphad3m_containers import DockerAutoML/SingularityAutoML as AutoML

In this example, we are generating pipelines for a CSV dataset. The [185_baseball_MIN_METADATA dataset](https://gitlab.com/ViDA-NYU/d3m/alphad3m/-/tree/devel/examples/datasets) is used for this example. The baseball dataset contains information about baseball players and play statistics, including Games_played, At_bats, Runs, Hits, Doubles, Triples, Home_runs, RBIs, Walks, Strikeouts, Batting_average, On_base_pct, Slugging_pct and Fielding_ave.

In [2]:
output_path = '/Users/rlopez/D3M/examples/tmp/'
train_dataset = '/Users/rlopez/D3M/examples/datasets/185_baseball_MIN_METADATA/train_data.csv'
test_dataset = '/Users/rlopez/D3M/examples/datasets/185_baseball_MIN_METADATA/test_data.csv'

In [3]:
automl = AutoML(output_path)
automl.search_pipelines(train_dataset, time_bound=30, target='Hall_of_Fame', metric='f1Macro', task_keywords=['classification', 'multiClass', 'tabular'], method='cross_validation')

INFO: Initializing AlphaD3M AutoML...
INFO: Starting process...
INFO: Connecting via gRPC to localhost:47086...
INFO: AlphaD3M AutoML initialized!
INFO: Found pipeline id=22411ed6-623b-411d-a5a5-41249404b3b4, time=0:06:50.848407, scoring...
INFO: Found pipeline id=95695c63-c766-4078-9de7-cc876bddcb07, time=0:07:06.038738, scoring...
INFO: Found pipeline id=690c996c-e7df-44a4-a753-dc71e22a8b68, time=0:07:21.233511, scoring...
INFO: Found pipeline id=0cf7e562-dc32-4588-be74-4ed3c5414505, time=0:07:42.471797, scoring...
INFO: Found pipeline id=a4992507-b8e1-43a3-bb9a-41d2d40e5e6e, time=0:08:03.888186, scoring...
INFO: Scored pipeline id=22411ed6-623b-411d-a5a5-41249404b3b4, f1_macro=0.5448
INFO: Scored pipeline id=95695c63-c766-4078-9de7-cc876bddcb07, f1_macro=0.5448
INFO: Found pipeline id=24eb7dd2-ee51-4b2d-b843-3050a5ef4674, time=0:08:19.138738, scoring...
INFO: Scored pipeline id=690c996c-e7df-44a4-a753-dc71e22a8b68, f1_macro=0.57038
INFO: Found pipeline id=fc5659e1-4064-4d11-8fb0-aab

INFO: Scoring completed for all pipelines!


In [4]:
automl.plot_leaderboard()

ranking,id,summary,f1_macro
1,435d1b64-b895-4b93-a140-96bdbf024071,"add_semantic_types.common, imputer.sklearn, tfidf_vectorizer.sklearn, one_hot_encoder.distilonehotencoder, gradient_boosting.sklearn",0.65594
2,3f1219a0-84f9-444d-80b2-e530c2d40dec,"add_semantic_types.common, imputer.sklearn, encoder.distiltextencoder, encoder.dsbox, pca_features.pcafeatures, gradient_boosting.sklearn",0.65587
3,b5f0ac3d-d26f-460e-9678-ad90166fe2cd,"add_semantic_types.common, imputer.sklearn, encoder.distiltextencoder, one_hot_encoder.distilonehotencoder, pca_features.pcafeatures, gradient_boosting.sklearn",0.65587
4,a0adaa79-5015-4537-8332-bf63033e3675,"add_semantic_types.common, imputer.sklearn, encoder.distiltextencoder, one_hot_encoder.distilonehotencoder, gradient_boosting.sklearn",0.64787
5,eed687a8-0221-4a04-b2ec-607c3a39ab46,"add_semantic_types.common, imputer.sklearn, encoder.distiltextencoder, one_hot_encoder.distilonehotencoder, pca_features.pcafeatures, random_forest.common",0.64737
6,1e60cc1c-0e36-485d-86e5-8126e42583ca,"add_semantic_types.common, imputer.sklearn, tfidf_vectorizer.sklearn, one_hot_encoder.distilonehotencoder, pca_features.pcafeatures, bagging.sklearn",0.64533
7,754bde19-e2d7-4838-82b4-989c2cfcb31c,"add_semantic_types.common, imputer.sklearn, tfidf_vectorizer.sklearn, one_hot_encoder.distilonehotencoder, pca_features.pcafeatures, gradient_boosting.sklearn",0.64376
8,800cc19d-e56b-418d-9f67-65e9d7249a14,"add_semantic_types.common, imputer.sklearn, tfidf_vectorizer.sklearn, encoder.dsbox, pca_features.pcafeatures, gradient_boosting.sklearn",0.64376
9,8fe800f9-b5c4-4330-8638-afc0181903ae,"add_semantic_types.common, imputer.sklearn, tfidf_vectorizer.sklearn, one_hot_encoder.distilonehotencoder, pca_features.pcafeatures, light_gbm.common",0.64092
10,e9909906-ba78-4c5e-bcfe-1850ec8234d8,"add_semantic_types.common, mean_imputation.dsbox, tfidf_vectorizer.sklearn, one_hot_encoder.distilonehotencoder, pca_features.pcafeatures, random_forest.common",0.6332


In [5]:
best_pipeline_id = automl.get_best_pipeline_id()
automl.score(best_pipeline_id, test_dataset)

('f1_macro', 0.61315)

In [6]:
pipelines_without_blacklist = automl.create_pipelineprofiler_inputs()

INFO: Inputs for PipelineProfiler created!


In [None]:
automl.plot_comparison_pipelines(precomputed_pipelines=pipelines_without_blacklist)

Blacklist 'FEATURE_SCALING' and 'FEATURE_SELECTION' types, and rerun using a shorter time bound

In [14]:
automl2 = AutoML(output_path)
automl2.search_pipelines(train_dataset, time_bound=10, target='Hall_of_Fame', metric='f1Macro', task_keywords=['classification', 'multiClass', 'tabular'], method='cross_validation')

INFO: Initializing AlphaD3M AutoML...
INFO: Starting process...
INFO: Connecting via gRPC to localhost:61650...
INFO: AlphaD3M AutoML initialized!
INFO: Found pipeline id=94bbc90a-47b3-4026-a4a1-1a53034f4450, time=0:01:08.766733, scoring...
INFO: Found pipeline id=52f1f6cc-e68e-4f7f-9af9-38dafede5f9b, time=0:01:20.938401, scoring...
INFO: Found pipeline id=8591447d-ff6f-4fad-a7ea-727e1ff81985, time=0:01:39.116916, scoring...
INFO: Scored pipeline id=52f1f6cc-e68e-4f7f-9af9-38dafede5f9b, f1_macro=0.40786
INFO: Found pipeline id=80b8f9ce-daef-46c4-b162-b269dcdeb868, time=0:01:54.324105, scoring...
INFO: Scored pipeline id=94bbc90a-47b3-4026-a4a1-1a53034f4450, f1_macro=0.64376
INFO: Found pipeline id=658adb0c-1f8f-4c00-a094-bd4bd6e1190e, time=0:02:09.546476, scoring...
INFO: Scored pipeline id=8591447d-ff6f-4fad-a7ea-727e1ff81985, f1_macro=0.64533
INFO: Scored pipeline id=80b8f9ce-daef-46c4-b162-b269dcdeb868, f1_macro=0.47896
INFO: Scored pipeline id=658adb0c-1f8f-4c00-a094-bd4bd6e1190e, 

After the pipeline search is complete, we can display the leaderboard:

In [20]:
automl2.plot_leaderboard()

ranking,id,summary,f1_macro
1,8591447d-ff6f-4fad-a7ea-727e1ff81985,"add_semantic_types.common, imputer.sklearn, tfidf_vectorizer.sklearn, one_hot_encoder.distilonehotencoder, pca_features.pcafeatures, bagging.sklearn",0.64533
2,94bbc90a-47b3-4026-a4a1-1a53034f4450,"add_semantic_types.common, imputer.sklearn, tfidf_vectorizer.sklearn, encoder.dsbox, pca_features.pcafeatures, gradient_boosting.sklearn",0.64376
3,100802de-3bf4-4b74-b514-e925a24f827d,"add_semantic_types.common, imputer.sklearn, tfidf_vectorizer.sklearn, one_hot_encoder.distilonehotencoder, pca_features.pcafeatures, gradient_boosting.sklearn",0.64376
4,d2aa8126-e846-49fb-8413-3b84f53c3706,"add_semantic_types.common, imputer.sklearn, tfidf_vectorizer.sklearn, pca_features.pcafeatures, gradient_boosting.sklearn",0.6355
5,647f64bf-df54-4f4d-be44-560c5484baa8,"add_semantic_types.common, mean_imputation.dsbox, tfidf_vectorizer.sklearn, encoder.dsbox, pca_features.pcafeatures, gradient_boosting.sklearn",0.62937
6,70388555-358e-483a-b0fa-4db57d09b5ab,"add_semantic_types.common, data_cleaning.datacleaning, xgboost_gbtree.common",0.62509
7,c78b642f-bb8d-433f-9332-6eacfc503f93,"add_semantic_types.common, data_cleaning.datacleaning, imputer.sklearn, xgboost_gbtree.common",0.62332
8,18aacd17-4eac-4fa5-83eb-c2fc05f235b9,"add_semantic_types.common, imputer.sklearn, xgboost_gbtree.common",0.62332
9,ad26057a-9f85-4c10-8bcb-2e820b6f66ea,"add_semantic_types.common, imputer.sklearn, corex_text.dsbox, pca_features.pcafeatures, gradient_boosting.sklearn",0.62129
10,bec1c8d6-4525-4f8f-a35f-7b85d01c8a16,"add_semantic_types.common, data_cleaning.datacleaning, mean_imputation.dsbox, xgboost_gbtree.common",0.62047


In [22]:
best_pipeline_id = automl2.get_best_pipeline_id()
automl2.score(best_pipeline_id, test_dataset)

('f1_macro', 0.72438)