## Adding HuggingFace Primitives

First, import the class `AutoMLClassifier`

In [1]:
from alpha_automl import AutoMLClassifier
import pandas as pd

### Generating Pipelines for CSV Datasets

In this example, we are generating pipelines for a CSV dataset. The sentiment dataset is used for this example.

In [2]:
output_path = 'tmp/'
train_dataset = pd.read_csv('datasets/sentiment/train_data.csv')
test_dataset = pd.read_csv('datasets/sentiment/test_data.csv')

Removing the target column from the features for the train dataset

In [3]:
target_column = 'sentiment'
X_train = train_dataset.drop(columns=[target_column])
X_train

Unnamed: 0,text,Time of Tweet,Age of User,Country
0,"I`d have responded, if I were going",morning,0-20,Afghanistan
1,Sooo SAD I will miss you here in San Diego!!!,noon,21-30,Albania
2,my boss is bullying me...,night,31-45,Algeria
3,what interview! leave me alone,morning,46-60,Andorra
4,"Sons of ****, why couldn`t they put them on t...",noon,60-70,Angola
...,...,...,...,...
27476,wish we could come see u on Denver husband l...,night,31-45,Ghana
27477,I`ve wondered about rake to. The client has ...,morning,46-60,Greece
27478,Yay good for both of you. Enjoy the break - y...,noon,60-70,Grenada
27479,But it was worth it ****.,night,70-100,Guatemala


Selecting the target column for the train dataset

In [4]:
y_train = train_dataset[[target_column]]
y_train

Unnamed: 0,sentiment
0,neutral
1,negative
2,negative
3,negative
4,negative
...,...
27476,negative
27477,negative
27478,positive
27479,positive


### Adding New Primitives into AlphaAutoML's Search Space

In [5]:
automl = AutoMLClassifier(output_path, time_bound=10)

In [6]:
# Download the fasttext module if not already downloaded
import os
import fasttext
import fasttext.util


fasttext.util.download_model('en', if_exists='ignore')  # English
fasttext_model_path = os.getcwd() + '/cc.en.300.bin' # change this accordingly to the path where the model is downloaded



Downloading https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.en.300.bin.gz

In [7]:
'''
Fasttext Module and adding this as a primitive to automl

'''
from alpha_automl.wrapper_primitives.fasttext import FastTextEmbedder 
fasttext_embedder = FastTextEmbedder(fasttext_model_path)
automl.add_primitives([(fasttext_embedder, 'TEXT_ENCODER')])

### Searching  Pipelines

In [8]:
automl.fit(X_train, y_train)

INFO:alpha_automl.automl_api:Found pipeline, time=0:00:04, scoring...




INFO:alpha_automl.automl_api:Scored pipeline, score=0.4252656090816475
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:05, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4252656090816475
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:06, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.31596565274341437




INFO:alpha_automl.automl_api:Found pipeline, time=0:00:38, scoring...




INFO:alpha_automl.automl_api:Scored pipeline, score=0.36646776306214524
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:26, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4217726677339543
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:41, scoring...




INFO:alpha_automl.automl_api:Scored pipeline, score=0.4045990394411294
INFO:alpha_automl.automl_api:Found pipeline, time=0:02:51, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6277106680250327
INFO:alpha_automl.automl_api:Found pipeline, time=0:03:02, scoring...




INFO:alpha_automl.automl_api:Scored pipeline, score=0.43559889390190654
INFO:alpha_automl.automl_api:Found pipeline, time=0:04:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.31596565274341437
INFO:alpha_automl.automl_api:Found pipeline, time=0:04:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6186872362101586
INFO:alpha_automl.automl_api:Found pipeline, time=0:04:18, scoring...




DEBUG:matplotlib:matplotlib data path: /ext3/miniconda3/lib/python3.10/site-packages/matplotlib/mpl-data
DEBUG:matplotlib:CONFIGDIR=/home/yfw215/.config/matplotlib
DEBUG:matplotlib:interactive is False
DEBUG:matplotlib:platform is linux




INFO:alpha_automl.automl_api:Scored pipeline, score=0.4102750691311308
INFO:alpha_automl.automl_api:Found pipeline, time=0:05:17, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5280163003929559
INFO:alpha_automl.automl_api:Found pipeline, time=0:05:17, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4252656090816475
INFO:alpha_automl.automl_api:Found pipeline, time=0:05:18, scoring...




INFO:alpha_automl.automl_api:Scored pipeline, score=0.36675884150778637
INFO:alpha_automl.automl_api:Found pipeline, time=0:06:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4252656090816475
INFO:alpha_automl.automl_api:Found pipeline, time=0:06:17, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6787949352350459
DEBUG:matplotlib:matplotlib data path: /ext3/miniconda3/lib/python3.10/site-packages/matplotlib/mpl-data
DEBUG:matplotlib:CONFIGDIR=/home/yfw215/.config/matplotlib
DEBUG:matplotlib:interactive is False
DEBUG:matplotlib:platform is linux
INFO:alpha_automl.automl_api:Found pipeline, time=0:06:19, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4252656090816475
INFO:alpha_automl.automl_api:Found pipeline, time=0:06:31, scoring...




INFO:alpha_automl.automl_api:Scored pipeline, score=0.4111483044680541
INFO:alpha_automl.automl_api:Found pipeline, time=0:07:20, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6144665987483627
INFO:alpha_automl.automl_api:Found pipeline, time=0:07:21, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4252656090816475
INFO:alpha_automl.automl_api:Found pipeline, time=0:07:23, scoring...




INFO:alpha_automl.automl_api:Scored pipeline, score=0.4142046281472857
INFO:alpha_automl.automl_api:Found pipeline, time=0:08:19, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4252656090816475
INFO:alpha_automl.automl_api:Found pipeline, time=0:08:19, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4252656090816475
INFO:alpha_automl.automl_api:Found pipeline, time=0:08:20, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.605006549265027
INFO:alpha_automl.automl_api:Found pipeline, time=0:08:27, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.3169844273031582
INFO:alpha_automl.automl_api:Found pipeline, time=0:08:27, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6453209139863193
INFO:alpha_automl.automl_api:Found pipeline, time=0:08:28, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4252656090816475
INFO:alpha_automl.automl_api:Found pipeline, time=0:08:28, scoring...
INFO:alpha_aut



INFO:alpha_automl.automl_api:Scored pipeline, score=0.5613447824188619
INFO:alpha_automl.automl_api:Found pipeline, time=0:09:53, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4222092854024159
INFO:alpha_automl.automl_api:Found pipeline, time=0:09:53, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6917479260660748
INFO:alpha_automl.automl_api:Found pipeline, time=0:09:55, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5268519866103915
INFO:alpha_automl.automl_api:Found pipeline, time=0:09:56, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.42337359918498035
INFO:alpha_automl.automl_api:Found 34 pipelines


After the pipeline search is complete, we can display the leaderboard:

### Exploring Pipelines

In [9]:
automl.plot_leaderboard()

DEBUG:matplotlib:CACHEDIR=/home/yfw215/.cache/matplotlib
DEBUG:matplotlib.font_manager:font search path [PosixPath('/ext3/miniconda3/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/ttf'), PosixPath('/ext3/miniconda3/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/afm'), PosixPath('/ext3/miniconda3/lib/python3.10/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts')]
INFO:matplotlib.font_manager:generated new fontManager


ranking,pipeline,accuracy_score
1,"SimpleImputer, ColumnTransformer, CountVectorizer, OneHotEncoder, MaxAbsScaler, SelectPercentile, LogisticRegression",0.692
2,"SimpleImputer, ColumnTransformer, CountVectorizer, OneHotEncoder, MaxAbsScaler, LogisticRegression",0.679
3,"SimpleImputer, ColumnTransformer, CountVectorizer, OneHotEncoder, MaxAbsScaler, SelectPercentile, PassiveAggressiveClassifier",0.645
4,"SimpleImputer, ColumnTransformer, CountVectorizer, OneHotEncoder, MaxAbsScaler, DecisionTreeClassifier",0.628
5,"SimpleImputer, ColumnTransformer, CountVectorizer, OneHotEncoder, MaxAbsScaler, SelectPercentile, DecisionTreeClassifier",0.619
6,"SimpleImputer, ColumnTransformer, CountVectorizer, OneHotEncoder, MaxAbsScaler, PassiveAggressiveClassifier",0.614
7,"SimpleImputer, ColumnTransformer, TfidfVectorizer, OneHotEncoder, MaxAbsScaler, DecisionTreeClassifier",0.605
8,"SimpleImputer, ColumnTransformer, FastTextEmbedder, OneHotEncoder, MaxAbsScaler, PassiveAggressiveClassifier",0.561
9,"SimpleImputer, ColumnTransformer, CountVectorizer, OneHotEncoder, MaxAbsScaler, SelectKBest, DecisionTreeClassifier",0.528
10,"SimpleImputer, ColumnTransformer, CountVectorizer, OneHotEncoder, MaxAbsScaler, SelectKBest, LogisticRegression",0.527


In [None]:
automl.plot_comparison_pipelines()