## Adding HuggingFace Primitives

First, import the class `AutoMLClassifier`

In [1]:
from alpha_automl import AutoMLClassifier
import pandas as pd

### Generating Pipelines for CSV Datasets

In this example, we are generating pipelines for a CSV dataset. The sentiment dataset is used for this example.

In [2]:
output_path = 'tmp/'
train_dataset = pd.read_csv('datasets/sentiment/train_data.csv')
test_dataset = pd.read_csv('datasets/sentiment/test_data.csv')

Removing the target column from the features for the train dataset

In [3]:
target_column = 'sentiment'
X_train = train_dataset.drop(columns=[target_column])
X_train

Unnamed: 0,text,Time of Tweet,Age of User,Country
0,"I`d have responded, if I were going",morning,0-20,Afghanistan
1,Sooo SAD I will miss you here in San Diego!!!,noon,21-30,Albania
2,my boss is bullying me...,night,31-45,Algeria
3,what interview! leave me alone,morning,46-60,Andorra
4,"Sons of ****, why couldn`t they put them on t...",noon,60-70,Angola
...,...,...,...,...
27476,wish we could come see u on Denver husband l...,night,31-45,Ghana
27477,I`ve wondered about rake to. The client has ...,morning,46-60,Greece
27478,Yay good for both of you. Enjoy the break - y...,noon,60-70,Grenada
27479,But it was worth it ****.,night,70-100,Guatemala


Selecting the target column for the train dataset

In [4]:
y_train = train_dataset[[target_column]]
y_train

Unnamed: 0,sentiment
0,neutral
1,negative
2,negative
3,negative
4,negative
...,...
27476,negative
27477,negative
27478,positive
27479,positive


### Adding New Primitives into AlphaAutoML's Search Space

In [5]:
automl = AutoMLClassifier(output_path, time_bound=10)

In [6]:
# Download the fasttext module if not already downloaded
import os
import fasttext
import fasttext.util


fasttext.util.download_model('en', if_exists='ignore')  # English
fasttext_model_path = os.getcwd() + '/cc.en.300.bin' # change this accordingly to the path where the model is downloaded



In [7]:
'''
Fasttext Module and adding this as a primitive to automl

'''
from alpha_automl.wrapper_primitives.fasttext import FastTextEmbedder 
fasttext_embedder = FastTextEmbedder(fasttext_model_path)
automl.add_primitives([(fasttext_embedder, 'TEXT_ENCODER')])

### Searching  Pipelines

In [None]:
automl.fit(X_train, y_train)

INFO:alpha_automl.automl_api:Found pipeline, time=0:00:03, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4252656090816475
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:04, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4252656090816475
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:05, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.31596565274341437
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:25, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.36646776306214524
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:00, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4217726677339543
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4045990394411294
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:36, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6277106680250327
INFO:alpha_

After the pipeline search is complete, we can display the leaderboard:

### Exploring Pipelines

In [None]:
automl.plot_leaderboard()

In [None]:
automl.plot_comparison_pipelines()