## Solving Semi-Supervised Classification Tasks

First, import the class `AutoMLSemiSupervisedClassifier`

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

from alpha_automl import AutoMLSemiSupervisedClassifier

### Generating Pipelines for CSV Datasets

In this example, we are generating pipelines for a CSV dataset.

In [2]:
output_path = 'tmp/'
dataset = pd.read_csv('datasets/semi/learningData.csv')
target_col = "defects"

X = dataset.drop(columns=[target_col])
y = dataset[[target_col]]

X_train, X_test, y_train, y_test = train_test_split(X, y)

X_test = X_test[~pd.isnull(y_test['defects'])]
y_test = y_test[~pd.isnull(y_test['defects'])]
y_test.value_counts()

defects
False      466
True       110
dtype: int64

### Searching Pipelines

In [3]:
automl = AutoMLSemiSupervisedClassifier(output_path, time_bound=10, verbose=False,
                                        split_strategy_kwargs={'test_size':.2})

automl.fit(X_train, y_train)

INFO:alpha_automl.automl_api:Found pipeline, time=0:00:05, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9136242208370436
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
INFO:gluonts.mx.context:Using CPU
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
INFO:gluonts.mx.context:Using CPU
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.778272484416741
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7800534283170079
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:25, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.80

### Exploring Pipelines

After the pipeline search is complete, we can display the leaderboard:

In [4]:
automl.plot_leaderboard()

ranking,pipeline,f1_score
1,"SimpleImputer, MaxAbsScaler, SelfTrainingClassifier, RandomForestClassifier",1.0
2,"SimpleImputer, MaxAbsScaler, AutonBox, RandomForestClassifier",1.0
3,"SimpleImputer, MaxAbsScaler, SelfTrainingClassifier, XGBClassifier",1.0
4,"SimpleImputer, AutonBox, RandomForestClassifier",1.0
5,"SimpleImputer, MaxAbsScaler, SelfTrainingClassifier, GradientBoostingClassifier",1.0
6,"SimpleImputer, MaxAbsScaler, SelfTrainingClassifier, DecisionTreeClassifier",1.0
7,"SimpleImputer, RobustScaler, SelfTrainingClassifier, RandomForestClassifier",1.0
8,"SimpleImputer, StandardScaler, SelfTrainingClassifier, RandomForestClassifier",1.0
9,"SimpleImputer, RobustScaler, AutonBox, RandomForestClassifier",1.0
10,"SimpleImputer, StandardScaler, AutonBox, RandomForestClassifier",1.0


In order to explore the produced pipelines, we can use [PipelineProfiler](https://github.com/VIDA-NYU/PipelineVis). PipelineProfiler is a visualization that enables users to compare and explore the pipelines generated by the AlphaAutoML system.

After the pipeline search process is completed, we can use PipelineProfiler with:

In [None]:
automl.plot_comparison_pipelines()

### Testing Pipelines

Pipeline predictions are accessed with:

In [6]:
y_pred = automl.predict(X_test)
y_pred

array([[False],
       [True],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [True],
       [False],
       [False],
       [False],
       [False],
       [True],
       [False],
       [True],
       [False],
       [False],
       [False],
       [False],
       [True],
       [False],
       [False],
       [False],
       [False],
       [True],
       [False],
       [False],
       [False],
       [False],
       [False],
       [True],
       [True],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [True],
       [False],
       [True],
       [False],
       [False],
       [False],
       [True],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
   

In [7]:
automl.score(X_test, y_test)

INFO:alpha_automl.automl_api:Metric: f1_score, Score: 1.0


{'metric': 'f1_score', 'score': 1.0}