## Solving Semi-Supervised Classification Tasks

First, import the class `AutoMLSemiSupervisedClassifier`

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

from alpha_automl import AutoMLSemiSupervisedClassifier

### Generating Pipelines for CSV Datasets

In this example, we are generating pipelines for a CSV dataset.

In [2]:
output_path = 'tmp/'
dataset = pd.read_csv('datasets/semi/learningData.csv')
target_col = "defects"

X = dataset.drop(columns=[target_col])
y = dataset[[target_col]]

X_train, X_test, y_train, y_test = train_test_split(X, y)

X_test = X_test[~pd.isnull(y_test['defects'])]
y_test = y_test[~pd.isnull(y_test['defects'])]
y_test.value_counts()

defects
False      466
True       110
dtype: int64

### Searching Pipelines

In [3]:
automl = AutoMLSemiSupervisedClassifier(output_path, time_bound=10, verbose=False,
                                        split_strategy_kwargs={'test_size':.2})

automl.fit(X_train, y_train)

INFO:alpha_automl.automl_api:Found pipeline, time=0:00:04, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9786286731967943
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:05, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9884238646482636
INFO:gluonts.mx.context:Using CPU
DEBUG:matplotlib:matplotlib data path: /ext3/miniconda3/lib/python3.10/site-packages/matplotlib/mpl-data
DEBUG:matplotlib:CONFIGDIR=/home/yfw215/.config/matplotlib
DEBUG:matplotlib:interactive is False
DEBUG:matplotlib:platform is linux
DEBUG:matplotlib:CACHEDIR=/home/yfw215/.cache/matplotlib
DEBUG:matplotlib.font_manager:Using fontManager instance from /home/yfw215/.cache/matplotlib/fontlist-v330.json
INFO:gluonts.mx.context:Using CPU
DEBUG:matplotlib:matplotlib data path: /ext3/miniconda3/lib/python3.10/site-packages/matplotlib/mpl-data
DEBUG:matplotlib:CONFIGDIR=/home/yfw215/.config/matplotlib
DEBUG:matplotlib:interactive is False
DEBUG:matplotlib:platform is linux
DEBUG:m

Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback at 0x147d5e9f2b00>
Traceback (most recent call last):
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 400, in match_module_callback
    self._make_module_from_path(filepath)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 646, in get_version
    config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback a

INFO:alpha_automl.automl_api:Scored pipeline, score=0.9171861086375779
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7934105075690115
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:03, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7818343722172753
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:04, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.8806767586821014
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:05, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7889581478183437
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9269813000890472
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9919857524487978
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:23, scoring...
INFO:alpha_au

Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback at 0x147d5f5a9bd0>
Traceback (most recent call last):
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 400, in match_module_callback
    self._make_module_from_path(filepath)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 646, in get_version
    config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback a

INFO:alpha_automl.automl_api:Scored pipeline, score=0.9893143365983972
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:31, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9109528049866429
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:32, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9127337488869101
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:33, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9991095280498664
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:34, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7791629563668745
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:35, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.8023152270703473
INFO:alpha_automl.automl_api:Found pipeline, time=0:02:31, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9991095280498664
INFO:alpha_automl.automl_api:Found pipeline, time=0:02:31, scoring...
INFO:alpha_au

Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback at 0x147d5e9f2b00>
Traceback (most recent call last):
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 400, in match_module_callback
    self._make_module_from_path(filepath)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 646, in get_version
    config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback a

INFO:alpha_automl.automl_api:Scored pipeline, score=1.0
INFO:alpha_automl.automl_api:Found pipeline, time=0:02:43, scoring...


Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback at 0x147d5e24fbe0>
Traceback (most recent call last):
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 400, in match_module_callback
    self._make_module_from_path(filepath)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 646, in get_version
    config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback a

INFO:alpha_automl.automl_api:Scored pipeline, score=0.9145146927871772
INFO:alpha_automl.automl_api:Found pipeline, time=0:02:44, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9991095280498664
INFO:alpha_automl.automl_api:Found pipeline, time=0:02:49, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9991095280498664
INFO:alpha_automl.automl_api:Found pipeline, time=0:03:02, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9991095280498664
INFO:alpha_automl.automl_api:Found pipeline, time=0:03:03, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.8815672306322351
INFO:alpha_automl.automl_api:Found pipeline, time=0:03:06, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9456812110418522
INFO:alpha_automl.automl_api:Found pipeline, time=0:03:08, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6010685663401603
INFO:alpha_automl.automl_api:Found pipeline, time=0:03:08, scoring...
INFO:alpha_au

Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback at 0x147d7200d240>
Traceback (most recent call last):
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 400, in match_module_callback
    self._make_module_from_path(filepath)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 646, in get_version
    config = get_config().split()

AttributeError: 'NoneType' object has no attribute 'split'Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback a

INFO:alpha_automl.automl_api:Scored pipeline, score=0.9991095280498664
INFO:alpha_automl.automl_api:Found pipeline, time=0:03:09, scoring...



  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 400, in match_module_callback
Traceback (most recent call last):
    self._make_module_from_path(filepath)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 646, in get_version
    config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback at 0x147d7200d1b0>
Traceback (most recent call last):
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 400, in match_module_callback
    

INFO:alpha_automl.automl_api:Scored pipeline, score=0.9991095280498664


Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback at 0x147d7200d1b0>
Traceback (most recent call last):
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 400, in match_module_callback
    self._make_module_from_path(filepath)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)

  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 646, in get_version
    config = get_config().split()AttributeError: 'NoneType' object has no attribute 'split'


INFO:alpha_automl.automl_api:Found pipeline, time=0:03:10, scoring...


Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback at 0x147d5e24fc70>
Traceback (most recent call last):
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 400, in match_module_callback
    self._make_module_from_path(filepath)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "/ext3/miniconda3/lib/python3.10/site-packages/threadpoolctl.py", line 646, in get_version
    config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback a

INFO:alpha_automl.automl_api:Scored pipeline, score=0.8121104185218165
INFO:alpha_automl.automl_api:Found pipeline, time=0:03:10, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9991095280498664
INFO:alpha_automl.automl_api:Found pipeline, time=0:03:12, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=1.0
INFO:alpha_automl.automl_api:Found pipeline, time=0:03:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7836153161175423
INFO:alpha_automl.automl_api:Found pipeline, time=0:03:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.9991095280498664
INFO:alpha_automl.automl_api:Found 55 pipelines


### Exploring Pipelines

After the pipeline search is complete, we can display the leaderboard:

In [4]:
automl.plot_leaderboard()

ranking,pipeline,f1_score
1,"SimpleImputer, MaxAbsScaler, SelfTrainingClassifier",1.0
2,"SimpleImputer, MaxAbsScaler, SelfTrainingClassifier",1.0
3,"SimpleImputer, MaxAbsScaler, SelfTrainingClassifier",1.0
4,"SimpleImputer, MaxAbsScaler, SelfTrainingClassifier",1.0
5,"SimpleImputer, StandardScaler, SelfTrainingClassifier",1.0
6,"SimpleImputer, StandardScaler, SelfTrainingClassifier",1.0
7,"SimpleImputer, RobustScaler, SelfTrainingClassifier",1.0
8,"SimpleImputer, MaxAbsScaler, SelfTrainingClassifier",0.999
9,"SimpleImputer, StandardScaler, SelfTrainingClassifier",0.999
10,"SimpleImputer, StandardScaler, SelfTrainingClassifier",0.999


In order to explore the produced pipelines, we can use [PipelineProfiler](https://github.com/VIDA-NYU/PipelineVis). PipelineProfiler is a visualization that enables users to compare and explore the pipelines generated by the AlphaAutoML system.

After the pipeline search process is completed, we can use PipelineProfiler with:

In [None]:
automl.plot_comparison_pipelines()

### Testing Pipelines

Pipeline predictions are accessed with:

In [None]:
y_pred = automl.predict(X_test)
y_pred

In [6]:
automl.score(X_test, y_test)

INFO:alpha_automl.automl_api:Metric: f1_score, Score: 1.0


{'metric': 'f1_score', 'score': 1.0}