# Introduction

This tutorial introduces the basic Auto-PyTorch API together with the classes for featurized and image data.
So far, Auto-PyTorch covers classification and regression on featurized data as well as classification on image data.
For installing Auto-PyTorch, please refer to the github page.

# API

There are classes for featurized tasks (classification, multi-label classification, regression) and image tasks (classification). You can import them via:

In [1]:
from autoPyTorch import (AutoNetClassification,
                         AutoNetMultilabel,
                         AutoNetRegression,
                         AutoNetImageClassification,
                         AutoNetImageClassificationMultipleDatasets)

In [2]:
import numpy as np
import openml
import json

Upon initialization of a class, you can specify its configuration. The *config_preset* allows to constrain the search space to one of *tiny_cs, medium_cs* or *full_cs*. These presets can be seen in *core/presets/*.

In [3]:
autonet = AutoNetClassification(config_preset="full_cs", result_logger_dir="logs/")

Here are some useful methods provided by the API:

In [4]:
# Get the current configuration as dict
current_configuration = autonet.get_current_autonet_config()

# Get the ConfigSpace object with all hyperparameters, conditions, default values and default ranges
hyperparameter_search_space = autonet.get_hyperparameter_search_space()

# Print all possible configuration options
autonet.print_help()

Configure AutoNet with the following keyword arguments.
Pass these arguments to either the constructor or fit().

name                                default                  choices                                  type                                     
additional_logs                     []                       []                                       <class 'str'>                            
-----------------------------------------------------------------------------------------------------------------------------------------------
additional_metrics                  []                       [accuracy,                               <class 'str'>                            
                                                              auc_metric,                                                                      
                                                              pac_metric,                                                                      
                      

The most important methods for using Auto-PyTorch are **fit**, **refit**, **score** and **predict**.

**fit** is used to search for a configuration:

In [5]:
import numpy as np
import openml
import json

# Get some data from an openml task
task = openml.tasks.get_task(task_id=32)
X, y = task.get_X_and_y()
ind_train, ind_test = task.get_train_test_split_indices()
X_train, Y_train = X[ind_train], y[ind_train]
X_test, Y_test = X[ind_test], y[ind_test]

In [None]:
# Search for a configuration for 300 seconds and with 60-120 s time for fitting
# (use log_level="info" or log_level="debug" for more detailed output)
autonet = AutoNetClassification(config_preset="full_cs", result_logger_dir="logs/")
results_fit = autonet.fit(X_train=X_train,
                          Y_train=Y_train,
                          validation_split=0.3,
                          max_runtime=300,
                          min_budget=60,
                          max_budget=120)

# Save json
with open("logs/results_fit.json", "w") as file:
    json.dump(results_fit, file)

**refit** allows you to fit a configuration of your choice for a defined time:

In [None]:
# Create an autonet, use tensorboard during fitting
autonet_config = {
    "result_logger_dir" : "logs/",
    "budget_type" : "epochs",
    "log_level" : "info", 
    "use_tensorboard_logger" : True
    }
autonet = AutoNetClassification(**autonet_config)

# This samples a random hyperparameter configuration as an example
hyperparameter_config = autonet.get_hyperparameter_search_space().sample_configuration().get_dictionary()

# Refit with sampled hyperparameter config for 10 epochs
results_refit = autonet.refit(X_train=X_train,
                              Y_train=Y_train,
                              X_valid=X_test,
                              Y_valid=Y_test,
                              hyperparameter_config=hyperparameter_config,
                              autonet_config=autonet.get_current_autonet_config(),
                              budget=10)

# Save json
with open("logs/results_refit.json", "w") as file:
    json.dump(results_refit, file)

**pred** returns the predictions of the incumbent model. **score** can be used to evaluate the model on a test set. 

In [None]:
score = autonet.score(X_test=X_test, Y_test=Y_test)
pred = autonet.predict(X=X_test)

Finall, you can also get the incumbent model as PyTorch Sequential model via

In [None]:
pytorch_model = autonet.get_pytorch_model()
print(pytorch_model)

# Featurized Data

All classes for featurized data (*AutoNetClassification*, *AutoNetMultilabel*, *AutoNetRegression*) can be used as in the example above. The only difference is the type of labels they accept.

# Image Data

Auto-PyTorch provides two classes for image data. *autonet_image_classification* can be used for classification for images. The *autonet_multi_image_classification* class allows to search for configurations for image classification across multiple datasets. This means Auto-PyTorch will try to choose a configuration that works well on all given datasets.

In [None]:
# Load classes
autonet_image_classification = AutoNetImageClassification(config_preset="full_cs", result_logger_dir="logs/")
autonet_multi_image_classification = AutoNetImageClassificationMultipleDatasets(config_preset="tiny_cs", result_logger_dir="logs/")

For passing your image data to fit, your have two options:

I) Via path to a comma-separated value file, which contains the paths to the images and the image labels (note header is assumed to be None):

In [None]:
import os

csv_dir = os.path.abspath("../../datasets/example.csv")

X_train = np.array([csv_dir])
Y_train = np.array([0])

II) directly passing the paths to the images and the labels

In [None]:
import pandas as pd

df = pd.read_csv(csv_dir, header=None)
X_train = df.values[:,0]
Y_train = df.values[:,1]

Make sure you specify *image_root_folders* if the paths to the images are not specified from your current working directory. You can also specify *images_shape* to up- or downscale images.

Using the flag *save_checkpoints=True* will save checkpoints to a specified directory:

In [None]:
autonet_image_classification.fit(X_train=X_train,
                                 Y_train=Y_train,
                                 images_shape=[3,32,32],
                                 min_budget=100,
                                 max_budget=200,
                                 max_runtime=400,
                                 save_checkpoints=True,
                                 images_root_folders=[os.path.abspath("../../datasets/example.csv")])

Auto-PyTorch also supports some common datasets. By passing a comma-separated value file with just one line, e.g. "CIFAR10, 0" and specifying *default_dataset_download_dir* it will automatically download the data and use it for searching. Supported datasets are CIFAR10, CIFAR100, SVHN and MNIST.

In [None]:
path_to_cifar_csv = os.path.abspath("../../datasets/CIFAR10.csv")

autonet_image_classification.fit(X_train=np.array([path_to_cifar_csv]),
                                 Y_train=np.array([0]),
                                 min_budget=900,
                                 max_budget=1200,
                                 max_runtime=3000,
                                 default_dataset_download_dir="./datasets",
                                 images_root_folders=["./datasets"])

For searching across multiple datasets, pass multiple csv files to the corresponding Auto-PyTorch class. Make sure your specify *images_root_folders* for each of them.

In [None]:
autonet_multi_image_classification.fit(X_train=np.array([path_to_cifar_csv, csv_dir]),
                                       Y_train=np.array([0]),
                                       min_budget=1500,
                                       max_budget=2000,
                                       max_runtime=4000,
                                       default_dataset_download_dir="./datasets",
                                       images_root_folders=["./datasets", "./datasets"],
                                       log_level="info")