## AutoPrognosis API Tutorial

A demonstration for AP functionality and operation

This tutorial shows how to use [Autoprognosis](https://arxiv.org/abs/1802.07207). We are using the UCI Spam dataset.

See [installation instructions](../../doc/install.md) to install the dependencies.

Load dataset and show the first five samples:

In [19]:
import pandas as pd
import initpath_ap
initpath_ap.init_sys_path()
import utilmlab

from sklearn.datasets import load_breast_cancer

df = load_breast_cancer()
X_ = pd.DataFrame(df.data)
Y_ = pd.DataFrame(df.target)

## Import the AutoPrognosis library

In [20]:
import model

## Run the model with few iterations

In [None]:
metric = 'aucprc'
acquisition_type = 'MPI' # default and prefered is LCB but this generates excessive warnings, MPI is a good compromise.
AP_mdl   = model.AutoPrognosis_Classifier(
    metric=metric, CV=5, num_iter=3, kernel_freq=100, ensemble=True,
    ensemble_size=3, Gibbs_iter=100, burn_in=50, num_components=3, 
    acquisition_type=acquisition_type)

AP_mdl.fit(X_, Y_)

[ mean, Gradient Boosting ]
[ most_frequent, MultinomialNaiveBayes ]
[ median, LinearSVM ]


Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"


[ mean, XGBoost ]
[ mean, BernoullinNaiveBayes ]
[ most_frequent, LinearSVM ]


Iteration number: 1 3s (3s) (8s), Current pipelines:  [[[ median, XGBoost ]]], [[[ mean, BernoullinNaiveBayes ]]], [[[ median, LinearSVM ]]], BO objective: -0.9891936728238395


[ median, XGBoost ]
[ median, BernoullinNaiveBayes ]
[ median, QDA ]


Iteration number: 2 5s (3s) (8s), Current pipelines:  [[[ mean, XGBoost ]]], [[[ mean, BernoullinNaiveBayes ]]], [[[ median, QDA ]]], BO objective: -0.9999999999999997


[ missForest, Random Forest ]
[ mean, Bagging ]


## Computing model predictions

##### ~~~First element in the output is the predictions of a single model, the second element is the prediction of the ensemble~~~

In [None]:
AP_mdl.predict(X_)

## Compute performance via multi-fold cross-validation

In [None]:
model.evaluate_ens(X_, Y_, AP_mdl, n_folds=5, visualize=True)

## Visualize data...

In [None]:
AP_mdl.visualize_data(X_)

## Visualize the model...

In [None]:
AP_mdl.APReport()