# mfeat-morphological

## Context

Multiple Features Dataset: Morphological  
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same original character. 200 instances per class (for a total of 2,000 instances) have been digitized in binary images. 

In this dataset, these digits are represented in terms of 6 morphological features. 

### Attribute Information  
The meaning of the features is mostly unknown. They are never named in the original files, and the paper only talks about 'morphological features, such as the number of endpoints'.

In [None]:
from xautoml.util.datasets import openml_task

X_train, y_train = openml_task(18, 0, train=True)
X_train

## Start the Model Building

You load the data set in an AutoML tool you have found on the internet, to create a predictive model. After starting the optimization, the AutoML tool tests various possible models and evaluates how good each candidate is. In the meantime you have to wait for the program to finish its optimization.

In [None]:
import pickle
from dswizard.optimizers.bandit_learners import PseudoBandit
from dswizard.optimizers.structure_generators.mcts import TransferLearning, MCTS
from dswizard.optimizers.config_generators import Hyperopt
from dswizard.core.master import Master
from dswizard.core.model import Dataset
from dswizard.util import util

util.setup_logging('/opt/xautoml/dswizard/output/mcts/log.txt')

ds = Dataset(X_train.values, y_train.values, task=18, metric='accuracy', feature_names=X_train.columns)
master = Master(
    ds=ds,
    working_directory='/opt/xautoml/dswizard/output/mcts/',
    n_workers=1,
    model='_rf_complete.pkl',

    wallclock_limit=600,
    cutoff=10,
    pre_sample=False,

    config_generator_class=Hyperopt,
    structure_generator_class=MCTS,
    structure_generator_kwargs={'policy': TransferLearning},
    bandit_learner_class=PseudoBandit
)

pipeline, run_history, ensemble = master.optimize()

with open('/opt/xautoml/dswizard/output/mcts/dswizard.pkl', 'wb') as f:
    pickle.dump((run_history, ensemble), f)


In [None]:
import pickle
import joblib

with open(f'/opt/xautoml/dswizard/output/mcts/dswizard.pkl', 'rb') as f:
    run_history, ensemble = pickle.load(f)

with open(f'/opt/xautoml/dswizard/output/mcts/incumbent.pkl', 'rb') as f:
    pipeline = joblib.load(f)

After waiting for 10 minutes, you are presented with the following results:

### The score of the Final Model

Internally, the AutoML tool uses a measure to determine how good a candidate is, for example the number of correct predictions (accuracy). After the optimization, you want to test how good the model actually is before using it with patients. Therefore, you have hidden a part of the data set which you will now use to test how good the best model actually is:

In [None]:
from sklearn.metrics import accuracy_score

X_test, y_test = X_train, y_train = openml_task(18, 0, test=True)

predictions = ensemble.predict(X_test.values)
accuracy_score(y_test, predictions)

Meaning, that the generated model is able to predict that many new patients, it has never seen before, correctly.


### View the Models found by dswizard

Besides the raw performance, the tool also tells you which the best models are

In [None]:
pipeline

With this information you are good to go and can decide if you actually want to use the generated model.

## Load the Same Results in XAutoML

In [None]:
from xautoml.main import XAutoML
from xautoml.adapter import import_dswizard
from xautoml.util.datasets import openml_task
import pickle

with open(f'/opt/xautoml/dswizard/output/mcts/dswizard.pkl', 'rb') as f:
    run_history, ensemble = pickle.load(f)

X_test, y_test = openml_task(18, 0, test=True)

rh = import_dswizard(run_history, ensemble)
main = XAutoML(rh, X_test, y_test)
main

In [None]:
main.explain(include={'overview', 'candidate:domain', 'ensemble'})

In [None]:
main.explain_domain(rank=0, exclude={'candidate:domain:performance'})