## In this notebook, we will demostrate how to run some local machine learning experiments and collect the performance measurements. These measurements will be later used to train the IRT models.

In [1]:
import sys; sys.path.insert(0, '..')
import numpy
import scipy.stats
import sklearn.datasets
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
import atml.measure
import atml.exp

## To set up the machine learning experiments, we need to first define the datasets and models. This toolbox requires the datasets and models to be indexed by python dictionaries.

In [2]:
data_dict = {0: 'iris',
             1: 'digits',
             2: 'wine'}

In [3]:
model_dict = {0: 'lr',
              1: 'rf',
              2: 'nb'}

## Furthermore, we also need to provide two functions to load the datasets and declare the models. We assume the datasets to be represented as numpy.ndarray, with x as features, y as target. The model should have the same format as sklearn.predictor, with fit() as the training function, and predict_proba() as the function to predict probability vectors.

In [4]:
def get_data(ref):
    if ref == 'iris':
        x, y = sklearn.datasets.load_iris(return_X_y=True)
    elif ref == 'digits':
        x, y = sklearn.datasets.load_digits(return_X_y=True)
    elif ref == 'wine':
        x, y = sklearn.datasets.load_wine(return_X_y=True)
    return x, y

def get_model(ref):
    if ref == 'lr':
        mdl = LogisticRegression()
    elif ref == 'rf':
        mdl = RandomForestClassifier()
    elif ref == 'nb':
        mdl = GaussianNB()
    return mdl

## For this example, we use the built-in measure of Brier score.

In [5]:
measure = atml.measure.BS()

## Now we can use the built-in function to perform an exhaustive testing, that is, to test all combinations of different datasets and models, and collect the corresponding performance measurements.

In [6]:
res = atml.exp.get_exhaustive_testing(data_dict, get_data, model_dict, get_model, measure)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


## We can check the results with Pandas dataframe.

In [7]:
res

Unnamed: 0,data_idx,data_ref,model_idx,model_ref,Brier score
0,0,iris,0,lr,0.071075
1,0,iris,1,rf,0.076325
2,0,iris,2,nb,0.013939
3,1,digits,0,lr,0.076327
4,1,digits,1,rf,0.131119
5,1,digits,2,nb,0.318727
6,2,wine,0,lr,0.105201
7,2,wine,1,rf,0.094307
8,2,wine,2,nb,0.05434


## Save the results (to be used later for IRT training)

In [8]:
res.to_csv('./res_base.csv')