In [24]:
%load_ext autoreload
%autoreload 0

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Introduction to the OpenMLResultExtractor
======================

The OpenMLResultExtractor provides a Pandas DataFrame built for different flows and tasks (tasks that are constructed on the flows). 
Each value can be a run, a list of runs or it can be empty. The DataFrame can be built based on different restrictions on flows, tasks or both combined.

The result extractor can be initiliazed in multiple ways:

* With non keyworded arguments which represent flow ids
* With keyworded arguments which represent tasks restrictions
* Non keyworded and keyworded arguments combined.
* Without any arguments, in which case all flows and all tasks created on them will be considered.

Restricting the flows considered
-------------------------------------

To limit the results by certain flows, you have to initialize the result extractor with unique flow ids, given as positional arguments. The package provides a helper function which returns flow ids given an arbitrary number of string flow identifiers.


A flow identifier can be a flow name eg. **'mlr.classif.svm'** or it can be a flow name combined with a flow version **'mlr.classif.svm_6'**, the later is a **unique** flow identifier.

A list of flows can be found at https://www.openml.org/search?type=flow and it 
can be sorted according to your needs.


In [30]:
# Covering 2 simple use cases of the helper function
from pprint import pprint

from src.util import get_flow_ids

# There are 10 different flow versions 
# for the svm algorithm.
print("Providing only a flow name:")
flow_ids = get_flow_ids('mlr.classif.svm')
pprint(flow_ids)

# A unique flow identifier.
print("Providing a flow name and a version:")
flow_ids = get_flow_ids('mlr.classif.svm_6')
pprint(flow_ids)

# Providing multiple arguments
print("Providing multiple flows:")
flow_ids = get_flow_ids('mlr.classif.svm', 'weka.RandomForest_5')
pprint(flow_ids)

Providing only a flow name:
{5891, 4102, 6599, 4141, 6669, 5969, 6322, 5524, 5527, 4319}
Providing a flow name and a version:
{5527}
Providing multiple flows:
{5891, 4102, 6599, 4141, 6669, 5969, 6322, 5524, 1079, 5527, 4319}


In [31]:
from src.result_extractor import ResultExtractor

# Calling the Result extracter with the flow ids
result_extracter = ResultExtractor(*flow_ids)
pprint(result_extracter.results)

                                                     1079       5524  \
1       {385697, 148578, 348327, 361032, 475240, 31969...        NaN   
2       {326273, 385698, 365763, 475255, 355976, 47524...        NaN   
3       {185440, 374561, 385699, 355973, 348330, 36103...  {1847845}   
4       {355972, 385700, 439863, 319693, 463565, 45691...        NaN   
5       {291716, 326245, 385701, 439864, 319694, 46356...        NaN   
6       {385702, 185544, 356011, 319695, 439865, 46356...        NaN   
7       {385703, 348329, 319696, 463568, 339925, 29176...        NaN   
9       {456921, 361033, 385705, 319698, 463570, 18543...        NaN   
10      {326241, 385706, 319699, 330739, 463571, 18543...        NaN   
11      {291770, 374560, 326244, 185444, 385707, 31970...        NaN   
12      {355978, 279691, 326251, 385708, 348334, 31970...        NaN   
13      {439872, 326243, 185446, 355975, 385709, 33083...        NaN   
14      {439873, 233251, 185449, 463575, 330860, 32625...       

In [26]:
from pprint import pprint

from src.operations import get_tasks_by_measure
from src.util import get_flow_ids
from src.result_extractor import ResultExtractor


flow_ids = get_flow_ids('mlr.classif.ranger_8')
result_extracter = ResultExtractor(*flow_ids)
results = result_extracter.results
pprint(get_tasks_by_measure(results))

       accuracy
3      0.985294
3950   0.999758
9889   0.980441
9914   0.982434
9946   0.958993
9952   0.907476
9957   0.864455
9967   0.994333
9970   0.573432
9971   0.694683
9976   0.735769
9977   0.967968
9978   0.945146
9980   0.914815
9983   0.931442
10093  0.992711
10101  0.770053
14951  0.931442
14952  0.966893
14965  0.906815
14966  0.806452
14971  0.833108
