# Axolotl CSV manipulation [Binary Classification].

In this example, we are showcasing different components of the system.
- Loading syntethic data for a univariate regression task.
- Easy use of the backend.
- Use of simple interface for search predefined method.
- Exploring searched pipelines.

## Import multiple utils we will be using

In [1]:
import os
from pprint import pprint
import pandas as pd
from sklearn.datasets import make_regression

from d3m import container
from d3m.metadata.pipeline import Pipeline

from axolotl.utils import data_problem, pipeline as pipeline_utils
from axolotl.backend.ray import RayRunner
from axolotl.algorithms.random_search import RandomSearch

# init runner
backend = RayRunner(random_seed=42, volumes_dir=None, n_workers=3)

2020-07-12 15:23:25,435	INFO resource_spec.py:212 -- Starting Ray with 4.39 GiB memory available for workers and up to 2.2 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-07-12 15:23:25,965	INFO services.py:1170 -- View the Ray dashboard at localhost:8265


### Load csv file and transform it as dataset

In [2]:
table_path = os.path.join('..', 'tests', 'data', 'datasets', 'iris_dataset_1', 'tables', 'learningData.csv')
df = pd.read_csv(table_path)
dataset, problem_description = data_problem.generate_dataset_problem(df, task='binary_classification', target_index=5)

### Create an instance of the search and fit with the input_data.

In [3]:
# The method fit search for the best pipeline based on the time butget and fit the best pipeline based on the rank with the input_data.
search = RandomSearch(problem_description=problem_description, backend=backend)

In [4]:
fitted_pipeline, fitted_pipelineine_result = search.search_fit(input_data=[dataset], time_limit=30)

Current trial is failed. Error: [StepFailedError('Step 7 for pipeline 47ec5c86-46b8-4dee-9562-1e5ebc3d0824 failed.',)]
Current trial is failed. Error: [StepFailedError('Step 7 for pipeline 64da5190-c2ee-4b8e-abef-697b54cfa32b failed.',)]
Current trial is failed. Error: [StepFailedError('Step 7 for pipeline 9e03188f-2120-49ac-a087-1e4fb1b29754 failed.',)]
Current trial is failed. Error: [StepFailedError('Step 7 for pipeline af32bc20-64fa-44a5-ab34-bbe810b671b1 failed.',)]
Current trial is failed. Error: [StepFailedError('Step 7 for pipeline 5dbc9e87-19be-4cda-ac51-c1d7ea9328c1 failed.',)]


(pid=85426) class_weight presets "balanced" or "balanced_subsample" are not recommended for warm_start if the fitted data differs from the full dataset. In order to use "balanced" weights, use compute_class_weight ("balanced", classes, y). In place of y you can use a large enough sample of the full training set target to properly estimate the class frequency distributions. Pass the resulting weights as the class_weight parameter.


Current trial is failed. Error: [StepFailedError('Step 7 for pipeline 918c088e-58dd-4991-8336-deb0b41cb5eb failed.',)]
Current trial is failed. Error: [StepFailedError('Step 7 for pipeline 41dfec8f-0b07-4f8e-8ff3-cdbb1dab11c7 failed.',)]
Current trial is failed. Error: [StepFailedError('Step 7 for pipeline d465a878-1ea5-4b72-b8a7-3a4122d1a482 failed.',)]
Current trial is failed. Error: [StepFailedError('Step 7 for pipeline 8c39e981-f446-4fde-8744-5606c35a7fdf failed.',)]
Current trial is failed. Error: [StepFailedError('Step 7 for pipeline df127bce-11af-4fae-b8bb-722cb0666484 failed.',)]


(pid=85426) class_weight presets "balanced" or "balanced_subsample" are not recommended for warm_start if the fitted data differs from the full dataset. In order to use "balanced" weights, use compute_class_weight ("balanced", classes, y). In place of y you can use a large enough sample of the full training set target to properly estimate the class frequency distributions. Pass the resulting weights as the class_weight parameter.


Current trial is failed. Error: [StepFailedError('Step 7 for pipeline 0985e11e-8db0-4c1c-9f34-3ce8fbc626c1 failed.',)]
Current trial is failed. Error: [StepFailedError('Step 7 for pipeline 8977a9c0-dd79-4771-9dc1-455586b80947 failed.',)]
Current trial is failed. Error: [StepFailedError('Step 7 for pipeline c0238551-5fbb-41cd-8187-d3d23bc5571d failed.',)]


(pid=85426) class_weight presets "balanced" or "balanced_subsample" are not recommended for warm_start if the fitted data differs from the full dataset. In order to use "balanced" weights, use compute_class_weight ("balanced", classes, y). In place of y you can use a large enough sample of the full training set target to properly estimate the class frequency distributions. Pass the resulting weights as the class_weight parameter.


In [5]:
produce_results = search.produce(fitted_pipeline, [dataset])

In [6]:
produce_results.output

Unnamed: 0,d3mIndex,species
0,0,Iris-setosa
1,1,Iris-setosa
2,2,Iris-setosa
3,3,Iris-setosa
4,4,Iris-setosa
...,...,...
145,145,Iris-virginica
146,146,Iris-virginica
147,147,Iris-virginica
148,148,Iris-virginica


### Print information about scores of the succeded pipelines.

In [7]:
for pipeline_result in search.history:
    print('-' * 52)
    print('Pipeline id:', pipeline_result.pipeline.id)
    print('Rank:', pipeline_result.rank)
    print(pipeline_result.scores)

----------------------------------------------------
Pipeline id: 676360d8-71ac-401c-b44a-31a810c4e8d3
Rank: 0.22667216466666668
     metric     value  normalized  randomSeed  fold
0  ACCURACY  0.773333    0.773333          42     0
----------------------------------------------------
Pipeline id: 85d44359-0dac-4260-aea8-c78950025c3f
Rank: 0.33333446433333336
     metric     value  normalized  randomSeed  fold
0  ACCURACY  0.666667    0.666667          42     0
----------------------------------------------------
Pipeline id: 3efb07be-28ff-45d8-b1fb-1c49f96b3381
Rank: 0.6666653826666668
     metric     value  normalized  randomSeed  fold
0  ACCURACY  0.333333    0.333333          42     0
----------------------------------------------------
Pipeline id: abd9eb99-a4ba-4210-bb34-c2dec7c3ccfa
Rank: 0.6666606186666667
     metric     value  normalized  randomSeed  fold
0  ACCURACY  0.333333    0.333333          42     0
----------------------------------------------------
Pipeline id: 8948