# FEDOT framework
#### FEDOT version = 0.2.1

Below is a description of the FEDOT framework and its main functions, which can be used to solve various ML tasks, namely:

* Regression
* Classification
* Time series forecasting
* Clustering

FEDOT can construct complex composite models (consisting of multiple machine learning models and preprocessing operations) based on an evolutionary algorithm. Thus, it is possible to create pipelines for solving various tasks.

The structure of the FEDOT framework can be seen in the figure below:

![fedot_structure.png](../jupyter_media/fedot_structure/fedot_structure_02.png)

Figure 1. The structure of the FEDOT framework. The main modules of the library are shown.

As you can see from the picture there are two ways to start FEDOT:
1) API - allows you to run framework models in a few lines of code;
2) Low-level methods from the core - you can call methods by accessing the core directly. In this case, you will have to write more code, but more functionality opens up.

Below are two examples for solving the regression problem using API methods and using FEDOT.core function directly.

## Generate synthetic dataset for classification task

In [12]:
from fedot.utilities.synthetic.data import classification_dataset

# Generate numpy arrays with features and target
features_options = {'informative': 1, 'redundant': 0,
                    'repeated': 0, 'clusters_per_class': 1}
x_data, y_data = classification_dataset(samples_amount=250,
                                        features_amount=3,
                                        classes_amount=2,
                                        features_options=features_options)

## API example

In [20]:
from fedot.api.main import Fedot

# Task selection, initialisation of the framework
fedot_model = Fedot(problem='classification', learning_time=1,
                    seed = 42, verbose_level=4)

# During fit, the chain composition algorithm is started
pipeline = fedot_model.fit(features=x_data,
                           target=y_data)

light_tun preset is used. Parameters tuning: True. Set of candidate models: ['logit', 'lda', 'qda', 'dt', 'rf', 'knn', 'xgboost', 'bernb', 'direct_data_model', 'pca_data_model']. Composing time limit: 0:01:00
Model composition started
Hyperparameters tuning started
Start tuning of primary nodes
End tuning
Model composition finished
Fit chain from scratch


In [28]:
from sklearn.metrics import roc_auc_score as roc_auc

prediction = fedot_model.predict_proba(features=x_data)
print(pipeline)
print(f'ROC AUC score on training sample: {roc_auc(y_data, prediction):.3f}')

{'depth': 2, 'length': 3, 'nodes': [logit, qda, lda]}
ROC AUC score on training sample: 0.877


## Core-based example

We will transform the data into a specific format (InputData) for the algorithm launch.

In [45]:
import datetime
import numpy as np 
from fedot.core.data.data import InputData
from fedot.core.repository.tasks import Task, TaskTypesEnum
from fedot.core.repository.dataset_types import DataTypesEnum
from fedot.core.repository.model_types_repository import ModelTypesRepository

from fedot.core.chains.chain import Chain
from fedot.core.composer.gp_composer.gp_composer import GPComposerBuilder, GPComposerRequirements
from fedot.core.composer.optimisers.gp_optimiser import GPChainOptimiserParameters, GeneticSchemeTypesEnum
from fedot.core.repository.quality_metrics_repository import ClassificationMetricsEnum, MetricsRepository

# Define classification task
task = Task(TaskTypesEnum.classification)

# Prepare data to train the model
input_data = InputData(idx=np.arange(0, len(x_data)), features=x_data,
                       target=y_data, task=task,
                       data_type=DataTypesEnum.table)

In [46]:
# the search of the models provided by the framework that can be used as nodes in a chain for the selected task
available_model_types, _ = ModelTypesRepository().suitable_model(task_type=task.task_type)
available_model_types

['logit',
 'lda',
 'qda',
 'dt',
 'rf',
 'mlp',
 'knn',
 'xgboost',
 'bernb',
 'direct_data_model',
 'pca_data_model']

In [47]:
# the choice of the metric for the chain quality assessment during composition
metric_function = MetricsRepository().metric_by_id(ClassificationMetricsEnum.ROCAUC_penalty)

In [67]:
# the choice and initialisation of the GP search
max_lead_time = datetime.timedelta(minutes=5)
composer_requirements = GPComposerRequirements(
    primary=available_model_types,
    secondary=available_model_types, max_arity=3,
    max_depth=3, pop_size=10, num_of_generations=2,
    crossover_prob=0.8, mutation_prob=0.8, max_lead_time=max_lead_time)

In [68]:
# GP optimiser parameters choice
scheme_type = GeneticSchemeTypesEnum.steady_state
optimiser_parameters = GPChainOptimiserParameters(genetic_scheme_type=scheme_type)

# Create builder for composer and set composer params
builder = GPComposerBuilder(task=task).with_requirements(composer_requirements).with_metrics(
    metric_function).with_optimiser_parameters(optimiser_parameters)

# Create GP-based composer
composer = builder.build()

# the optimal chain generation by composition - the most time-consuming task
chain_evo_composed = composer.compose_chain(data=input_data,
                                            is_visualise=True)

In [69]:
chain_evo_composed.fine_tune_primary_nodes(input_data=input_data,
                                           iterations=50)

In [70]:
chain_evo_composed.fit(input_data=input_data)

OutputData(idx=array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 17

In [76]:
prediction = chain_evo_composed.predict(input_data)
print(pipeline)
print(f'ROC AUC score on training sample: {roc_auc(y_data, prediction.predict):.3f}')

{'depth': 2, 'length': 3, 'nodes': [logit, qda, lda]}
ROC AUC score on training sample: 0.876
