# AutoML solution vs single model
#### FEDOT version = 0.5

In [None]:
pip install fedot==0.5

Below is an example of running an Auto ML solution for a classification problem.
## Description of the task and dataset

In [3]:
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

# Input data from csv files 
train_data_path = '../data/scoring_train.csv'
test_data_path = '../data/scoring_test.csv'
df = pd.read_csv(train_data_path)
df.head(5)

Unnamed: 0,ID,RevolvingUtilizationOfUnsecuredLines,age,NumberOfTime30.59DaysPastDueNotWorse,DebtRatio,MonthlyIncome,NumberOfOpenCreditLinesAndLoans,NumberOfTimes90DaysLate,NumberRealEstateLoansOrLines,NumberOfTime60.89DaysPastDueNotWorse,NumberOfDependents,target
0,0,0.766127,45,2,0.802982,9120.0,13,0,6,0,2.0,1
1,1,0.957151,40,0,0.121876,2600.0,4,0,0,0,1.0,0
2,2,0.65818,38,1,0.085113,3042.0,2,1,0,0,0.0,0
3,3,0.23381,30,0,0.03605,3300.0,5,0,0,0,0.0,0
4,4,0.907239,49,1,0.024926,63588.0,7,0,1,0,0.0,0


## Baseline model

Let's use the api features to solve the classification problem. First, we create a pipeline with a single model "xgboost". 
To do this, we will substitute the appropriate name in the predefined_model field.

Attention!
"predefined_model" - is not an initial assumption for the AutoML algorithm. It's just a single model without AutoML part

In [4]:
from fedot.api.main import Fedot

# task selection, initialisation of the framework
baseline_model = Fedot(problem='classification')

# fit model without optimisation - single XGBoost node is used 
xgb_pipeline = baseline_model.fit(features=train_data_path, target='target', predefined_model='xgboost')

# evaluate the prediction with test data
xgb_predict = baseline_model.predict_proba(features=test_data_path)

In [5]:
from fedot.core.data.data import InputData
from sklearn.metrics import roc_auc_score

# Read data from csv file as InputData
test_data = InputData.from_csv(test_data_path)
roc_auc_baseline = roc_auc_score(test_data.target, xgb_predict)
roc_auc_baseline

0.8328294111806176

## FEDOT AutoML for classification

We can identify the model using an evolutionary algorithm built into the core of the FEDOT framework.

Here are some parameters that you can specify when initializing a class:
* problem - the name of modelling problem to solve:
        - classification
        - regression
        - ts_forecasting
        - clustering
* seed - value for fixed random seed
* verbose_level - level of the output detailing
        - -1 - nothing
        - 0 - errors
        - 1 - messages
        - 2 - warnings and info
        - 3-4 - basic and detailed debug
* timeout - time for model design (in minutes)

In [6]:
# new instance to be used as AutoML tool
auto_model = Fedot(problem='classification', seed=42, verbose_level=0, timeout=2)

In [7]:
# run of the AutoML-based model generation
pipeline = auto_model.fit(features=train_data_path, target='target')

Generations:   0%|                                                                          | 1/10000 [01:03<?, ?gen/s]

Hyperparameters optimization start





  1%|â–Ž                                           | 6/1000 [02:42<7:29:04, 27.11s/trial, best loss: -0.8543467023733671]
Hyperparameters optimization finished
Return tuned pipeline due to the fact that obtained metric 0.854 equal or bigger than initial (- 5% deviation) 0.803


In [8]:
prediction = auto_model.predict_proba(features=test_data_path)

# Calculate metric
roc_auc_auto = roc_auc_score(test_data.target, prediction)

In [9]:
# comparison with the manual pipeline

print(f'Baseline {roc_auc_baseline:.2f}')
print(f'AutoML solution {roc_auc_auto:.2f}')

Baseline 0.83
AutoML solution 0.85


Thus, with just a few lines of code, we were able to launch the FEDOT framework and got a better result*.

*Due to the stochastic nature of the algorithm, the metrics for the found solution may differ.

If you want to learn more about FEDOT, you can use [this notebook](2_intro_to_fedot.ipynb).