# AutoML solution vs single model
#### FEDOT version = 0.2.1

Below is an example of running an Auto ML solution for a classification problem.
## Description of the task and dataset

In [7]:
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

# Input data from csv files 
train_data_path = '../data/scoring_train.csv'
test_data_path = '../data/scoring_test.csv'
df = pd.read_csv(train_data_path)
df.head(5)

Unnamed: 0,ID,RevolvingUtilizationOfUnsecuredLines,age,NumberOfTime30.59DaysPastDueNotWorse,DebtRatio,MonthlyIncome,NumberOfOpenCreditLinesAndLoans,NumberOfTimes90DaysLate,NumberRealEstateLoansOrLines,NumberOfTime60.89DaysPastDueNotWorse,NumberOfDependents,target
0,0,0.766127,45,2,0.802982,9120.0,13,0,6,0,2.0,1
1,1,0.957151,40,0,0.121876,2600.0,4,0,0,0,1.0,0
2,2,0.65818,38,1,0.085113,3042.0,2,1,0,0,0.0,0
3,3,0.23381,30,0,0.03605,3300.0,5,0,0,0,0.0,0
4,4,0.907239,49,1,0.024926,63588.0,7,0,1,0,0.0,0


## Baseline model

Let's use the api features to solve the classification problem. First, we create a chain from a single model "xgboost". 
To do this, we will substitute the appropriate name in the predefined_model field.

In [8]:
from fedot.api.main import Fedot

#task selection, initialisation of the framework
baseline_model = Fedot(problem='classification')

#fit model without optimisation - single XGBoost node is used 
baseline_model.fit(features=train_data_path, target='target', predefined_model='xgboost')

#evaluate the prediction with test data
baseline_model.predict_proba(features=test_data_path)

#evaluate quality metric for the test sample
baseline_metrics = baseline_model.get_metrics()
print(baseline_metrics)

{'roc_auc': 0.827, 'f1': 0.32508833922261476}


## FEDOT AutoML for classification

We can identify the model using an evolutionary algorithm built into the core of the FEDOT framework.

In [9]:
# new instance to be used as AutoML tool
auto_model = Fedot(problem='classification', seed = 42, verbose_level=4)

In [10]:
#run of the AutoML-based model generation
pipeline = auto_model.fit(features=train_data_path, target='target')

light_tun preset is used. Parameters tuning: True. Set of candidate models: ['logit', 'lda', 'qda', 'dt', 'rf', 'knn', 'xgboost', 'bernb', 'direct_data_model', 'pca_data_model']. Composing time limit: 0:02:00
Model composition started
Hyperparameters tuning started
Start tuning of primary nodes
End tuning
Model composition finished
Fit chain from scratch


In [11]:
prediction = auto_model.predict_proba(features=test_data_path)
auto_metrics = auto_model.get_metrics()
print(auto_metrics)

{'roc_auc': 0.849, 'f1': 0.38768529076396807}


In [12]:
#comparison with the manual pipeline

print('Baseline', round(baseline_metrics['roc_auc'], 3))
print('AutoML solution', round(auto_metrics['roc_auc'], 3))

Baseline 0.827
AutoML solution 0.849


Thus, with just a few lines of code, we were able to launch the FEDOT framework and got a better result*.

*Due to the stochastic nature of the algorithm, the metrics for the found solution may differ.

If you want to learn more about FEDOT, you can use [this notebook](2_intro_to_fedot.ipynb).