# Classification
In this example we are building a model that classifies iris flowers.  
Dataset:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb)

In [17]:
from hana_automl.automl import AutoML
import pandas as pd
from hana_ml.dataframe import ConnectionContext
from hana_automl.storage import Storage

In [18]:
test_df = pd.read_csv('./docs/source/datasets/test_iris.csv', index_col='Unnamed: 0')
df = pd.read_csv('./docs/source/datasets/iris.csv', index_col='Unnamed: 0')
df.head()

Unnamed: 0,ID,sepal_length,sepal_width,petal_length,petal_width,species
0,30,4.8,3.1,1.6,0.2,setosa
1,31,5.4,3.4,1.5,0.4,setosa
2,32,5.2,4.1,1.5,0.1,setosa
3,33,5.5,4.2,1.4,0.2,setosa
4,34,4.9,3.1,1.5,0.1,setosa


Pass credentials to the database.

In [19]:
cc = ConnectionContext(address='localhost', port=39015, user='DEVELOPER')

In [20]:
automl = AutoML(connection_context=cc)

In [21]:
automl.fit(
    df=df,
    task='cls', # if task = None, we'll determine it for you
    steps=10,
    target='species',
    table_name='CLASSIFICATION', # optional
    categorical_features=['species'],
    id_column='ID', # optional
    verbosity=1
)

Recreating table CLASSIFICATION with data from dataframe
100%|██████████| 1/1 [00:00<00:00,  6.92it/s]
Task: cls
All iterations completed successfully!
Starting model accuracy evaluation on the validation data!


Save model

In [22]:
storage = Storage(connection_context=cc, schema='DEVELOPER')
automl.model.name = "iris" # don't forget to specify the name
storage.save_model(automl=automl)
storage.list_models()

Unnamed: 0,NAME,VERSION,LIBRARY,CLASS,JSON,TIMESTAMP,MODEL_STORAGE_VER
0,iris,1,PAL,hana_ml.algorithms.pal.svm.SVC,"{""model_attributes"": {""c"": 0.5398025983275939,...",2021-05-20 14:11:13,1


Load model and predict

In [23]:
new_model = storage.load_model('iris')
new_model.predict(df=test_df, id_column='ID')

Creating table with name: AUTOMLb758836b-8420-43ac-8a16-37176544d312
100%|██████████| 1/1 [00:00<00:00,  7.08it/s]
Preprocessor settings: <hana_automl.preprocess.settings.PreprocessorSettings object at 0x121094970>
Prediction results (first 20 rows): 
     ID   SCORE PROBABILITY
0    0  setosa        None
1    1  setosa        None
2    2  setosa        None
3    3  setosa        None
4    4  setosa        None
5    5  setosa        None
6    6  setosa        None
7    7  setosa        None
8    8  setosa        None
9    9  setosa        None
10  10  setosa        None
11  11  setosa        None
12  12  setosa        None
13  13  setosa        None
14  14  setosa        None
15  15  setosa        None
16  16  setosa        None
17  17  setosa        None
18  18  setosa        None
19  19  setosa        None


Unnamed: 0,ID,SCORE,PROBABILITY
0,0,setosa,
1,1,setosa,
2,2,setosa,
3,3,setosa,
4,4,setosa,
5,5,setosa,
6,6,setosa,
7,7,setosa,
8,8,setosa,
9,9,setosa,


Cleanup storage

In [24]:
storage.clean_up()

For more information, visit AutoML class and Storage class in documentation