# Regression
In this example we are building a model that predicts house prices in Boston  


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dan0nchik/SAP-HANA-AutoML/blob/dev/docs/source/regression.ipynb)

In [1]:
from hana_automl.automl import AutoML
import pandas as pd
from hana_ml.dataframe import ConnectionContext
from hana_automl.storage import Storage

Let's get used to the dataset

In [11]:
test_df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/boston_test_data.csv')
df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/boston_data.csv')
df.head()

Unnamed: 0,ID,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat,medv
0,0,0.15876,0.0,10.81,0.0,0.413,5.961,17.5,5.2873,4.0,305.0,19.2,376.94,9.88,21.7
1,1,0.10328,25.0,5.13,0.0,0.453,5.927,47.2,6.932,8.0,284.0,19.7,396.9,9.22,19.6
2,2,0.3494,0.0,9.9,0.0,0.544,5.972,76.7,3.1025,4.0,304.0,18.4,396.24,9.97,20.3
3,3,2.73397,0.0,19.58,0.0,0.871,5.597,94.9,1.5257,5.0,403.0,14.7,351.85,21.45,15.4
4,4,0.04337,21.0,5.64,0.0,0.439,6.115,63.0,6.8147,4.0,243.0,16.8,393.97,9.43,20.5


Pass credentials to the database.

In [3]:
cc = ConnectionContext(address='localhost', port=39015, user='DEVELOPER')

In [4]:
automl = AutoML(connection_context=cc)

In [5]:
automl.fit(
    df=df,
    task=None, # library will try to determine task
    steps=10,
    target='medv',
    table_name='REGRESSION', # optional
    id_column='ID', # pass None if no ID column in dataset
    verbosity=1
)

Recreating table REGRESSION with data from dataframe
100%|██████████| 1/1 [00:00<00:00,  1.84it/s]
Task: reg
All iterations completed successfully!
Starting model accuracy evaluation on the validation data!


Save model

In [6]:
storage = Storage(connection_context=cc, schema='DEVELOPER')
automl.model.name = "boston" # don't forget to specify the name
storage.save_model(automl=automl)
storage.list_models()
storage.list_preprocessors()

Unnamed: 0,MODEL,VERSION,JSON
0,boston,1,"{""num_strategy"": [""mean"", ""median"", ""zero""], ""..."
1,boston,2,"{""num_strategy"": [""mean"", ""median"", ""zero""], ""..."
2,boston,3,"{""num_strategy"": [""mean"", ""median"", ""zero""], ""..."
3,boston,4,"{""num_strategy"": [""mean"", ""median"", ""zero""], ""..."
4,boston,5,"{""num_strategy"": [""mean"", ""median"", ""zero""], ""..."


In [7]:
storage.list_models()

Unnamed: 0,NAME,VERSION,LIBRARY,CLASS,JSON,TIMESTAMP,MODEL_STORAGE_VER
0,boston,1,PAL,hana_ml.algorithms.pal.trees.GradientBoostingR...,"{""model_attributes"": {""n_estimators"": 280, ""su...",2021-05-20 14:40:32,1
1,boston,2,PAL,hana_ml.algorithms.pal.trees.GradientBoostingR...,"{""model_attributes"": {""n_estimators"": 280, ""su...",2021-05-20 14:45:29,1
2,boston,3,PAL,hana_ml.algorithms.pal.trees.GradientBoostingR...,"{""model_attributes"": {""n_estimators"": 280, ""su...",2021-05-20 14:45:42,1
3,boston,4,PAL,hana_ml.algorithms.pal.trees.GradientBoostingR...,"{""model_attributes"": {""n_estimators"": 294, ""su...",2021-05-20 14:52:14,1
4,boston,5,PAL,hana_ml.algorithms.pal.trees.GradientBoostingR...,"{""model_attributes"": {""n_estimators"": 101, ""su...",2021-05-20 15:26:21,1


Load model and predict

In [8]:
new_model = storage.load_model('boston', version=1)
new_model.predict(df=test_df)

Creating table with name: AUTOML70c8b478-6008-462b-8d72-83905a5158e3
100%|██████████| 1/1 [00:00<00:00,  2.45it/s]
Preprocessor settings: <hana_automl.preprocess.settings.PreprocessorSettings object at 0x1066e5f70>
Prediction results (first 20 rows): 
     ID    SCORE CONFIDENCE
0    1  34.6182       None
1    2  20.7285       None
2    3  15.3624       None
3    4  25.2115       None
4    5  22.3916       None
5    6  23.9694       None
6    7  20.4695       None
7    8  21.3091       None
8    9  19.0294       None
9   10  41.3506       None
10  11  39.9838       None
11  12  13.5209       None
12  13  10.0027       None
13  14  44.5971       None
14  15  23.6994       None
15  16  12.5791       None
16  17  19.8964       None
17  18   17.987       None
18  19  17.6412       None
19  20  11.2817       None


Unnamed: 0,ID,SCORE,CONFIDENCE
0,1,34.6182,
1,2,20.7285,
2,3,15.3624,
3,4,25.2115,
4,5,22.3916,
...,...,...,...
97,98,25.2976,
98,99,25.7257,
99,100,13.6407,
100,101,40.0866,


Cleanup storage

In [9]:
storage.clean_up()

For more information, visit AutoML class and Storage class in documentation