# Python HANA ML APL

<div class="alert alert-block alert-info">
<b>Predicting a continuous target (regression case).</b> <br>
</div>

## Train

### Create an HANA Dataframe for the training data

In [1]:
from hana_ml import dataframe as hd
conn = hd.ConnectionContext(userkey='MLMDA_KEY')
sql_cmd = 'SELECT * FROM "APL_SAMPLES"."CENSUS" order by "id"'
hdf_train = hd.DataFrame(conn, sql_cmd)

In [2]:
hdf_train.head(6).collect()

Unnamed: 0,id,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,class
0,1,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,0
1,2,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,0
2,3,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,0
3,4,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,0
4,5,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,0
5,6,37,Private,284582,Masters,14,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,0


### Fit with APL Gradient Boosting

In [3]:
from hana_ml.algorithms.apl.gradient_boosting_regression import GradientBoostingRegressor
apl_model = GradientBoostingRegressor(eval_metric='MAE', variable_auto_selection = True)
apl_model.fit(hdf_train, label='age', key='id')

##### Model Reports

In [4]:
df = apl_model.get_debrief_report('ClassificationRegression_VariablesContribution').collect()
df = df.sort_values(by=['Rank'])
df.drop({'Oid','Method','Rank'}, axis=1, inplace=True)
df.drop(df[df.Contribution <=0].index, inplace=True)
format_dict = {'Contribution':'{:,.2%}','Cumulative':'{:,.2%}'}
df.style.format(format_dict).hide_index()

Variable,Contribution,Cumulative
marital-status,37.90%,37.90%
relationship,16.20%,54.00%
workclass,10.20%,64.20%
class,7.40%,71.60%
education-num,6.90%,78.50%
hours-per-week,6.60%,85.10%
fnlwgt,5.90%,91.00%
occupation,4.00%,95.00%
education,2.50%,97.50%
capital-gain,2.50%,100.00%


In [5]:
my_filter = "\"Partition\" = 'Validation' and \"Indicator\" in ('MAPE','RMSE')"
df = apl_model.get_debrief_report('ClassificationRegression_Performance').filter(my_filter).collect()
df.drop('Oid', axis=1, inplace=True)
format_dict = {'Value':'{:,.3f}'}
df.style.format(format_dict).hide_index()

Target,Partition,Indicator,Value
age,Validation,RMSE,9.8
age,Validation,MAPE,0.209


## Make Predictions

In [6]:
sql_cmd = 'SELECT * FROM "APL_SAMPLES"."CENSUS" LIMIT 100'
hdf_apply = hd.DataFrame(conn, sql_cmd)
df = apl_model.predict(hdf_apply).collect()
df.columns = ['id', 'Actual','Prediction']
df.head(8)

Unnamed: 0,id,Actual,Prediction
0,30,49,40
1,63,48,42
2,66,36,41
3,110,42,40
4,335,53,40
5,352,26,40
6,366,28,41
7,407,28,41
