## Check the setup and connect to the database

In [None]:
%run "010-check_setup.ipynb"

# Tables from SAP HANA

In [None]:
hdf_titanic_train=myconn.table('DATA_LABELED', schema='TITANIC')
print(hdf_titanic_train.columns)

In [None]:
col_id='PassengerId'
col_label='Survived'

In [None]:
col_features=[feature for feature in hdf_titanic_train.columns if not (feature in {col_id, col_label})]
print(col_features)

# Random Decision Trees classification

Random Decision Trees, aka RDT: https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/pal/algorithms/hana_ml.algorithms.pal.trees.RDTClassifier.html

In [None]:
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification

UnifiedClassification offers a varity of classfication algorithms. We use `RandomDecisionTree` for training.

Other options are: 
- 'DecisionTree'
- 'HybridGradientBoostingTree'
- 'LogisticRegression'
- 'MLP'
- 'NaiveBayes'
- 'RandomDecisionTree'
- 'SVM'

### The simplest training call

RDT Classifier has many parameters to influence the execution of the fitting algorithm: https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/pal/algorithms/hana_ml.algorithms.pal.trees.RDTClassifier.html#rdtclassifier, but for now you run it with the default parameters only.

You will use Unified Classifier class to create an RDT class: https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/pal/algorithms/hana_ml.algorithms.pal.unified_classification.UnifiedClassification.html#unifiedclassification

In [None]:
uc_rdt = UnifiedClassification(func='RandomDecisionTree')

The `fit()` procedure returns a fitted object: https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/pal/algorithms/hana_ml.algorithms.pal.trees.RDTClassifier.html#hana_ml.algorithms.pal.trees.RDTClassifier.fit, ie. populated attributes, like 
- `model_DataFrame`: Trained model content.
- `feature_importances_DataFrame`: The feature importance (the higher, the more important the feature).
- `oob_error_DataFrame`: Out-of-bag error rate or mean squared error for random decision trees up to indexed tree. Set to None if calculate_oob is False.
- `confusion_matrix_DataFrame`: Confusion matrix used to evaluate the performance of classification algorithms.

To understand these structures better check the corresponding PAL documentation: https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-predictive-analysis-library/random-decision-trees-random-decision-trees-9ad576a#ariaid-title3

To understand the mapping between PAL objects and fields in SQL and in Python, check https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/pal/parameter_mappings.html (or https://help.sap.com/doc/cd94b08fe2e041c2ba778374572ddba9/latest/en-US/_static/extra_parameter_mappings.html#/ for the full screen)

The simplest training (fit) call: only the key and the label (the target) of the dataset

In [None]:
uc_rdt.fit(
    data=hdf_titanic_train,
    key=col_id, 
    label=col_label
);

In [None]:
#Check fit procedure executed on the db side
print(uc_rdt.get_fit_execute_statement())

# Call prediction

In [None]:
hdf_titanic_test=myconn.table('DATA_TO_PREDICT', schema='TITANIC')

The test table has the same structure except missing the column `Survived`.

In [None]:
hdf_titanic_test.head(4).collect()

In [None]:
hdf_res = uc_rdt.predict(hdf_titanic_test, key = col_id)

In [None]:
hdf_res.collect()

🤓 **Let's discuss**:
- The structure of the result table `hdf_res`

## Visualize the split of predicted target

In [None]:
from hana_ml.visualizers.eda import EDAVisualizer

In [None]:
EDAVisualizer(enable_plotly=True).pie_plot(data=hdf_res, column='SCORE',
                         legend=False, explode=0,
                         #startangle=90, 
                         #counterclock=False
                        );

🤓 **Let's discuss**:
* What we can say about this prediction

## Debrief the model

In [None]:
from hana_ml.visualizers.model_debriefing import TreeModelDebriefing