# Deployment of an Intelligent Scenario based on HANA PAL via Python

In the following notebook we will demonstrate how to work with the predictive analytics library (PAL) via the Python API and a development build of the [**hana_ml**](https://pypi.org/project/hana-ml/) package.

## Introduction

### Contact Information

- [Christoph Morgen, Product Manager](mailto:christoph.morgen@sap.com)
- [Jonas Heinrich, VT Student](mailto:jonas.heinrich@sap.com)
- [Florian Drescher, VT Student](mailto:florian.drescher@sap.com)



<h3 style="color:red"><b>Disclaimer</b></h3>

This demo is executed on a live development environment currently in use by the ISLM team.
Since there are several colleagues working on it frequently, **this demo does not claim stability**.

**Additionally the part of this demo that focuses on the deployment is not released, nor is there any current release date targeted.**

### Scenario

In this notebook, we use a common dataset that is about the onset of diabetes in the Indian population. It contains a table that has several columns as features, like age, bmi, insulin levels as well as the ultimate outcome (diabetes or not).

We chose this for no particular reason, other than the dataset being in the [public domain](https://www.kaggle.com/uciml/pima-indians-diabetes-database), the prominence of it in the data science community and the ease of importing the data into an SAP System.

### Structure

This notebook is a mix of the typical workflow of a data scientist (with Python Code), as well as explanations and some screenshots from the Fiori frontend. You do not need to understand any Python code or the particular scenario we have chosen to follow this notebook.

Without any further ado, lets get started!

## Demo

### 1. Install requirements

Like any other machine learning library in the python ecosystem, we need to install the **hana_ml** package (a development build) in order to be able to import the necessary requirements.

### 2. Importing Requirements

Let's import the necessary libraries for our use case. In here there is yaml for configuration management, a machine learning algorithm, a dataframe for data manipulation as well as the artifact generator and deployer.

In [None]:
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
import hana_ml.dataframe as dataframe

from hana_ml.artifacts.generators import AMDPGenerator
from hana_ml.artifacts.deployers import AMDPDeployer

### 3. Create connection context

In the following code block we just load our credentials from disk in order to not leak them into this notebook or the underlying git repository:

In [None]:
from hana_ml.algorithms.pal.utility import DataSets, Settings

In [None]:
try:
    import configparser
except ImportError:
    import ConfigParser as configparser
Settings.settings = configparser.ConfigParser()
Settings.settings.read("../../config/e2edata.ini")
url = Settings.settings.get("hana", "url")
port = Settings.settings.get("hana", "port")
user = Settings.settings.get("hana", "user")
passwd = Settings.settings.get("hana", "passwd")

Now its time to create a connection context for our HANA system. This allows us to access the required data, as well as the PAL procedures we need to call in order to train our model.

In [None]:
connection_context = dataframe.ConnectionContext(
    url, int(port), user, passwd)


In [None]:
connection_context.hana_version()

In [None]:
connection_context.get_current_schema()

We also enable SQL tracing for later reuse of the model in the deployment.

### 4. Prepare the data

This block is part of the utils for this demo - it makes sure the dataset is in the system and creates it if necessary. In a real production use case this would obviously be unnecessary since the data is already in the system.

In [None]:
diabetes_full, diabetes_train_valid, diabetes_test, _ = DataSets.load_diabetes_data(connection_context)

diabetes_train_valid = diabetes_train_valid.save("DIABETES_TRAIN", force=True)
diabetes_test = diabetes_test.save("DIABETES_TEST", force=True)

### 5. Data Science Loop

In this section the real work of a data scientist happens. They manipulate the data, preprocess columns, choose a model and try different combinations of hyper parameters.

Since we just want to demonstrate the deployment, lets keep this short and just use a basic Random Decision Tree Classifier.

In [None]:
connection_context.sql_tracer.enable_sql_trace(True)
connection_context.sql_tracer.enable_trace_history(True)
rfc_params = dict(n_estimators=5, split_threshold=0, max_depth=10)
rfc = UnifiedClassification(func="RandomDecisionTree", **rfc_params)
rfc.fit(diabetes_train_valid, 
        key='ID', 
        label='CLASS', 
        categorical_variable=['CLASS'],
        partition_method='stratified',
        stratified_column='CLASS',)
cm = rfc.confusion_matrix_.collect()
rfc.predict(diabetes_test.deselect('CLASS'), key="ID")

We can also view the confusion matrix and accuracy:

In [None]:
print(cm)
print(float(cm['COUNT'][cm['ACTUAL_CLASS'] == cm['PREDICTED_CLASS']].sum()) / cm['COUNT'].sum())

### 5. Generate abap managed database procedures (AMDP) artifact

At this point in the workflow, our data scientist has iterated on the model many times and found a satisfactory solution. He/she now decides that its time to deploy this to an ABAP system such that an application developer can easily work with it.

We start the process by creating some `.abap` files on our local machine based on the work that was done previously. This contains the SQL logic wrapped in AMDPs the data scientist created by interacting with the **hana_ml** package. You can also manually inspect the code at this point and make adaptions where you see fit.

In [None]:
generator = AMDPGenerator(project_name="PIMA_DIAB", version="1", connection_context=connection_context, outputdir="out/")
generator.generate()

At this point in time there is no intelligent scenario present on the system:

![pre.png](img/pre.PNG)

In [None]:
!zip -r out.zip out

### 6. Deploy the generated artifact to S/4

We can now take the generated code and deploy it to S/4 or any ABAP stack with ISLM for that matter. We just need to provide the `.abap` file, and some basic parameters for the ISLM registration.

In [None]:
model_name = "ZJMODEL_01"
model_description = "ZJ Hello S/4 demo!"
scenario_name = "ZJ_DEMO_CUSTOM01"
scenario_description = "ZJ Hello S/4 Demo!"
scenario_type = "CLASSIFICATION"

backend_url = Settings.settings.get("abap", "backend_url")
backend_user = Settings.settings.get("abap", "backend_user")
backend_password = Settings.settings.get("abap", "backend_password")
frontend_url = Settings.settings.get("abap", "frontend_url")
frontend_user = Settings.settings.get("abap", "frontend_user")
frontend_password = Settings.settings.get("abap", "frontend_password")
deployer = AMDPDeployer(backend_url=backend_url,
                        backend_auth=(backend_user,
                                      backend_password),
                        frontend_url=frontend_url,
                        frontend_auth=(frontend_user,
                                       frontend_password))

guid = deployer.deploy(fp="out/PIMADIAB/abap/Z_CL_PIMADIAB_1.abap",
                       model_name=model_name,
                       catalog="$TMP",
                       scenario_name=scenario_name,
                       scenario_description=scenario_description,
                       scenario_type=scenario_type,
                       force_overwrite=True)

If we now look into our fiori app, we can see that the scenario was created:

![created.png](img/created.PNG)

Looking into the scenario details, we can see that no CDS binding was supplied and it is currently kept as a draft:

![created-details.png](img/created-details.PNG)

### 7. Publish the scenario to ISLM (optional)

Altough this is not strictly necessary and should probably be done manually, we can also go ahead and publish the scenario with a training and apply dataset. 

In [None]:
deployer.publish_islm(scenario_name, train_cds='ZPIMA_DIABETES_TRAIN', apply_cds='ZPIMA_DIABETES_TRAIN', sap_client='000')

Looking at the Fiori app, we can now see that the scenario was published correctly.

![published.png](img/published.PNG)

### 8. Train the scenario in ISLM (optional)

Taking this logic further, we can also start training a model programatically. Below screenshot is before the execution.

![pretrain.png](img/pretrain.png)

In [None]:
deployer.train_islm(model_name, model_description, scenario_name, sap_client='000')

After some minutes, the training was started and executed successfully.

![posttrain.png](img/posttrain.png)

We can also inspect model specific debriefing information:

![debrief.png](img/debrief.png)