# Business Entity Recognition Demo

This notebook is designed to demonstrate how easy it is to use the SAP AI Business Services - Business Entity Recognition service for classification tasks. In this demo, we train a model and evaluate its performance on a small example dataset.

For the demo, we prepared this Jupyter Notebook which demonstrates the use of this client library to invoke the most important functions of the Business Entity Recognition REST API. 

## Fetch python module and repo containing example dataset

This notebook requires the python package containing the client and a dataset to train a model on. Both are fetched in the cell below.

An example dataset is provided in the repo, you can exlpore the structure of the dataset required [here](https://github.wdf.sap.corp/i329525/BER-Client/tree/master/examples/data).

## Settings

The settings under `Environment specific configuration` require a valid service key for the Business Entity Recognition service on SAP Cloud Plattform.

The keys in the service key needed here are named exactly as the variables, specifically:
- url: The URL of the service deployment provided in the outermost hierachy of the service key json file
- uaa_url: The URL of the UAA server used for authentication provided in the __uaa__ of the service key json file
- uaa_clientid: The clientid used for authentication to the UAA server provided in the __uaa__ of the service key json file
- uaa_clientsecret: The clientsecret used for authentication to the UAA server provided in the __uaa__ of the service key json file

For the `Model specific configuration` the parameters are explained by a comment below.

# Environment specific configuration
url = ""
uaa_url = ""
uaa_clientid = ""
uaa_clientsecret = ""

# Model specific configuration
model_name = "" # choose an arbitrary model name for the model trained here, will be assigned to the trained model for identification purposes
dataset_folder = "data" # should point to (relative or absolute) path containing dataset

In [1]:
# update working directory path

import os

os.chdir('../')

print(os.getcwd())

import pathlib
pathlib.Path().absolute()

/Users/i308965/Documents/Github/business-entity-recognition-client-library


PosixPath('/Users/i308965/Documents/Github/business-entity-recognition-client-library')

In [2]:
import pathlib
pathlib.Path().absolute()

PosixPath('/Users/i308965/Documents/Github/business-entity-recognition-client-library')

## Initialize Demo

In [3]:
from sap_ber_client import ber_api_client
from pprint import pprint

In [4]:
import importlib
# import sap_ber_client.ber_api_client

importlib.reload(ber_api_client)

<module 'sap_ber_client.ber_api_client' from '/Users/i308965/Documents/Github/business-entity-recognition-client-library/sap_ber_client/ber_api_client.py'>

In [5]:
# Instaniate object used to communicate with DC REST API
# my_ber_client = pyber.Pyber(url, uaa_clientid, uaa_clientsecret, uaa_url)
url = 'https://ner-api-integration.cfapps.sap.hana.ondemand.com'
uaa_clientid = 'sb-13782270-6d82-4a44-9fa1-f8d8b691b22e!b9271|na-32a67d88-b0a4-449d-bba8-d86383508741!b9271'
uaa_clientsecret = 'V7eavDNE7XqtgHzECojyKBYy6JA='
uaa_url = 'https://ml-ner-test.authentication.sap.hana.ondemand.com'

my_ber_client = ber_api_client.BER_API_Client(url, uaa_clientid, uaa_clientsecret, uaa_url)

print(my_ber_client.base_url)

https://ner-api-integration.cfapps.sap.hana.ondemand.com/api/v1/


## Display access token

In [6]:
# Token can be used to interact with e.g. swagger UI to explore BER API
print(my_ber_client.session.headers)
print("\nYou can use this token to Authorize here and explore the API via Swagger UI: \n{}api/v1/".format(url))

{'Authorization': 'Bearer eyJhbGciOiJSUzI1NiIsImprdSI6Imh0dHBzOi8vbWwtbmVyLXRlc3QuYXV0aGVudGljYXRpb24uc2FwLmhhbmEub25kZW1hbmQuY29tL3Rva2VuX2tleXMiLCJraWQiOiJrZXktaWQtMSIsInR5cCI6IkpXVCJ9.eyJqdGkiOiI1YTU3YTZkMDc0NmU0NmM4YWRlYmY2OGRmOGRkN2YxYyIsImV4dF9hdHRyIjp7ImVuaGFuY2VyIjoiWFNVQUEiLCJzdWJhY2NvdW50aWQiOiJjNjkyNzFkYy03ZTU3LTQwMWQtYjkzNS1jOTIxYTA3ZTFkMTQiLCJ6ZG4iOiJtbC1uZXItdGVzdCIsInNlcnZpY2VpbnN0YW5jZWlkIjoiMTM3ODIyNzAtNmQ4Mi00YTQ0LTlmYTEtZjhkOGI2OTFiMjJlIn0sInN1YiI6InNiLTEzNzgyMjcwLTZkODItNGE0NC05ZmExLWY4ZDhiNjkxYjIyZSFiOTI3MXxuYS0zMmE2N2Q4OC1iMGE0LTQ0OWQtYmJhOC1kODYzODM1MDg3NDEhYjkyNzEiLCJhdXRob3JpdGllcyI6WyJ1YWEucmVzb3VyY2UiXSwic2NvcGUiOlsidWFhLnJlc291cmNlIl0sImNsaWVudF9pZCI6InNiLTEzNzgyMjcwLTZkODItNGE0NC05ZmExLWY4ZDhiNjkxYjIyZSFiOTI3MXxuYS0zMmE2N2Q4OC1iMGE0LTQ0OWQtYmJhOC1kODYzODM1MDg3NDEhYjkyNzEiLCJjaWQiOiJzYi0xMzc4MjI3MC02ZDgyLTRhNDQtOWZhMS1mOGQ4YjY5MWIyMmUhYjkyNzF8bmEtMzJhNjdkODgtYjBhNC00NDlkLWJiYTgtZDg2MzgzNTA4NzQxIWI5MjcxIiwiYXpwIjoic2ItMTM3ODIyNzAtNmQ4Mi00YTQ0LTlmYTEtZjhkOGI2O

## Create Dataset for training of a new model

In [7]:
# Create Training dataset
response = my_ber_client.create_dataset()
pprint(response)

<Response [201]>


In [8]:
import json
print(str(response.json()))

{'data': {'datasetId': 'c379969e-0077-4790-86a4-7bc58dd24101', 'datasetType': 'training', 'message': 'Dataset has been created successfully'}}


In [12]:
training_dataset_id = response.json()["data"]["datasetId"]
print(training_dataset_id)

c379969e-0077-4790-86a4-7bc58dd24101


In [18]:
# Upload training documents to the dataset from training directory
import os
dataset_folder = os.getcwd() + '/examples/data/english_training_dataset_annotated.json'
print("Uploading training documents to the dataset")
response = my_ber_client.upload_document_to_dataset(training_dataset_id, dataset_folder)
print("Finished uploading training documents to the dataset")
pprint(response)

Uploading training documents to the dataset
Finished uploading training documents to the dataset
<Response [201]>


In [19]:
# Pretty print the dataset statistics
print("Dataset statistics")
dataset_stats = my_ber_client.get_dataset(training_dataset_id)
pprint(dataset_stats)

Dataset statistics
<Response [200]>


## Training

In [20]:
model_name = "for-client-lib"

In [21]:
# Train the model
    
print("Start training job from model with modelName {}".format(model_name))
response = my_ber_client.train_model(model_name, training_dataset_id)
print(response)

Start training job from model with modelName for-client-lib
<Response [202]>


In [29]:
print(response.json())
jobid = response.json()["data"]["jobId"]
print(jobid)

{'data': {'jobId': '854dda19-c329-4436-a12a-868c2879b9e9', 'datasetId': 'c379969e-0077-4790-86a4-7bc58dd24101', 'status': 'RUNNING', 'message': 'Training job has been submitted successfully'}}
854dda19-c329-4436-a12a-868c2879b9e9


In [35]:
#Get the status of job

r = my_ber_client.get_training_status(jobid)
pprint(response.json())

{'data': {'datasetId': 'c379969e-0077-4790-86a4-7bc58dd24101',
          'jobId': '854dda19-c329-4436-a12a-868c2879b9e9',
          'message': 'Training job has been submitted successfully',
          'status': 'RUNNING'}}


In [34]:
#Get recently submitted jobs

response_recent = my_ber_client.get_recently_submitted_training_jobs_list()
pprint(response_recent.json())

{'count': 2,
 'jobs': [{'createdAt': '2021-05-31T19:55:12Z',
           'datasetId': '066235d1-3231-41fe-bdab-4d1829b63230',
           'jobId': 'c54e407f-6026-4930-8efc-b9e577274662',
           'modelName': 'string'},
          {'createdAt': '2021-06-01T04:16:25Z',
           'datasetId': 'c379969e-0077-4790-86a4-7bc58dd24101',
           'jobId': '854dda19-c329-4436-a12a-868c2879b9e9',
           'modelName': 'for-client-lib'}]}


## Model

In [None]:
response = my_ber_client.get_trained_model_versions(model_name)
pprint(response)

## Deployment

In [None]:
# Deploy model
response = my_ber_client.deploy_model(model_name, 1) #  model_version)
pprint(response)

## Inference


<!-- This runs inference on all documents in the test set (stratification is done inside DC service and reproduced here).  
We are working on exposing the stratification results so that this cell can be shortend. -->

In [None]:
# post inference job
text = 'Hello, I would like to know the status of the invoice 456789. Regards, John'
# modelName = 'test'
modelVersion = 1
response = my_ber_client.post_inference_job(text, model_name, modelVersion)
pprint(response.json())

In [None]:
## get inference job
# todo: set job id
from pprint import pprint
inference_jobid = '6411de19-9812-4008-b47a-4127bd2a6a6e'
response = my_ber_client.get_inference_job(inference_jobid)
pprint(response)

In [None]:
# post inference job
dataset_id =
model_name =
model_version =
response = my_ber_client.post_batch_inference_job(dataset_id, model_name, model_version)
pprint(response.json())