# Business Entity Recognition Demo

This notebook is designed to demonstrate how easy it is to use the SAP AI Business Services - Business Entity Recognition service for classification tasks. In this demo, we train a model and evaluate its performance on a small example dataset.

For the demo, we prepared this Jupyter Notebook which demonstrates the use of this client library to invoke the most important functions of the Business Entity Recognition REST API. 

## Fetch python module and repo containing example dataset

This notebook requires the python package containing the client and a dataset to train a model on. Both are fetched in the cell below.

An example dataset is provided in the repo, you can exlpore the structure of the dataset required [here](https://github.wdf.sap.corp/i329525/BER-Client/tree/master/examples/data).

## Settings

The settings under `Environment specific configuration` require a valid service key for the Business Entity Recognition service on SAP Cloud Plattform.

The keys in the service key needed here are named exactly as the variables, specifically:
- url: The URL of the service deployment provided in the outermost hierachy of the service key json file
- uaa_url: The URL of the UAA server used for authentication provided in the __uaa__ of the service key json file
- uaa_clientid: The clientid used for authentication to the UAA server provided in the __uaa__ of the service key json file
- uaa_clientsecret: The clientsecret used for authentication to the UAA server provided in the __uaa__ of the service key json file

For the `Model specific configuration` the parameters are explained by a comment below.

# Environment specific configuration
url = ""
uaa_url = ""
uaa_clientid = ""
uaa_clientsecret = ""

# Model specific configuration
model_name = "" # choose an arbitrary model name for the model trained here, will be assigned to the trained model for identification purposes
dataset_folder = "data" # should point to (relative or absolute) path containing dataset

In [1]:
# update working directory path

import os

os.chdir('../')

print(os.getcwd())

import pathlib
pathlib.Path().absolute()

/Users/i329525/OneDrive - SAP SE/AI-BUS/git-clones/pyber


PosixPath('/Users/i329525/OneDrive - SAP SE/AI-BUS/git-clones/pyber')

In [2]:
import pathlib
pathlib.Path().absolute()

PosixPath('/Users/i329525/OneDrive - SAP SE/AI-BUS/git-clones/pyber')

## Initialize Demo

In [10]:
from sap_ber_client import ber_api_client
from pprint import pprint

In [12]:
import importlib
# import sap_ber_client.ber_api_client

importlib.reload(ber_api_client)

<module 'sap_ber_client.pyber' from '/Users/i329525/OneDrive - SAP SE/AI-BUS/git-clones/pyber/pyber/pyber.py'>

In [11]:
# Instaniate object used to communicate with DC REST API
# my_ber_client = pyber.Pyber(url, uaa_clientid, uaa_clientsecret, uaa_url)
url = 'https://ner-api-hardik.cfapps.sap.hana.ondemand.com/'
uaa_clientid = 'sb-179547c6-591a-44be-98e6-dc8e914971a5!b12302|na-9de0a1f0-0475-4798-bcf5-06158dae107d!b12302'
uaa_clientsecret = 'G9hYEpw94TQ7B3FwWZX6CguJp88='
uaa_url = 'https://ner-fire.authentication.sap.hana.ondemand.com'

my_ber_client = ber_api_client.BER_API_Client(url, uaa_clientid, uaa_clientsecret, uaa_url)

print(my_ber_client.base_url)

https://ner-api-hardik.cfapps.sap.hana.ondemand.com/api/v1/


## Display access token

In [5]:
# Token can be used to interact with e.g. swagger UI to explore BER API
print(my_ber_client.session.headers)
print("\nYou can use this token to Authorize here and explore the API via Swagger UI: \n{}api/v1/".format(url))

{'Authorization': 'Bearer eyJhbGciOiJSUzI1NiIsImprdSI6Imh0dHBzOi8vbmVyLWZpcmUuYXV0aGVudGljYXRpb24uc2FwLmhhbmEub25kZW1hbmQuY29tL3Rva2VuX2tleXMiLCJraWQiOiJkZWZhdWx0LWp3dC1rZXktMTA1OTAwOTg1NCIsInR5cCI6IkpXVCJ9.eyJqdGkiOiI2ZmM2ZDljZGUzMWY0MTY4OTQ0ZGY5NjllYjJkMjhiZiIsImV4dF9hdHRyIjp7ImVuaGFuY2VyIjoiWFNVQUEiLCJzdWJhY2NvdW50aWQiOiIxMzRlNzlmYS1hOTMzLTQ2ZjMtYjk5YS03NmM5NWY3YzdkNjQiLCJ6ZG4iOiJuZXItZmlyZSIsInNlcnZpY2VpbnN0YW5jZWlkIjoiMTc5NTQ3YzYtNTkxYS00NGJlLTk4ZTYtZGM4ZTkxNDk3MWE1In0sInN1YiI6InNiLTE3OTU0N2M2LTU5MWEtNDRiZS05OGU2LWRjOGU5MTQ5NzFhNSFiMTIzMDJ8bmEtOWRlMGExZjAtMDQ3NS00Nzk4LWJjZjUtMDYxNThkYWUxMDdkIWIxMjMwMiIsImF1dGhvcml0aWVzIjpbInVhYS5yZXNvdXJjZSJdLCJzY29wZSI6WyJ1YWEucmVzb3VyY2UiXSwiY2xpZW50X2lkIjoic2ItMTc5NTQ3YzYtNTkxYS00NGJlLTk4ZTYtZGM4ZTkxNDk3MWE1IWIxMjMwMnxuYS05ZGUwYTFmMC0wNDc1LTQ3OTgtYmNmNS0wNjE1OGRhZTEwN2QhYjEyMzAyIiwiY2lkIjoic2ItMTc5NTQ3YzYtNTkxYS00NGJlLTk4ZTYtZGM4ZTkxNDk3MWE1IWIxMjMwMnxuYS05ZGUwYTFmMC0wNDc1LTQ3OTgtYmNmNS0wNjE1OGRhZTEwN2QhYjEyMzAyIiwiYXpwIjoic2ItMTc5NTQ3YzYtNTkxY

## Create Dataset for training of a new model

In [13]:
# Create Training dataset
response = my_ber_client.create_dataset()
pprint(response)

JSONDecodeError: Extra data: line 1 column 5 (char 4)

In [12]:
print(str(response))

<Response [404]>


In [24]:
training_dataset_id = 'ecf5b4c6-9a81-476d-896b-c94f03d5809f'
print(training_dataset_id)

ecf5b4c6-9a81-476d-896b-c94f03d5809f


In [34]:
# Upload training documents to the dataset from training directory
import json
dataset_folder = '/Users/i329525/OneDrive - SAP SE/AI-BUS/git-clones/pyber/examples/data/english_training_dataset_annotated.json'
print("Uploading training documents to the dataset")
response = my_ber_client.upload_document_to_dataset(training_dataset_id, dataset_folder)
print("Finished uploading training documents to the dataset")
pprint(response)

Uploading training documents to the dataset
Finished uploading training documents to the dataset


In [25]:
# Pretty print the dataset statistics
print("Dataset statistics")
dataset_stats = my_ber_client.get_dataset(training_dataset_id)
pprint(dataset_stats)

Dataset statistics
{'data': {'createdAt': '2020-09-13T06:56:45',
          'datasetId': 'ecf5b4c6-9a81-476d-896b-c94f03d5809f',
          'description': 'local test',
          'documentCount': 1}}


## Training

In [6]:
model_name = "for-client-lib"

In [None]:
# Train the model
    
print("Start training job from model with modelName {}".format(model_name))
response = my_ber_client.train_model(model_name, training_dataset_id)
print(response)

In [None]:
jobid = '5526b15f-c8c6-4690-8f1d-409353f2730c'
print(jobid)

In [None]:
r = my_ber_client.get_training_status(jobid)
pprint(response)

## Model

In [44]:
response = my_ber_client.get_trained_model_versions(model_name)
pprint(response)

{'data': {'modelName': 'for-client-lib', 'modelType': 'customModel', 'count': 1, 'versions': [{'modelVersion': 1, 'metadata': {'capabilities': [{'entity': 'Amount'}, {'entity': 'Payment Due date'}, {'entity': 'Invoice Reference No'}, {'entity': 'Account No'}], 'accuracy': 0.9786}, 'createdAt': '2020-09-13T13:20:12.927Z', 'updatedAt': '2020-09-13T13:20:22.651Z'}]}}


## Deployment

In [None]:
# Deploy model
response = my_ber_client.deploy_model(model_name, 1) #  model_version)
pprint(response)

## Inference


<!-- This runs inference on all documents in the test set (stratification is done inside DC service and reproduced here).  
We are working on exposing the stratification results so that this cell can be shortend. -->

In [9]:
# post inference job
text = 'Hello, I would like to know the status of the invoice 456789. Regards, John'
# modelName = 'test'
modelVersion = 1
response = my_ber_client.post_inference_job(text, model_name, modelVersion)
pprint(response.json())

{'data': {'id': '6411de19-9812-4008-b47a-4127bd2a6a6e',
          'message': 'Inference job has been submitted',
          'modelName': 'for-client-lib',
          'modelVersion': 1,
          'status': 'PENDING'}}


In [15]:
## get inference job
# todo: set job id
from pprint import pprint
inference_jobid = '6411de19-9812-4008-b47a-4127bd2a6a6e'
response = my_ber_client.get_inference_job(inference_jobid)
pprint(response)

{'data': {'errors': {'error_type': 'custom',
                     'message': 'Check all the files and model assets are '
                                'exist in the model directory'},
          'id': '6411de19-9812-4008-b47a-4127bd2a6a6e',
          'status': 'FAILED'}}
