Data Security & Privacy related laws and regulations have become more stringent and at the same time businesses are expected to open for ecosystem partners. This makes data governance very critical to avoid litigation, loss of competitive position and trust.
Business transaction with sensitive information is exchanged within the enterprise, with customers, and ecosystem partners. Data such as name, location, contact details, date of birth, credit card number, financial details, not limited, needs to be handled with sensitivity. A data governance framework plays a critical role to enforce security and privacy at the same time be an enabler for business to achieve their strategy.
There are multiple business scenarios wherein sensitive data needs to be captured or retrieved. Describing few important ones:
- For customer support application, customer orders are being queried wherein customer mobile number, credit card number and location needs to be obscured or masked.
- A machine learning application needs to perform an analysis based on sensitive information like credit card number. In such cases, the machine learning application needs to be provided obfuscated data for analysis and model building.
This code pattern demonstrates a methodology to provide a read-only view of data with senstive information masked for an application.
For this purpose, an insurance business scenario has been taken. There are two applications provided:
- An insurance portal application
- A chatbot application
A customer registers on the insurance portal. During registration, the customer provides mobile number, address and e-mail. After registration, the customer can login and purchase insurance policies. The customer supplies credit card details for purchasing the policy. After an insurance policy has been purchased, a customer can query policy details with next premium due information on the chatbot. The user will be able to query this information without logging in. This requires sensitive information in the policy to be masked win the display. In this code pattern, the first 12 digits of the credit card used to purchase the policy will be masked and displayed with other details in the chatbot application.
The Insurance Portal Application
owns the policy data. The Chatbot Application
will have a read-only access of the data with data protection policies
applied on the data specified in the Data Governance
framework.
In this code pattern, you will learn how to:
- Set up data assets for governance in the Watson Knowledge Catalog
- Create data categories, classes, business terms and data protection rules for the data assets
- Create virtualized view of the data on Watson Query with data policies enforced
- Create a chatbot aapplication using Watson Assistant that consumes the read-only data with sensitive information masked from Watson Query
Security Verify has been used to implement authentication for the insurance application.
- Create tables in Db2. The Db2 connection and the tables(as
Data Asset
) are added to theWatson Knowledge Catalog(WKC)
. The data policies are configured for the data assets inWKC
. - Db2 is added as a data source in Watson Query. The needed tables are virtualized and a
View
is created by joining the virtualized tables. - The Watson Query virtualized tables and view are published to
WKC
. The data policies are configured for the data assets inWKC
. - User registers on the
Insurance Portal
. This creates an user profile onSecurity Verify
. User logs into theInsurance Portal
with the newly created credentials. - The credentials are validated by
Security Verify
and request is re-directed to the application. - User purchases an
Insurance Policy
. The policy information is stored in theDb2
database. - User accesses the chatbot on the
Insurance Portal
to query policy details. - The request is sent to
Watson Assistant
. Watson Assistant
invokes an API on theQuery App
to get policy details.- The
Query App
accesses theWatson Query
with collaborator credentials.Watson Query
returns the policy details data with data policies applied. The returned results are displayed on the chatbot to the user.
- IBM Cloud account
- IBM Cloud CLI
- Red Hat OpenShift instance
- Git client
- The OpenShift CLI (oc)
- Cloud Pak For Data
- IBM Security Verify
- Clone the repository
- Create IBM Cloud Services instances
- Configure Security Verify
- Provide access for collaborators to Cloud Pak for Data
- Set up and configure chatbot application
- Deploy Insurance Portal Application
- Configure Watson Query
- Configure Watson Knowledge Studio
- Access the Application
From a command terminal, run the below command to clone the repo:
git clone https://github.com/IBM/data-governance-mask-sensitive-data
In the code pattern, we will be using Cloud Pak for Data.
Cloud Pak For Data is available in two modes -
2.1.1 For fully managed service, click here and follow the steps.
2.1.2 For self managed software, click here and follow the steps.
Go to the Watson Knowledge Studio console. Select View All Catalogs
on the hamburger menu on the top left.
Click on Create Catalog
.
Enter a name for the catalog (say InsClCatalog
). Enter a description. Select Enforce data policies
. Click Create
.
Click Security Verify to sign up for Security Verify. After you sign up for an account, the account URL (https://[tenant name].verify.ibm.com/ui/admin) and password is sent in an email.
Note: If you are using a Cloud Pak For Data as a self managed software, the same cluster can be used for application deployment.
Go to this link to create an instance of OpenShift cluster.
Make a note of the Ingress Subdomain URL
:
Please follow the instructions here to configure Security Verify
.
For fully managed service, click here and follow the steps.
For self managed software, click here and follow the steps.
As detailed in architecture diagram, the chatbot uses Cloud Functions (or serverless functions) to call external APIs. So the chatbot side has three components aka the chatbot itself, Cloud functions to call external APIs and the application which hosts external APIs.
We will deploy application that hosts external APIs to connect to Watson Query and read insurance details of users.
- From a terminal, login to your cluster using the oc login command
- Change directory to <cloned repo parent folder>/sources/chatbot/db-rest-app/src/main/resources.
- In a file editor open the file
env.props
. - Replace
HOSTNAME
,PORT
andDB_NAME
with the host, port and database name that you noted during Watson Query creation in this section for fully managed service mode, and here for self managed software mode. - For a fully managed Cloud Pak for Data service - as noted in this section, update value for API_KEY. After updating it should look like
HOSTNAME=xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx.xxxxxxxxxxxxxxxxxxx.databases.appdomain.cloud
PORT=XXXXX
API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
DB_NAME=XXXXX
Note: If you are using the self managed cluster, enter the
user id
andpassword
of theData Collaborator
user as the credentials for accessing the Watson Query. The database ODBC string must be changed to take username, password instead of API Key.
- Ensure you are in
governance
project. If not change to governance projectoc project governance
. - Change directory to <cloned repo parent folder>/sources/chatbot/db-rest-app
- Run the following commands to deploy the application on to cluster
oc new-app . --name=dbconnection --strategy=docker
oc start-build dbconnection --from-dir=.
Monitor the logs using
oc logs -f bc/dbconnection
This may take a few minutes.
When done, check the status of pods. You may use oc get pods
command.
Expose the application for it to be accessed
oc expose svc/dbconnection
oc get routes
Make a note of the HOST
. It will be used in Cloud functions.
- Login to IBM Cloud dashboard.
- In the Navigation Menu, click
Functions
->Actions
. - Click
Create
button. - Click on
Action
tile. - In
Action Name
text field, provide a name for the action such asMake DB Calls
. LeaveEnclosing Package
asDefault Package
andRuntime
asNode.js 16
. ClickCreate
. - Replace the default code with the code provided in /chatbot/cloud-functions/Cloud Function.js.
- In the code, on line number 14, you will see that it requires hostname for the REST application that you deployed in step 5.1. Replace the hostname with the route that was notes in step 5.1.
- Click
Save
. - Click
Endpoints
link on the left hand side of the screen. - Under
Web Action
enable the checkboxEnable as Web Action
. ClickSave
. - Under
REST API
, make a note of the link with headingURL
. We will need to use this for chatbot webhook settings.
- Click on the Watson Service instance link on your cloud resources and click
Launch Watson Assistant
. - In the Watson Assistant home page, click
skills
option on the left menu options.
If you do not see skills icon, then the Watson assistant view could be for the new experience UI. For this code pattern, we will use the classic view and hence switch to classic view by navigating to
manage
(user icon on top right corner) and clickingSwitch to classic experience
.
- Click
Create skill
button, then clickDialog skill
tile. ClickNext
. - Select
Upload skill
tab. Drag and drop or browse to select the file in /sources/chatbot/chatbot resources/ecomm-skill-dialog.json. ClickUpload
. - On the left navigation links click
Options
->Webhooks
on the left hand navigation. - In
URL
text field, enter the REST API endpoint as noted in section 5.2 and append it with .json. It should look something like this
https://eu-gb.functions.appdomain.cloud/api/v1/web/.../default/Make%20DB%20Calls.json
- Click
Assistants
icon on the top left corner of Watson Assistant screen - Click
Create assistant
. - Give a name for your assistant, optionally enter a description and click
Create assistant
. - On the just created Assistant screen, click the
Preview
button. Make a note ofintegrationID
,serviceInstanceID
andregion
from the link provided under the sectionShare this link
. - Close the window using the
x
button placed just below the user icon on the top right corner. - In Assistants page, under
Integrations
section (bottom right corner of the screen), clickIntegrate web chat
. - Click on
Create
button. - Click on
Embed
tab. Copy and save thescript
in a text file. In this script, you will need to updateintegrationID
,serviceInstanceID
andregion
as noted from Preview link earlier. - This code snippet will be used in the Insurnace Portal UI.
Login to your OpenShift cluster from command line
Login to your OpenShift cluster. Access the IBM Cloud Dashboard > Clusters (under Resource Summary) > click on your OpenShift Cluster > OpenShift web Console
. Click the dropdown next to your username at the top of the OpenShift web console and select Copy Login Command. Select Display Token and copy the oc login command from the web console and paste it into the terminal on your workstation. Run the command to login to the cluster using oc
command line.
6.1.1 Changes to server.xml
In the cloned repo folder - go to src/main/liberty/config
. Open server.xml
.
Make the below changes for the openidConnectClient
element and save the file:
- Replace {{ingress-sub-domain}} with
Ingress subdomain
of the OpenShift cluster. - Replace {{clientId}} and {{clientSecret}} with the Client ID and Client secret noted on the
Sign-on
tab of Security Verify. - Replace {{tenantId}} with the tenant id of Security Verify noted at the time of creation.
<openidConnectClient id="home"
signatureAlgorithm="RS256"
httpsRequired="false"
redirectToRPHostAndPort="http://ins-portal-app-governance.{{ingress-sub-domain}}/insportal/app"
clientId="{{clientId}}"
clientSecret="{{clientSecret}}"
authorizationEndpointUrl="https://{{tenantId}}.verify.ibm.com/v1.0/endpoint/default/authorize"
tokenEndpointUrl="https://{{tenantId}}.verify.ibm.com/v1.0/endpoint/default/token"></openidConnectClient>
6.1.2 Changes to db.config
In the cloned repo folder - go to src/main/resources
. Open db.config
.
Replace the {{host}} and {{port}} with the host and port you noted during Db2 credentials creation. Enter the userid, password and schema with the username, password and username(in uppercase). Save the file.
Note: the schema should be in uppercase of the username noted in Db2 credentials.
jdbcurl=jdbc:db2://{{host}}:{{port}}/bludb:sslConnection=true;
userid=
password=
schema=
6.1.3 Changes to verify.config
In the cloned repo folder - go to src/main/resources
. Open verify.config
.
Make the below changes and save the file:
- Replace {{tenant-id}} with the tenant id of Security Verify noted at the time of creation.
- For
clientId
andclientSecret
enter the Client ID and Client secret noted on theSign-on
tab of Security Verify. - For
apiClientId
andapiClientSecret
enter the Client ID and Client secret noted on theAPI Access
tab of Security Verify.
introspectionUrl=https://{{tenant-id}}.verify.ibm.com/v1.0/endpoint/default/introspect
tokenUrl=https://{{tenant-id}}.verify.ibm.com/v1.0/endpoint/default/token
userInfoUrl=https://{{tenant-id}}.verify.ibm.com/v1.0/endpoint/default/userinfo
clientId=
clientSecret=
usersUrl=https://{{tenant-id}}.verify.ibm.com/v2.0/Users
apiClientId=
apiClientSecret=
6.1.4 Embed chatbot on the home page of the Insurance Portal Application
In the cloned repo folder - go to src/main/resources
. Open home.html
.
Embed the chatbot script element before the closingbody
tag.
Note: Replace the
integration ID
,region
andinstance ID
of the Watson Assistant deployed in previous section.
<script>
window.watsonAssistantChatOptions = {
integrationID : "fxxxxeb", // The ID of this integration.
region : "eu-gb", // The region your integration is hosted in.
serviceInstanceID : "bxxxxx4", // The ID of your service instance.
onLoad : function(instance) {
instance.render();
}
};
setTimeout(function() {
const t = document.createElement('script');
t.src = "https://web-chat.global.assistant.watson.appdomain.cloud/versions/"
+ (window.watsonAssistantChatOptions.clientVersion || 'latest')
+ "/WatsonAssistantChatEntry.js"
document.head.appendChild(t);
});
</script>
On the terminal window, got to the repository folder that we cloned earlier.
Go to the directory - sources/ins-portal-app/src/main/java/com/example/legacy/insportal/
.
Open the file InsuranceAppEndpoint.java
.
Replace the placeholder {{ingress-sub-domain}}
with the ingress sub domain of the OpenShift cluster you noted earlier. Save the file.
private static String ingressSubDomain = "ins-portal-app-governance.{{ingress-sub-domain}}/";
Now change directory to /sources/ins-portal-app
in the cloned repo folder.
Run the following commands to deploy Insurance Portal application
.
oc new-project governance
mvn clean install
oc new-app . --name=ins-portal-app --strategy=docker
oc start-build ins-portal-app --from-dir=.
oc logs -f bc/ins-portal-app
oc expose svc/ins-portal-app
Ensure that the application is started successfully using the command oc get pods
. Also make a note of the route using the command oc get routes
.
In this step, we will create two tables in the Db2 database - CUSTOMER and ORDERS table.
Invoke the URL - http://ins-portal-app-governance.{{IngressSubdomainURL}}/insportal/app/setupdb
Note: Replace {{IngressSubdomainURL}} with
Ingress subdomain
of the OpenShift cluster.
Login to Cloud Pak for Data
with Data Owner
credentials. Go to the Watson Query console.
Select Service settings
in the dropdown menu. Click on Governance
tab. Enable Enforce policies within Data Virtualization
and Enforce publishing to a governed catalog
.
Select Data Sources
in the dropdown menu. Click on Add Connection
. Select Db2 on Cloud
if the instance is on IBM Cloud. Enter the Db2
credentials that you noted earlier, and create the connection.
Select Schemas
in the dropdown menu. Click on New schema
with a name say INSSCHEMA
.
Select Schemas
in the dropdown menu. Select the CUSTOMER
and ORDER
tables. Add to Cart. Go to the cart, select Virtualized data
option and click on Virtualize
as shown.
Select Virtualized data
in the dropdown menu. Select CUSTOMER
and ORDERS
table. Click on Join
. In the next page, create a joiin key from CUST_ID
of CUSTOMER
table to CUST_ID
of ORDERS
table.
On the next page, select Virtualized data
option. Click Create View
.
Select Virtualized data
in the dropdown menu. For the CUSTOMER_ORDERS_VIEW
select Manage Access
. On the access page, click on Grant Access
and provide access to the Data Collaborator
user.
Login to Cloud Pak for Data
with Data Owner
credentials. Go to the Watson Query console.
Click View All Catalogs
on the left hamburger menu. Click on the catalog that you created earlier. All the Watson Query Data Assets should appear as shown.
Click on the INSSCHEMA.CUSTOMER
data asset. Click on the Asset
tab.
Enter the connection details of Watson Query noted earlier.
If it is a fully managed Cloud Pak for Data service:
- On the IBM Cloud Dashboard, go to
Manage
and selectAccess (IAM
). Create an IBM Cloud API Key. Note the API key. - On the
Asset
tab, select API Key as the mode of authentication. - Enter the API key noted in the earlier step, and click
Connect
.
If it is a self managed software for Cloud Pak for Data:
- Enter the
Data Owner
credentials for Cloud Pak for Data.
The data should now be visible on the Asset
tab:
For each of the assets - INSSCHEMA.CUSTOMER
,INSSCHEMA.ORDERS
and INSSCHEMA.CUSTOMER_ORDERS_VIEW
, go to the Profile
tab and click Create Profile
.
Click View All Catalogs
on the left hamburger menu. Click on Add category
and select New category
.
Create a category for personal financial information. Enter a name
and click Create
.
Click Data classes
on the left hamburger menu. Click on Add data class
and select New data class
.
Enter details as shown and click Create
.
This will be saved as Draft
. Click Publish
to publish the data class.
Click Business terms
on the left hamburger menu. Click on Add business term
and select New business term
.
Enter details as shown and click Create
.
This will be saved as Draft
. Click Publish
to publish the business term.
Open the Asset
tab for ORDERS
table. Assign the data class CC_NUM_CLASS
created earlier to the credit card information columns.
Open the Asset
tab for CUSTOMER
table. Verify the data class assignment for mobile and email columns.
Click Rules
on the left hamburger menu. Click on Add rule
and select New rule
.
Next select Data protection rule
. Configure the rule as shown. This rule will mask the credit card data for collaborators. Click on Create
.
Similarly, you can add rules for masking mobile, email and credit card expiry information.
Login to Watson Query with Data Owner
credentials. Preview
the CUSTOMER_ORDERS_VIEW
.
Login to Watson Query with Data Collaborator
credentials. Preview
the CUSTOMER_ORDERS_VIEW
.
In the next section, let us access the application and see the data privacy policies enforced for the chatbot.
Note: Please specify a valid email on the
Registration
page.Security Verify
will send the initial password to the specified email address after registration.