<h1>Use RCAC to Create Test Data</h1>

<p><strong>Welcome!</strong> This notebook has all of the commands I discuss in the db2Dean.com article about using RCAC to create Test Data plus a few others.  The idea is to create column masks in the production system that when called by any user id used for extracting data for test the data will be automatically changed certain columns without the user having to do anything other than querying the tables.  </p>

<h2 id="dataset">About Db2 Magic Commands</h2>

Db2 Magic commands make it easy to run SQL in cells in your Jupyter Notebook as is without any Python code.  This notebook shows how to enable this feature in Cloud Pak for Data.  Using these examples would probably allow you to use them in Watson Studio outside of CPD as well.  I only show some basic Magic Commands here.  For more information on Db2 Magic Commands and lots of example see https://github.com/IBM/db2-jupyter/

<h2 id="pandas">Get the db2 magic commands notebook from github: <code>db2.ipynb</code></h2>

You must get and run the db2.iyndb notebook before using the magic commands needed for these examples.   

In [1]:
import wget
url = 'https://raw.githubusercontent.com/IBM/db2-jupyter/master/db2.ipynb'
filename = wget.download(url)
filename

'db2 (5).ipynb'

### Run the db2 magic comands notebook

In [2]:
%run "$filename"

Db2 Extensions Loaded.


In [None]:
# If the wget commands above don't work for you, uncomment and try these instead.
# !wget https://raw.githubusercontent.com/IBM/db2-jupyter/master/db2.ipynb
# %run db2.ipynb

### Connect to the Db2 Database
User bob's data will not be masked while user testdata's datra will be masked.

In [None]:
# Create users and set passwords with these commands as root in the linux container
# For Db2 on other operating systems, use that OS's facilities to create users. 

# useradd bob
# echo "password" | passwd --stdin "bob"
# useradd testdata
# echo "password" | passwd --stdin "testdata"

In [88]:
%sql CONNECT RESET
%sql CONNECT TO sample USER db2inst1 USING ibmdb2aa HOST localhost PORT 50000;

Connection closed.
Connection successful.


##### Run a select command

In [None]:
%sql select dbms_utility.get_hash_value(userid, 1,50), dbms_utility.get_hash_value(ssn, 1,50) from db2inst1.patient

##### Create a table of names 
Use the default sample database employee table to create at table of first and last names with a column that is a sequential integer used to reference them for substitutions.

In [40]:
%%sql 
drop table if exists sub_name_list;
create table sub_name_list as
      (select row_number() over() as seq
            , firstnme as firstname
            , lastname
        from employee
      ) with data;

Command completed.


In [53]:
%sql describe table sub_name_list

Unnamed: 0,COLNAME,TYPESCHEMA,TYPENAME,LENGTH,SCALE,NULLABLE
0,SEQ,SYSIBM,BIGINT,8,0,N
1,FIRSTNAME,SYSIBM,VARCHAR,12,0,N
2,LASTNAME,SYSIBM,VARCHAR,15,0,N


In [7]:
%sql select * from sub_name_list fetch first 5 rows only;

Unnamed: 0,SEQ,FIRSTNAME,LASTNAME
0,1,CHRISTINE,HAAS
1,2,MICHAEL,THOMPSON
2,3,SALLY,KWAN
3,4,JOHN,GEYER
4,5,IRVING,STERN


### Show the GOSALESHR.EMPLOYEE TABLE
This table comes with the sample GOSALES tables provided by IBM.
https://www.ibm.com/support/knowledgecenter/SS62YD_4.1.1/com.ibm.sampledata.go.doc/topics/download.html

In [20]:
%sql -a describe table gosaleshr.employee

Unnamed: 0,COLNAME,TYPESCHEMA,TYPENAME,LENGTH,SCALE,NULLABLE
0,EMPLOYEE_CODE,SYSIBM,INTEGER,4,0,N
1,FIRST_NAME,SYSIBM,VARCHAR,75,0,N
2,FIRST_NAME_MB,SYSIBM,VARCHAR,75,0,Y
3,LAST_NAME,SYSIBM,VARCHAR,90,0,Y
4,LAST_NAME_MB,SYSIBM,VARCHAR,90,0,Y
5,DATE_HIRED,SYSIBM,TIMESTAMP,10,6,Y
6,TERMINATION_DATE,SYSIBM,TIMESTAMP,10,6,Y
7,TERMINATION_CODE,SYSIBM,INTEGER,4,0,Y
8,BIRTH_DATE,SYSIBM,TIMESTAMP,10,6,Y
9,GENDER_CODE,SYSIBM,SMALLINT,2,0,N


In [138]:
%sql select * from db2inst1.sub_name_list where seq = 36;

Unnamed: 0,SEQ,FIRSTNAME,LASTNAME
0,36,KIYOSHI,YAMAMOTO


##### Show sample data from the GSDB sample goshare.employee table.


In [16]:
%sql select * from gosaleshr.employee e  fetch first 5 rows only

Unnamed: 0,EMPLOYEE_CODE,FIRST_NAME,FIRST_NAME_MB,LAST_NAME,LAST_NAME_MB,DATE_HIRED,TERMINATION_DATE,TERMINATION_CODE,BIRTH_DATE,GENDER_CODE,WORK_PHONE,EXTENSION,FAX,EMAIL
0,10004,Denis,Denis,SMITH,Pagé,2001-12-11,,150,1960-11-02,0,+33 1 68 94 52 20,3995,+33 1 68 94 56 60,DPage@grtd123.com
1,10005,Élizabeth,Élizabeth,YAMAMOTO,Michel,2003-11-24,,150,1974-03-02,1,+33 1 68 94 52 20,3994,+33 1 68 94 56 60,EMichel@grtd123.com
2,10006,Émile,Émile,MONTEVERDE,Clermont,2006-05-10,,150,1980-07-12,0,+33 1 68 94 52 20,3993,+33 1 68 94 56 60,EClermont@grtd123.com
3,10007,Étienne,Étienne,JONES,Jauvin,2003-10-09,,150,1973-02-16,0,+33 1 68 94 52 20,3992,+33 1 68 94 56 60,EJauvin@grtd123.com
4,10012,Elsbeth,Elsbeth,SCHWARTZ,Wiesinger,2005-03-22,,150,1968-11-05,1,+(49) 40 663 1990,3987,+(49) 40 663 4571,EWiesinger@grtd123.com


##### Tryout Last Name substitution using the built-in <code>get_hash_value</code> procedure. 
Before building the mask, I wanted to try out the built-in hash function and see the value it provides and use it in picking a name to substitute.  By using the same last_name value to be masked in the hash procedure you always get the same number for the same input string.  I use this number to select a row from sub_name_list table based on the SEQ column.  There are 42 rows in the table, so I tell the hash function to always pick anumber between 1 and 42.  


In [148]:
%%sql 
select last_name 
     , (select lastname from db2inst1.sub_name_list s
         where seq = dbms_utility.get_hash_value(e.last_name, 1,42))
     , (select dbms_utility.get_hash_value(e.last_name, 1,42) from db2inst1.sub_name_list s2
          where seq = dbms_utility.get_hash_value(e.last_name, 1,42))
     , birth_date 
  from gosaleshr.employee e
  fetch first 5 rows only

Unnamed: 0,LAST_NAME,LASTNAME,3,BIRTH_DATE
0,Pagé,SMITH,28,1960-11-02
1,Michel,YAMAMOTO,36,1974-03-02
2,Clermont,MONTEVERDE,38,1980-07-12
3,Jauvin,JONES,19,1973-02-16
4,Wiesinger,SCHWARTZ,39,1968-11-05


### Create Roles and grant authroities

 Create the <code>EXTRACT_FOR_TESTDB</code> role. 

In [11]:
%sql create role EXTRACT_FOR_TESTDB

Command completed.


Add the user testdata to the role.

In [19]:
%sql grant role EXTRACT_FOR_TESTDB to user testdata

Command completed.


Grant the authority to select data to the EXTRACT_FOR_TESTDB so any user in it including testdata can select the data.

In [13]:
%%sql 
grant select on table gosaleshr.employee to role EXTRACT_FOR_TESTDB;
grant select on table GOSALESCT.CUST_CRDT_CARD to role EXTRACT_FOR_TESTDB;

Command completed.


Create some other roles tht will be able to see any data in the tables and do the appropriate grants.  

In [34]:
%sql create role SUPER_USER

Command completed.


In [38]:
%sql create role FAIR_USER

Command completed.


In [35]:
%sql grant role SUPER_USER to user bob

Command completed.


In [36]:
%%sql 
grant select, update, delete on table gosaleshr.employee to role SUPER_USER;
grant select, update, delete on table GOSALESCT.CUST_CRDT_CARD to role SUPER_USER;

Command completed.


##### Create a  <code>mask</code> to change the value of last name when anyone int the test data extraction role queries the table.
See notes about the get_hash_value built-in procedure above.

In [None]:
%sql DROP MASK LAST_NAME;
%sql DROP MASK BIRTYDAY;

In [92]:
%%sql
CREATE MASK LAST_NAME ON GOSALESHR.EMPLOYEE FOR
   COLUMN LAST_NAME_MB RETURN
          CASE WHEN 
               VERIFY_ROLE_FOR_USER(SESSION_USER,'EXTRACT_FOR_TESTDB') = 1 
          THEN (select lastname from db2inst1.sub_name_list 
                      where seq = dbms_utility.get_hash_value(LAST_NAME_MB, 1,42)
               )
          ELSE LAST_NAME_MB
          END
ENABLE;

Command completed.


##### Create a  <code>mask</code> to change the value of birth_date when anyone int the test data extraction role queries the table.
This mask changes the birth_data value in the query results to add two days to the day portion of the date, subtract one year from the year and add one to the month. 

In [93]:
%%sql
CREATE MASK BIRTHDAY ON GOSALESHR.EMPLOYEE FOR
   COLUMN BIRTH_DATE RETURN
          CASE WHEN 
               VERIFY_ROLE_FOR_USER(SESSION_USER,'EXTRACT_FOR_TESTDB') = 1 
          THEN BIRTH_DATE + 2 DAYS - 1 YEARS + 1 MONTH
          ELSE BIRTH_DATE
          END
ENABLE;

Command completed.


##### Make the masks active for the employee table.
Until you active the masks they will not be invoked.  

In [94]:
%sql ALTER TABLE GOSALESHR.EMPLOYEE ACTIVATE COLUMN ACCESS CONTROL;

Command completed.


##### Alternate  <code>mask</code> for the Last_name column.
If you wanted everyone who wasn't explicitly allowed to see the last name to get a masked value you would code the mask something like this:

In [10]:
%%sql
CREATE MASK LAST_NAME ON GOSALESHR.EMPLOYEE FOR
   COLUMN LAST_NAME_MB RETURN
          CASE WHEN 
               VERIFY_ROLE_FOR_USER(SESSION_USER,'SUPER_USER') = 1 OR
               VERIFY_ROLE_FOR_USER(SESSION_USER,'FAIR_USER') = 1 
          THEN LAST_NAME_MB
               ELSE (select lastname from db2inst1.sub_name_list s
                      where seq = dbms_utility.get_hash_value(LAST_NAME_MB, 1,42)  
                    )
          END
ENABLE;

Command completed.


##### Use a  <code>User Defined Function</code> to obfuscate last_name.
Create a UDF and call it in the mask instead of coding substitution logic in the mask itself.  In this case, this is not a good pracice. To get the function to work in the mask I had to declare it "DETERMINISTIC",  but it is not deterministic according to the strict definition.  If the state of the sub_name_list table changed then the return value could be different.  

In [152]:
%%sql -d
CREATE OR REPLACE FUNCTION SUB_LAST_NAME(LN VARCHAR(90))
      RETURNS VARCHAR(90)
      READS SQL DATA  NO EXTERNAL ACTION DETERMINISTIC SECURED
  BEGIN
    DECLARE SUBNAME VARCHAR(90);
    DECLARE EXIT HANDLER FOR SQLEXCEPTION RETURN(FALSE);
    SET SUBNAME = (select lastname from db2inst1.sub_name_list s
                    where seq = dbms_utility.get_hash_value(LN, 1,42));
    SET SUBNAME = CASE WHEN SUBNAME IS NOT NULL THEN SUBNAME ELSE 'DB2DEAN' END;
    RETURN(SUBNAME);
END

Command completed.


In [153]:
%%sql
CREATE MASK LAST_NAME ON GOSALESHR.EMPLOYEE FOR
   COLUMN LAST_NAME_MB_MB RETURN
          CASE WHEN 
               VERIFY_ROLE_FOR_USER(SESSION_USER,'SUPER_USER') = 1 OR
               VERIFY_ROLE_FOR_USER(SESSION_USER,'FAIR_USER') = 1 
          THEN LAST_NAME_MB
          ELSE sub_last_name(LAST_NAME_MB)
          END
ENABLE;


Command completed.


In [None]:
%sql drop mask last_name;
%sql DROP FUNCTION SUB_LAST_NAME

In [None]:
%sql select db2inst1.sub_last_name('DB2DEAN') from sysibm.sysdummy1

### Super user, Bob, selects the employee table and sees the unmasked Last Name MB and birthday

In [95]:
%sql CONNECT RESET
%sql CONNECT TO sample USER bob USING password HOST localhost PORT 50000;
%sql select LAST_NAME, LAST_NAME_MB, BIRTH_DATE from gosaleshr.employee   fetch first 5 rows only

Connection closed.
Connection successful.


Unnamed: 0,LAST_NAME,LAST_NAME_MB,BIRTH_DATE
0,Pagé,Pagé,1960-11-02
1,Michel,Michel,1974-03-02
2,Clermont,Clermont,1980-07-12
3,Jauvin,Jauvin,1973-02-16
4,Wiesinger,Wiesinger,1968-11-05


### Test data user, testdata, selects the employee table and sees the maksed Last Name MB and birthday

In [96]:
%sql CONNECT RESET
%sql CONNECT TO sample USER testdata USING password HOST localhost PORT 50000;
%sql select LAST_NAME, LAST_NAME_MB, BIRTH_DATE from gosaleshr.employee   fetch first 5 rows only

Connection closed.
Connection successful.


Unnamed: 0,LAST_NAME,LAST_NAME_MB,BIRTH_DATE
0,Pagé,SMITH,1959-12-04
1,Michel,YAMAMOTO,1973-04-04
2,Clermont,MONTEVERDE,1979-08-14
3,Jauvin,JONES,1972-03-18
4,Wiesinger,SCHWARTZ,1967-12-07


In [19]:
%sql select birth_date, birth_date + 2 DAYS - 1 YEARS from gosaleshr.employee   fetch first 5 rows only

Unnamed: 0,BIRTH_DATE,2
0,1960-11-02,1959-11-04
1,1974-03-02,1973-03-04
2,1980-07-12,1979-07-14
3,1973-02-16,1972-02-18
4,1968-11-05,1967-11-07


In [41]:
%sql -a describe table GOSALESCT.CUST_CRDT_CARD

Unnamed: 0,COLNAME,TYPESCHEMA,TYPENAME,LENGTH,SCALE,NULLABLE
0,CUST_CC_ID,SYSIBM,INTEGER,4,0,N
1,CUST_CODE,SYSIBM,INTEGER,4,0,Y
2,CRDT_METHOD_CODE,SYSIBM,INTEGER,4,0,Y
3,CUST_CC_NUMBER,SYSIBM,CHARACTER,57,0,Y
4,CUST_CC_SERV_CODE,SYSIBM,INTEGER,4,0,Y
5,CUST_CC_EXP_DATE,SYSIBM,TIMESTAMP,10,6,Y


In [40]:
%sql select * from GOSALESCT.CUST_CRDT_CARD fetch first 5 rows only;

Unnamed: 0,CUST_CC_ID,CUST_CODE,CRDT_METHOD_CODE,CUST_CC_NUMBER,CUST_CC_SERV_CODE,CUST_CC_EXP_DATE
0,10000,131072,29,5298765461884536 ...,3100,2013-01-01
1,10001,131073,28,5598765426519067 ...,3627,2011-08-01
2,10002,131075,29,5498765452595818 ...,8662,2013-03-01
3,10003,131076,25,4998765444282028 ...,636,2009-08-01
4,10004,131077,29,9998765460293064 ...,784,2012-06-01


##### Create a  <code>mask</code> to change the value of credit card number when anyone in the test data extraction role queries the table.
In the case we create a function to do the work and call it in the mask.  The logic is that we will add 4 to the ninth digit of the credit card number.  If that addition makes the number two digits, subtract 9 from it.  

In [61]:
%%sql -d
CREATE OR REPLACE FUNCTION MASK_CARD(IN_CARD CHAR(57))
      RETURNS CHAR(57)
      CONTAINS SQL NO EXTERNAL ACTION DETERMINISTIC SECURED
  BEGIN
    DECLARE OUT_CARD CHAR(57);
    DECLARE CD CHAR(1);
    DECLARE CI INTEGER;
    DECLARE EXIT HANDLER FOR SQLEXCEPTION RETURN(FALSE);
    SET CD = SUBSTR(IN_CARD,9,1);
    SET CI = INT(CD);
    SET CI = CI + 4;
    IF CI > 9 THEN SET CI=CI-9; END IF;
    SET CD = CHAR(CI);
    SET OUT_CARD = SUBSTR(IN_CARD,1,8) || CD || SUBSTR(IN_CARD,10,48) ;
    RETURN(OUT_CARD);
END


Command completed.


In [63]:
%%sql
CREATE MASK CRED_CARD ON GOSALESCT.CUST_CRDT_CARD FOR
   COLUMN CUST_CC_NUMBER RETURN
          CASE WHEN 
               VERIFY_ROLE_FOR_USER(SESSION_USER,'EXTRACT_FOR_TESTDB') = 1 
          THEN MASK_CARD(CUST_CC_NUMBER)
          ELSE CUST_CC_NUMBER
          END
ENABLE;

Command completed.


In [70]:
%sql ALTER TABLE GOSALESCT.CUST_CRDT_CARD ACTIVATE COLUMN ACCESS CONTROL;

Command completed.


### Testdata tries to call the function explicitly to mask the credit card number
He doesn't have authority to do that,but notice that when the mask calls the function that authority isn't needed

In [8]:
%sql CONNECT RESET
%sql CONNECT TO sample USER testdata USING password HOST localhost PORT 50000;
%sql select cust_cc_number, db2inst1.MASK_CARD(cust_cc_number) from GOSALESCT.CUST_CRDT_CARD fetch first 5 rows only

Connection closed.
Connection successful.


Command completed.


In [104]:
%sql CONNECT RESET
%sql CONNECT TO sample USER db2inst1 USING ibmdb2aa HOST localhost PORT 50000;
%sql select cust_cc_number, db2inst1.MASK_CARD(cust_cc_number) from GOSALESCT.CUST_CRDT_CARD fetch first 5 rows only

Connection closed.
Connection successful.


Unnamed: 0,CUST_CC_NUMBER,2
0,5298765461884536 ...,5298765411884536 ...
1,5598765426519067 ...,5598765466519067 ...
2,5498765452595818 ...,5498765492595818 ...
3,4998765444282028 ...,4998765484282028 ...
4,9998765460293064 ...,9998765410293064 ...


In [9]:
%sql CONNECT RESET
%sql CONNECT TO sample USER bob USING password HOST localhost PORT 50000;

Connection closed.
Connection successful.


In [12]:
%sql -a select CUST_CC_ID, cust_cc_number from GOSALESCT.CUST_CRDT_CARD fetch first 5 rows only

Unnamed: 0,CUST_CC_ID,CUST_CC_NUMBER
0,10000,5298765461884536 ...
1,10001,5598765426519067 ...
2,10002,5498765452595818 ...
3,10003,4998765444282028 ...
4,10004,9998765460293064 ...


In [102]:
%sql CONNECT RESET
%sql CONNECT TO sample USER testdata USING password HOST localhost PORT 50000;

Connection closed.
Connection successful.


In [103]:
%%sql 
select CUST_CC_ID, cust_cc_number
  from GOSALESCT.CUST_CRDT_CARD 
    fetch first 5 rows only

Unnamed: 0,CUST_CC_ID,CUST_CC_NUMBER
0,10000,5298765411884536 ...
1,10001,5598765466519067 ...
2,10002,5498765492595818 ...
3,10003,4998765484282028 ...
4,10004,9998765410293064 ...


In [59]:
%sql select MASK_CARD('123453789012345                                          ') from sysibm.sysdummy1

Unnamed: 0,1
0,123457789012345 ...


In [13]:
%sql connect reset;

Connection successful.
