# Query Data Virtualization using the REST API

The following notebook shows some examles of the same queries that I list in the September 2021 db2dean.com article on how to use the Data Virtualization REST API for querying virtual tables.  The difference is that the article calls the REST API using cURL and this notebook uses Python.


You can find more information about the Db2/DV REST API at: https://www.ibm.com/support/producthub/db2/docs/content/SSEPGG_11.5.0/com.ibm.db2.luw.admin.rest.doc/doc/c_rest.html.

This notebook is based on the one Peter Kohlmann provides in his Jupyter notebook on GitHub and uses much of the descriptive text used there.  He also provides many more examples that you can see here:  https://github.com/Db2-DTE-POC/CPDDVLAB/blob/master/Bonus%20Lab%20-%20Db2%20RESTful%20Endpoint%20Service.ipynb

### Import the required programming libraries
The requests library is the minimum required by Python to construct RESTful service calls. The Pandas library is used to format and manipulate JSON result sets as tables. 

In [1]:
import requests
import pandas as pd

## Authenticate to the End Point and Get a Token used for Querying Data
So that you don't have to send your user id and password with every query, the REST endpoint will allow you to call it once and give  you a token to use in the queries instead.  The next set of steps will configure and call an API giving the End Point your Data Virtualization user id, pass word and other information to get a token that can be used to authenticate when you run the API go query the database.  

### Create the Header File required for getting an authetication token
The RESTful call to the RESTful Endpoint service is contructed and transmitted as JSON. The first part of the JSON structure is the headers that define the content tyoe of the request.

Please note that you must configure the REST Endpoint as described in my Db2 Rest Endpoint article unless you are using the DV Endpoint deployed in your Cloud Pak for Data cluster.  In this case I am using my own endpoint an not the one in the CPD cluster.  http://www.db2dean.com/Previous/Db2RestEndpoint.html

In [2]:
headers = {
  "content-type": "application/json"
}

### RESTful Host
Define the host and port of the end point in a URL.    In this case I'm using my own endpoint

In [3]:
Db2RESTful = "http://192.168.0.14:50050"  

### API Authentication Service
Each service has its own path in the RESTful call. For authentication we need to point to the `v1/auth` service.

In [4]:
API_Auth = "/v1/auth"

### Authentication
To authenticate to the RESTful service you must provide the connection information for the Data Virtualization database you want to query along with the userid and password that you are using to authenticate to that database. You can also provide an expiry time so that the access token that gets returned will be invalidated after that time period. In this example the token is good for 30 minutes.  Note that the database name in DV is always "BIGSQL".

In [5]:
body = {
  "dbParms": {
    "dbHost": "cpdmkt-cpd-cpdmkt.apps.cpd.170-224-72-131.nip.io",
    "dbName": "BIGSQL",
    "dbPort": 30753,
    "isSSLConnection": False,
    "username": "db2dean",
    "password": "db2dean_password"
  },
  "expiryTime": "30m"
}

### API Service
When communicating with the RESTful service, you must provide the name of the service that you want to interact with. In this case the authentication service is */v1/auth*.   When the cell below is run, the server will establish a connection to the database server.


In [6]:
# Use with HTTP call (when using endpoint container on my mac)
# Use when querying my local sample db or the DV database in the CPD cluster.
try:
    response = requests.post("{}{}".format(Db2RESTful,API_Auth), headers=headers, json=body)
except Exception as e:
    print("Unable to call RESTful service. Error={}".format(repr(e)))

### curl eqluivilant to the above call:
token=`curl --header "Content-Type: application/json" \
             -d '{"dbParms":{"dbHost": "cpdmkt-cpd-cpdmkt.apps.cpd.170-224-51-161.nip.io", \
		  "dbName": "BIGSQL","dbPort": 30753 ,\
                  "isSSLConnection":false,"username": "db2dean","password": "password"},"expiryTime": "30m"}' \
             http://192.168.0.14:50050/v1/auth`
             
token2=`echo $token | awk 'BEGIN {FS = "\""} ; {print $4}'

A response code of 200 means that the authentication worked properly, otherwise the error that was generated is printed.

In [7]:
print(response)
print(response.status_code)

<Response [200]>
200


The response includes a connection token that is reused in the queries below. It ensures secure a connection without requiring that you reenter a userid and password with each request.  

In [8]:
if (response.status_code == 200):
  token = response.json()["token"]
  print("Token: {}".format(token))
else: 
  print(response.json()["errors"])

Token: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJhZG1pbiI6dHJ1ZSwiY2xpZW50X2lkIjoiOGVjOGVkMTYtZDVlMC00MzBjLTk2ZWQtZjYxM2RkYzQ5OTdmIiwiZXhwIjoxNjMyNzc3ODk4LCJpc3MiOiJkYjJkZWFuIn0.Cr1s3X9AYQXMXFtd1OpEgRUh5Gs9etcSU9DANIva5sfFv7N9LirVj61FknZ_v4m4lRtDIiBf6x0BN8YhEYdXzwwizewgR71tI9xUjIbwN8Pdw9p35rbqz7Shd3_eRyifWdEw6aKhNEmHb0blL1V43t4sGB8bBtn7NQAl8F20TkaiYoJj0ZrUpGZzzZu90sXd2zGngFKKThqWc7ezqBU6iem-WlR3ylSsFQ1Z5vC5WLMHj995BG5CQCzxa5wvpYKoJvb3oHMCFXb_1Yrmd1o2aAAZCd_HW6eRaDovC7WfmrNG4nH3tYAJiVbuTRGm-FErThIqg1fK7DUeTjroOzCehA


## The remaining steps configure variables and then query the database
The standard header for all subsequent calls will use this format. It includes the access token.

In [9]:
headers = {
  "authorization": f"{token}",
  "content-type": "application/json"
}

### API for Querying the virtual dataase
Executing SQL requires the execsql service endpoint. 

In [10]:
API_execsql = "/v1/services/execsql"

### Format string to query a single virtual table in the database and run it

In [11]:
body = {
  "isQuery": True,
  "sqlStatement": "SELECT * \
    FROM DV.HOSPITAL_HCAHPS_DATA_PG",
  "sync": True
}

In [12]:
try:
    response = requests.post("{}{}".format(Db2RESTful,API_execsql), headers=headers, json=body)
except Exception as e:
    print("Unable to call RESTful service. Error={}".format(repr(e)))

    # A response of 200 means success
print(response)

<Response [200]>


#### The equivalent cURL of the above call is shown in this shell script

```
myquery= "SELECT * \
    FROM DV.HOSPITAL_HCAHPS_DATA_PG"  
```
```
curl --header "authorization: $token2" --header 'Content-Type: application/json' \
      -d '{"isQuery": true,"sqlStatement": "'"$myquery"'" ,"sync": true}' \
     http://192.168.0.14:50050/v1/services/execsql
```

Retrieve the results. The Dataframe class converts the json result set into a table. Dataframes can be used to further manipulate results in Python.

In [13]:
display(pd.DataFrame(response.json()['resultSet']))

Unnamed: 0,HCAHPS_MEASURE_ID,PROVIDER_ID,county_name,hcahps_answer_description,hcahps_answer_percent,hcahps_answer_percent_footnote,hcahps_linear_mean_value,hcahps_question,hospital_name
0,H_COMP_4_LINEAR_SCORE,90008,DISTRICT OF COLUMBIA,Pain management - linear mean score,Not Applicable,,81,Pain management - linear mean score,UNITED MEDICAL CENTER
1,H_COMP_4_SN_P,90008,DISTRICT OF COLUMBIA,"Pain was ""sometimes"" or ""never"" well controlled",17,,Not Applicable,"Patients who reported that their pain was ""Som...",UNITED MEDICAL CENTER
2,H_COMP_4_U_P,100009,MIAMI-DADE,"Pain was ""usually"" well controlled",23,,Not Applicable,"Patients who reported that their pain was ""Usu...",UNIVERSITY OF MIAMI HOSPITAL
3,H_COMP_5_A_P,100009,MIAMI-DADE,"Staff ""always"" explained",60,,Not Applicable,"Patients who reported that staff ""Always"" expl...",UNIVERSITY OF MIAMI HOSPITAL
4,H_COMP_5_LINEAR_SCORE,100009,MIAMI-DADE,Communication about medicines - linear mean score,Not Applicable,,74,Communication about medicines - linear mean score,UNIVERSITY OF MIAMI HOSPITAL
5,H_COMP_5_SN_P,100009,MIAMI-DADE,"Staff ""sometimes"" or ""never"" explained",23,,Not Applicable,"Patients who reported that staff ""Sometimes"" o...",UNIVERSITY OF MIAMI HOSPITAL
6,H_COMP_5_STAR_RATING,100009,MIAMI-DADE,Communication about medicines - star rating,Not Applicable,,Not Applicable,Communication about medicines - star rating,UNIVERSITY OF MIAMI HOSPITAL
7,H_COMP_5_U_P,100009,MIAMI-DADE,"Staff ""usually"" explained",17,,Not Applicable,"Patients who reported that staff ""Usually"" exp...",UNIVERSITY OF MIAMI HOSPITAL
8,H_COMP_6_LINEAR_SCORE,100009,MIAMI-DADE,Discharge information - linear mean score,Not Applicable,,83,Discharge information - linear mean score,UNIVERSITY OF MIAMI HOSPITAL
9,H_COMP_6_N_P,100009,MIAMI-DADE,"No, staff ""did not"" give patients this informa...",17,,Not Applicable,"Patients who reported that NO, they were not g...",UNIVERSITY OF MIAMI HOSPITAL


In [14]:
response.json()

{'jobStatus': 4,
 'jobStatusDescription': 'Job is complete',
 'resultSet': [{'HCAHPS_MEASURE_ID': 'H_COMP_4_LINEAR_SCORE',
   'PROVIDER_ID': '90008',
   'county_name': 'DISTRICT OF COLUMBIA',
   'hcahps_answer_description': 'Pain management - linear mean score',
   'hcahps_answer_percent': 'Not Applicable',
   'hcahps_answer_percent_footnote': '',
   'hcahps_linear_mean_value': '81',
   'hcahps_question': 'Pain management - linear mean score',
   'hospital_name': 'UNITED MEDICAL CENTER'},
  {'HCAHPS_MEASURE_ID': 'H_COMP_4_SN_P',
   'PROVIDER_ID': '90008',
   'county_name': 'DISTRICT OF COLUMBIA',
   'hcahps_answer_description': 'Pain was "sometimes" or "never" well controlled',
   'hcahps_answer_percent': '17',
   'hcahps_answer_percent_footnote': '',
   'hcahps_linear_mean_value': 'Not Applicable',
   'hcahps_question': 'Patients who reported that their pain was "Sometimes" or "Never" well controlled',
   'hospital_name': 'UNITED MEDICAL CENTER'},
  {'HCAHPS_MEASURE_ID': 'H_COMP_4_U_P

### Format string to join multiple tables and run it


In [15]:
body = {
  "isQuery": True,
  "sqlStatement": "select postgres.hcahps_measure_id \
              , db2edw.county_name as County \
              , sandbox.number_of_readmissions \
           from DV.HOSPITAL_HCAHPS_DATA_PG postgres \
              , DV.HOSPITAL_INFO_EDW db2edw \
              , DV.HOSPITAL_READMISSION_DB2S sandbox \
          where varchar(postgres.provider_id) = db2edw.provider_id \
            and db2edw.provider_id = sandbox.provider_number \
          fetch first 3 rows only",
  "sync": True
}


Retrieve the results. The Dataframe class converts the json result set into a table. Dataframes can be used to further manipulate results in Python.

In [16]:
# Use with HTTP call when using endpoint container on my mac
# This end point tals to my local DB, DV or any other database defined in the DB2RESTful set up above.
try:
    response = requests.post("{}{}".format(Db2RESTful,API_execsql), headers=headers, json=body)
except Exception as e:
    print("Unable to call RESTful service. Error={}".format(repr(e)))

    # A response of 200 means success
print(response)

<Response [200]>


The equivalent cURL call is:

```
myquery="select postgres.hcahps_measure_id \
              , db2edw.county_name as County \
              , sandbox.number_of_readmissions \
           from DV.HOSPITAL_HCAHPS_DATA_PG postgres \
              , DV.HOSPITAL_INFO_EDW db2edw \
              , DV.HOSPITAL_READMISSION_DB2S sandbox \
          where varchar(postgres.provider_id) = db2edw.provider_id \
            and db2edw.provider_id = sandbox.provider_number \
          fetch first 3 rows only"

curl --header "authorization: $token2" --header 'Content-Type: application/json' \
      -d '{"isQuery": true,"sqlStatement": "'"$myquery"'" ,"sync": true}' \
     http://192.168.0.14:50050/v1/services/execsql
```

In [17]:
display(pd.DataFrame(response.json()['resultSet']))

Unnamed: 0,COUNTY,HCAHPS_MEASURE_ID,NUMBER_OF_READMISSIONS
0,DISTRICT OF COLUMBIA,H_COMP_4_LINEAR_SCORE,40
1,DISTRICT OF COLUMBIA,H_COMP_4_LINEAR_SCORE,44
2,DISTRICT OF COLUMBIA,H_COMP_4_LINEAR_SCORE,61


In [18]:
response.json()

{'jobStatus': 4,
 'jobStatusDescription': 'Job is complete',
 'resultSet': [{'COUNTY': 'DISTRICT OF COLUMBIA',
   'HCAHPS_MEASURE_ID': 'H_COMP_4_LINEAR_SCORE',
   'NUMBER_OF_READMISSIONS': 40},
  {'COUNTY': 'DISTRICT OF COLUMBIA',
   'HCAHPS_MEASURE_ID': 'H_COMP_4_LINEAR_SCORE',
   'NUMBER_OF_READMISSIONS': 44},
  {'COUNTY': 'DISTRICT OF COLUMBIA',
   'HCAHPS_MEASURE_ID': 'H_COMP_4_LINEAR_SCORE',
   'NUMBER_OF_READMISSIONS': 61}],
 'rowCount': 3}

### Format string to describe a virtual table



In [19]:
body = {
  "isQuery": True,
  "sqlStatement": "CALL sysproc.admin_cmd('describe table DV.HOSP_INFO_READMIT_VIEW')",
  "sync": True
}


In [20]:
# Use with HTTP call when using endpoint container on my mac
# This end point tals to my local DB, DV or any other database defined in the DB2RESTful set up above.
try:
    response = requests.post("{}{}".format(Db2RESTful,API_execsql), headers=headers, json=body)
except Exception as e:
    print("Unable to call RESTful service. Error={}".format(repr(e)))

    # A response of 200 means success
print(response)

<Response [200]>


The equivalent cURL call is:
```
myquery="CALL sysproc.admin_cmd('describe table DV.HOSP_INFO_READMIT_VIEW ')"
curl --header "authorization: $token2" --header 'Content-Type: application/json' \
      -d '{"isQuery": true,"sqlStatement": "'"$myquery"'" ,"sync": true}' \
     http://192.168.0.14:50050/v1/services/execsql
```

Retrieve the results. The Dataframe class converts the json result set into a table. Dataframes can be used to further manipulate results in Python.

In [21]:
display(pd.DataFrame(response.json()['resultSet']))

Unnamed: 0,COLNAME,LENGTH,NULLABLE,SCALE,TYPENAME,TYPESCHEMA
0,PROVIDER_ID,4,Y,0,INTEGER,SYSIBM
1,DV_HOSPITAL_INFO_EDW_HOSPITAL_NAME,50,Y,0,VARCHAR,SYSIBM
2,ADDRESS,50,Y,0,VARCHAR,SYSIBM
3,CITY,20,Y,0,VARCHAR,SYSIBM
4,STATE,2,Y,0,VARCHAR,SYSIBM
5,ZIP_CODE,4,Y,0,INTEGER,SYSIBM
6,COUNTY_NAME,20,Y,0,VARCHAR,SYSIBM
7,HOSPITAL_TYPE,25,Y,0,VARCHAR,SYSIBM
8,HOSPITAL_OWNERSHIP,43,Y,0,VARCHAR,SYSIBM
9,EMERGENCY_SERVICES,3,Y,0,VARCHAR,SYSIBM


In [22]:
response.json()

{'jobStatus': 4,
 'jobStatusDescription': 'Job is complete',
 'resultSet': [{'COLNAME': 'PROVIDER_ID',
   'LENGTH': 4,
   'NULLABLE': 'Y',
   'SCALE': 0,
   'TYPENAME': 'INTEGER',
   'TYPESCHEMA': 'SYSIBM  '},
  {'COLNAME': 'DV_HOSPITAL_INFO_EDW_HOSPITAL_NAME',
   'LENGTH': 50,
   'NULLABLE': 'Y',
   'SCALE': 0,
   'TYPENAME': 'VARCHAR',
   'TYPESCHEMA': 'SYSIBM  '},
  {'COLNAME': 'ADDRESS',
   'LENGTH': 50,
   'NULLABLE': 'Y',
   'SCALE': 0,
   'TYPENAME': 'VARCHAR',
   'TYPESCHEMA': 'SYSIBM  '},
  {'COLNAME': 'CITY',
   'LENGTH': 20,
   'NULLABLE': 'Y',
   'SCALE': 0,
   'TYPENAME': 'VARCHAR',
   'TYPESCHEMA': 'SYSIBM  '},
  {'COLNAME': 'STATE',
   'LENGTH': 2,
   'NULLABLE': 'Y',
   'SCALE': 0,
   'TYPENAME': 'VARCHAR',
   'TYPESCHEMA': 'SYSIBM  '},
  {'COLNAME': 'ZIP_CODE',
   'LENGTH': 4,
   'NULLABLE': 'Y',
   'SCALE': 0,
   'TYPENAME': 'INTEGER',
   'TYPESCHEMA': 'SYSIBM  '},
  {'COLNAME': 'COUNTY_NAME',
   'LENGTH': 20,
   'NULLABLE': 'Y',
   'SCALE': 0,
   'TYPENAME': 'VARCHAR