![Top <](./images/watsonxdata.png "watsonxdata")

# Accessing watsonx.data with RESTful Calls
Representational state transfer (REST) is a software architectural style that defines a set of constraints to be used for creating Web services. Web services that conform to the REST architectural style, called RESTful Web services, provide interoperability between computer systems on the internet. RESTful Web services allow the requesting systems to access and manipulate textual representations of Web resources by using a uniform and predefined set of stateless operations.

Watsonx.data provides additional RESTful APIs that provide access to underlying engines, buckets, catalogs, schemas, tables and services. This notebook will explore a numer of the RESTful calls, including an example of querying tables through watsonx.data rather than the Presto RESTful API. 

The Python requests library provides a simple call interface to RESTful. This code also includes the Pandas library used for formatting the output and the warnings are turned off for any of the RESTful calls (Self-signed certificate warnings).

In [None]:
import requests
import pandas as pd
import json
from IPython.display import display, HTML
import warnings
warnings.filterwarnings('ignore')

## Calling Restful Services

There are five different types of RESTful calls that are used with watsonx.data:

* GET - Get results from a request or SQL
* DELETE - Delete a resource
* PUT - Update a resource (usually replaces the resource)
* PATCH - Partial update of a resource
* POST - Creates a new resource
  
All RESTful calls require the host IP address and the service URL:

* host - This is the IP address and port of the machine that is hosting watsonx.data
* api - The API library that is being used to communicate with watsonx.data
* service - The service (API) that is being requested.

The API library may be different for some of the watsonx.data calls. The API version can also change between releases of watsonx.data. Using an API library provides for backward compatibility when new API libraries are created.

The full URL used in a RESTful call is make up of a combination of host, port, API, and service.

```bash
https:// + host:port + api + service
```

If your host was xyz.abc.com:44444 with v3 of the api library and using the /auth/tokens service, the full URL would be:

```bash
https://xyz.abc.com:4444/dbapi/v3/auth/tokens
```

This URL is then placed into a RESTful call that has one of the following formats:

```bash
RESTful.call( host + api + service)
RESTful.call( host + api + service, headers=headers, verify=file)
RESTful.call( host + api + service, headers=headers, json=data, verify=file)
```

The first form of the RESTful call is requesting a resource that does not require authentication to the server. The second form of the RESTful calls requires a header file with control information (userid/password/token) and a file that points to the certificate file. The contents of the header file is dependent on the API call that is being used. The third call provides some additional data to the RESTful call by using the `json=request` value.

The information that is returned from a Python RESTful call contains four values that you need. Here is a sample restful call:
```python
r = requests.get(f"{host}:{port}{service}", headers=auth_header, verify=certfile)
```
The variable `r` will contain the following:
* r.ok - `True` or `False`
* r.status_code - The web code returned by the RESTful call (i.e. 400 - not found, 200 - Ok)
* r.reason - The success or error message from the RESTful call
* r.json() - The returned messages from the RESTful call if it was successful

A list of return codes from a RESTful call are summarized below.

HTTP Error Code |Description| Recovery
---|------------|-----------------
200|Success 	|The request was successful.
201|Created 	|The requested resource successfully created in a synchronous manner.
204|No Content 	|The server successfully processed the request and is not returning any content.
400|Bad Request |The input parameters in the request body are either incomplete or in the wrong format. Be sure to include all required parameters in your request.
401|Unauthorized|You are not authorized to make this request. Log in to IBM Cloud and try again. If this error persists, contact the account owner to check your permissions.
403|Forbidden 	|The supplied authentication is not authorized to access '{namespace}'.
404|Not Found 	|The requested resource could not be found.
409|Conflict 	|The entity is already in the requested state.
500|Internal Server Error|Your request could not be processed. Wait a few minutes and try again.

The error message returned from the RESTful call is found in the `r.reason` field. If the RESTful call was successful, the r.json() function will return the data from the call. The data is in the format of a Python dictionary. 
```python
results = r.json()
```

Once you have the data in a variable, you can access the fields by using the Python dictionary syntax. For instance, if you want to retrieve the access token from the result set, you would use the following code:
```python
access_code = results['accesscode']
```
The fields that are returned from a RESTful call will be unique for each service, so you will need to refer to the documentation to detemrine what the returned document will contain.

## Watsonx.data RESTful Reference
This notebook uses a few of the RESTful APIs that are available in watsonx.data. The full list of RESTful calls can be found at the following [watsonx.data Cloud API](https://cloud.ibm.com/apidocs/watsonxdata-software#intro) site. The RESTful calls are subdivided into the following sections which include the links into the documentation.

* [Buckets](https://cloud.ibm.com/apidocs/watsonxdata-software#list-bucket-registrations) 
* [Databases](https://cloud.ibm.com/apidocs/watsonxdata-software#create-driver-database-catalog)
* [Engines](https://cloud.ibm.com/apidocs/watsonxdata-software#list-db2-engines)
* [Console](https://cloud.ibm.com/apidocs/watsonxdata-software#test-l-h-console)
* [Catalogs](https://cloud.ibm.com/apidocs/watsonxdata-software#list-catalogs)
* [Services](https://cloud.ibm.com/apidocs/watsonxdata-software#list-milvus-services)

This notebook will examine the `buckets` RESTful calls, along with `engines` and `catalogs`. The information retrieved from these calls will then be used to issue SQL calls to the Presto engine.

## Watsonx.data Host Settings
In our developer edition system, we will use the following HOST and PORT values to communicate with the system, along with our administrative userid.

In [None]:
host              = "https://watsonxdata"
port              = 9443
username          = "ibmlhadmin"
password          = "password"

## Retrieve watsonx.data Certificate
Before we can issue our RESTful calls, a certificate is required as part of the RESTful header. Our watsonx.data system already has certificates available to use, but the following call will extract the information we need into a local file.

In [None]:
%system echo\
        QUIT |\
        openssl s_client -showcerts -connect 127.0.0.1:8443 |\
        awk '/-----BEGIN CERTIFICATE-----/ {p=1}; p; /-----END CERTIFICATE-----/ {p=0}'\
        > /tmp/restful.crt

Doublecheck that we have our certificate.

In [None]:
%system cat /tmp/restful.crt

## Watsonx.data Instance Information
All of the watsonx.data RESTful calls require information about what instance is being accessed. This information needs to be extracted from the server. In the developer edition, there is only one instance that is running and it is referred to with the following string `0000-0000-0000-0000`. The next command will extract the instance ID by using a docker command to retrieve the instance environment variable in the system.,

In [None]:
r = %system docker exec ibm-lh-presto printenv LH_INSTANCE_ID
lh_instance_id = r[0]
r = %system docker exec ibm-lh-presto printenv LH_INSTANCE_NAME
lh_instance_name = r[0]
print(f"Instance name: {lh_instance_name}\nInstance ID  : {lh_instance_id}")

## Watsonx.data Authentication
At this point we have all of the information we need to initiate a connection to the watsonx.data service. The first step is to authenticate to the watsonx.data system and retrieve a token that will be used for subsequent calls. The RESTful call requires the following information:
* Call Type: `POST`
* API: `/lakehouse/api/v2`
* Service: `/auth/authenticate`
* JSON Data
    * Instance ID: `instance_id`
    * Instance Name: `instance_name`
    * Watsonx.data Userid: `username`
    * Password: `password`
* Certificate: `filename`

The request header is not required in this RESTful call. Instead we are using the JSON keyword to send the control information to the RESTful service. Note that the JSON data is created as a Python dictionary. The next cell creates the header variable we need. The code also includes the administrative userid and password for the watsonx.data system.

In [None]:
api      = "/lakehouse/api/v2"
service  = "/auth/authenticate"
username = "ibmlhadmin"
password = "password"
certfile = "/tmp/restful.crt"
request = {
    "instance_id"  : lh_instance_id,
    "instance_name": lh_instance_name,
    "password"     : password,
    "username"     : username
}

At this point we are ready to call the RESTful service to authenticate the RESTful user to watsonx.data.

In [None]:
r = requests.post(f"{host}:{port}{api}{service}", json=request, verify=certfile)
r.reason

The information returned from the authentication call includes an access token which will be used for subsequent calls to watsonx.data.

In [None]:
details = r.json()
accesstoken = details['accessToken']
print(accesstoken)

## Token Expiry
Tokens will expire after a period of time. The token we just created will work for the next few commands but will fail if we wait too long. In order to check for token expiry, another authentication call (`/preauth`) can be used to reauthenticate with the previous token and get a timeout value that can be checked for expiry.

In [None]:
api_inst = f"/lakehouse/api/v2/{lh_instance_id}"
service  = "/preauth"
headers = {
    "Content-Type"   : "application/json", 
    "AuthInstanceID" : lh_instance_id,
    "Authorization"  : f"Bearer {accesstoken}"
}

r = requests.post(f"{host}:{port}{api_inst}{service}", headers=headers, verify=certfile)
r.reason

The reason code comes back as "Created", which means a new access token has been returned with a time limit. In the dictionary that is returned is a field called `exp` which contains the expiry time of the token as the Linux date as the number of seconds since 1970.

In [None]:
results = r.json()
token_details = results['token_details']
print(token_details['exp'])
expiry = token_details['exp']

Some additional details are found in the token results.

In [None]:
token_details

You can check the current Linux time with the following Python script.

In [None]:
import time    
epoch_time = int(time.time())
print(epoch_time)

If the time exceeds the expiry time, the token will no longer work and you will have to reauthenticate. The following code will reauthenticate if the current time exceeds the time interval. In order to simplify the passing of parameters, the next cell will create a python variable called `credentials` which will bundle the variables required to authenticate with the server and track the expiry time of the token.

In [None]:
credentials = {
    "host"            : "https://watsonxdata",
    "port"            : 9443,
    "lh_instance_id"  : lh_instance_id,
    "lh_instance_name": lh_instance_name,
    "password"        : password,
    "username"        : username,    
    "expiry"          : expiry,
    "accesstoken"     : accesstoken,
    "certfile"        : certfile
}

Now we can define the routine that will authenticate against the RESTful service and track the token and expiry time.

In [None]:
def authenticate(credentials):

    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken, certfile = \
    [credentials[k] for k in ["host","port","lh_instance_id","lh_instance_name","password","username","expiry","accesstoken","certfile"]]


    epoch_time = int(time.time())
    if (epoch_time < expiry):
        return True
        
    api      = "/lakehouse/api/v2"
    service  = "/auth/authenticate"
    request = {
        "instance_id"  : lh_instance_id,
        "instance_name": lh_instance_name,
        "password"     : password,
        "username"     : username
    }
    r = requests.post(f"{host}:{port}{api}{service}", json=request, verify=certfile)
    
    if (r.ok == False):
        print(r.reason)
        return False
        
    details = r.json()
    accesstoken = details['accessToken']
    credentials['accesstoken'] = accesstoken

    api_inst = f"/lakehouse/api/v2/{lh_instance_id}"
    service  = "/preauth"
    headers = {
        "Content-Type"   : "application/json", 
        "AuthInstanceID" : lh_instance_id,
        "Authorization"  : f"Bearer {accesstoken}"
    }

    r = requests.post(f"{host}:{port}{api_inst}{service}", headers=headers, verify=certfile)
    if (r.ok == False):
        print(r.reason)
        return False
        
    results = r.json()
    credentials["expiry"]      = token_details['exp']
    return True

Check that the routine works by running it with the current credentials and checking the new expiry time after running it.

In [None]:
authenticate(credentials)
print(credentials["expiry"])

## Bucket Management
The watsonx.data RESTful calls provide a number services for bucket management including:
* Get bucket registrations
* Register a bucket
* Get a bucket
* Unregister a bucket
* Update a bucket
* Activate a bucket
* Deactivate a bucket
* Check bucket credentials

One of the more useful RESTful calls it to get a list of buckets that are found in the watsonx.data system. The service name must be modified to include the instance id for all calls that access resources within the watsonx.data instance. The header is now required in these `GET` requests and the data in the header includes the access token retrieved in the previous call:
* Call Type: `GET`
* API: `/lakehouse/api/v2/instance_id`
* Service: `/bucket_registrations`
* Header
    * Content-Type: `application/json`
    * AuthInstanceID: `0000-0000-0000-0000`
    * Authorization:  `Bearer <token>`
* Certificate: `filename`

In [None]:
if (authenticate(credentials) == True):
    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken = \
    [credentials[k] for k in ["host", "port", "lh_instance_id", "lh_instance_name", "password", "username", "expiry", "accesstoken"]]
    api_inst = f"/lakehouse/api/v2/{lh_instance_id}"
    service  = "/bucket_registrations"
    headers = {
    "Content-Type"   : "application/json", 
    "AuthInstanceID" : credentials["lh_instance_id"],
    "Authorization"  : f"Bearer {accesstoken}"
    }
    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken
    r = requests.get(f"{host}:{port}{api_inst}{service}", headers=headers, verify=certfile)
    print(r.reason)

The data in the result set includes all of the buckets that are registered in the system. The bucket_details field contains all of the information on a particular bucket, including the catalog it is associated with and the location of the bucket. 

     "bucket_details": {
        "bucket_name": "iceberg-bucket",
        "endpoint": "http://xyz-minio-svc:9000"
      },

In [None]:
results = r.json()
bucket_registrations = results['bucket_registrations']
bucket_name = []
bucket_endpoint = []
catalog = []
for doc in bucket_registrations:
    bucket_name.append(doc['bucket_id'])
    catalog.append(doc.get('associated_catalog',{}).get('catalog_name',None))
    bucket_endpoint.append(doc.get('bucket_details',{}).get('bucket_name', None))
df = pd.DataFrame({'bucket_id': bucket_name, 'catalog': catalog, 'endpoint': bucket_endpoint})
display(HTML(df.to_html()))

### Individual Bucket Details
If you only wanted the details about an individual bucket, you can append the bucket name to the end of the service URL. The following code will only get the details of the `hive-bucket`.

In [None]:
import json
if (authenticate(credentials) == True):
    bucket_name = "hive-bucket"
    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken = \
    [credentials[k] for k in ["host", "port", "lh_instance_id", "lh_instance_name", "password", "username", "expiry", "accesstoken"]]
    api_inst = f"/lakehouse/api/v2/{lh_instance_id}"
    service  = f"/bucket_registrations/{bucket_name}"
    headers = {
    "Content-Type"   : "application/json", 
    "AuthInstanceID" : credentials["lh_instance_id"],
    "Authorization"  : f"Bearer {accesstoken}"
    }
    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken
    r = requests.get(f"{host}:{port}{api_inst}{service}", headers=headers, verify=certfile)
    print(json.dumps(r.json(),indent=4))

### Bucket Files
We can use another RESTful call to get the contents of a bucket. The RESTful call is similar to the previous one, except that the bucket name needs to passed in the RESTful path. This call will get the contents of the `hive-data` bucket.

In [None]:
if (authenticate(credentials) == True):
    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken = \
    [credentials[k] for k in ["host", "port", "lh_instance_id", "lh_instance_name", "password", "username", "expiry", "accesstoken"]]
    bucket   = "hive-bucket"
    api_inst = f"/lakehouse/api/v2/{lh_instance_id}"
    service  = f"/bucket_registrations/{bucket}/objects"
    headers = {
        "Content-Type"   : "application/json", 
        "AuthInstanceID" : lh_instance_id,
        "Authorization"  : f"Bearer {accesstoken}"
    }
    r = requests.get(f"{host}:{port}{api_inst}{service}", headers=headers, verify=certfile)
    r.reason
    results = r.json()
    df = pd.DataFrame({'Object': results['objects']})
    display(HTML(df.to_html()))

## Watsonx.data Objects
In order to query the data within watsonx.data, the RESTful calls require details on the objects that the table(s) are tied to. There is a hierarchy to the objects in watsonx.data that are represented in the simple diagram below. 
```
engine - Presto
 \
  buckets - hive-data, iceberg-data
   \
    catalogs - hive_data, iceberg_data
     \
      schemas - gosalesdw, ontime, taxi
       \
        tables 
```
A table is created in a schema, which in turn is created in a catalog which resides in a bucket. The bucket is then tied to a particular query engine. 

### Watsonx.data Engines
In the watsonx.data developer edition, there is only one Presto engine and it is referred to as `presto-01`. We can query the engines in watsonx.data with the following RESTful call.

* Call Type: `GET`
* API: `/lakehouse/api/v2`
* Service: `/engines`
* Header
    * Content-Type: `application/json`
    * AuthInstanceID: `0000-0000-0000-0000`
    * Authorization:  `Bearer <token>`
* Certificate: `filename`

In [None]:
if (authenticate(credentials) == True):
    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken = \
    [credentials[k] for k in ["host", "port", "lh_instance_id", "lh_instance_name", "password", "username", "expiry", "accesstoken"]]
    api_inst = f"/lakehouse/api/v2"
    service  = f"/engines"
    headers = {
        "Content-Type"   : "application/json", 
        "AuthInstanceID" : lh_instance_id,
        "Authorization"  : f"Bearer {accesstoken}"
    }
    r = requests.get(f"{host}:{port}{api_inst}{service}", headers=headers, verify=certfile)
    print(json.dumps(r.json(),indent=4))

### Details on the Presto Engines

The data returned provides details on all of the engines that have been registered in the system. To get the details of the presto engines, we modify the service to `/presto_engines`.

In [None]:
if (authenticate(credentials) == True):
    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken = \
    [credentials[k] for k in ["host", "port", "lh_instance_id", "lh_instance_name", "password", "username", "expiry", "accesstoken"]]
    api_inst = f"/lakehouse/api/v2"
    service  = f"/presto_engines"
    headers = {
        "Content-Type"   : "application/json", 
        "AuthInstanceID" : lh_instance_id,
        "Authorization"  : f"Bearer {accesstoken}"
    }
    r = requests.get(f"{host}:{port}{api_inst}{service}", headers=headers, verify=certfile)
    print(json.dumps(r.json(),indent=4))

For subsequent queries, we will need the name of the Presto engine. The value is extracted from the result set with the next command.

In [None]:
results = r.json()
presto_id = results['presto_engines'][0]['engine_id']
presto_id

## Catalogs
The step in the object hierarchy is Catalogs. To retrieve the catalogs that are found in watsonx.data. we use the following RESTful call.

* Call Type: `GET`
* API: `/lakehouse/api/v2`
* Service: `/catalogs`
* Header
    * Content-Type: `application/json`
    * AuthInstanceID: `0000-0000-0000-0000`
    * Authorization:  `Bearer <token>`
* Certificate: `filename`

This call will return all catalog entries. To find only the catalogs associated with the Presto engine, we will need to add some additional logic to result set that returned.

In [None]:
if (authenticate(credentials) == True):
    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken = \
    [credentials[k] for k in ["host", "port", "lh_instance_id", "lh_instance_name", "password", "username", "expiry", "accesstoken"]]
    api_inst = f"/lakehouse/api/v2"
    service  = f"/catalogs"
    headers = {
        "Content-Type"   : "application/json", 
        "AuthInstanceID" : lh_instance_id,
        "Authorization"  : f"Bearer {accesstoken}"
    }
    r = requests.get(f"{host}:{port}{api_inst}{service}", headers=headers, verify=certfile)
    results = r.json()
    catalog_list = []
    catalogs = results['catalogs']
    print("Catalogs")
    print("---------------")    
    for catalog in catalogs:
        if (presto_id in catalog['associated_engines']):
            print(catalog['catalog_name'])

## Schemas
Once we have the catalogs associated with an engine, you can request what schemas are associated with a catalog. The format of the RESTful call is:

* Call Type: `GET`
* API: `/lakehouse/api/v2`
* Service: `/catalogs/{catalog_id}/schemas?engine_id=engine_id`
* Header
    * Content-Type: `application/json`
    * AuthInstanceID: `0000-0000-0000-0000`
    * Authorization:  `Bearer <token>`
* Certificate: `filename`

The service name is modifed to include the `catalog_id` and the `schemas` service with the addition of the `engine_id` being passed as a parameter in the URL.

To get the schemas associated with the `hive_data` catalog in the `presto-01` engine, the service would be:

```bash
/catalogs/hive_data/schemas?engine_id=presto-01
```


In [None]:
catalog_id = "hive_data"

if (authenticate(credentials) == True):
    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken = \
    [credentials[k] for k in ["host", "port", "lh_instance_id", "lh_instance_name", "password", "username", "expiry", "accesstoken"]]
    api_inst = f"/lakehouse/api/v2"
    service  = f"/catalogs/{catalog_id}/schemas?engine_id={presto_id}"
    headers = {
        "Content-Type"   : "application/json", 
        "AuthInstanceID" : lh_instance_id,
        "Authorization"  : f"Bearer {accesstoken}"
    }
    r = requests.get(f"{host}:{port}{api_inst}{service}", headers=headers, verify=certfile)
    results = r.json()
    schemas = results["schemas"]
    print("Schemas")
    print("---------------")    
    for schema in schemas:
        print(schema)

## Tables
Once you have the schema list, you can use a RESTful call to return the list of tables that are found in the schema. The call is similar to the SCHEMA example, except that the schema ID must now be added to the service request.

* Call Type: `GET`
* API: `/lakehouse/api/v2`
* Service: `/catalogs/{catalog_id}/schemas/{schema_id}/tables?engine_id=engine_id`
* Header
    * Content-Type: `application/json`
    * AuthInstanceID: `0000-0000-0000-0000`
    * Authorization:  `Bearer <token>`
* Certificate: `filename`

The service name is modifed to include the `catalog_id`, `schema_id` and the `/tables` service, with the addition of the `engine_id` being passed as a parameter in the URL.

To get the tables associated with the `ontime` schema, in the `hive_data` catalog, running on the `presto-01` engine, the service would be:

```bash
/catalogs/hive_data/schemas/ontime/tables?engine_id=presto-01
```

In [None]:
catalog_id = "hive_data"
schema_id  = "ontime"

if (authenticate(credentials) == True):
    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken = \
    [credentials[k] for k in ["host", "port", "lh_instance_id", "lh_instance_name", "password", "username", "expiry", "accesstoken"]]
    api_inst = f"/lakehouse/api/v2"
    service  = f"/catalogs/{catalog_id}/schemas/{schema_id}/tables?engine_id={presto_id}"
    headers = {
        "Content-Type"   : "application/json", 
        "AuthInstanceID" : lh_instance_id,
        "Authorization"  : f"Bearer {accesstoken}"
    }
    r = requests.get(f"{host}:{port}{api_inst}{service}", headers=headers, verify=certfile)
    results = r.json()
    tables = results["tables"]
    print("Tables")
    print("---------------")
    for table in tables:
        print(table)

## Table Details
You can get the details of a table by adding the table ID to the previous RESTful call. As you will notice, the service name is extended with additional details (catalog, schema, table, engine) as we get closer to the final object.

To request the details of a table, the call is similar to the TABLES example, except that the Table ID must now be added to the service request.

* Call Type: `GET`
* API: `/lakehouse/api/v2`
* Service: `/catalogs/{catalog_id}/schemas/{schema_id}/tables/{table_id}?engine_id=engine_id`
* Header
    * Content-Type: `application/json`
    * AuthInstanceID: `0000-0000-0000-0000`
    * Authorization:  `Bearer <token>`
* Certificate: `filename`

The service name is modifed to include the `catalog_id`, `schema_id`, `table_id` and the `/tables` service, with the addition of the `engine_id` being passed as a parameter in the URL.

To get the details of the `ontime` table in the `ontime` schema, in the `hive_data` catalog, running on the `presto-01` engine, the service would be:

```bash
/catalogs/hive_data/schemas/ontime/tables/ontime?engine_id=presto-01
```

In [None]:
catalog_id = "hive_data"
schema_id  = "ontime"
table_id   = "ontime"

if (authenticate(credentials) == True):
    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken = \
    [credentials[k] for k in ["host", "port", "lh_instance_id", "lh_instance_name", "password", "username", "expiry", "accesstoken"]]
    api_inst = f"/lakehouse/api/v2"
    service  = f"/catalogs/{catalog_id}/schemas/{schema_id}/tables/{table_id}?engine_id={presto_id}"
    headers = {
        "Content-Type"   : "application/json", 
        "AuthInstanceID" : lh_instance_id,
        "Authorization"  : f"Bearer {accesstoken}"
    }
    r = requests.get(f"{host}:{port}{api_inst}{service}", headers=headers, verify=certfile)
    results = r.json()
    columns = results['columns']
    names = []
    types = []
    for column in columns:
        names.append(column['column_name'])
        types.append(column['type'])

    df = pd.DataFrame({'Column Name': names, 'Type': types})
    display(HTML(df.to_html()))    

## Querying watsonx.data with a RESTful Call
You can query data in Presto by using a RESTful call. There are a number of steps involved when retrieving answer sets from Presto. First of all, a single RESTful call may not result in an answer set immediately. What this means is that the program must "poll" the server to determine when to retrieve results. 

The intial call to RESTful will result in a number of possible responses:

* WAITING_FOR_PREQUISITES - Presto is checking that resources are available to run your query
* QUEUED - Your SQL is queued for execution
* RUNNING - The SQL is running
* FINISHED - The SQL has finished
* ERROR - An error was found in your SQL

The RESTful service uses POST to send a request to the server. The RESTful call requires the following information.

* Call Type: `POST`
* API: `/lakehouse/api/v2/{lh_instance_id}/v1`
* Service: `/statement?engine_id={engine_id}`
* Header
    * Content-Type: `application/json`
    * AuthInstanceID: `0000-0000-0000-0000`
    * Authorization:  `Bearer <token>`
* JSON
    * ccatalog: Catalog name
    * host: Internal Presto host (ibm-lh-presto-svc)
    * port: Internal Presto port (8443)
    * schema: Schema
    * sqlQuery: The SQL that you want to run
* Certificate: `filename`

The program below initiates a POST request with the connection details and the SQL statement. The returned message is found in the `r.json()` field. A field called `stats` contains another field called `state` which indicate what state the RESTful service is in. Based on the current state of execution, the code will continue looping looking for intermediate results or the final results.

Every RESTful call (after the initial one) may send data back to the client. This data needs to be appended after each RESTful call. The program may need to make several RESTful calls to retrieve the entire answer set. The returned `r.json()` field will contain the URL (`nextUri`) that should be used to get the next block of rows using a `GET` request. Once the answer set is exhausted, the final block will have a FINISHED status.

The presto engine (`presto-01`) is hardcoded in the code below. You could add that to the credentials file if you have more than one presto engine available.

In order to reduce the overhead on the Presto service, a delay is added between every call to not overwhelm the server!

The `restfulSQL` function takes 4 arguments:
* credentials - The userid and password required to connect to the watsonx.data system
* catalog - The default catalog that is being used for the query
* schema - The default schema that is being used for the query
* SQL - The SQL query

If the SQL statement does not supply a catalog or schema in the table definition, the default values in the header will be used.

In [None]:
def restfulSQL(credentials,catalog,schema,sql):
    
    from time import sleep
    import pandas as pd

    if (authenticate(credentials) == False):
        print("Unable to authenticate")
        return

    presto_id = "presto-01"    
        
    host, port, lh_instance_id, lh_instance_name, password, username, expiry, accesstoken = \
    [credentials[k] for k in ["host", "port", "lh_instance_id", "lh_instance_name", "password", "username", "expiry", "accesstoken"]]

    api_inst = f"/lakehouse/api/v2/{lh_instance_id}/v1"
    service  = f"/statement?engine_id={presto_id}"
    headers = {
        "Content-Type"   : "application/json", 
        "AuthInstanceID" : lh_instance_id,
        "LhInstanceId"   : lh_instance_id,        
        "Authorization"  : f"Bearer {accesstoken}"
    }
    data = {
        "catalog"        : catalog,
        "schema"         : schema,
        "host"           : "ibm-lh-presto-svc",
        "port"           : 8443,
        "sqlQuery"       : sql
    }
    
    r = requests.post(f"{host}:{port}{api_inst}{service}", headers=headers, json=data, verify=certfile)

    columns = []
    values  = []

    while True:
        if (r.ok == False):
            try:
                results = r.json()
                if (results.get('error',None) != None):
                    errordata = results['error']
                    errormsg  = errordata['message']
                    print(f"Error: {errormsg}")
                    return None
            except Exception as e:
                print(repr(e))
                return None

        data = r.json()
        results = data["data"]
        collect = False
        stats = results.get('stats',None)
        state = stats['state']
        print(state)
        if (state in ["FINISHED","RUNNING"]):
            collect = True
        elif (state == "FAILED"):
            errormsg = results.get('error',None)
            if (errormsg != None):
                print(f"Error: {errormsg.get('message')}")
            error = True
            break
        else:
            collect = False
    
        if (collect == True):
            columns = results.get('columns',None)
            rows    = results.get('data',None)
            if (rows not in [None]):
                values.append(rows)
    
        URI = results.get('nextUri',None)
        if (URI != None):    
            data = {
                "catalog"        : catalog,
                "schema"         : schema,
                "host"           : "ibm-lh-presto-svc",
                "port"           : 8443,
                "nextUri"        : URI
            }           
            sleep(.1)
            r = requests.post(f"{host}:{port}{api_inst}{service}", headers=headers, json=data, verify=certfile)
        else:
            break

    column_names = []
    if (len(columns) > 0):
        for col in columns:
            column_names.append(col.get("name"))
    
    data_values = []
    if (len(values) > 0):
        for row in values[0]:
            data_values.append(row)
    
    df = pd.DataFrame(data=data_values, columns=column_names)
    return df


In [None]:
restfulSQL(credentials,"tpch","tiny",'select * from "tpch"."tiny"."customer" limit 10')

### Example of Invalid SQL
If you send invalid SQL to the engine you will receive a FAILED state back from the RESTful call.

In [None]:
restfulSQL(credentials,'tpch','tiny','select * from "tpch"."tiny"."xcustomer" limit 10')

#### Credits: IBM 2025, George Baklarz [baklarz@ca.ibm.com]