# Using IBM Cloud SQL Query

<div class="pull-left"><left><img style="float: right;" src="http://developer.ibm.com/clouddataservices/wp-content/uploads/sites/85/2018/01/ibm-cloud-object-storage-logo-small.png" width="100" margin=50></left></div>
<div style="text-align:center">
IBM Cloud SQL Query is IBM's serverless SQL service on data in Cloud Object Storage. It allows to run ANSI SQL on Parquet, CSV, JSON, ORC and AVRO data sets. You can use it to run your analytic queries, and you can use it to conduct complex transformations and write the result in any desired data format, partitioning and layout. SQL Query is based on Apache Spark SQL as the query engine in the background. This means you do not have to provision any Apache Spark instance or service. A simple Python client (like the IBM Watson Studio Notebook) is sufficient.<br><br></div>
This notebook is meant to be a generic starter to use the SQL Query API in order to run SQL statements in a programmatic way. It uses the <a href="https://github.com/IBM-Cloud/sql-query-clients/tree/master/Python" target="_blank" rel="noopener noreferrer">ibmcloudsql</a> Python library for this purpose. The notebook has been verified to work with Python 3.5. As mentioned above it does not require a Spark service bound to the notebook.

## Table of contents
1. [Setup libraries](#setup)<br>
2. [Configure SQL Query](#configure)<br>
    2.1 [Using the project bucket](#projectbucket)<br>
    2.2 [Setting SQL Query parameters](#parameters)<br>
3. [Your SQL](#sql)<br>
4. [Running Your SQL Statement](#run)<br>
    4.1 [Low level SQL job submission](#lowlevel)<br>
5. [Running ETL SQLs](#etl)<br>
6. [Paginated SQL Results](#pagination)<br>
7. [Working with SQL result objects](#results)<br>
8. [List recent SQL submissions](#joblist)<br>
9. [Next steps](#next)<br>

### <a id="setup"></a> 1. Setup libraries

Run the following cell at least once in your notebook environment in order to install required packages, such as the SQL Query client library:

In [2]:
!conda install pyarrow
!conda install sqlparse

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



In [3]:
!pip install --upgrade --user ibmcloudsql

Requirement already up-to-date: ibmcloudsql in /home/dsxuser/.local/lib/python3.6/site-packages (0.4.9)


In [4]:
import ibmcloudsql
import pandas as pd
targeturl=''

### <a id="configure"></a> 2. Configure SQL Query
1. You need an **API key** for an IBM cloud identity that has access to your Cloud Object Storage bucket for writing SQL results and to your SQL Query instance. To create API keys log on to the IBM Cloud console and go to <a href="https://console.bluemix.net/iam/#/apikeys" target="_blank">Manage->Security->Platform API Keys</a>, click the `Create` button, give the key a custom name and click `Create`. In the next dialog click `Show` and copy the key to your clipboard and paste it below in this notebook.
2. You need the **instance CRN** for the SQL Query instance. You can find it in the <a href="https://console.bluemix.net/dashboard/apps" target="_blank">IBM Cloud console dashboard</a>. Make sure you have `All Resources` selected as resource group. In the section `Services` you can see your instances of SQL Query and Cloud Object Storage. Select the instance of SQL Query that you want to use. In the SQL Query dashboard page that opens up you find a section titled **REST API** with a button labelled **Instance CRN**. Click the button to copy the CRN into your clipboard and paste it here into the notebook. If you don't have an SQL Query instance created yet, <a href="https://console.bluemix.net/catalog/services/sql-query" target="_blank">create one</a> first.
3. You need to specify the location on Cloud Object Storage where your **query results** should be written. This comprises three parts of information that you can find in the Cloud Object Storage UI for your instance in the IBM Cloud console. You need to provide it as a **URL** using the format `cos://<endpoint>/<bucket>/[<prefix>]`. You have the option to use the cloud object storage **bucket that is associated with your project**. In this case, execute the following section before you proceed.  
<br/>
For more background information, check out the SQL Query <a href="https://console.bluemix.net/docs/services/sql-query/getting-started.html#getting-started-tutorial" target="_blank">documentation</a>.

#### <a id="projectbucket"></a> 2.1 Using the project bucket
**Only** follow the instructions in this section when you want to write your SQL query results to the bucket that has been created for the project for which you have created this notebook. In any other case proceed directly with section **2.2**.
<br><br>
__Inserting the project token__:  
Click the `More` option in the toolbar above (the three stacked dots) and select `Insert project token`.
 * If you haven't created an access token for this project before, you will see a dialog that asks you to create one first. Follow the link to open your project settings, scroll down to `Access tokens` and click `New token`. Give the token a custom name and make sure you select `Editor` as `Access role for project`. After you created your access token you can come back to this notebook, select the empty cell below and again select `Insert project token` from the toolbar at the top.
[//]: # 
This will add a new cell at the top of your notebook with content that looks like this:
```
# @hidden_cell
# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.
from project_lib import Project
project = Project(project_id='<some id>', project_access_token='<some access token>')
pc = project.project_context
```
Leave that cell content as inserted and run the cell. Then then proceed with the following cell below:

In [5]:
cos_bucket = project.get_metadata()['entity']['storage']['properties']
targeturl="cos://" + cos_bucket['bucket_region'] + "/" + cos_bucket['bucket_name'] + "/"

#### <a id="parameters"></a> 2.2 Setting the SQL Query parameters

In [6]:
import getpass
apikey=getpass.getpass('Enter IBM Cloud API Key (leave empty to use previous one): ') or apikey
instnacecrn=input('Enter SQL Query Instance CRN (leave empty to use previous one): ') or instnacecrn
if targeturl == '':
    targeturl=input('Enter target URL for SQL results: ')
else:
    targeturl=input('Enter target URL for SQL results (leave empty to use ' + targeturl + '): ') or targeturl
sqlClient = ibmcloudsql.SQLQuery(apikey, instnacecrn, client_info='SQL Query Starter Notebook')
sqlClient.logon()
print('\nYour SQL Query web console link:\n')
sqlClient.sql_ui_link()

Enter IBM Cloud API Key (leave empty to use previous one): ········
Enter SQL Query Instance CRN (leave empty to use previous one): crn:v1:bluemix:public:sql-query:us-south:a/23f1e9853c41f6b566e71689ed8a1363:c49157a5-b308-4f6b-972c-455082a8f47e::
Enter target URL for SQL results (leave empty to use cos://us-geo/notebooks-donotdelete-pr-thri4xhqi5ofdi/): cos://us-south/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud

Your SQL Query web console link:

https://sql-query.cloud.ibm.com/sqlquery/?instance_crn=crn:v1:bluemix:public:sql-query:us-south:a/23f1e9853c41f6b566e71689ed8a1363:c49157a5-b308-4f6b-972c-455082a8f47e::


'https://sql-query.cloud.ibm.com/sqlquery/?instance_crn=crn:v1:bluemix:public:sql-query:us-south:a/23f1e9853c41f6b566e71689ed8a1363:c49157a5-b308-4f6b-972c-455082a8f47e::'

### <a id="sql"></a> 3. Your SQL
To author your own SQL query, use the interactive SQL Query web console (**link above**) of your SQL Query service instance.

In [7]:
import sqlparse
from pygments import highlight
from pygments.lexers import get_lexer_by_name
from pygments.formatters import HtmlFormatter, Terminal256Formatter

sql=input('Enter your SQL statement (leave empty to use a simple sample SQL)')
if sql == '':
    sql='SELECT o.OrderID, c.CompanyName, e.FirstName, e.LastName FROM cos://us-geo/sql/orders.parquet STORED AS PARQUET o, \
         cos://us-geo/sql/employees.parquet STORED AS PARQUET e, cos://us-geo/sql/customers.parquet STORED AS PARQUET c \
         WHERE e.EmployeeID = o.EmployeeID AND c.CustomerID = o.CustomerID AND o.ShippedDate > o.RequiredDate AND o.OrderDate > "1998-01-01" \
         ORDER BY c.CompanyName'
if " INTO " not in sql:
    sql += ' INTO {} STORED AS CSV'.format(targeturl)
formatted_sql = sqlparse.format(sql, reindent=True, indent_tabs=True, keyword_case='upper')
lexer = get_lexer_by_name("sql", stripall=True)
formatter = Terminal256Formatter(style='tango')
result = highlight(formatted_sql, lexer, formatter)
from IPython.core.display import display, HTML
print('\nYour SQL statement is:\n')
print(result)

Enter your SQL statement (leave empty to use a simple sample SQL)

Your SQL statement is:

[38;5;24;01mSELECT[39;00m [38;5;0mo[39m[38;5;0;01m.[39;00m[38;5;0mOrderID[39m[38;5;0;01m,[39;00m
	[38;5;24;01mc[39;00m[38;5;0;01m.[39;00m[38;5;0mCompanyName[39m[38;5;0;01m,[39;00m
	[38;5;0me[39m[38;5;0;01m.[39;00m[38;5;0mFirstName[39m[38;5;0;01m,[39;00m
	[38;5;0me[39m[38;5;0;01m.[39;00m[38;5;0mLastName[39m
[38;5;24;01mFROM[39;00m [38;5;0mcos[39m[38;5;0;01m:[39;00m[38;5;166;01m/[39;00m[38;5;166;01m/[39;00m[38;5;0mus[39m[38;5;166;01m-[39;00m[38;5;0mgeo[39m[38;5;166;01m/[39;00m[38;5;24;01mSQL[39;00m[38;5;166;01m/[39;00m[38;5;0morders[39m[38;5;0;01m.[39;00m[38;5;0mparquet[39m [38;5;0mSTORED[39m [38;5;24;01mAS[39;00m [38;5;0mPARQUET[39m [38;5;0mo[39m[38;5;0;01m,[39;00m
	[38;5;0mcos[39m[38;5;0;01m:[39;00m[38;5;166;01m/[39;00m[38;5;166;01m/[39;00m[38;5;0mus[39m[38;5;166;01m-[39;00m[38;5;0mgeo[39m[38;5;166;01m/[39;00m[

### <a id="run"></a> 4. Running Your SQL Statement
The following cell submits the above SQL statement and waits for it to finish before printing a sample of the result set.

In [8]:
result_df = sqlClient.run_sql(sql)
if isinstance(result_df, str):
    print(result_df)

In [9]:
result_df.head(10)

Unnamed: 0,OrderID,CompanyName,FirstName,LastName
0,10924,Berglunds snabbköp,Janet,Leverling
1,11058,Blauer See Delikatessen,Anne,Dodsworth
2,10827,Bon app',Nancy,Davolio
3,11076,Bon app',Margaret,Peacock
4,11045,Bottom-Dollar Markets,Michael,Suyama
5,10970,Bólido Comidas preparadas,Anne,Dodsworth
6,11054,Cactus Comidas para llevar,Laura,Callahan
7,11008,Ernst Handel,Robert,King
8,11072,Ernst Handel,Margaret,Peacock
9,10816,Great Lakes Food Market,Margaret,Peacock


#### <a id="lowlevel"></a> 4.1 Low level SQL job submission
Let's run the same SQL again, but this time using the asynchronous submission mechanism and the status check method.

In [10]:
sqlClient.logon()
jobId = sqlClient.submit_sql(sql)
print("SQL query submitted and running in the background. jobId = " + jobId)

SQL query submitted and running in the background. jobId = 16503ee4-a074-4b15-b558-b11513ceabd9


In [11]:
print("Job status for " + jobId + ": " + sqlClient.get_job(jobId)['status'])

Job status for 16503ee4-a074-4b15-b558-b11513ceabd9: running


Use the `wait_for_job()` method as a blocking call until your job has finished:

In [12]:
job_status = sqlClient.wait_for_job(jobId)
print("Job " + jobId + " terminated with status: " + job_status)
if job_status == 'failed':
    details = sqlClient.get_job(jobId)
    print("Error: {}\nError Message: {}".format(details['error'], details['error_message']))

Job 16503ee4-a074-4b15-b558-b11513ceabd9 terminated with status: completed


Use the `get_result()` method to retrieve a dataframe for the SQL result set:

In [13]:
result_df = sqlClient.get_result(jobId)
print("OK, we have a dataframe for the SQL result that has been stored by SQL Query in " + sqlClient.get_job(jobId)['resultset_location'])

OK, we have a dataframe for the SQL result that has been stored by SQL Query in cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud/jobid=16503ee4-a074-4b15-b558-b11513ceabd9


You can delete the result set from Cloud Object Storage using the `delete_result()` method:

In [14]:
sqlClient.delete_result(jobId)

Unnamed: 0,Deleted Object
0,s3.us-south.cloud-object-storage.appdomain.clo...
1,s3.us-south.cloud-object-storage.appdomain.clo...
2,s3.us-south.cloud-object-storage.appdomain.clo...


### <a id="etl"></a> 5. Running ETL SQLs
The following ETL SQL statement joins two data sets from COS and writes the result to COS using **hive style partitioning** with two columns. 

In [15]:
etl_sql='SELECT OrderID, c.CustomerID CustomerID, CompanyName, ContactName, ContactTitle, Address, City, Region, PostalCode, Country, Phone, Fax \
         EmployeeID, OrderDate, RequiredDate, ShippedDate, ShipVia, Freight, ShipName, ShipAddress, \
         ShipCity, ShipRegion, ShipPostalCode, ShipCountry FROM cos://us-geo/sql/orders.parquet STORED AS PARQUET o, \
         cos://us-geo/sql/customers.parquet STORED AS PARQUET c \
         WHERE c.CustomerID = o.CustomerID \
         INTO {}customer_orders STORED AS PARQUET PARTITIONED BY (ShipCountry, ShipCity)'.format(targeturl)
formatted_etl_sql = sqlparse.format(etl_sql, reindent=True, indent_tabs=True, keyword_case='upper')
result = highlight(formatted_etl_sql, lexer, formatter)
print('\nExample ETL Statement is:\n')
print(result)


Example ETL Statement is:

[38;5;24;01mSELECT[39;00m [38;5;0mOrderID[39m[38;5;0;01m,[39;00m
	[38;5;24;01mc[39;00m[38;5;0;01m.[39;00m[38;5;0mCustomerID[39m [38;5;0mCustomerID[39m[38;5;0;01m,[39;00m
	[38;5;0mCompanyName[39m[38;5;0;01m,[39;00m
	[38;5;0mContactName[39m[38;5;0;01m,[39;00m
	[38;5;0mContactTitle[39m[38;5;0;01m,[39;00m
	[38;5;0mAddress[39m[38;5;0;01m,[39;00m
	[38;5;0mCity[39m[38;5;0;01m,[39;00m
	[38;5;0mRegion[39m[38;5;0;01m,[39;00m
	[38;5;0mPostalCode[39m[38;5;0;01m,[39;00m
	[38;5;0mCountry[39m[38;5;0;01m,[39;00m
	[38;5;0mPhone[39m[38;5;0;01m,[39;00m
	[38;5;0mFax[39m [38;5;0mEmployeeID[39m[38;5;0;01m,[39;00m
	[38;5;0mOrderDate[39m[38;5;0;01m,[39;00m
	[38;5;0mRequiredDate[39m[38;5;0;01m,[39;00m
	[38;5;0mShippedDate[39m[38;5;0;01m,[39;00m
	[38;5;0mShipVia[39m[38;5;0;01m,[39;00m
	[38;5;0mFreight[39m[38;5;0;01m,[39;00m
	[38;5;0mShipName[39m[38;5;0;01m,[39;00m
	[38;5;0mShipAddress[39m[38;5;0;01m,

In [16]:
jobId = sqlClient.submit_sql(etl_sql)
print("SQL query submitted and running in the background. jobId = " + jobId)
job_status = sqlClient.wait_for_job(jobId)
print("Job " + jobId + " terminated with status: " + job_status)
job_details = sqlClient.get_job(jobId)
if job_status == 'failed':
    print("Error: {}\nError Message: {}".format(job_details['error'], job_details['error_message']))

SQL query submitted and running in the background. jobId = 911ba35e-f979-48a8-99fa-40d7a9c7d816
Job 911ba35e-f979-48a8-99fa-40d7a9c7d816 terminated with status: completed


The following cell uses the `get_cos_summary()` method to print a summary of the objects that have been written by the previous ETL SQL statement. Note the **total_volume** value. We will reference it for comparison in the next steps.

In [17]:
resultset_location = job_details['resultset_location']
sqlClient.get_cos_summary(resultset_location)

{'url': 'cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/',
 'total_objects': 71,
 'total_volume': '401.7 KB',
 'oldest_object_timestamp': 'October 05, 2020, 19H:51M:43S',
 'newest_object_timestamp': 'October 05, 2020, 19H:51M:54S',
 'smallest_object_size': '0.0 B',
 'smallest_object': 's3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/_SUCCESS',
 'largest_object_size': '7.1 KB',
 'largest_object': 's3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=UK/ShipCity=London/part-00015-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195143_0129_m_000015_1416.c000.snappy.parquet'}

The following cell uses the `list_results()` method to print a list of the objects that have been written by the above ETL SQL statement. Note the partition columns and their values being part of the object names now. This naming convention is known as **hive style partitioning**. This type of partitioning is the basis for optimizing SQL queries using predicates that match with the partitioning columns.

In [18]:
pd.set_option('display.max_colwidth', -1)
result_objects_df = sqlClient.list_results(jobId)
print("List of objects written by ETL SQL:")
result_objects_df.head(200)

List of objects written by ETL SQL:


Unnamed: 0,Object,LastModified,Size,StorageClass,Bucket,ObjectURL
0,s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816,2020-10-05 19:51:41.846000+00:00,0,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816
1,s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Argentina/ShipCity=Buenos Aires/part-00065-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195145_0129_m_000065_1433.c000.snappy.parquet,2020-10-05 19:51:45.336000+00:00,6412,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Argentina/ShipCity=Buenos Aires/part-00065-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195145_0129_m_000065_1433.c000.snappy.parquet
2,s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Austria/ShipCity=Graz/part-00096-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195146_0129_m_000096_1442.c000.snappy.parquet,2020-10-05 19:51:46.110000+00:00,6094,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Austria/ShipCity=Graz/part-00096-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195146_0129_m_000096_1442.c000.snappy.parquet
3,s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Austria/ShipCity=Salzburg/part-00015-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195143_0129_m_000015_1416.c000.snappy.parquet,2020-10-05 19:51:43.649000+00:00,5636,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Austria/ShipCity=Salzburg/part-00015-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195143_0129_m_000015_1416.c000.snappy.parquet
4,s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Belgium/ShipCity=Bruxelles/part-00180-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195147_0129_m_000180_1465.c000.snappy.parquet,2020-10-05 19:51:48.119000+00:00,5660,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Belgium/ShipCity=Bruxelles/part-00180-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195147_0129_m_000180_1465.c000.snappy.parquet
5,s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Belgium/ShipCity=Charleroi/part-00006-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195142_0129_m_000006_1412.c000.snappy.parquet,2020-10-05 19:51:43.432000+00:00,5957,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Belgium/ShipCity=Charleroi/part-00006-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195142_0129_m_000006_1412.c000.snappy.parquet
6,s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Brazil/ShipCity=Campinas/part-00069-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195145_0129_m_000069_1434.c000.snappy.parquet,2020-10-05 19:51:45.583000+00:00,5715,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Brazil/ShipCity=Campinas/part-00069-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195145_0129_m_000069_1434.c000.snappy.parquet
7,s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Brazil/ShipCity=Resende/part-00042-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195144_0129_m_000042_1424.c000.snappy.parquet,2020-10-05 19:51:44.652000+00:00,5749,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Brazil/ShipCity=Resende/part-00042-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195144_0129_m_000042_1424.c000.snappy.parquet
8,s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Brazil/ShipCity=Rio de Janeiro/part-00198-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195148_0129_m_000198_1475.c000.snappy.parquet,2020-10-05 19:51:48.362000+00:00,6877,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Brazil/ShipCity=Rio de Janeiro/part-00198-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195148_0129_m_000198_1475.c000.snappy.parquet
9,s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Brazil/ShipCity=Sao Paulo/part-00102-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195146_0129_m_000102_1445.c000.snappy.parquet,2020-10-05 19:51:46.507000+00:00,6981,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816/ShipCountry=Brazil/ShipCity=Sao Paulo/part-00102-a9f55e31-62cf-4d55-8ef7-579df911cbb3-attempt_20201005195146_0129_m_000102_1445.c000.snappy.parquet


Now let's take a look at the result data with the `get_result()` method. Note that the result dataframe contains the two partitioning columns. The values for these have been put together by get_result() from the object names above because in hive style partitioning the partition column values are not stored in the objects but rather in the object names.

In [19]:
sqlClient.get_result(jobId).head(100)

Unnamed: 0,OrderID,CustomerID,CompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,...,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipRegion,ShipPostalCode,ShipCountry,ShipCity
0,10409,OCEAN,Océano Atlántico Ltda.,Yvonne Moncada,Sales Agent,Ing. Gustavo Moncada 8585 Piso 20-A,Buenos Aires,,1010,Argentina,...,1997-02-06 06:00:00,1997-01-14 00:00:00.000,1,29.83,Océano Atlántico Ltda.,Ing. Gustavo Moncada 8585 Piso 20-A,,1010,Argentina,Buenos Aires
1,10448,RANCH,Rancho grande,Sergio Gutiérrez,Sales Representative,Av. del Libertador 900,Buenos Aires,,1010,Argentina,...,1997-03-17 06:00:00,1997-02-24 00:00:00.000,2,38.82,Rancho grande,Av. del Libertador 900,,1010,Argentina,Buenos Aires
2,10521,CACTU,Cactus Comidas para llevar,Patricio Simpson,Sales Agent,Cerrito 333,Buenos Aires,,1010,Argentina,...,1997-05-27 05:00:00,1997-05-02 00:00:00.000,2,17.22,Cactus Comidas para llevar,Cerrito 333,,1010,Argentina,Buenos Aires
3,10531,OCEAN,Océano Atlántico Ltda.,Yvonne Moncada,Sales Agent,Ing. Gustavo Moncada 8585 Piso 20-A,Buenos Aires,,1010,Argentina,...,1997-06-05 05:00:00,1997-05-19 00:00:00.000,1,8.12,Océano Atlántico Ltda.,Ing. Gustavo Moncada 8585 Piso 20-A,,1010,Argentina,Buenos Aires
4,10716,RANCH,Rancho grande,Sergio Gutiérrez,Sales Representative,Av. del Libertador 900,Buenos Aires,,1010,Argentina,...,1997-11-21 06:00:00,1997-10-27 00:00:00.000,2,22.57,Rancho grande,Av. del Libertador 900,,1010,Argentina,Buenos Aires
5,10782,CACTU,Cactus Comidas para llevar,Patricio Simpson,Sales Agent,Cerrito 333,Buenos Aires,,1010,Argentina,...,1998-01-14 06:00:00,1997-12-22 00:00:00.000,3,1.10,Cactus Comidas para llevar,Cerrito 333,,1010,Argentina,Buenos Aires
6,10819,CACTU,Cactus Comidas para llevar,Patricio Simpson,Sales Agent,Cerrito 333,Buenos Aires,,1010,Argentina,...,1998-02-04 06:00:00,1998-01-16 00:00:00.000,3,19.76,Cactus Comidas para llevar,Cerrito 333,,1010,Argentina,Buenos Aires
7,10828,RANCH,Rancho grande,Sergio Gutiérrez,Sales Representative,Av. del Libertador 900,Buenos Aires,,1010,Argentina,...,1998-01-27 06:00:00,1998-02-04 00:00:00.000,1,90.85,Rancho grande,Av. del Libertador 900,,1010,Argentina,Buenos Aires
8,10881,CACTU,Cactus Comidas para llevar,Patricio Simpson,Sales Agent,Cerrito 333,Buenos Aires,,1010,Argentina,...,1998-03-11 06:00:00,1998-02-18 00:00:00.000,1,2.84,Cactus Comidas para llevar,Cerrito 333,,1010,Argentina,Buenos Aires
9,10898,OCEAN,Océano Atlántico Ltda.,Yvonne Moncada,Sales Agent,Ing. Gustavo Moncada 8585 Piso 20-A,Buenos Aires,,1010,Argentina,...,1998-03-20 06:00:00,1998-03-06 00:00:00.000,2,1.27,Océano Atlántico Ltda.,Ing. Gustavo Moncada 8585 Piso 20-A,,1010,Argentina,Buenos Aires


The following cell runs a new **optimized SQL** query against the **partitioned data** that has been produced by the previous ETL SQL statement. The query uses `WHERE` predicates on the columns that have been used to partition the results in the ETL job. The query will physically only read the objects that match these predicate values.

In [20]:
optimized_sql='SELECT * FROM {} STORED AS PARQUET WHERE ShipCountry = "Austria" AND ShipCity="Graz" \
               INTO {} STORED AS PARQUET'.format(resultset_location, targeturl)
formatted_optimized_sql = sqlparse.format(optimized_sql, reindent=True, indent_tabs=True, keyword_case='upper')
result = highlight(formatted_optimized_sql, lexer, formatter)
print('\nRunning SQL against the previously produced hive style partitioned objects as input:\n')
print(result)

jobId = sqlClient.submit_sql(optimized_sql)
job_status = sqlClient.wait_for_job(jobId)
print("Job " + jobId + " terminated with status: " + job_status)
job_details = sqlClient.get_job(jobId)
if job_status == 'failed':
    print("Error: {}\nError Message: {}".format(job_details['error'], job_details['error_message']))


Running SQL against the previously produced hive style partitioned objects as input:

[38;5;24;01mSELECT[39;00m [38;5;166;01m*[39;00m
[38;5;24;01mFROM[39;00m [38;5;0mcos[39m[38;5;0;01m:[39;00m[38;5;166;01m/[39;00m[38;5;166;01m/[39;00m[38;5;0ms3[39m[38;5;0;01m.[39;00m[38;5;0mus[39m[38;5;166;01m-[39;00m[38;5;0msouth[39m[38;5;0;01m.[39;00m[38;5;0mcloud[39m[38;5;166;01m-[39;00m[38;5;24;01mOBJECT[39;00m[38;5;166;01m-[39;00m[38;5;24;01mstorage[39;00m[38;5;0;01m.[39;00m[38;5;0mappdomain[39m[38;5;0;01m.[39;00m[38;5;0mcloud[39m[38;5;166;01m/[39;00m[38;5;0mcos[39m[38;5;166;01m-[39;00m[38;5;0mstandard[39m[38;5;166;01m-[39;00m[38;5;20;01m6[39;00m[38;5;0mil[39m[38;5;166;01m/[39;00m[38;5;0ms3[39m[38;5;0;01m.[39;00m[38;5;0mus[39m[38;5;166;01m-[39;00m[38;5;0msouth[39m[38;5;0;01m.[39;00m[38;5;0mcloud[39m[38;5;166;01m-[39;00m[38;5;24;01mOBJECT[39;00m[38;5;166;01m-[39;00m[38;5;24;01mstorage[39;00m[38;5;0;01m.[39;00m[38;

The following cell uses the `get_job()` method in order to show the job details of the just run optimized SQL that leverages hive style partitioning. Note the **bytes_read** value that is significantly lower than the **total_volume** value of the data in the queries data set. This does increase query performance and lower the query cost.

In [21]:
sqlClient.get_job(jobId)

{'job_id': 'd17b0e3d-0894-4bc2-8c20-06ad6fda20c2',
 'status': 'completed',
 'statement': 'SELECT * FROM cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816 STORED AS PARQUET WHERE ShipCountry = "Austria" AND ShipCity="Graz"                INTO cos://us-south/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud STORED AS PARQUET',
 'plan_id': 'ead0f7f5-0c96-40c0-9aae-63c4846d8188',
 'submit_time': '2020-10-05T19:52:14.095Z',
 'resultset_location': 'cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud/jobid=d17b0e3d-0894-4bc2-8c20-06ad6fda20c2',
 'rows_returned': 30,
 'rows_read': 30,
 'bytes_read': 6090,
 'resultset_format': 'parquet',
 'end_time': '2020-10-05T19:52:19.667Z',
 'user_id': 'ashley.zhao@ibm.com'}

### <a id="pagination"></a> 6. Paginated SQL Results
The next cell runs a simple join SQL. But this time `submit_sql()` is provided the optional **`pagesize`** parameter with a value of **`10`**. This results in multiple objects being written with each having 10 rows of the result in it. Internally this is achieved by using the SQL Query syntax clause of `PARTITIONED EVERY <num> ROWS`. This also means that your query cannot already contain another `PARTITIONED BY` clause.

In [22]:
pagination_sql='SELECT OrderID, c.CustomerID CustomerID, CompanyName, City, Region, PostalCode \
                FROM cos://us-geo/sql/orders.parquet STORED AS PARQUET o, \
                     cos://us-geo/sql/customers.parquet STORED AS PARQUET c \
                WHERE c.CustomerID = o.CustomerID \
                INTO {}paginated_orders STORED AS PARQUET'.format(targeturl)
formatted_etl_sql = sqlparse.format(etl_sql, reindent=True, indent_tabs=True, keyword_case='upper')
result = highlight(formatted_etl_sql, lexer, formatter)
print('\nExample ETL Statement is:\n')
print(result)

jobId = sqlClient.submit_sql(pagination_sql, pagesize=10)
job_status = sqlClient.wait_for_job(jobId)
print("Job " + jobId + " terminated with status: " + job_status)
job_details = sqlClient.get_job(jobId)
if job_status == 'failed':
    print("Error: {}\nError Message: {}".format(job_details['error'], job_details['error_message']))


Example ETL Statement is:

[38;5;24;01mSELECT[39;00m [38;5;0mOrderID[39m[38;5;0;01m,[39;00m
	[38;5;24;01mc[39;00m[38;5;0;01m.[39;00m[38;5;0mCustomerID[39m [38;5;0mCustomerID[39m[38;5;0;01m,[39;00m
	[38;5;0mCompanyName[39m[38;5;0;01m,[39;00m
	[38;5;0mContactName[39m[38;5;0;01m,[39;00m
	[38;5;0mContactTitle[39m[38;5;0;01m,[39;00m
	[38;5;0mAddress[39m[38;5;0;01m,[39;00m
	[38;5;0mCity[39m[38;5;0;01m,[39;00m
	[38;5;0mRegion[39m[38;5;0;01m,[39;00m
	[38;5;0mPostalCode[39m[38;5;0;01m,[39;00m
	[38;5;0mCountry[39m[38;5;0;01m,[39;00m
	[38;5;0mPhone[39m[38;5;0;01m,[39;00m
	[38;5;0mFax[39m [38;5;0mEmployeeID[39m[38;5;0;01m,[39;00m
	[38;5;0mOrderDate[39m[38;5;0;01m,[39;00m
	[38;5;0mRequiredDate[39m[38;5;0;01m,[39;00m
	[38;5;0mShippedDate[39m[38;5;0;01m,[39;00m
	[38;5;0mShipVia[39m[38;5;0;01m,[39;00m
	[38;5;0mFreight[39m[38;5;0;01m,[39;00m
	[38;5;0mShipName[39m[38;5;0;01m,[39;00m
	[38;5;0mShipAddress[39m[38;5;0;01m,

Let's check how many pages with each 10 rows have been written:

In [23]:
print("Number of pages written by job {}: {}".format(jobId, len(sqlClient.list_results(jobId))))

Number of pages written by job 986bf4b1-54e9-4097-9719-3cd4f3882ad5: 85


The following cell retrieves the first page of the result as a data frame. The desired page is specified as the optional parameter **`pagenumber`** to the `get_result()` method.

In [24]:
pagenumber=1
sqlClient.get_result(jobId, pagenumber=pagenumber).head(100)

Unnamed: 0,OrderID,CustomerID,CompanyName,City,Region,PostalCode
0,11011,ALFKI,Alfreds Futterkiste,Berlin,,12209
1,10952,ALFKI,Alfreds Futterkiste,Berlin,,12209
2,10835,ALFKI,Alfreds Futterkiste,Berlin,,12209
3,10702,ALFKI,Alfreds Futterkiste,Berlin,,12209
4,10692,ALFKI,Alfreds Futterkiste,Berlin,,12209
5,10643,ALFKI,Alfreds Futterkiste,Berlin,,12209
6,10926,ANATR,Ana Trujillo Emparedados y helados,México D.F.,,5021
7,10759,ANATR,Ana Trujillo Emparedados y helados,México D.F.,,5021
8,10625,ANATR,Ana Trujillo Emparedados y helados,México D.F.,,5021
9,10308,ANATR,Ana Trujillo Emparedados y helados,México D.F.,,5021


The following cell gets the next page. Run it multiple times in order to retrieve the subsequent pages, one page after the another.

In [25]:
pagenumber+=1
sqlClient.get_result(jobId, pagenumber).head(100)

Unnamed: 0,OrderID,CustomerID,CompanyName,City,Region,PostalCode
0,10856,ANTON,Antonio Moreno Taquería,México D.F.,,05023
1,10682,ANTON,Antonio Moreno Taquería,México D.F.,,05023
2,10677,ANTON,Antonio Moreno Taquería,México D.F.,,05023
3,10573,ANTON,Antonio Moreno Taquería,México D.F.,,05023
4,10535,ANTON,Antonio Moreno Taquería,México D.F.,,05023
5,10507,ANTON,Antonio Moreno Taquería,México D.F.,,05023
6,10365,ANTON,Antonio Moreno Taquería,México D.F.,,05023
7,11016,AROUT,Around the Horn,London,,WA1 1DP
8,10953,AROUT,Around the Horn,London,,WA1 1DP
9,10920,AROUT,Around the Horn,London,,WA1 1DP


### <a id="results"></a> 7. Working with SQL result objects
This section shows some useful methods to work with result objects of SQLs. The following cell runs a simple SQL that produces a single partitioned parquet result:

In [26]:
sql="SELECT * FROM cos://us-geo/sql/orders.parquet STORED AS PARQUET LIMIT 100 INTO {}first100orders.parquet JOBPREFIX NONE STORED AS PARQUET".format(targeturl)
jobId = sqlClient.submit_sql(sql)
sqlClient.wait_for_job(jobId)

'completed'

The next cell lists the result objects produced by the simple SQL job using the method `list_results()`:

In [27]:
sqlClient.list_results(jobId).head(100)

Unnamed: 0,Object,LastModified,Size,StorageClass,Bucket,ObjectURL
0,s3.us-south.cloud-object-storage.appdomain.cloudfirst100orders.parquet,2020-10-05 19:52:44.821000+00:00,0,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudfirst100orders.parquet
1,s3.us-south.cloud-object-storage.appdomain.cloudfirst100orders.parquet/_SUCCESS,2020-10-05 19:52:46.458000+00:00,0,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudfirst100orders.parquet/_SUCCESS
2,s3.us-south.cloud-object-storage.appdomain.cloudfirst100orders.parquet/part-00000-a78a4b33-dcb2-4cd7-b674-10d045fd6293-c000-attempt_20201005195245_0013_m_000000_19.snappy.parquet,2020-10-05 19:52:46.306000+00:00,10271,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudfirst100orders.parquet/part-00000-a78a4b33-dcb2-4cd7-b674-10d045fd6293-c000-attempt_20201005195245_0013_m_000000_19.snappy.parquet


As can be seen the SQL job always produces a logical hierarchy of object names. The path specified in the URI of the `INTO` clause above is used as a logical root "folder", which is nothing else than an empty object. It will normally include a suffix containing the unique jobId that produced this result. But because the queryabove used the optional clause `JOBPREFIX NONE` this suffix is ommitted, which means each time the same targer path will be used with that clause the previous list of objects in that logical folder will be overwritten.

#### Exact target result
If the result of the SQL job was not written in a partitioned way, you can now optionally restructure the objects to a single object with the exact name that was specified in the `INTO` clause of the SQL job. Thisis achieved with the method `rename_exact_result()`:

In [28]:
sqlClient.rename_exact_result(jobId)
sqlClient.list_results(jobId).head(100)

Unnamed: 0,Object,LastModified,Size,StorageClass,Bucket,ObjectURL
0,s3.us-south.cloud-object-storage.appdomain.cloudfirst100orders.parquet,2020-10-05 19:52:51.294000+00:00,10271,STANDARD,cos-standard-6il,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudfirst100orders.parquet


As can be seen the result of the previous SQL is now exactly one object with the exact name from the path of the `INTO` clause.

### <a id="joblist"></a> 8. Working with your SQL Job Submission History
The following cell uses the `get_cos_summary()` method to get a statistical overview of the data in the **target location** in COS that has been used by the above queries in this notebook.

In [29]:
sqlClient.get_cos_summary(targeturl)

{'url': 'cos://us-south/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud/',
 'total_objects': 100,
 'total_volume': '23.4 MB',
 'oldest_object_timestamp': 'November 26, 2019, 03H:30M:47S',
 'newest_object_timestamp': 'October 05, 2020, 19H:52M:18S',
 'smallest_object_size': '0.0 B',
 'smallest_object': 's3.us-south.cloud-object-storage.appdomain.cloud/jobid=034070c9-5d06-431d-afec-6be1a61809e4',
 'largest_object_size': '5.4 MB',
 'largest_object': 's3.us-south.cloud-object-storage.appdomain.cloud/jobid=241ded35-a36b-40a7-a3ee-90681ea0a5cb/part-00000-8117c764-e8a6-47b3-8da5-bc7cf0399b87-c000-attempt_20191128211109_0050_m_000000_0.json'}

The method `get_jobs()` provides you a dataframe with the **30 most recent SQL submissions** with all details. You can change the value `-1`for `display.max_colwidth` to a positive integer if you want to truncate the cell content to shrink the overall table display size.

In [30]:
pd.set_option('display.max_colwidth', -1)
job_history_df = sqlClient.get_jobs()
job_history_df.head(100)

Unnamed: 0,job_id,status,user_id,statement,resultset_location,submit_time,end_time,rows_read,rows_returned,bytes_read,objects_skipped,objects_qualified,error,error_message
0,62776875-91f0-43fc-a561-7a20c3c4b5c8,completed,ashley.zhao@ibm.com,SELECT * FROM cos://us-geo/sql/orders.parquet STORED AS PARQUET LIMIT 100 INTO cos://us-south/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudfirst100orders.parquet JOBPREFIX NONE STORED AS PARQUET,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudfirst100orders.parquet,2020-10-05T19:52:41.287Z,2020-10-05T19:52:48.154Z,830.0,100,30606.0,,,,
1,986bf4b1-54e9-4097-9719-3cd4f3882ad5,completed,ashley.zhao@ibm.com,"SELECT OrderID, c.CustomerID CustomerID, CompanyName, City, Region, PostalCode FROM cos://us-geo/sql/orders.parquet STORED AS PARQUET o, cos://us-geo/sql/customers.parquet STORED AS PARQUET c WHERE c.CustomerID = o.CustomerID INTO cos://us-south/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudpaginated_orders STORED AS PARQUET PARTITIONED EVERY 10 ROWS",cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudpaginated_orders/jobid=986bf4b1-54e9-4097-9719-3cd4f3882ad5,2020-10-05T19:52:21.619Z,2020-10-05T19:52:38.156Z,921.0,830,17111.0,,,,
2,d17b0e3d-0894-4bc2-8c20-06ad6fda20c2,completed,ashley.zhao@ibm.com,"SELECT * FROM cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816 STORED AS PARQUET WHERE ShipCountry = ""Austria"" AND ShipCity=""Graz"" INTO cos://us-south/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud STORED AS PARQUET",cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud/jobid=d17b0e3d-0894-4bc2-8c20-06ad6fda20c2,2020-10-05T19:52:14.095Z,2020-10-05T19:52:19.667Z,30.0,30,6090.0,,,,
3,911ba35e-f979-48a8-99fa-40d7a9c7d816,completed,ashley.zhao@ibm.com,"SELECT OrderID, c.CustomerID CustomerID, CompanyName, ContactName, ContactTitle, Address, City, Region, PostalCode, Country, Phone, Fax EmployeeID, OrderDate, RequiredDate, ShippedDate, ShipVia, Freight, ShipName, ShipAddress, ShipCity, ShipRegion, ShipPostalCode, ShipCountry FROM cos://us-geo/sql/orders.parquet STORED AS PARQUET o, cos://us-geo/sql/customers.parquet STORED AS PARQUET c WHERE c.CustomerID = o.CustomerID INTO cos://us-south/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders STORED AS PARQUET PARTITIONED BY (ShipCountry, ShipCity)",cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudcustomer_orders/jobid=911ba35e-f979-48a8-99fa-40d7a9c7d816,2020-10-05T19:51:38.318Z,2020-10-05T19:52:04.791Z,921.0,830,43058.0,,,,
4,16503ee4-a074-4b15-b558-b11513ceabd9,completed,ashley.zhao@ibm.com,"SELECT o.OrderID, c.CompanyName, e.FirstName, e.LastName FROM cos://us-geo/sql/orders.parquet STORED AS PARQUET o, cos://us-geo/sql/employees.parquet STORED AS PARQUET e, cos://us-geo/sql/customers.parquet STORED AS PARQUET c WHERE e.EmployeeID = o.EmployeeID AND c.CustomerID = o.CustomerID AND o.ShippedDate > o.RequiredDate AND o.OrderDate > ""1998-01-01"" ORDER BY c.CompanyName INTO cos://us-south/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud STORED AS CSV",cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud/jobid=16503ee4-a074-4b15-b558-b11513ceabd9,2020-10-05T19:51:25.248Z,2020-10-05T19:51:34.918Z,1760.0,29,41499.0,,,,
5,26b82547-ddaa-4853-80ff-d34f053f8fcf,completed,ashley.zhao@ibm.com,"SELECT o.OrderID, c.CompanyName, e.FirstName, e.LastName FROM cos://us-geo/sql/orders.parquet STORED AS PARQUET o, cos://us-geo/sql/employees.parquet STORED AS PARQUET e, cos://us-geo/sql/customers.parquet STORED AS PARQUET c WHERE e.EmployeeID = o.EmployeeID AND c.CustomerID = o.CustomerID AND o.ShippedDate > o.RequiredDate AND o.OrderDate > ""1998-01-01"" ORDER BY c.CompanyName INTO cos://us-south/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud STORED AS CSV",cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud/jobid=26b82547-ddaa-4853-80ff-d34f053f8fcf,2020-10-05T19:51:08.221Z,2020-10-05T19:51:22.582Z,1760.0,29,41499.0,,,,
6,3a9e4a81-99f5-4e4b-94b5-b5a4ba2e6cc0,completed,ashley.zhao@ibm.com,"select 'Before Disruption' as period\n ,s.supplierID, s.companyName as supplier_company_name, s.region as supplier_region\n ,p.productID, p.productName\n\t ,o.orderID\n\t ,c.customerID, c.companyName as customer_company_name, c.region as customer_region, c.country as customer_country\n from cos://us-geo/sql/suppliers.parquet stored as parquet s, \n cos://us-geo/sql/products.parquet stored as parquet p,\n cos://us-geo/sql/order_details.parquet stored as parquet od,\n cos://us-geo/sql/orders.parquet stored as parquet o,\n cos://us-geo/sql/customers.parquet stored as parquet c\n where s.supplierID = p.supplierID \n\t and p.productID = od.productID\n and od.orderID = o.orderID \n and o.customerID = c.customerID\n and s.country='USA'\n\t \n\t UNION\n\t \n\t select 'After Disruption' as period\n\t ,s.supplierID, s.companyName as supplier_company_name, s.region as supplier_region\n ,p.productID, p.productName\n\t ,o.orderID\n\t ,c.customerID, c.companyName as customer_company_name, c.region as customer_region, c.country as customer_country\n from cos://us-geo/sql/suppliers.parquet stored as parquet s, \n cos://us-geo/sql/products.parquet stored as parquet p,\n cos://us-geo/sql/order_details.parquet stored as parquet od,\n cos://us-geo/sql/orders.parquet stored as parquet o,\n cos://us-geo/sql/customers.parquet stored as parquet c\n where s.supplierID = p.supplierID \n\t and p.productID = od.productID\n and od.orderID = o.orderID \n and o.customerID = c.customerID\n and s.country='USA' and s.region <> 'LA' INTO cos://us-geo/notebooks-donotdelete-pr-thri4xhqi5ofdi/ STORED AS CSV",cos://s3.us.cloud-object-storage.appdomain.cloud/notebooks-donotdelete-pr-thri4xhqi5ofdi/jobid=3a9e4a81-99f5-4e4b-94b5-b5a4ba2e6cc0,2020-10-05T19:16:02.984Z,2020-10-05T19:16:23.261Z,3288.0,484,45422.0,,,,
7,71f8291a-04f4-4392-abd2-b969cfd1d0c6,completed,ashley.zhao@ibm.com,"select 'Before Disruption' as period\n ,s.supplierID, s.companyName as supplier_company_name, s.region as supplier_region\n ,p.productID, p.productName\n\t ,o.orderID\n\t ,c.customerID, c.companyName as customer_company_name, c.region as customer_region, c.country as customer_country\n from cos://us-geo/sql/suppliers.parquet stored as parquet s, \n cos://us-geo/sql/products.parquet stored as parquet p,\n cos://us-geo/sql/order_details.parquet stored as parquet od,\n cos://us-geo/sql/orders.parquet stored as parquet o,\n cos://us-geo/sql/customers.parquet stored as parquet c\n where s.supplierID = p.supplierID \n\t and p.productID = od.productID\n and od.orderID = o.orderID \n and o.customerID = c.customerID\n and s.country='USA'\n\t \n\t UNION\n\t \n\t select 'After Disruption' as period\n\t ,s.supplierID, s.companyName as supplier_company_name, s.region as supplier_region\n ,p.productID, p.productName\n\t ,o.orderID\n\t ,c.customerID, c.companyName as customer_company_name, c.region as customer_region, c.country as customer_country\n from cos://us-geo/sql/suppliers.parquet stored as parquet s, \n cos://us-geo/sql/products.parquet stored as parquet p,\n cos://us-geo/sql/order_details.parquet stored as parquet od,\n cos://us-geo/sql/orders.parquet stored as parquet o,\n cos://us-geo/sql/customers.parquet stored as parquet c\n where s.supplierID = p.supplierID \n\t and p.productID = od.productID\n and od.orderID = o.orderID \n and o.customerID = c.customerID\n and s.country='USA' and s.region <> 'LA' INTO cos://us-geo/notebooks-donotdelete-pr-thri4xhqi5ofdi/ STORED AS CSV",cos://s3.us.cloud-object-storage.appdomain.cloud/notebooks-donotdelete-pr-thri4xhqi5ofdi/jobid=71f8291a-04f4-4392-abd2-b969cfd1d0c6,2020-07-15T18:05:51.874Z,2020-07-15T18:06:17.962Z,3288.0,484,45422.0,,,,
8,4b399f41-eab5-42d5-ace2-c8ee499042e9,completed,ashley.zhao@ibm.com,SELECT * FROM cos://us-south/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudmy_job_history/ STORED AS PARQUET INTO cos://us-south/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud STORED AS PARQUET,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloud/jobid=4b399f41-eab5-42d5-ace2-c8ee499042e9,2020-02-18T21:56:36.438Z,2020-02-18T21:56:57.719Z,37.0,37,34256.0,,,,
9,83dcc961-b070-4de7-b127-b45ed99b579d,completed,ashley.zhao@ibm.com,SELECT * FROM cos://us-geo/sql/orders.parquet STORED AS PARQUET LIMIT 100 INTO cos://us-south/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudfirst100orders.parquet JOBPREFIX NONE STORED AS PARQUET,cos://s3.us-south.cloud-object-storage.appdomain.cloud/cos-standard-6il/s3.us-south.cloud-object-storage.appdomain.cloudfirst100orders.parquet,2020-02-18T21:56:01.024Z,2020-02-18T21:56:20.551Z,830.0,100,30606.0,,,,


In [31]:
sqlClient.export_job_history(targeturl + "my_job_history/")

Exported 9 new jobs


In [32]:
pd.set_option('display.max_colwidth', 20)
sql = "SELECT * FROM {}my_job_history/ STORED AS PARQUET INTO {} STORED AS PARQUET".format(targeturl, targeturl)
sqlClient.run_sql(sql)

Unnamed: 0,job_id,status,user_id,statement,resultset_location,submit_time,end_time,rows_read,rows_returned,bytes_read,error,error_message,__index_level_0__
0,62776875-91f0-43...,completed,ashley.zhao@ibm.com,SELECT * FROM co...,cos://s3.us-sout...,2020-10-05T19:52...,2020-10-05T19:52...,830.0,100.0,30606.0,,,
1,986bf4b1-54e9-40...,completed,ashley.zhao@ibm.com,"SELECT OrderID, ...",cos://s3.us-sout...,2020-10-05T19:52...,2020-10-05T19:52...,921.0,830.0,17111.0,,,
2,d17b0e3d-0894-4b...,completed,ashley.zhao@ibm.com,SELECT * FROM co...,cos://s3.us-sout...,2020-10-05T19:52...,2020-10-05T19:52...,30.0,30.0,6090.0,,,
3,911ba35e-f979-48...,completed,ashley.zhao@ibm.com,"SELECT OrderID, ...",cos://s3.us-sout...,2020-10-05T19:51...,2020-10-05T19:52...,921.0,830.0,43058.0,,,
4,16503ee4-a074-4b...,completed,ashley.zhao@ibm.com,SELECT o.OrderID...,cos://s3.us-sout...,2020-10-05T19:51...,2020-10-05T19:51...,1760.0,29.0,41499.0,,,
5,26b82547-ddaa-48...,completed,ashley.zhao@ibm.com,SELECT o.OrderID...,cos://s3.us-sout...,2020-10-05T19:51...,2020-10-05T19:51...,1760.0,29.0,41499.0,,,
6,3a9e4a81-99f5-4e...,completed,ashley.zhao@ibm.com,select 'Before D...,cos://s3.us.clou...,2020-10-05T19:16...,2020-10-05T19:16...,3288.0,484.0,45422.0,,,
7,71f8291a-04f4-43...,completed,ashley.zhao@ibm.com,select 'Before D...,cos://s3.us.clou...,2020-07-15T18:05...,2020-07-15T18:06...,3288.0,484.0,45422.0,,,
8,4b399f41-eab5-42...,completed,ashley.zhao@ibm.com,SELECT * FROM co...,cos://s3.us-sout...,2020-02-18T21:56...,2020-02-18T21:56...,37.0,37.0,34256.0,,,
9,4b3ac0e4-4f1c-4f...,completed,ashley.zhao@ibm.com,SELECT * FROM co...,cos://s3.us-sout...,2020-02-18T17:05...,2020-02-18T17:05...,830.0,100.0,30606.0,,,0.0


### <a id="next"></a> 9. Next steps
In this notebook you learned how you can use the `ibmcloudsql` library in a Python notebook to submit SQL queries on data in IBM Cloud Object Storage and how you can interact with the query results. If you want to automate such an SQL query execution as part of your cloud solution, you can use the <a href="https://console.bluemix.net/openwhisk/" target="_blank">IBM Cloud Functions</a> framework. There is a dedicated SQL function available that lets you set up a cloud function to run SQL statements with IBM Cloud SQL Query. You can find the documentation for doing this <a href="https://hub.docker.com/r/ibmfunctions/sqlquery/" target="_blank" rel="noopener noreferrer">here</a>.

### <a id="authors"></a>Authors

**Torsten Steinbach**, Torsten is the lead architect for IBM Cloud Data Lake. Previously he has worked as IBM architect for a series of data management products and services, including DB2, PureData for Analytics and Db2 on Cloud.

<hr>
Copyright &copy; IBM Corp. 2020. This notebook and its source code are released under the terms of the MIT License.

<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>