<img align="left" src="https://github.com/Db2-DTE-POC/CPDDVHOL4/blob/main/media/Assets&ArchHeader.jpeg?raw=true">

# Introduction
Welcome to the IBM Cloud Pak for Data Multi-Cloud Virtualization Caching Hands-on Lab. 

In this lab you analyze the performance of Data Virtualization using a variety of caching techniques.

This hands-on lab uses live databases, were data is “virtually” available through the IBM Cloud Pak for Data Virtualization Service. It is an advanced Data Virtualization topic and uses the administrator userid, which is required to manage cached data. 

### Where to find this sample online
You can find a copy of this notebook on GITHUB at https://github.com/Db2-DTE-POC/CPDDVHOL4.

## The Data and the Business Problem
This hands-on lab analyzes details records of detailed flight time data for all the domestic flights in the United States from 2009 to 2015. The total number of flight time data is almost 43 millions rows of data. There are additional dimension tables to support this large fact table. The table used to supplement the ONTIME fact table is the AIRCRAFT dimension table that includes details on each aircraft with entries in the ONTIME table. 

From the almost 43 million flights, the lab analyzes flights between New Jersy and California for the Boeing 737. 

## In this lab you learn how to:

* Navigate to the Cloud Pak for Data Data Virtualization Cache Management user interface
* Use the Cloud Pak for Data RESTful API to control the existing ONTIME database caches
* Explore Views created to capture and "reduce" the working dataset of ONTIME fligth records
* Understand how caches were created based on those views
* Compare performance between with and without caches
* Compare performance using complete and targetted caches
* Use the RESTful API to expore and refresh caches

## Getting Started

## Using Jupyter notebooks
You are now officially using a Jupyter notebook! If this is your first time using a Jupyter notebook you might want to go through the Db2 Data Management Console Hands on Lab at www.ibm.biz/DMCDemosPOT. It includes an introduction to using Jupyter notebooks with the Db2 family. The introduction shows you some of the basics of using a notebook, including how to create the cells, run code, and save files for future use. 

Jupyter notebooks are based on IPython which started in development in the 2006/7 timeframe. The existing Python interpreter was limited in functionality and work was started to create a richer development environment. By 2011 the development efforts resulted in IPython being released (http://blog.fperez.org/2012/01/ipython-notebook-historical.html).

Jupyter notebooks were a spinoff (2014) from the original IPython project. IPython continues to be the kernel that Jupyter runs on, but the notebooks are now a project on their own.

Jupyter notebooks run in a browser and communicate to the backend IPython server which renders this content. These notebooks are used extensively by data scientists and anyone wanting to document, plot, and execute their code in an interactive environment. The beauty of Jupyter notebooks is that you document what you do as you go along.

## Connecting to IBM Cloud Pak for Data
For this lab you use the admin userid.
* **Engineer:**
    * ID: admin
    * PASSWORD: CP4DDataFabric

If you have this notebook open, you should have already signed in using the admin userid. 
1. To check your userid, click the icon at the very top right of the webpage. It will look something like this:

    <img src="https://github.com/Db2-DTE-POC/CPDDVHOL4/blob/main/media/11.06.10 EngineerUserIcon.png?raw=true">

2. Click **Profile and settings**
3. Click **Permissions** and review the user permissions for this user


### Exploring the ONTIME dataset
Let's start by looking at the the ONTIME dataset that has already been virtualized. 

You should now have this Hands-on Lab notebook on the left side of your screen and the Cloud Pak for Data Console on the right side of your screen. In the Cloud Pak for Data Console:

1. Click the three bar (hamburger) menu at the top left of the console
2. Click on the Data menu item if is not already expanded
3. Right click **Data Virtualization** and select **Open in New Window**
4. Arrange your windows so that notebook is on one side of your screen and the Cloud Pak Data Virtualization Console is on the other side. This makes it easier to follow the instructions without having to jump back and forth between the notebook and the console.
    <img src="https://github.com/Db2-DTE-POC/CPDDVHOL4/blob/main/media/Caching%20HOL/SidebySide.png?raw=true">
5. If you don't see the page above, click the **Data Virtualization** menu in the Cloud Pak for Data Console and select **Data Sources**.
4. Click **Constellation View**. A spider diagram of the connected data sources opens. 
    <img src="https://github.com/Db2-DTE-POC/CPDDVHOL4/blob/main/media/ConstellationView.png?raw=true">

    This displays the Data Source Graph with numerous active data sources. The ONTIME dataset is sourced from:
    * 5 Db2 Databases hosted on premises and accessed throught a remote connector. Each database contains one year of ONTIME flight data. 
    * Db2 Warehouse Database on Cloud Pak for Data
    * Netezza Performance Server on the Public Cloud
    * EDB Postgres Database on Premises
    * Virtualized CSV files on Premises

There five ONTIME Db2 Databases have been combined ("folded") into a single virtual table called ONTIME.ONTIME1115. It includes flight records for the years 2011 to 2015. One year from each database. The Db2 Warehouse on Cloud Pak for Data includes all the years of flight records from 2009 to 2015. However the view ONTIME.ONTIME0910 retrieve only the years 2009 and 2010 from the single warehouse database. The ONTIME.ONTIME view combined the data from the ONTIME.ONTIME1115 and the ONTIME.ONTIME0910 views. 

# Exploring the Data Virtualization User Interface

<img src="https://github.com/Db2-DTE-POC/CPDDVHOL4/blob/main/media/Caching%20HOL/CacheManagement.png?raw=true">
<img src="https://github.com/Db2-DTE-POC/CPDDVHOL4/blob/main/media/Caching%20HOL/CacheMenu.png?raw=true">
<img src="https://github.com/Db2-DTE-POC/CPDDVHOL4/blob/main/media/Caching%20HOL/CacheDetail.png?raw=true">
<img src="https://github.com/Db2-DTE-POC/CPDDVHOL4/blob/main/media/Caching%20HOL/Queries.png?raw=true">
<img src="https://github.com/Db2-DTE-POC/CPDDVHOL4/blob/main/media/Caching%20HOL/FullQuery.png?raw=true">
<img src="https://github.com/Db2-DTE-POC/CPDDVHOL4/blob/main/media/Caching%20HOL/QueryDetail.png?raw=true">


# Running Scripted Performance Benchmarks with and Without Caching

The IBM Cloud Pak for Data Console is only one way you can interact with the Virtualization service. IBM Cloud Pak for Data is built on a set of microservices that communicate with each other and the Console user interface using RESTful APIs. You can use these services to automate anything you can do throught the user interface.

This Jupyter Notebook contains examples of how to use the Open APIs to retrieve information from the virtualization service, how to run SQL statements directly against the service through REST and how to provide authoritization to objects. This provides a way write your own script to automate the setup and configuration of the virtualization service. 

### Load the REST API Class
The next part of the lab relies on a set of base classes to help you interact with the RESTful Services API for IBM Cloud Pak for Data Virtualization. You can access this library on GITHUB. The commands below download the library and run them as part of this notebook.
<pre>
&#37;run CPDDVRestClassV402.ipynb
</pre>
The cell below loads the RESTful Service Classes and methods directly from GITHUB. Note that it will take a few seconds for the extension to load, so you should generally wait until the "Db2 Extensions Loaded" message is displayed in your notebook. You can click on the following like to browse the RESTful Services class file: https://github.com/Db2-DTE-POC/CPDDVHOL4/blob/main/RESTfulEndpointServiceClass402.ipynb. You are free to download and reuse this sample for your own applications.

1. Click the cell below
2. Click **Run**

In [11]:
!wget -O CPDDVRestClassV402.ipynb https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVHOL4/main/CPDDVRestAPIClass402.ipynb
%run CPDDVRestClassV402.ipynb

--2021-12-06 22:00:56--  https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVHOL4/main/CPDDVRestAPIClass402.ipynb
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15024 (15K) [text/plain]
Saving to: ‘CPDDVRestClassV402.ipynb’


2021-12-06 22:00:57 (56.2 MB/s) - ‘CPDDVRestClassV402.ipynb’ saved [15024/15024]



### Connect to the DV RESTful Service
To connect to the Data Virtualization service you need to provide the URL, the service name (v1) and profile the console user name and password.

In [16]:
# Set the service URL to connect from inside the ICPD Cluster
Console  = 'https://cpd-cpd-instance.apps.demo.ibmdte.net:31192'

# Connect to the Db2 Data Management Console service
user     = 'admin'
password = 'CP4DDataFabric'

# Set up the required connection
databaseAPI = DVRESTAPI(Console)
api = '/v1'
databaseAPI.authenticate(api, user, password)
database = Console

Token Retrieved


### Virtualized Tables and Views
This next call retrieves all the virtualized tables and views available to the userid that you use to connect to the service. In this example the whole call is included in the DVRESTAPI class library and returned as a complete Dataframe ready for display or to be used for analysis or administration.

In [17]:
### Display Virtualized Tables and Views 
display(databaseAPI.getVirtualizedTablesDF())
display(databaseAPI.getVirtualizedViewsDF())

Unnamed: 0,create_time,data_source_table_name,grantable,owner,stats_time,table_name,table_schema
0,1635947542985,CUSTOMER,Y,ADMIN,,CUSTOMER,DB2ONPREMISES
1,1635961450415,ACCOUNTS,Y,ADMIN,,ACCOUNTS,DB2OLTPONCPD
2,1635948060514,stock_symbols,Y,ADMIN,,STOCK_SYMBOLS,MYSQL
3,1635947935938,customer,Y,ADMIN,,CUSTOMER,MYSQL
4,1635947665180,STOCK_TRANSACTIONS,Y,ADMIN,,STOCK_TRANSACTIONS,DB2ONPREMISES
...,...,...,...,...,...,...,...
80,1636385134011,DEMOGRAPHICS,Y,ADMIN,,Demographics,DB2CHURN
81,1636496026378,ONTIME,Y,ADMIN,,ONTIME,DB2WONCPD
82,1636496026471,AIRLINE_ID,Y,ADMIN,,AIRLINE_ID,DB2WONCPD
83,1636496027236,CANCELLATION,Y,ADMIN,,CANCELLATION,DB2WONCPD


Unnamed: 0,create_time,grantable,owner,viewname,viewschema
0,1637594958041,Y,ADMIN,CUSTOMER,MONGO
1,1636562977952,Y,ADMIN,AIRCRAFT,ONTIME
2,1636563269752,Y,ADMIN,BOEING737,ONTIME
3,1636497353224,Y,ADMIN,ONTIME,ONTIME
4,1636497308454,Y,ADMIN,ONTIME0910,ONTIME
5,1636989196845,Y,ADMIN,ONTIME737NJCA,ONTIME
6,1636379241450,Y,ADMIN,DB2LOOK_INFO_V,SYSTOOLS
7,1637961605059,Y,ADMIN,ACCOUNTS,TRADING
8,1637961604312,Y,ADMIN,CUSTOMER,TRADING
9,1637961604464,Y,ADMIN,PORTFOLIO,TRADING


### Get a list of the Data Virtualization Caches
A single call to the DVRESTAPI class returns all the information available about all the caches. 

In [18]:
display(databaseAPI.getCaches())

Unnamed: 0,name,id,query,owner_id,type,created_timestamp,last_modified_timestamp,last_refresh_timestamp,last_used_timestamp,state,size,cardinality,time_taken_for_refresh,refresh_count,hit_count,refresh_schedule,refresh_schedule_desc,status_msg
0,ONTIME0915,DV202111161222469697,SELECT * FROM ONTIME.ONTIME,ADMIN,U,2021-11-16 12:22:46.969151,2021-11-16 15:54:05.80047,2021-11-16 12:59:22.006647,2021-11-16 15:50:31.796,Disabled,1309298,30238851,0,0,3,,,
1,BOEING737,DV20211110165634403313,SELECT * FROM ONTIME.BOEING737,ADMIN,U,2021-11-10 16:56:34.402815,2021-12-03 16:57:55.977122,2021-11-10 18:30:21.411623,2021-12-03 16:58:05.092,Enabled,56967,4172450,0,0,16,,,The current transaction has been rolled back b...
2,ONTIME737NJCA,DV20211115151427714635,SELECT * FROM ONTIME.ONTIME737NJCA,ADMIN,U,2021-11-15 15:14:27.714001,2021-12-03 16:57:55.969392,2021-11-15 15:33:30.254003,2021-12-03 16:58:10.365,Enabled,554,37621,0,0,24,,,The current transaction has been rolled back b...
6,ONTIME0910,DV20211110144113849337,SELECT * FROM ONTIME.ONTIME0910,ADMIN,U,2021-11-10 14:41:13.848923,2021-12-03 16:57:04.432383,2021-11-10 16:32:41.327378,2021-12-03 16:57:54.857,Enabled,591398,12752436,0,0,26,,,The current transaction has been rolled back b...
7,AIRCRAFT,DV20211115221309388288,SELECT * FROM ONTIME.AIRCRAFT,ADMIN,U,2021-11-15 22:13:09.387651,2021-12-03 16:57:04.426489,2021-12-03 16:36:36.670765,2021-12-03 16:58:05.092,Enabled,106,13101,17913,1,44,,,
10,ONTIME1115,DV20211110144049703423,SELECT * FROM ONTIME.ONTIME1115,ADMIN,U,2021-11-10 14:40:49.702724,2021-12-03 16:57:04.807222,2021-11-10 15:01:13.967413,2021-12-03 16:57:54.857,Enabled,1399806,30190408,0,0,46,,,The current transaction has been rolled back b...


The **getCaches** routine can show just the enabled or disabled caches using a single parameter. You can also easily choose just the columns you want to see by defining the columns using the dataframe returned by the call.

In [19]:
df = databaseAPI.getCaches('Enabled')
display(df[['id', 'name','state','size', 'cardinality','last_refresh_timestamp']])

Unnamed: 0,id,name,state,size,cardinality,last_refresh_timestamp
1,DV20211110165634403313,BOEING737,Enabled,56967,4172450,2021-11-10 18:30:21.411623
2,DV20211115151427714635,ONTIME737NJCA,Enabled,554,37621,2021-11-15 15:33:30.254003
6,DV20211110144113849337,ONTIME0910,Enabled,591398,12752436,2021-11-10 16:32:41.327378
7,DV20211115221309388288,AIRCRAFT,Enabled,106,13101,2021-12-03 16:36:36.670765
10,DV20211110144049703423,ONTIME1115,Enabled,1399806,30190408,2021-11-10 15:01:13.967413


In [20]:
df = databaseAPI.getCaches('Disabled')
display(df[['id', 'name','state','size', 'cardinality','last_refresh_timestamp']])

Unnamed: 0,id,name,state,size,cardinality,last_refresh_timestamp
0,DV202111161222469697,ONTIME0915,Disabled,1309298,30238851,2021-11-16 12:59:22.006647


In [None]:
# Enable ONTIME1115
databaseAPI.enableCache('DV20211110144113849337')

In [None]:
# Enable ONTIME0910
databaseAPI.enableCache('DV20211110144049703423')

In [None]:
# Enable AIRCRAFT
databaseAPI.enableCache('DV20211115221309388288')

In [None]:
# Enable BOEING737
databaseAPI.enableCache('DV20211110165634403313')

In [None]:
# Enable ONTIME737NJCA
databaseAPI.enableCache('DV20211115151427714635')

In [None]:
!wget -O db2.ipynb https://raw.githubusercontent.com/Db2-DTE-POC/Db2-Openshift-11.5.4/master/db2.ipynb
%run db2.ipynb
print('db2.ipynb loaded')

In [None]:
# Connect to the Data Virtualization Database
database = 'bigsql'
user = 'admin'
password = 'CP4DDataFabric'
host = 'cpd-cpd-instance.apps.demo.ibmdte.net'
port = '31193'

%sql CONNECT TO {database} USER {user} USING {password} HOST {host} PORT {port}

In [None]:
ontime0910count = %sql SELECT COUNT(*) FROM ONTIME.ONTIME0910;

In [None]:
ontime1115count = %sql SELECT COUNT(*) FROM ONTIME.ONTIME1115;

In [None]:
ontime737count = %sql SELECT COUNT(*) FROM ONTIME.BOEING737;

In [None]:
ontimenjca737count = %sql SELECT COUNT(*) FROM ONTIME.ONTIME737NJCA;

In [None]:
aircraftcount = %sql SELECT COUNT(*) FROM ONTIME.AIRCRAFT;

In [None]:
ontime0910 = ontime0910count['1'][0]
ontime1115 = ontime1115count['1'][0]
ontimefull = ontime0910 + ontime1115
ontime737 = ontime737count['1'][0]
ontimenjca737 = ontimenjca737count['1'][0]
aircraft = aircraftcount['1'][0]

In [None]:
print('Aircraft:' + str(aircraft))
print('Number of Domestic Flights in the US')
print('Years 2009 to 2010: ' + str(ontime0910))
print('Years 2011 to 2015: ' + str(ontime1115))
print('Years 2009 to 2015: ' + str(ontimefull))
print('Only 737 from 2009 to 2015: ' + str(ontime737))
print('Only 737 from NJ to CA: ' + str(ontimenjca737))

In [None]:
# Disable ONTIME0910
databaseAPI.disableCache('DV20211110144113849337')

In [None]:
# Disable ONTIME1115
databaseAPI.disableCache('DV20211110144049703423')

In [None]:
# Disable BOEING737
databaseAPI.disableCache('DV20211110165634403313')

In [None]:
# Disable ONTIME737NJCA
databaseAPI.disableCache('DV20211115151427714635')

In [None]:
# Disable AIRCRAFT
databaseAPI.disableCache('DV20211115221309388288')

In [None]:
df = databaseAPI.getCaches('Enabled')
display(df[['id', 'name','state','size', 'cardinality','last_refresh_timestamp']])

In [None]:
%%sql 
SELECT * FROM "ONTIME"."ONTIME" OT, "ONTIME"."AIRCRAFT" AC 
  WHERE AC."TAIL_NUMBER" = OT.TAILNUM
  AND ORIGINSTATE = 'NJ'
  AND DESTSTATE = 'CA'
  AND MANUFACTURER = 'Boeing' 
  AND AC.MODEL LIKE 'B737%'
  AND OT.TAXIOUT > 30
  AND OT.DISTANCE > 2000
  AND OT.DEPDELAY > 200
  ORDER BY OT.ARRDELAY DESC
  FETCH FIRST 5 ROWS ONLY

In [None]:
%%capture result
%%time
%%sql 
SELECT * FROM "ONTIME"."ONTIME" OT, "ONTIME"."AIRCRAFT" AC 
  WHERE AC."TAIL_NUMBER" = OT.TAILNUM
  AND ORIGINSTATE = 'NJ'
  AND DESTSTATE = 'CA'
  AND MANUFACTURER = 'Boeing' 
  AND AC.MODEL LIKE 'B737%'
  AND OT.TAXIOUT > 30
  AND OT.DISTANCE > 2000
  AND OT.DEPDELAY > 200
  ORDER BY OT.ARRDELAY DESC
  FETCH FIRST 5 ROWS ONLY

In [None]:
print(result)
sqldvnocachetimer = Timer()
sqldvnocachetimer.timeTotal()

In [None]:
# Enable ONTIME1115
databaseAPI.enableCache('DV20211110144113849337')

In [None]:
# Enable ONTIME0910
databaseAPI.enableCache('DV20211110144049703423')

In [None]:
# Enable AIRCRAFT
databaseAPI.enableCache('DV20211115221309388288')

In [None]:
df = databaseAPI.getCaches('Enabled')
display(df[['id', 'name','state','size', 'cardinality','last_refresh_timestamp']])

In [None]:
%%capture result
%%time
%%sql 
SELECT * FROM "ONTIME"."ONTIME" OT, "ONTIME"."AIRCRAFT" AC 
  WHERE AC."TAIL_NUMBER" = OT.TAILNUM
  AND ORIGINSTATE = 'NJ'
  AND DESTSTATE = 'CA'
  AND MANUFACTURER = 'Boeing' 
  AND AC.MODEL LIKE 'B737%'
  AND OT.TAXIOUT > 30
  AND OT.DISTANCE > 2000
  AND OT.DEPDELAY > 200
  ORDER BY OT.ARRDELAY DESC
  FETCH FIRST 5 ROWS ONLY

In [None]:
print(result)
sqldvbigcachetimer = Timer()
sqldvbigcachetimer.timeTotal()

In [None]:
# Enable BOEING737
databaseAPI.enableCache('DV20211110165634403313')

In [None]:
# Enable ONTIME737NJCA
databaseAPI.enableCache('DV20211115151427714635')

In [None]:
%%capture result
%%time
%%sql 
SELECT * FROM "ONTIME"."BOEING737" OT, "ONTIME"."AIRCRAFT" AC 
  WHERE ORIGINSTATE = 'NJ'
  AND DESTSTATE = 'CA'
  AND TAXIOUT > 30
  AND DISTANCE > 2000
  AND DEPDELAY > 200
  ORDER BY OT.ARRDELAY DESC
  FETCH FIRST 5 ROWS ONLY

In [None]:
sqldvfocusedcachetimer = Timer()
sqldvfocusedcachetimer.timeTotal()

In [None]:
%%capture result
%%time
%%sql 
SELECT * FROM ONTIME.ONTIME737NJCA WHERE 
  TAXIOUT > 30
  AND DISTANCE > 2000
  AND DEPDELAY > 200
  ORDER BY ARRDELAY DESC
  FETCH FIRST 5 ROWS ONLY

In [None]:
sqldvsmallcachetimer = Timer()
sqldvsmallcachetimer.timeTotal()

In [None]:
df = databaseAPI.getCaches('Enabled')
display(df[['id', 'name','size', 'cardinality','last_refresh_timestamp']])

In [None]:
# Connect to the Db2 Warehouse Database from inside of IBM Cloud Pak for Data
database = 'ONTIME'
user = 'admin'
password = 'CP4DDataFabric'
host = 'cpd-cpd-instance.apps.demo.ibmdte.net'
port = '31175'

%sql CONNECT TO {database} USER {user} USING {password} HOST {host} PORT {port}

In [None]:
%%time
%%sql 
SELECT AC."TAIL_NUMBER", AC."MANUFACTURER", AC."MODEL", OT."UNIQUECARRIER", OT."AIRLINEID", OT."CARRIER", OT."TAILNUM", OT."FLIGHTNUM", OT."ORIGINAIRPORTID", OT."ORIGINAIRPORTSEQID", OT."ORIGINCITYNAME", OT."ORIGINSTATE", OT."DESTAIRPORTID", OT."DESTCITYNAME", OT."DESTSTATE", OT."DEPTIME", OT."DEPDELAY", OT."TAXIOUT", OT."WHEELSOFF", OT."WHEELSON", OT."TAXIIN", OT."ARRTIME", OT."ARRDELAY", OT."ARRDELAYMINUTES", OT."CANCELLED", OT."AIRTIME", OT."DISTANCE"
  FROM "ONTIME"."ONTIME" OT, "ONTIME"."AIRCRAFT" AC 
  WHERE AC."TAIL_NUMBER" = OT.TAILNUM
  AND ORIGINSTATE = 'NJ'
  AND DESTSTATE = 'CA'
  AND AC.MANUFACTURER = 'Boeing' 
  AND AC.MODEL LIKE 'B737%'
  AND OT.TAXIOUT > 30
  AND OT.DISTANCE > 1800
  AND OT.DEPDELAY > 200
  ORDER BY OT.ARRDELAY DESC
  FETCH FIRST 5 ROWS ONLY

In [None]:
%%time
%%sql 
SELECT COUNT(*)
  FROM "ONTIME"."ONTIME" 

In [None]:
%%time
%%sql 
SELECT UNIQUE(YEAR) AS YEAR
  FROM "ONTIME"."ONTIME" ORDER BY YEAR

In [None]:
%%capture result
%%time
%%sql 
SELECT AC."TAIL_NUMBER", AC."MANUFACTURER", AC."MODEL", OT."UNIQUECARRIER", OT."AIRLINEID", OT."CARRIER", OT."TAILNUM", OT."FLIGHTNUM", OT."ORIGINAIRPORTID", OT."ORIGINAIRPORTSEQID", OT."ORIGINCITYNAME", OT."ORIGINSTATE", OT."DESTAIRPORTID", OT."DESTCITYNAME", OT."DESTSTATE", OT."DEPTIME", OT."DEPDELAY", OT."TAXIOUT", OT."WHEELSOFF", OT."WHEELSON", OT."TAXIIN", OT."ARRTIME", OT."ARRDELAY", OT."ARRDELAYMINUTES", OT."CANCELLED", OT."AIRTIME", OT."DISTANCE"
  FROM "ONTIME"."ONTIME" OT, "ONTIME"."AIRCRAFT" AC 
  WHERE AC."TAIL_NUMBER" = OT.TAILNUM
  AND ORIGINSTATE = 'NJ'
  AND DESTSTATE = 'CA'
  AND AC.MANUFACTURER = 'Boeing' 
  AND AC.MODEL LIKE 'B737%'
  AND OT.TAXIOUT > 30
  AND OT.DISTANCE > 2000
  AND OT.DEPDELAY > 200
  ORDER BY OT.ARRDELAY DESC
  FETCH FIRST 5 ROWS ONLY

In [None]:
print(result)
sqldb2wtimer = Timer()
sqldb2wtimer.timeTotal()

In [None]:
dvnocache = sqldvnocachetimer.getTotalTime()/1000
dvfullcache = sqldvbigcachetimer.getTotalTime()/1000
dv737cache = sqldvfocusedcachetimer.getTotalTime()/1000
dv737njcacache = sqldvsmallcachetimer.getTotalTime()/1000
db2w = sqldb2wtimer.getTotalTime()/1000
print("DV No Cache Query 2009-2015: " + str(dvnocache) + " s")
print("DV Full Cache Query 2009-2015: " + str(dvfullcache) + " s")
print("DV 737 Cache Query 2009-2015: " + str(dv737cache) + " s")
print("DV 737 NJ to CA Cache Query 2009-2015: " + str(dv737njcacache) + " s")
print("Db2 Warehouse Query 2009-2015: " + str(db2w) + " s")

In [None]:
import matplotlib.pyplot as plt
fig = plt.figure()
fig = plt.figure(figsize=(20, 6))
ax = fig.add_axes([0,0,1,1])
runs = ['Live DV No Cache','Fully Cached Dataset','Targetted 737 Cache DV','Highly Focused 737 NJ to CA Cache','Db2 Warehouse']
runtime = [dvnocache, dvfullcache, dv737cache, dv737njcacache, db2w]
ax.bar(runs, runtime)
plt.ylabel("Time in Seconds (Lower is better)", fontsize=16)
plt.xlabel("Performance Run", fontsize=16)
plt.show()

In [None]:
print('Aircraft:' + str(aircraft))
print('Number of Domestic Flights in the US')
print('Years 2009 to 2010: ' + str(ontime0910))
print('Years 2011 to 2015: ' + str(ontime1115))
print('Years 2009 to 2015: ' + str(ontimefull))
print('Only 737 from 2009 to 2015: ' + str(ontime737))
print('Only 737 from NJ to CA: ' + str(ontimenjca737))

In [None]:
import matplotlib.pyplot as plt
fig = plt.figure()
fig = plt.figure(figsize=(20, 6))
ax = fig.add_axes([0,0,1,1])
runs = ['Years 2009 to 2010','Years 2011 to 2015','Full Dataset','737 Dataset','737 NJ to CA Dataset']
runtime = [ontime0910, ontime1115, ontimefull, ontime737, ontimenjca737]
ax.bar(runs, runtime)
plt.ylabel("Dataset Size in Millions of Rows", fontsize=16)
plt.xlabel("US Domestic Flights", fontsize=16)
plt.show()

## Refreshing a Cache Through a RESTful Service Call

In [None]:
databaseAPI.refreshCache('DV20211115221309388288')
df = databaseAPI.getCaches("Refreshing")
display(df[['id', 'name','state','size', 'cardinality','last_refresh_timestamp']])

In [None]:
df = databaseAPI.getCaches("Available")
display(df[['id', 'name','state','size', 'cardinality','last_refresh_timestamp']])

## Get Cache Details

In [None]:
json = databaseAPI.getCacheDetails('DV20211115221309388288')
print(json)

In [None]:
print(json['state'])

In [None]:
display(pd.DataFrame(json_normalize(json))[['name','state']])

In [None]:
pd.set_option('display.max_rows', 20)
display(pd.DataFrame(json_normalize(json)).T)