# Access Db2 Warehouse on Cloud and Db2 with Python

This notebook shows how to access Db2 Warehouse on Cloud or a Db2 database when using Python. The examples use Db2 Warehouse on Cloud, but the instructions apply to both Db2 Warehouse on Cloud and Db2.


This notebook runs on Python.
## Table of contents

1. [Setup](#Setup) 
1. [Import the *ibmdbpy* Python library](#Import-the-ibmdbpy-Python-library)
1. [Identify and enter the database connection credentials](#Identify-and-enter-the-database-connection-credentials)
1. [Create the database connection](#Create-the-database-connection)
1. [Use dataframe to read and manipulate tables](#Use-dataframe-to-read-and-manipulate-tables)
1. [Close the database connection](#Close-the-database-connection)
1. [Summary](#Summary)


## Setup

Before you begin you will need **Db2 Warehouse on Cloud** which is a fully-managed, enterprise-class, cloud data warehouse service, purpose-built for analytics. It offers massively parallel processing (MPP) scale and compatibility with a wide range of business intelligence (BI) tools.  

[Try Db2 Warehouse on Cloud free of charge on IBM Cloud.](https://console.ng.bluemix.net/catalog/services/dashdb)


## Import the *ibmdbpy* Python library

Python support for Db2 Warehouse on Cloud and Db2 is provided by the <a href="https://pypi.python.org/pypi/ibmdbpy" target="_blank" rel="noopener noreferrer">ibmdbpy Python library</a>. Connecting to Db2 Warehouse on Cloud or Db2 is also enabled by a Db2 driver, libdb2.so.

The JDBC Connection is based on a Java virtual machine. From the ibmdbpy library you can use JDBC to connect to a remote Db2 Warehouse on Cloud/Db2 instance. To be able to use JDBC to connect, we need to import the *JayDeBeApi* package.

Run the following commands to install and load the JayDeBeApi package and the ibmdbpy library into your notebook:

In [None]:
!pip install jaydebeapi --user  
!pip install ibmdbpy --user 

In [2]:
import jaydebeapi
from ibmdbpy import IdaDataBase
from ibmdbpy import IdaDataFrame

In [3]:
import os
# when using Watson Studio Spark service, use
#os.environ['CLASSPATH'] = "/usr/local/src/data-connectors-1.4.1/db2jcc4-10.5.0.6.jar"

# when using Watson Studio Environments, use 
os.environ['CLASSPATH'] = "/opt/ibm/dsdriver/java/db2jcc4.jar"

In [4]:
import jpype
args='-Djava.class.path=%s' % os.environ['CLASSPATH']
jvm = jpype.getDefaultJVMPath()
jpype.startJVM(jvm, args)


## Identify and enter the database connection credentials

Connecting to Db2 Warehouse on Cloud or a Db2 database requires the following information:
* Database name 
* Host DNS name or IP address 
* Host port
* Connection protocol
* User ID
* User password

All of this information must be captured in a connection string in a subsequent step. Provide the Db2 Warehouse on Cloud or Db2 connection information as shown:

In [5]:
dsn_uid = "";  # e.g.  db104434
dsn_pwd = ""   # e.g. xxxx
dsn_hostname =""  # e.g.  awh-yp-small03.services.dal.bluemix.net
dsn_port = ""   # e.g.  50001
dsn_database = ""   # e.g. BLUDB 

## Create the database connection

The following code snippet creates a connection string `connection_string`
and uses the `connection_string` to create a Db2 connection object:


In [6]:
connection_string='jdbc:db2://'+dsn_hostname+':'+dsn_port+'/'+dsn_database+':user='+dsn_uid+';password='+dsn_pwd+";" 
idadb=IdaDataBase(dsn=connection_string)

## Use dataframe to read and manipulate tables

You can now use the connection object `conn` to query the database:

In [7]:
df=idadb.show_tables(show_all = True)
df.head(5)

Unnamed: 0,TABSCHEMA,TABNAME,OWNER,TYPE
0,GOSALES,BRANCH,DB2INST1,T
1,GOSALES,CONVERSION_RATE,DB2INST1,T
2,GOSALES,COUNTRY,DB2INST1,T
3,GOSALES,CURRENCY_LOOKUP,DB2INST1,T
4,GOSALES,EURO_CONVERSION,DB2INST1,T


In [8]:
idadb.exists_table_or_view('GOSALESDW.EMP_EXPENSE_FACT')

True

Using our previously opened IdaDataBase instance named ‘idadb’, we can open one or several IdaDataFrame objects. They behave like pointers to remote tables.

Let us open the *EMP_EXPENSE_FACT* data set, assuming it is stored in the database under the name ‘GOSALESDW.EMP_EXPENSE_FACT’. The following cell assigns the dataset to a pandas DataFrame.

The [Pandas data analysis library](http://pandas.pydata.org/) provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pandas allows easy processing and manipulation of tabular data, so it is a perfect fit for data extracted from relational databases.


In [9]:
idadf = IdaDataFrame(idadb, 'GOSALESDW.EMP_EXPENSE_FACT')

You can very easily explore the data in the IdaDataFrame by using built in functions.

Use IdaDataFrame.head to get the first n records of your data set (default 5):

In [10]:
idadf.head(5)

Unnamed: 0,DAY_KEY,ORGANIZATION_KEY,POSITION_KEY,EMPLOYEE_KEY,EXPENSE_TYPE_KEY,ACCOUNT_KEY,EXPENSE_UNIT_QUANTITY,EXPENSE_TOTAL
0,20100131,11167,43639,4043,2104,8050,7.5,187.5
1,20100131,11122,43614,4845,2124,8056,0.03,166.54
2,20100131,11122,43614,4845,2120,8052,0.08,444.1
3,20100131,11122,43614,4845,2122,8054,0.11,610.64
4,20100131,11122,43614,4845,2131,8049,165.0,5551.28


Use IdaDataFrame.tail to get the last n records of your data set (default 5):

In [11]:
idadf.tail(5)

Unnamed: 0,DAY_KEY,ORGANIZATION_KEY,POSITION_KEY,EMPLOYEE_KEY,EXPENSE_TYPE_KEY,ACCOUNT_KEY,EXPENSE_UNIT_QUANTITY,EXPENSE_TOTAL
127979,20130731,11120,43609,4653,2104,8050,7.5,115.68
127980,20130731,11120,43609,4653,2114,8050,8.25,185.63
127981,20130731,11120,43609,4653,2120,8052,0.08,218.45
127982,20130731,11120,43609,4653,2122,8054,0.11,300.37
127983,20130731,11120,43609,4653,2124,8056,0.03,81.92


__Note__: Because Db2 Warehouse on Cloud operates on a distributed system, the order of rows using IdaDataFrame.head and IdaDataFrame.tail is not guaranteed unless the table is sorted (using an ‘ORDER BY’ clause) or a column is declared as index for the IdaDataFrame (parameter/attribute indexer).

IdaDataFrame also implements most attributes that are available in a pandas DataFrame:


In [12]:
idadf.shape

(127984, 8)

In [13]:
idadf.columns

Index(['DAY_KEY', 'ORGANIZATION_KEY', 'POSITION_KEY', 'EMPLOYEE_KEY',
       'EXPENSE_TYPE_KEY', 'ACCOUNT_KEY', 'EXPENSE_UNIT_QUANTITY',
       'EXPENSE_TOTAL'],
      dtype='object')

Several standard statistics functions from the pandas interface are also available for IdaDataFrame. For example, let us calculate the covariance matrix for the iris data set:

In [14]:
idadf.cov()

Unnamed: 0,DAY_KEY,ORGANIZATION_KEY,POSITION_KEY,EMPLOYEE_KEY,EXPENSE_TYPE_KEY,ACCOUNT_KEY,EXPENSE_UNIT_QUANTITY,EXPENSE_TOTAL
DAY_KEY,107444500.0,-1301.774305,-2699.336397,-74463.200864,-2541.104007,-88.733494,-2747.250164,338749.3
ORGANIZATION_KEY,-1301.774,977.978493,-60.746262,2228.417559,-27.240468,0.756326,11.18659,-2999.219
POSITION_KEY,-2699.336,-60.746262,148.234472,-2070.93463,10.28491,-1.006254,-13.697657,1101.108
EMPLOYEE_KEY,-74463.2,2228.417559,-2070.93463,89393.601947,-237.530049,39.144365,525.387975,47399.03
EXPENSE_TYPE_KEY,-2541.104,-27.240468,10.28491,-237.530049,88.103306,4.663223,26.490807,5577.918
ACCOUNT_KEY,-88.73349,0.756326,-1.006254,39.144365,4.663223,6.414971,-92.920363,-2669.485
EXPENSE_UNIT_QUANTITY,-2747.25,11.18659,-13.697657,525.387975,26.490807,-92.920363,3331.325768,76740.54
EXPENSE_TOTAL,338749.3,-2999.218552,1101.107528,47399.031411,5577.918013,-2669.484571,76740.540006,4321078.0


It is possible to subset the rows of an IdaDataFrame by accessing the IdaDataFrame with a slice object. You can also use the IdaDataFrame.loc attribute, which contains an ibmdbpy.Loc object. However, the row selection might be inaccurate if the current IdaDataFrame is not sorted or does not contain an indexer. This is due to the fact that Db2 Warehouse on Cloud stores the data across several nodes if available. Moreover, because Db2 Warehouse on Cloud is a column oriented database, row numbers are undefined:

In [15]:
idadf_new = idadf[0:9] # Select the first 10 rows
idadf_new.head()

  " was given and the dataset was not sorted")


Unnamed: 0,DAY_KEY,ORGANIZATION_KEY,POSITION_KEY,EMPLOYEE_KEY,EXPENSE_TYPE_KEY,ACCOUNT_KEY,EXPENSE_UNIT_QUANTITY,EXPENSE_TOTAL
0,20101031,11139,43619,4479,2124,8056,0.03,81.44
1,20101031,11139,43619,4479,2131,8049,146.25,2520.83
2,20101130,11139,43619,4479,2103,8050,15.0,262.24
3,20101130,11139,43619,4479,2122,8054,0.11,313.25
4,20101130,11139,43619,4479,2120,8052,0.08,227.82


## Close the database connection

To ensure expected behaviors, IdaDataBase instances need to be closed. Closing the *IdaDataBase* is equivalent to closing the connection: once the connection is closed, it is no longer possible to use the *IdaDataBase* instance and any IdaDataFrame instances that were opened on this connection.

In [16]:
idadb.close()

Connection closed.


## Summary

This notebook demonstrated how to establish a connection to a Db2 Warehouse on Cloud / Db2 database from Python using the ibmdbpy library.

## Want to learn more?
### Free courses on <a href="https://bigdatauniversity.com/courses/?utm_source=tutorial-dashdb-python&utm_medium=github&utm_campaign=bdu/" rel="noopener noreferrer" target="_blank">Cognitive Class</a>: <a href="https://bigdatauniversity.com/courses/?utm_source=tutorial-dashdb-python&utm_medium=github&utm_campaign=bdu" rel="noopener noreferrer" target="_blank"><img src = "https://ibm.box.com/shared/static/xomeu7dacwufkoawbg3owc8wzuezltn6.png" width=600px> </a>

### Authors

**Saeed Aghabozorgi**, PhD, is a Data Scientist in IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge. He is a researcher in the data mining field and an expert in developing advanced analytic methods like machine learning and statistical modelling on large data sets.

**Polong Lin** is a Data Scientist at IBM in Canada. Under the Emerging Technologies division, Polong is responsible for educating the next generation of data scientists through Big Data University. Polong is a regular speaker in conferences and meetups, and holds an M.Sc. in Cognitive Psychology.

Copyright © 2016, 2018 Cognitive Class. This notebook and its source code are released under the terms of the <a href="https://bigdatauniversity.com/mit-license/" rel="noopener noreferrer" target="_blank">MIT License</a>.

<div style="background:#F5F7FA; height:100px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Want to do more?</span><span style="border: 1px solid #3d70b2;padding: 15px;float:right;margin-right:40px; color:#3d70b2; "><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
<span style="color:#5A6872;"> Try out this notebook with your free trial of IBM Watson Studio.</span>
</div>