# Connect to dashDB and DB2 using Python

This notebook shows how to access dashDB Data Warehouse (or a DB2 database) using Python by following the steps below:
1. Import the `ibmdbpy` Python library
1. Identify and enter the database connection credentials
1. Create the database connection
1. Using dataframe to read and manipulate tables
1. Close the database connection

## What is dashDB ?

**dashDB** is a fully managed cloud data warehouse, purpose-built for analytics. It offers massively parallel processing (MPP) scale, and compatibility with a wide range of business intelligence (BI) tools.  


__Notice:__ Get your own dashDB free of charge: 

<h3 align = "center">
<a href="https://console.ng.bluemix.net/?direct=classic/&amp;cm_mc_uid=&amp;cm_mc_sid_50200000=1453781614#/store/cloudOEPaneId=store&amp;serviceOfferingGuid=7c87c148-e1a4-4cb8-81f8-c5e74be7684b&CampID=DSWB">Launch a dashDB service through Bluemix</a>
</h3>

<a class="ibm-tooltip" href="https://console.ng.bluemix.net/?direct=classic/&amp;cm_mc_uid=&amp;cm_mc_sid_50200000=1453781614#/store/cloudOEPaneId=store&amp;serviceOfferingGuid=7c87c148-e1a4-4cb8-81f8-c5e74be7684b&CampID=DSWB" target="_blank" title="" id="ibm-tooltip-0">
<img alt="IBM Bluemix.Get started now" height="193" width="153" src="https://ibm.box.com/shared/static/42yt39czuksqdi278xpy96txtlw3lfmb.png" >
</a>





## Import the `ibmdbpy` Python library

Python support for dashDB / DB2 is provided by the [imdbpy Python library](https://pypi.python.org/pypi/ibmdbpy). The `ibmdbpy` library is pre-installed in for you.

Connecting to dashDB / DB2 requires a DB2 driver (libdb2.so) which is also already pre-installed for you. 

In [1]:
import jaydebeapi
from ibmdbpy import IdaDataBase
from ibmdbpy import IdaDataFrame

When the command above completes, the `ibmdbpy` library is loaded in your notebook. 


## Identify the database connection credentials

Connecting to dashDB or DB2 database requires the following information:
* Database name 
* Host DNS name or IP address 
* Host port
* Connection protocol
* User ID
* User Password

All of this information must be captured in a connection string in a subsequent step.

__Notice:__ To obtain credentials follow this [user guide](http://support.datascientistworkbench.com/knowledgebase/articles/826020-getting-credentials-to-access-a-dashdb-data-wareho).


In [2]:
dsn_uid = "<id>";  # e.g.  dash104434
dsn_pwd = "<password>"   # e.g. xxxx
dsn_hostname ="<host>"  # e.g.  awh-yp-small03.services.dal.bluemix.net
dsn_port = "<port>"   # e.g.  50001
dsn_database = "<database>"   # e.g. BLUDB

## Create the DB2 database connection

The JDBC Connection is based on a Java Virtual Machine. In ibmdbpy, we use JDBC to connect to a remote dashDB/DB2 instance. To be able to use JDBC to connect, we need to import the __JayDeBeApi__ package, as we did above.

The following code snippet creates a connection string `connection_string`
and uses the `connection_string` to create a DB2 connection object.


In [3]:
connection_string='jdbc:db2://'+dsn_hostname+':'+dsn_port+'/'+dsn_database+':user='+dsn_uid+';password='+dsn_pwd+";" 
idadb=IdaDataBase(dsn=connection_string)

## Using dataframe to read table
You can now use the connection object `conn` to query the database.

In [None]:
df=idadb.show_tables(show_all = True)
df.head(5)

In [None]:
idadb.exists_table_or_view('GOSALESDW.EMP_EXPENSE_FACT')

Using our previously opened IdaDataBase instance named ‘idadb’, we can open one or several IdaDataFrame objects. They behave like pointers to remote tables.

Let us open the __EMP_EXPENSE_FACT__ data set, assuming it is stored in the database under the name ‘GOSALESDW.EMP_EXPENSE_FACT’. The following cell assigns the dataset to a pandas dataframe.


The [Pandas data analysis library](http://pandas.pydata.org/) provides high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pandas allows easy processing and manipulation of tabular data, so it is a perfect fit for data extracted from relational databases.


In [None]:
idadf = IdaDataFrame(idadb, 'GOSALESDW.EMP_EXPENSE_FACT')

You can very easily explore the data in the IdaDataFrame by using built in functions.

Use IdaDataFrame.head to get the first n records of your data set (default 5).

In [None]:
idadf.head(5)

Use IdaDataFrame.tail to get the last n records of your data set (default 5).

In [None]:
idadf.tail(5)

**Note**: Because dashDB operates on a distributed system, the order of rows using IdaDataFrame.head and IdaDataFrame.tail is not guaranteed unless the table is sorted (using an ‘ORDER BY’ clause) or a column is declared as index for the IdaDataFrame (parameter/attribute indexer).

IdaDataFrame also implements most attributes that are available in a Pandas DataFrame.



In [None]:
idadf.shape

In [None]:
idadf.columns

Several standard statistics functions from the Pandas interface are also available for IdaDataFrame. For example, let us calculate the covariance matrix for the iris data set:

In [None]:
idadf.cov()

It is possible to subset the rows of an IdaDataFrame by accessing the IdaDataFrame with a slice object. You can also use the IdaDataFrame.loc attribute, which contains an ibmdbpy.Loc object. However, the row selection might be inaccurate if the current IdaDataFrame is not sorted or does not contain an indexer. This is due to the fact that dashDB stores the data across several nodes if available. Moreover, because dashDB is a column oriented database, row numbers are undefined.

In [None]:
idadf_new = idadf[0:9] # Select the first 10 rows
idadf_new.head()

## Close the Connection
To ensure expected behaviors, IdaDataBase instances need to be closed. Closing the __IdaDataBase__ is equivalent to closing the connection: once the connection is closed, it is not possible to use the __IdaDataBase__ instance and any IdaDataFrame instances that were opened on this connection anymore.

In [None]:
idadb.close()

## Summary

In this tutorial you established a connection to a dashDB / DB2 database from a Python notebook using ibmdbpy and queried sample data. Additional tutorials are available on our [Welcome](https://my.datascientistworkbench.com/tools/jupyter-notebook/) page.

## Want to learn more?

### Free courses on [Big Data University](https://bigdatauniversity.com/courses/?utm_source=tutorial-dashdb-python&utm_medium=dswb&utm_campaign=bdu):
<a href="https://bigdatauniversity.com/courses/?utm_source=tutorial-dashdb-python&utm_medium=dswb&utm_campaign=bdu"><img src = "https://ibm.box.com/shared/static/xomeu7dacwufkoawbg3owc8wzuezltn6.png" width=600px> </a>

<h3>Authors:</h3>
<br>
<a href="https://ca.linkedin.com/in/saeedaghabozorgi">
    <div class="teacher-image" style="    float: left;
        width: 115px;
        height: 115px;
        margin-right: 10px;
        margin-bottom: 10px;
        border: 1px solid #CCC;
        padding: 3px;
        border-radius: 3px;
        text-align: center;"><img class="alignnone wp-image-2258 " src="https://ibm.box.com/shared/static/tyd41rlrnmfrrk78jx521eb73fljwvv0.jpg" alt="Saeed Aghabozorgi" width="178" height="178"/>
    </div>
</a>

<h4>Saeed Aghabozorgi</h4>
<p><a href="https://ca.linkedin.com/in/saeedaghabozorgi">Saeed Aghabozorgi</a>, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients' ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.</p>

<br>

<a href="https://ca.linkedin.com/in/polonglin">
    <div class="teacher-image" style="    float: left;
        width: 115px;
        height: 115px;
        margin-right: 10px;
        margin-bottom: 10px;
        border: 1px solid #CCC;
        padding: 3px;
        border-radius: 3px;
        text-align: center;"><img class="alignnone size-medium wp-image-2177" src="https://ibm.box.com/shared/static/2ygdi03ahcr97df2ofrr6cf8knq4kodd.jpg" alt="Polong Lin" width="300" height="300"/>
    </div>
</a>
<h4>Polong Lin</h4>
<p>
<a href="https://ca.linkedin.com/in/polonglin">Polong Lin</a> is a Data Scientist at IBM in Canada. Under the Emerging Technologies division, Polong is responsible for educating the next generation of data scientists through Big Data University. Polong is a regular speaker in conferences and meetups, and holds a M.Sc. in Cognitive Psychology.</p>

<hr>
Copyright &copy; 2016 [Big Data University](https://bigdatauniversity.com/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).​