# Access PostgreSQL with Python

This notebook shows how to access a PostgreSQL database when using Python.

## Table of contents

1. [Setup](#Setup)
1. [Import the *psycopg2* Python library](#Import-the-psycopg2-Python-library)
1. [Identify and enter the database connection credentials](#Identify-and-enter-the-database-connection-credentials)
1. [Create the database connection](#Create-the-database-connection)
1. [Create a table](#Create-a-table)
1. [Insert data into a table](#Insert-data-into-a-table)
1. [Query data](#Query-data)
1. [Close the database connection](#Close-the-database-connection)
1. [Summary](#Summary)



## Setup

Before beginning you will need access to a *PostgreSQL* database. PostgreSQL is a powerful, open source, object-relational database system. It is a multi-user database management system and has sophisticated features such as Multi-Version Concurrency Control, point in time recovery, and more. To learn more, see the [PostgreSQL website](http://www.postgresql.org/).

When dealing with large data sets (for example 50 GB) that potentially exceed the memory of your machine (RAM), it is nice to have another possibility such as an PostgreSQL database, where you can query the data in smaller digestible chunks. In this way, you just query data in smaller chunks (for instance 2 GB), and leave resources for the computation.

[Try PostgreSQL free of charge on IBM Bluemix.](https://console.ng.bluemix.net/catalog/services/compose-for-postgresql/)


<a class="ibm-tooltip" href="https://console.ng.bluemix.net/catalog/services/compose-for-postgresql/" target="_blank" title="" id="ibm-tooltip-0">
<img alt="IBM Bluemix.Get started now" height="193" width="153" src="https://ibm.box.com/shared/static/a91ydi71gu58ar10aosoc3sflyo3jif2.png" >
</a> 


## Import the *psycopg2* Python library

__Psycopg2__ is a driver for interacting with PostgreSQL from the Python scripting language. It provides to efficiently perform the full range of SQL operations against Postgres databases. Run the commands below to install and import the psycopg2 library:

In [1]:
!pip install psycopg2 --user



In [2]:
import psycopg2
import sys

## Identify and enter the database connection credentials

Connecting to PostgreSQL database requires the following information:
* Host name or IP address 
* Host port
* default database name
* Connection protocol
* User ID
* User password

All of this information must be captured in a connection string in a subsequent step. Provide the PostgreSQL connection information as shown:

In [3]:
#Enter the values for you database connection
dsn_database = "<database name>"       # for example  "compose"
dsn_hostname = "<your host name>"     # for example  "aws-us-east-1-portal.4.dblayer.com"
dsn_port = "<port>"                 # for example  11101 
dsn_uid = "<your user id>"        # for example  "admin"
dsn_pwd = "<your password>"      # for example  "xxx"

## Create the database connection

Set up a connection as follows. If a connection cannot be made an exception will be raised. 
*conn.cursor* will return a cursor object and you can use this cursor to perform queries:

In [4]:
try:
    conn_string = "host="+dsn_hostname+" port="+dsn_port+" dbname="+dsn_database+" user="+dsn_uid+" password="+dsn_pwd
    print "Connecting to database\n	->%s" % (conn_string)
    conn=psycopg2.connect(conn_string)
    print "Connected!\n"
except:
    print "Unable to connect to the database."

Connecting to database
	->host=bluemix-sandbox-dal-9-portal.1.dblayer.com port=28059 dbname=compose user=admin password=KRRGHQDOZSLTKXJU
Connected!



The next step is to define a cursor to work with. It is important to note that Python/Psycopg cursors are not cursors as defined by PostgreSQL. Given the cursor, we can execute a query, for example, to retrieve the list of databases:

In [5]:
cursor = conn.cursor()
cursor.execute("""SELECT datname from pg_database""")
rows = cursor.fetchall()

We can iterate through _rows_ to print the results:

In [6]:
print "\nShow me the databases:\n"
for row in rows:
    print "   ", row[0]


Show me the databases:

    template1
    template0
    postgres
    compose


## Create a table

Create a test table named Cars. The code below drops the Cars table if it already exists, and then creates the new table:

In [7]:
cursor.execute("DROP TABLE IF EXISTS Cars")
cursor.execute("CREATE TABLE Cars(Id INTEGER PRIMARY KEY, Name VARCHAR(20), Price INT)")

## Insert data into a table

Run the following commands to create records in the new Cars table:

In [8]:
cursor.execute("INSERT INTO Cars VALUES(1,'Audi',52642)")
cursor.execute("INSERT INTO Cars VALUES(2,'Mercedes',57127)")
cursor.execute("INSERT INTO Cars VALUES(3,'Skoda',9000)")
cursor.execute("INSERT INTO Cars VALUES(4,'Volvo',29000)")
cursor.execute("INSERT INTO Cars VALUES(5,'Bentley',350000)")
cursor.execute("INSERT INTO Cars VALUES(6,'Citroen',21000)")
cursor.execute("INSERT INTO Cars VALUES(7,'Hummer',41400)")
cursor.execute("INSERT INTO Cars VALUES(8,'Volkswagen',21600)")

conn.commit()

## Query data

The following Python code fetches and displays records from the Cars table:

In [9]:
cursor.execute("""SELECT * from Cars""")
rows = cursor.fetchall()

You can display the records neatly using pretty print:

In [10]:
print "\nShow me the databases:\n"
import pprint
pprint.pprint(rows)


Show me the databases:

[(1, 'Audi', 52642),
 (2, 'Mercedes', 57127),
 (3, 'Skoda', 9000),
 (4, 'Volvo', 29000),
 (5, 'Bentley', 350000),
 (6, 'Citroen', 21000),
 (7, 'Hummer', 41400),
 (8, 'Volkswagen', 21600)]


Use a loop to show each row:

In [11]:
for row in rows:
    print " Number=", row[0] ,"  Name=", row[1],"  Price", row[2]

 Number= 1   Name= Audi   Price 52642
 Number= 2   Name= Mercedes   Price 57127
 Number= 3   Name= Skoda   Price 9000
 Number= 4   Name= Volvo   Price 29000
 Number= 5   Name= Bentley   Price 350000
 Number= 6   Name= Citroen   Price 21000
 Number= 7   Name= Hummer   Price 41400
 Number= 8   Name= Volkswagen   Price 21600



Export data using *copy_to()* methods.

In [12]:
fout = open('cars.csv', 'w')
cursor.copy_to(fout, 'cars', sep=",")  

Similarly, import data using *copy_from()* methods:

In [13]:
f = open('cars.csv', 'r')
cursor.copy_from(f, 'cars', sep=",")                    
conn.commit()

## Close the database connection

It is good practice to close your database connection after work is done:

In [14]:
conn.close()

## Summary

This notebook demonstrated how to establish a connection to a PostgreSQL database from Python using the psycopg2 library.

## Want to learn more?
### Free courses on <a href="https://bigdatauniversity.com/courses/?utm_source=tutorial-dashdb-python&utm_medium=github&utm_campaign=bdu/" rel="noopener noreferrer" target="_blank">Big Data University</a>: <a href="https://bigdatauniversity.com/courses/?utm_source=tutorial-dashdb-python&utm_medium=github&utm_campaign=bdu" rel="noopener noreferrer" target="_blank"><img src = "https://ibm.box.com/shared/static/xomeu7dacwufkoawbg3owc8wzuezltn6.png" width=600px> </a>

### Authors

**Saeed Aghabozorgi**, PhD, is a Data Scientist in IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge. He is a researcher in the data mining field and an expert in developing advanced analytic methods like machine learning and statistical modelling on large data sets.

**Polong Lin** is a Data Scientist at IBM in Canada. Under the Emerging Technologies division, Polong is responsible for educating the next generation of data scientists through Big Data University. Polong is a regular speaker in conferences and meetups, and holds an M.Sc. in Cognitive Psychology.

Copyright © 2017 Big Data University. This notebook and its source code are released under the terms of the <a href="https://bigdatauniversity.com/mit-license/" rel="noopener noreferrer" target="_blank">MIT License</a>.