# Database Access
* Author: Johannes Maucher
* Last Update: 13.07.2017
* References: 
    * http://jgardiner.co.uk/blog/read_sql_pandas
    * PostgreSQL Online Manual: [https://www.postgresql.org/docs/8.4/static/queries-table-expressions.html](https://www.postgresql.org/docs/8.4/static/queries-table-expressions.html).
    * SQL Queries in Pandas: http://pandas.pydata.org/pandas-docs/stable/io.html#sql-queries


## Preliminaries
1. Download and install [PostgreSQL](https://www.postgresql.org/)
2. Download and install pgAdmin. pgAdmin is an open source administration and development platform for PostgreSQL databases.
3. In pgAdmin create a new database as described e.g. in https://www.pgadmin.org/docs/pgadmin4/dev/modifying_tables.html.

In the code below, the name of the database is *dataScienceExp*. Please use your own database- and table-name, instead.

In [1]:
import psycopg2 #provides drivers for PostgreSQL
import numpy as np
import json #required to access json file
import pandas as pd
from sqlalchemy import create_engine

## Connect to Database
The database-connection parameters are defined in the file [configTemplate.json](configTemplate.json). The contents of this file are loaded into a Python dictionary `conf` as follows:

In [2]:
with open('configLocalDS.json') as f:
    conf = json.load(f)

In [3]:
print "Name of database is: ",conf['database']

Name of database is:  dataScienceExp


In [4]:
conn_str = "host={} dbname={} user={} password={}".format(conf["host"], conf["database"], conf["user"], conf["passw"])
conn = psycopg2.connect(conn_str)
#conn = psycopg2.connect(host=awsDB,dbname="hrv_web",user=awsUser,password=awsPw)

## Without Pandas

In [5]:
cur = conn.cursor()

In [6]:
cur.execute("""SELECT * FROM cartable""")

In [7]:
rows=cur.fetchall()

In [8]:
for a in rows:
    print np.array(a)

['Mazda RX4' Decimal('21.0') Decimal('6') Decimal('160.0') Decimal('110')
 Decimal('3.90') Decimal('2.620') Decimal('16.46') Decimal('0')
 Decimal('1') Decimal('4') Decimal('4')]
['Mazda RX4 Wag' Decimal('21.0') Decimal('6') Decimal('160.0')
 Decimal('110') Decimal('3.90') Decimal('2.875') Decimal('17.02')
 Decimal('0') Decimal('1') Decimal('4') Decimal('4')]
['Datsun 710' Decimal('22.8') Decimal('4') Decimal('108.0') Decimal('93')
 Decimal('3.85') Decimal('2.320') Decimal('18.61') Decimal('1')
 Decimal('1') Decimal('4') Decimal('1')]
['Hornet 4 Drive' Decimal('21.4') Decimal('6') Decimal('258.0')
 Decimal('110') Decimal('3.08') Decimal('3.215') Decimal('19.44')
 Decimal('1') Decimal('0') Decimal('3') Decimal('1')]
['Hornet Sportabout' Decimal('18.7') Decimal('8') Decimal('360.0')
 Decimal('175') Decimal('3.15') Decimal('3.440') Decimal('17.02')
 Decimal('0') Decimal('0') Decimal('3') Decimal('2')]
['Valiant' Decimal('18.1') Decimal('6') Decimal('225.0') Decimal('105')
 Decimal('2.76')

In [9]:
cur.close()

## Read entire data into Pandas Dataframe

In [10]:
df = pd.read_sql('select * from cartable', con=conn)

In [11]:
print df.shape

(32, 12)


In [12]:
print df

                carname   mpg  cyl   disp     hp  drat     wt   qsec   vs  \
0             Mazda RX4  21.0  6.0  160.0  110.0  3.90  2.620  16.46  0.0   
1         Mazda RX4 Wag  21.0  6.0  160.0  110.0  3.90  2.875  17.02  0.0   
2            Datsun 710  22.8  4.0  108.0   93.0  3.85  2.320  18.61  1.0   
3        Hornet 4 Drive  21.4  6.0  258.0  110.0  3.08  3.215  19.44  1.0   
4     Hornet Sportabout  18.7  8.0  360.0  175.0  3.15  3.440  17.02  0.0   
5               Valiant  18.1  6.0  225.0  105.0  2.76  3.460  20.22  1.0   
6            Duster 360  14.3  8.0  360.0  245.0  3.21  3.570  15.84  0.0   
7             Merc 240D  24.4  4.0  146.7   62.0  3.69  3.190  20.00  1.0   
8              Merc 230  22.8  4.0  140.8   95.0  3.92  3.150  22.90  1.0   
9              Merc 280  19.2  6.0  167.6  123.0  3.92  3.440  18.30  1.0   
10            Merc 280C  17.8  6.0  167.6  123.0  3.92  3.440  18.90  1.0   
11           Merc 450SE  16.4  8.0  275.8  180.0  3.07  4.070  17.40  0.0   

In [13]:
conn.close()

## Using SQLAlchemy  and Pandas

In [41]:
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres@localhost:5432/dataScienceExp')

### Read entire table

In [42]:
with engine.connect() as conn, conn.begin():
    data = pd.read_sql_table('cartable',conn)

In [43]:
data

Unnamed: 0,carname,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,Mazda RX4,21.0,6.0,160.0,110.0,3.9,2.62,16.46,0.0,1.0,4.0,4.0
1,Mazda RX4 Wag,21.0,6.0,160.0,110.0,3.9,2.875,17.02,0.0,1.0,4.0,4.0
2,Datsun 710,22.8,4.0,108.0,93.0,3.85,2.32,18.61,1.0,1.0,4.0,1.0
3,Hornet 4 Drive,21.4,6.0,258.0,110.0,3.08,3.215,19.44,1.0,0.0,3.0,1.0
4,Hornet Sportabout,18.7,8.0,360.0,175.0,3.15,3.44,17.02,0.0,0.0,3.0,2.0
5,Valiant,18.1,6.0,225.0,105.0,2.76,3.46,20.22,1.0,0.0,3.0,1.0
6,Duster 360,14.3,8.0,360.0,245.0,3.21,3.57,15.84,0.0,0.0,3.0,4.0
7,Merc 240D,24.4,4.0,146.7,62.0,3.69,3.19,20.0,1.0,0.0,4.0,2.0
8,Merc 230,22.8,4.0,140.8,95.0,3.92,3.15,22.9,1.0,0.0,4.0,2.0
9,Merc 280,19.2,6.0,167.6,123.0,3.92,3.44,18.3,1.0,0.0,4.0,4.0


### SQL-Queries

In [44]:
cyl8=pd.read_sql_query("SELECT carname,mpg,cyl,carb FROM cartable WHERE cyl=8 ORDER BY mpg",engine)

In [45]:
print cyl8

                carname   mpg  cyl  carb
0    Cadillac Fleetwood  10.4  8.0   4.0
1   Lincoln Continental  10.4  8.0   4.0
2            Camaro Z28  13.3  8.0   4.0
3            Duster 360  14.3  8.0   4.0
4     Chrysler Imperial  14.7  8.0   4.0
5         Maserati Bora  15.0  8.0   8.0
6           Merc 450SLC  15.2  8.0   3.0
7           AMC Javelin  15.2  8.0   2.0
8      Dodge Challenger  15.5  8.0   2.0
9        Ford Pantera L  15.8  8.0   4.0
10           Merc 450SE  16.4  8.0   3.0
11           Merc 450SL  17.3  8.0   3.0
12    Hornet Sportabout  18.7  8.0   2.0
13     Pontiac Firebird  19.2  8.0   2.0


### Write Pandas dataframe into database

Read data from .csv - file into Pandas dataframe:

In [76]:
insuranceDF=pd.read_csv("../../R/Lecture/data/insurance.csv",sep=",",header=0,index_col=False)

In [78]:
insuranceDF.shape

(1338, 7)

In [90]:
print insuranceDF.head()

   age     sex     bmi  children smoker     region      charges
0   19  female  27.900         0    yes  southwest  16884.92400
1   18    male  33.770         1     no  southeast   1725.55230
2   28    male  33.000         3     no  southeast   4449.46200
3   33    male  22.705         0     no  northwest  21984.47061
4   32    male  28.880         0     no  northwest   3866.85520


Write Pandas dataframe into a new table in PostgreSQL DB. 

In [93]:
if not engine.has_table("insurancetable"):
    insuranceDF.to_sql(name='insurancetable',index=True, index_label='index',con=engine)
else:
    print "table already exists"

table already exists


Check if data of the new table can be accessed:

In [94]:
with engine.connect() as conn, conn.begin():
    data = pd.read_sql_table('insurancetable',conn)

In [96]:
print data.head()

   index  age     sex     bmi  children smoker     region      charges
0      0   19  female  27.900         0    yes  southwest  16884.92400
1      1   18    male  33.770         1     no  southeast   1725.55230
2      2   28    male  33.000         3     no  southeast   4449.46200
3      3   33    male  22.705         0     no  northwest  21984.47061
4      4   32    male  28.880         0     no  northwest   3866.85520


In [99]:
child3=pd.read_sql_query("SELECT * FROM insurancetable WHERE children > 3 ORDER BY children",engine)

In [100]:
child3

Unnamed: 0,index,age,sex,bmi,children,smoker,region,charges
0,344,49,female,41.47,4,no,southeast,10977.2063
1,390,48,male,35.625,4,no,northeast,10736.87075
2,83,48,female,41.23,4,no,northwest,11033.6617
3,165,47,male,28.215,4,no,northeast,10407.08585
4,1012,61,female,33.33,4,no,southeast,36580.28216
5,1064,29,female,25.6,4,no,southwest,5708.867
6,61,25,male,33.66,4,no,southeast,4504.6624
7,1094,50,female,33.7,4,no,southwest,11299.343
8,1095,18,female,31.35,4,no,northeast,4561.1885
9,450,39,male,29.6,4,no,southwest,7512.267
