# DSDJ TWS - Connecting to SQL Databases with Python

* We will be connecting to a Postgres databases using AWS RDS
* We will check out access and read SQL tables
* We will then use pandas to manipulate the resulting dataframes
* Finally We will write a dataframe to a new SQL table

## Pre-requisite

We will use the folling librairies
* Install https://pypi.org/project/ipython-sql/
* Install https://pypi.org/project/SQLAlchemy/
* Install https://www.psycopg.org/docs/

### Resources
    
* https://docs.sqlalchemy.org/en/14/core/engines.html#database-urls
* https://github.com/catherinedevlin/ipython-sql
* https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

### Important libraries

In [1]:
from sqlalchemy import create_engine
import pandas as pd
import getpass

### Create a sql alchemy connection to the database

In [2]:
database_host = "dsdj-postgres-db.clpvihbunw2c.ap-southeast-2.rds.amazonaws.com"
database_name = "postgres"
database_user = "postgres"

userpass = getpass.getpass("Password :")

Password :········


In [3]:
connection_str = database_user+":"+userpass+"@"+database_host

In [4]:
engine = create_engine("postgresql+psycopg2://"+connection_str, echo=False)
engine.dialect.identifier_preparer.initial_quote = ''
engine.dialect.identifier_preparer.final_quote = ''

### Create a ipython-sql connection to the database

In [5]:
%load_ext sql

In [6]:
%sql postgresql://$connection_str

'Connected: postgres@None'

### We have successfully connected the the AWS Postgres database - let's query it

#### Check the Postgres SQL database tables

In [7]:
%%sql
SELECT *
FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog' AND schemaname != 'information_schema';

 * postgresql://postgres:***@dsdj-postgres-db.clpvihbunw2c.ap-southeast-2.rds.amazonaws.com
0 rows affected.


schemaname,tablename,tableowner,tablespace,hasindexes,hasrules,hastriggers,rowsecurity


#### Check the CAR table - in two different ways

* First way - in an "interactive" way

In [147]:
%%sql
SELECT * FROM customers

 * postgresql://postgres:***@dsdj-postgres-db.clpvihbunw2c.ap-southeast-2.rds.amazonaws.com
(psycopg2.errors.UndefinedTable) relation "customers" does not exist
LINE 1: SELECT * FROM customers
                      ^

[SQL: SELECT * FROM customers]
(Background on this error at: http://sqlalche.me/e/14/f405)


* second way - in a scripting way 

In [38]:
table_name = "CAR"
query_str = "SELECT * FROM " + table_name
pd.read_sql_query(query_str, engine)

Unnamed: 0,vin,brand,model,price,production_year
0,LJCPCBLCX14500264,Ford,Focus,8000.0,2005
1,WPOZZZ79ZTS372128,Ford,Fusion,12500.0,2008
2,JF1BR93D7BG498281,Toyota,Avensis,11300.0,1999
3,KLATF08Y1VB363636,Volkswagen,Golf,3270.0,1992
4,1M8GDM9AXKP042788,Volkswagen,Golf,13000.0,2010
5,1HGCM82633A004352,Volkswagen,Jetta,6420.0,2003
6,1G1YZ23J9P5800003,Fiat,Punto,5700.0,1999
7,GS723HDSAK2399002,Opel,Corsa,,2007


#### Save the result in a dataframe - also in two different way

* **First way** - in an "interactive" way

In [24]:
%%sql res <<
SELECT * FROM CAR

 * postgresql://postgres:***@dsdj-postgres-db.clpvihbunw2c.ap-southeast-2.rds.amazonaws.com
8 rows affected.
Returning data to local variable res


In [25]:
car_df = res.DataFrame()
car_df.head()

Unnamed: 0,vin,brand,model,price,production_year
0,LJCPCBLCX14500264,Ford,Focus,8000,2005
1,WPOZZZ79ZTS372128,Ford,Fusion,12500,2008
2,JF1BR93D7BG498281,Toyota,Avensis,11300,1999
3,KLATF08Y1VB363636,Volkswagen,Golf,3270,1992
4,1M8GDM9AXKP042788,Volkswagen,Golf,13000,2010


In [26]:
car_df.shape

(8, 5)

* **Second way** - in a scripting way

In [39]:
car2_df = pd.read_sql_query(query_str, engine)
car2_df.head()

Unnamed: 0,vin,brand,model,price,production_year
0,LJCPCBLCX14500264,Ford,Focus,8000.0,2005
1,WPOZZZ79ZTS372128,Ford,Fusion,12500.0,2008
2,JF1BR93D7BG498281,Toyota,Avensis,11300.0,1999
3,KLATF08Y1VB363636,Volkswagen,Golf,3270.0,1992
4,1M8GDM9AXKP042788,Volkswagen,Golf,13000.0,2010


#### Write a dataframe back to the database

In [75]:
# select only VW cars from the car dataframe 
filt = car_df['brand'] == "Volkswagen"
vw_car_df = car_df[filt]
vw_car_df

Unnamed: 0,vin,brand,model,price,production_year
3,KLATF08Y1VB363636,Volkswagen,Golf,3270,1992
4,1M8GDM9AXKP042788,Volkswagen,Golf,13000,2010
5,1HGCM82633A004352,Volkswagen,Jetta,6420,2003


In [76]:
# write it back to the database
vw_car_df.to_sql("vw_cars", con = engine, index = False)

#### Check if we successfully created a new table

In [77]:
# look at the table list
query_str = '''SELECT *
FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog' AND schemaname != 'information_schema';'''

pd.read_sql_query(query_str, engine)

Unnamed: 0,schemaname,tablename,tableowner,tablespace,hasindexes,hasrules,hastriggers,rowsecurity
0,public,vw_cars,postgres,,False,False,False,False


In [50]:
# query the vw table

In [74]:
%%sql
SELECT * FROM vw_cars

 * postgresql://postgres:***@dsdj-postgres-db.clpvihbunw2c.ap-southeast-2.rds.amazonaws.com
(psycopg2.errors.UndefinedTable) relation "vw_cars" does not exist
LINE 1: SELECT * FROM vw_cars
                      ^

[SQL: SELECT * FROM vw_cars]
(Background on this error at: http://sqlalche.me/e/14/f405)


#### Drop a table

In [52]:
%%sql 

DROP TABLE vw_cars

 * postgresql://postgres:***@dsdj-postgres-db.clpvihbunw2c.ap-southeast-2.rds.amazonaws.com
Done.


[]