# Psycopg2 and SQLAlchemy

We have seen how to make queries using pgAdmin4. We can go to the same platform and export the output of out query into a CSV:

![](images/pgadmin_exportCSV.png)

But this defeats the purpose of making everything simple and automated... We would need to 1) make the query, 2) export it into a .csv, 3) save it into our directory, 4) opening it in Python using the csv library Pandas. These steps are manually performed, isn't there a way to create a pipeline?

## Pyscopg2

_'Psycopg is the most popular PostgreSQL adapter for the Python programming language. Its core is a complete implementation of the Python DB API 2.0 specifications. Several extensions allow access to many of the features offered by PostgreSQL.'_ Psycopg documentation

Psycopg is a fairly simple to use DBAPI (Database API), you just need to check your server details and the database you want to connect to:

![](images/psycopg2.png)

There are basically 2 objects we need to use to connect to our SQL server: connect and cursor.

- The `connect` object will establish the connection to our database
- The `cursor` object will point to the database, so we can start sending queries to it

Apart from that, we will use some methods from `cursor`:

- `execute` contains the query that we want to perform in a string format
- `fetchall` retrieves the output of the query (use it if you are reading (`SELECT`) entries in your database)

In [2]:
import psycopg2
HOST = 'localhost'
USER = 'postgres'
PASSWORD = #'password'
DATABASE = 'pagila'
PORT = 5432

with psycopg2.connect(host=HOST, user=USER, password=PASSWORD, dbname=DATABASE, port=PORT) as conn:
    with conn.cursor() as cur:
        cur.execute('''CREATE TABLE actor_2 AS (
                    SELECT * FROM actor
                    LIMIT 10);

                    SELECT * FROM actor_2''')
        print(type(cur))
        records = cur.fetchall()

<class 'psycopg2.extensions.cursor'>


In case you don't know the tables inside your database, you can run the following query:
`SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'`

In [4]:
import psycopg2
with psycopg2.connect(host='localhost', user='postgres', password='password', dbname='pagila', port=5432) as conn:
    with conn.cursor() as cur:
        cur.execute("""SELECT table_name FROM information_schema.tables
       WHERE table_schema = 'public'""")
        for table in cur.fetchall():
            print(table)

('actor',)
('actor_info',)
('customer_list',)
('film_list',)
('nicer_but_slower_film_list',)
('film',)
('payment_p2007_02',)
('payment_p2007_03',)
('payment_p2007_04',)
('payment_p2007_05',)
('payment_p2007_06',)
('sales_by_film_category',)
('payment_p2007_01',)
('address',)
('category',)
('city',)
('country',)
('customer',)
('film_actor',)
('film_category',)
('inventory',)
('language',)
('rental',)
('staff',)
('sales_by_store',)
('staff_list',)
('store',)
('payment',)
('employee_details',)
('actor_2',)


Observe that when you select something, the fetchall method will return a list, which then we have to process into a pandas dataframe. Additionally, if we need to write a sql query such as INSERT INTO, we would need to figure out the correct loop for inserting the proper rows every time we need to insert new data.

A very useful toolkit for this purpose is SQLAlchemy, which simplifies the code. Even though it simplifies the code enormously, SQLAlchemy has more benefits:

1. It is an Object Relational Mapper (ORM), which maps representations of objects to database tables. In other words, this ORM will transform the python objects into SQL tables
2. The second important advantage is the Engine object, which contains information about the type of database (PostgreSQL in this case) and a connection pool. This connection pool allows for multiple connections to the database that operate simultaneously. Additionally, this engine will only work whenever we send a query (Lazy Evaluation)

The syntax is as follows:

```
from sqlalchemy import create_engine

engine = create_engine("{type of database}+{DBAPI}://{username}:{password}@{host}:{port}/{database_name}")
```

In [5]:
from sqlalchemy import create_engine
import pandas as pd
DATABASE_TYPE = 'postgresql'
DBAPI = 'psycopg2'
HOST = 'localhost'
USER = 'postgres'
PASSWORD = #'password'
DATABASE = 'pagila'
PORT = 5432
engine = create_engine(f"{DATABASE_TYPE}+{DBAPI}://{USER}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}")

If everything went alright, the next cell should return no error

In [None]:
engine.connect()

You can also use other methods in the sqlalchemy to inspect the database. In this case, we will use the `inspect` function. This function returns an `Inspector` object, which is a wrapper around the database, and it allows us to retrieve information about the tables and columns inside the database.

In [None]:
from sqlalchemy import inspect
inspector = inspect(engine)
inspector.get_table_names()

In [6]:
engine.execute('''SELECT * FROM actor''').fetchall()

[(1, 'PENELOPE', 'GUINESS', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (2, 'NICK', 'WAHLBERG', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (3, 'ED', 'CHASE', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (4, 'JENNIFER', 'DAVIS', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (5, 'JOHNNY', 'LOLLOBRIGIDA', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (6, 'BETTE', 'NICHOLSON', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (7, 'GRACE', 'MOSTEL', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (8, 'MATTHEW', 'JOHANSSON', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (9, 'JOE', 'SWANK', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (10, 'CHRISTIAN', 'GABLE', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (11, 'ZERO', 'CAGE', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (12, 'KARL', 'BERRY', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (13, 'UMA', 'WOOD', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (14, 'VIVIEN', 'BERGEN', datetime.datetime(2006, 2, 15, 9, 34, 33)),
 (15, 'CUBA', 'OLIVIER', datetime

## Making use of the ORM

As mentioned, thanks to the ORM in SQLAlchemy, we can create a table in our database and insert data into it in a simple way.

One way to do so is by using pandas. You can read a specific table from the database using pandas and the engine you just created

In [7]:
actors = pd.read_sql_table('actor', engine)

In [8]:
actors.head(10)

Unnamed: 0,actor_id,first_name,last_name,last_update
0,1,PENELOPE,GUINESS,2006-02-15 09:34:33
1,2,NICK,WAHLBERG,2006-02-15 09:34:33
2,3,ED,CHASE,2006-02-15 09:34:33
3,4,JENNIFER,DAVIS,2006-02-15 09:34:33
4,5,JOHNNY,LOLLOBRIGIDA,2006-02-15 09:34:33
5,6,BETTE,NICHOLSON,2006-02-15 09:34:33
6,7,GRACE,MOSTEL,2006-02-15 09:34:33
7,8,MATTHEW,JOHANSSON,2006-02-15 09:34:33
8,9,JOE,SWANK,2006-02-15 09:34:33
9,10,CHRISTIAN,GABLE,2006-02-15 09:34:33


Or you can read from a query if you feel like it!

In [9]:
actors = pd.read_sql_query('''SELECT * FROM actor LIMIT 10''', engine).set_index('actor_id')
actors

Unnamed: 0_level_0,first_name,last_name,last_update
actor_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,PENELOPE,GUINESS,2006-02-15 09:34:33
2,NICK,WAHLBERG,2006-02-15 09:34:33
3,ED,CHASE,2006-02-15 09:34:33
4,JENNIFER,DAVIS,2006-02-15 09:34:33
5,JOHNNY,LOLLOBRIGIDA,2006-02-15 09:34:33
6,BETTE,NICHOLSON,2006-02-15 09:34:33
7,GRACE,MOSTEL,2006-02-15 09:34:33
8,MATTHEW,JOHANSSON,2006-02-15 09:34:33
9,JOE,SWANK,2006-02-15 09:34:33
10,CHRISTIAN,GABLE,2006-02-15 09:34:33


You can also use pandas to create tables in your database using the `to_sql` method

In [22]:
from sklearn.datasets import load_iris
data = load_iris()
iris = pd.DataFrame(data['data'], columns=data['feature_names'])
iris.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [23]:
iris.to_sql('iris_dataset', engine, if_exists='replace')