# PostgreSQL & WRDS
by Dr Liang Jin

- Part of Mini Python Sessions: [github.com/drliangjin/minipy](https://github.com/drliangjin/minipy)

- Official Guide/Documentation of PostgreSQL: [www.postgresql.org/docs](https://www.postgresql.org/docs/current/tutorial.html)

- WRDS Python Connection: [wrds-www.wharton.upenn.edu](https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/python-from-your-computer/)

## Connect WRDS

### Initial Setup

In [None]:
# Install WRDS and associated modules necessary using pip
try:
    import wrds
except ImportError:
    !pip install wrds

In [None]:
# Create a connection to WRDS server
# You will be asked for your username and passwords

conn = wrds.Connection()

### Create pgpass file (OPTIONAL)

In [None]:
# store your passwords locally!

# conn.create_pgpass_file()

# Test connection using stored passwords
# conn = wrds.Connection(wrds_username = username)

### WRDS data Workflow

#### Datasets Overview

In [None]:
# get all WRDS databases/libraries/Schema
conn.list_libraries()

In [None]:
# get all tables stores in a specific library
conn.list_tables(library = 'crsp')

In [None]:
# determine the column headers within a given dataset
conn.describe_table(library = "crsp", table = "dse")

#### Peek into a dataset

In [None]:
# get_table approach
# limit our queries
conn.get_table(library = "crsp", table = "dsf", columns = ["cusip, permno, permco, date, prc, openprc, bid, bidlo, ask, askhi, vol, ret, retx"], obs = 10)

## SQL & PostgreSQL

SQL, or Structured Query Language, is a language designed to allow both technical and non-technical users:

- query data
- manipulate data
- transform data

from relational databases. PostgreSQL is arguably the best open-source relational database, others include MySQL, Oracle and Microsoft SQL server.

##### Basic syntax (1):
- `SELECT`: retrieve data
- `FROM`: from a table using a format of **library.table**
- `LIMIT`: limit observations, or we will stuck with a large data table for a very very long time...

In [None]:
# SQL quary statment

stmt = """
SELECT * 
FROM crsp.dsf 
LIMIT 10
"""

# Connect to WRDS PostgreSQL databases
conn.raw_sql(stmt)

##### Basic syntax (2):
- `*`: all columns in a table
- alternative: we can pick the columns we want

In [None]:
# SQL quary statment

stmt = """
SELECT cusip, permno, permco, date, prc, openprc, bid, bidlo, ask, askhi, vol, ret, retx
FROM crsp.dsf
LIMIT 10
"""

# Connect to WRDS PostgreSQL databases
conn.raw_sql(stmt).info()

##### Basic syntax (3):
- `WHERE`: we can send queries with constrains
- `AND/OR`: more conditions
- `NULL`: select query with null or non-null value

*Numerical Operators*
- `=, !=, <, <=, >, >=`: standard numerical operators
- `BETWEEN...AND...`: number is within range of two values (inclusive)
- `IN`: number exists in a list
- `NOT`: not...

In [None]:
# Demo

# SQL quary statment
stmt = """
SELECT cusip, permno, permco, date, prc, openprc, bid, bidlo, ask, askhi, vol, ret, retx
FROM crsp.dsf
WHERE permno = 14593
"""

# Connect to WRDS PostgreSQL databases
conn.raw_sql(stmt).head()

## Task Set 1:
1. Find the data with permno: permno: 14593 and 10107 (Apple and Microsoft)
2. Find the above data with date between 01/01/2000 and 31/12/2018, 
3. Find the above data with returns higher than 10%

#### Basic syntax (4):
- `ORDER BY`: select query with ordered results
- `LIMIT` and `OFFSET`(optional): select query with limited results

In [None]:
# Demo

# SQL quary statment
stmt = """
SELECT cusip, permno, permco, date, prc, openprc, bid, bidlo, ask, askhi, vol, ret, retx
FROM crsp.dsf
ORDER BY cusip DESC
LIMIT 100 OFFSET 10
"""

# Connect to WRDS PostgreSQL databases
conn.raw_sql(stmt).head()

#### Intermedia Syntax (1)
- `INNER JOIN`:
- `LEFT JOIN` and `RIGHT JOIN`:
- `OUTER JOIN`:

In [None]:
# Demo

# SQL quary statment
dsf = """
SELECT permno, date, prc, ret
FROM crsp.dsf
WHERE permno = 14593
ORDER BY date DESC
LIMIT 10
"""

# Connect to WRDS PostgreSQL databases
conn.raw_sql(dsf)

In [None]:
# Demo

# SQL quary statment
dse = """
SELECT permno, date, event
FROM crsp.dse
WHERE permno = 14593
ORDER BY date DESC
LIMIT 10
"""

# Connect to WRDS PostgreSQL databases
conn.raw_sql(dse)

In [None]:
# Demo

# SQL quary statment
join = """
SELECT crsp.dsf.permno, crsp.dsf.date, crsp.dsf.prc, crsp.dsf.ret, crsp.dse.event
FROM crsp.dsf
INNER JOIN crsp.dse
ON crsp.dsf.permno = crsp.dse.permno
AND crsp.dsf.date =  crsp.dse.date
WHERE crsp.dsf.permno = 14593
ORDER BY date DESC
LIMIT 10
"""

# Connect to WRDS PostgreSQL databases
conn.raw_sql(join).head()

# Try INNER/LEFT/RIGHT/OUTER

### Task Set 2:
- Try INNER/LEFT/RIGHT/OUTER JOIN

#### Intermedia Syntax (2):
- `AS`: regular columns and even tables can have aliases to make them easier to reference

In [None]:
# Demo

# SQL quary statment
join = """
SELECT a.permno, a.date, a.prc AS price, a.ret * 100 AS return, b.event
FROM crsp.dsf AS a
INNER JOIN crsp.dse AS b
ON a.permno = b.permno
AND a.date =  b.date
WHERE a.permno = 14593
ORDER BY a.date DESC
LIMIT 10
"""

# Connect to WRDS PostgreSQL databases
conn.raw_sql(join).head()

# Try INNER/LEFT/RIGHT/OUTER

#### Intermedia Syntax (3):
- `AGG_FUNC` (e.g., `AVG`, `MAX` and so on): aggregate expressions that allow us to summarize information about a group of rows of data
- `DISTINCT`: remove duplicates
- `GROUP BY`: specify individual groups
- `HAVING`: a `WHERE` condition for the `GROUP BY` clause to filter grouped rows from the result set

In [None]:
# Demo

# SQL quary statment
stmt = """
SELECT permno, MAX(prc) AS max_prc, AVG(ret) AS avg_ret
FROM crsp.dsf
WHERE permno IN (14593, 10107)
GROUP BY permno
"""

# Connect to WRDS PostgreSQL databases
conn.raw_sql(stmt)

#### Advanced Features:
- Sub-queries: nest queries to allow complicated dataset transformation (for example, `FROM (SELECT * FROM...)`)
- Window Functions: perform a comprehensive calculation across a set of table rows

In [None]:
# SQL quary statment
stmt = """
SELECT permno, ret, 
AVG(ret) OVER (PARTITION BY permno) AS avg_ret, 
ret - AVG(ret) OVER (PARTITION BY permno) AS abnorm_ret
FROM crsp.dsf
WHERE permno IN (14593, 10107)
"""

# Connect to WRDS PostgreSQL databases
conn.raw_sql(stmt)