## Example usage of pg_pandas.py  
[(Click here for instructions on how to install Postgres on your local computer)](https://www.tutorialspoint.com/postgresql/postgresql_environment.htm)

pg_pandas.py is a module that facilitates the use of Pandas and SqlAlchemy when accessing Postgres databases.  Methods in the class PgPandas allow you to easily populate and read large Postgres tables, especially tables with binary data.

#### This ipynb notebook shows the following examples:
1. Remove tables and schemas from a postgres database
2. Create tables
3. Populate different types of tables from pandas DataFrames
4. Populate a table with binary blob data from DataFrames

___
## Section 1.0 -
Import data

In [None]:
import pandas as pd
import numpy as np
import pg_pandas as pg
import os,sys,glob
from icrawler.builtin import GoogleImageCrawler

In [None]:
import importlib
importlib.reload(pg)


___
## Section 1.1 -
Make sure that the Postgres engine (daemon) is running your computer

In [None]:
!pg_config --version

___
## Section 2.0 - 
Create an instance of PgPandas

In [None]:
pga2 = pg.PgPandas(databasename='testdb',username='',password='',dburl='localhost')
print(f'The tables are: {pga2.engine.table_names()}')

___
### Section 2.1 - 
Create a schema called test_schema

In [None]:
# drop all the tables from test_schema
pga2.exec_sql_raw("drop table if exists test_schema.craigslist")
pga2.exec_sql_raw("drop table if exists test_schema.ohlc")
pga2.exec_sql_raw("drop table if exists test_schema.expiry")
pga2.exec_sql_raw("drop table if exists test_schema.jpgs")

# recreate test_schema
pga2.exec_sql_raw("DROP SCHEMA IF EXISTS  test_schema;")
pga2.exec_sql_raw("create schema test_schema;")



___
### Section 2.2 - 
Create some tables in test_schema

In [None]:
# Create a table of craigslist data
sql = '''
create table test_schema.craigslist(
    id serial primary key,
    geo text,
    href text,
    listing text);
'''
pga2.exec_sql_raw(sql)

# Create a table for open,high,low,close bar data
sql = '''
create table test_schema.ohlc(
    symbol text not null,
    year integer not null,
    month integer not null,
    day integer not null,
    hour integer not null,
    minute integer not null,
    open numeric not null,
    high numeric not null,
    low numeric not null,
    trading_year integer not null,
    trading_month integer not null,
    trading_day integer not null,
    close numeric not null,
    adj_close numeric not null,
    volume integer not null,
    primary key(symbol,year,month,day,hour,minute));
'''
pga2.exec_sql_raw(sql)

# Create a table for open,high,low,close bar data
sql = '''
create table test_schema.jpgs(
    document_name text not null,
    document_binary bytea,
    primary key(document_name));
'''
pga2.exec_sql_raw(sql)


___
## Section 3.0 - 
Populate the tables that we previous created

___
### Section 3.1 - 
1. Read the craigslist csv file
2. Populate the craigslist table
3. Perform sql selects on the craigslist table

In [None]:
df_craigs = pd.read_csv('craig_20180210.csv')

In [None]:
pga2.write_df_to_postgres_using_metadata(df_craigs,'test_schema.craigslist')
pga2.get_sql("select * from test_schema.craigslist limit 20;")

In [None]:
sql = '''
select c.geo,count(*) from test_schema.craigslist c 
where c.href ~ 'bmw' and 
c.listing ~ '2002' 
group by c.geo 
order by count(*) desc 
limit 3;
'''
df_c = pga2.get_sql(sql)
df_c.head()

In [None]:
sql = '''
select id,geo,listing from test_schema.craigslist c 
where c.href ~ 'bmw' and 
c.listing ~ '2002'and  
c.geo = 'REPLACE_COUNTY' ;
'''
sql = sql.replace('REPLACE_COUNTY',str(df_c.iloc[0].geo))
df_c = pga2.get_sql(sql)
df_c

___
### Section 3.2 -
1. Read the csv data for the securities USO and SPY
2. Populate the ohlc table
3. Perform sql selects on the ohlc table


In [None]:
SYMBOLS_TO_LOAD = ['USO','SPY']
for sym in SYMBOLS_TO_LOAD:
    df_sym = pd.read_csv(f'{sym}.csv')
    df_sym['symbol'] = sym
    df_sym['year'] = df_sym['timestamp'].str[0:4]
    df_sym['month'] = df_sym['timestamp'].str[5:7]
    df_sym['day'] = df_sym['timestamp'].str[8:10]
    df_sym['hour'] = df_sym['timestamp'].str[11:13]
    df_sym['minute'] = df_sym['timestamp'].str[14:16]
    df_sym['trading_year'] = df_sym.tradingDay.str[0:4]
    df_sym['trading_month'] = df_sym.timestamp.str[5:7]
    df_sym['trading_day'] = df_sym.timestamp.str[8:10]
    adj_close_col = list(filter(lambda c: 'adj' in c,df_sym.columns.values))
    adj_close_col = 'close' if len(adj_close_col)==0 else adj_close_col[0]
    df_sym['adj_close'] = df_sym.apply(lambda r: r[adj_close_col],axis=1)
    cols = ['symbol','year','month','day','hour','minute','trading_year','trading_month','trading_day','open','high','low','close','adj_close','volume']
    df_sym = df_sym[cols]
    pga2.write_df_to_postgres_using_metadata(df_sym,'test_schema.ohlc')
pga2.get_sql('select count(*) from test_schema.ohlc;')


### Do a group by query of the ohlc table

In [None]:
sql = '''
select symbol,trading_year,trading_month,trading_day, avg(close),count(*)
from test_schema.ohlc o
group by symbol,trading_year,trading_month,trading_day
order by symbol,trading_year,trading_month,trading_day;
'''
pga2.get_sql(sql)

___
### Section 3.3 - 
Load and retrieve binary jpg data
1. Fetch sample jpg's using GoogleImageCrawler
2. Create a pandas DataFrame of the jpg's and their file paths,
3. Load the DataFrame to the postgres table test_schema.jpgs,
4. Use an sql statement to retrieve the data from test_schema.jpgs

In [None]:
try:
    for f in glob.glob("./temp_folder/cats/*.jpg"):
        os.remove(f)    
except Exception as e:
    pass
google_crawler = GoogleImageCrawler(storage={'root_dir': './temp_folder/cats'})
google_crawler.crawl(keyword='cat', max_num=20)
doc_full_path_list = [os.path.abspath(f'./temp_folder/cats/{s}') for s in  os.listdir('./temp_folder/cats')]
df_doc_binary = pga2.create_df_doc_binary_from_path_list(document_path_list=doc_full_path_list)
pga2.write_df_to_postgres_using_metadata(df_doc_binary,'test_schema.jpgs')
sql = '''
select * from test_schema.jpgs
'''
df_doc_binary_from_pg = pga2.get_sql(sql)
df_doc_binary_from_pg.head()

## End