# HOW TO: Join & Use Dates in PSQL Using Python: *(PSYCOPG2)*
-------------------------- **<font color=red>With Mr Fugu Data Science</font>** -------------------------

[Github](github.com/MrFuguDataScience) | [Youtube](https://www.youtube.com/channel/UCbni-TDI-Ub8VlGaP8HLTNw)

# Purpose & Outcome:

+ `Create New Table with schema as a function`
    + This will be used for joining examples

+ `Perform Inner Join`
    + Show Multiple Examples of what could go wrong
+ Query By Date:
    + Show a few examples



`-------------------------------------------------`

# Create *`Init file`*: used to store credentials, for security.

[Create Init File](https://towardsdatascience.com/python-and-postgresql-how-to-access-a-postgresql-database-like-a-data-scientist-b5a9c5a0ea43) | [PostgreSQL Tutotial](https://www.postgresqltutorial.com/postgresql-python/connect/)

**1<sup>st</sup>** ) : create the initialization file:

use your terminal or notepad, unless your favorite editor directly converts to this format .ini file.

on Mac I did touch database_file_init.ini

*vi database_file.ini then inside the file type following lines:*

`[postgresql]`

`host=localhost`

`database=what_databse_you_want_to_access`

`user= some_user_you_created_for_this_user_in_psql`

`password= some_password_you_have_to_this_db`


`__________________________________________________________`

**2<sup>nd</sup>** ) : now the config.py file this will be used to take data from init file and outputs a dictionary.

this file will look like this as an example: {‘host’: ‘localhost’, ‘database’: ‘suppliers’, ‘user’: ‘postgres’, ‘password’: ‘postgres’} when it is read.
HERE IS THE CODE:


If you are working in a real world scenario, you will most likely need to hide your credentials, permissions, login & password that is where these files shine.

[create init and config files](https://towardsdatascience.com/python-and-postgresql-how-to-access-a-postgresql-database-like-a-data-scientist-b5a9c5a0ea43) | [Python_configparser doc](https://docs.python.org/3/library/configparser.html) | [Postgres_tutorial_Python_PSQL](https://www.postgresqltutorial.com/postgresql-python/connect/)

**1<sup>st</sup> )** : create the initialization file:

use your terminal or notepad, unless your favorite editor directly converts to this format .ini file.
on Mac I did touch database_file_init.ini
vi database_file.ini then inside the file type following lines:
[postgresql]
host=localhost
database=what_databse_you_want_to_access
user= some_user_you_created_for_this_user_in_psql
password= some_password_you_have_to_this_db

------------------------------------------------

**2<sup>nd</sup> )** : now the config.py file this will be used to take data from init file and outputs a dictionary.

this file will look like this as an example: *{‘host’: ‘localhost’, ‘database’: ‘suppliers’, ‘user’: ‘postgres’, ‘password’: ‘postgres’}* when it is read.
HERE IS THE CODE:

____________________________________________________

`!/usr/bin/python (can also do virtenv,env)
from configparser import ConfigParser`

`def config(filename='database.ini', section='postgresql'):`

`# create a parser`

`parser = ConfigParser()`

`# read config file`

`parser.read(filename)`

`# get section, default to postgresql`

`db = {}`

`# Checks to see if section (postgresql) parser exists`

`if parser.has_section(section):
    params = parser.items(section)
    for param in params:
        db[param[0]] = param[1]`

`# Returns an error if a parameter is called that is not listed in the initialization file`

`else:
    raise Exception('Section {0} not found in the {1} file'.format(section, filename))
return db`



`---------------------------------------------`

**This code** (*Above*) **was adpated from online material on Postgres website and used in 1st & 3rd link above**

# Helpful Tips:

+ Admin: the prompt will have `database_name#` vs user: `database_name$`

+ `\h` will give list of SQL commands

+ `\?` describes psql specific commands


`-----------------------------------------------------`

PSQL is a `Relational` database and therefore follows the convention of: 

`tables` are "relations",

`attributes`: columns, 

`tuples`: rows

+ You will always need to create a `schema` beforehand. Because you are dealing with a "design first" database. 

+ Queries are run inside a transaction, ensuring data integrity and error handling. But,

+ not all queries can be run inside a transaction either. 

In [1]:
import psycopg2             # python->psql connection
import psycopg2.extras
import pandas as pd         # create dataframes 
# import os                   # fetch files
import numpy as np
from faker import Factory,Faker # Create fake data to use for join-tables

import io

# Import the 'config' function from the config_user_dta.py file:
from config_user_dta import config

In [2]:
# Establish a connection to the database by creating a cursor object

# Get the config params
params_ = config()

# Connect to the Postgres_DB:
conn = psycopg2.connect(**params_)

# Create new_cursor allowing us to write Python to execute PSQL:
cur = conn.cursor()

conn.autocommit = True  # read documentation understanding when to Use & NOT use (TRUE)

In [3]:
fake_ppl=pd.read_csv('fake_users_R.csv')

k=pd.DataFrame(fake_ppl)
n=pd.DataFrame(k.drop('Unnamed: 0', axis=1))


n.to_csv('noIndx.csv',index=False)


In [4]:
'''
Creating Fake CPU's that customers purchased, with country of purchase,linking them 
for joining tables later by foreign keys.
'''
fake_data=Faker()

cpus=[]
for _ in range(len(fake_ppl)//2):# len//2 I want the same length as dataframe, 2 cpu types
    cpus.append(fake_data.numerify(text='Intel Core i%-%%##K'))
    cpus.append(fake_data.numerify(text='AMD Ryzen % %%##X'))
len(cpus)
# len(fake_ppl)
cpus[:5]

['Intel Core i1-7554K',
 'AMD Ryzen 1 5827X',
 'Intel Core i5-9457K',
 'AMD Ryzen 4 3401X',
 'Intel Core i6-7283K']

In [5]:
# Create List of fake purchase dates:

fake_data.seed(10)

purchase_dates=[]

for _ in range(len(fake_ppl)):
    purchase_dates.append(fake_data.date_between(start_date='-3y', end_date='today'))
    
purchase_dates[:6]

[datetime.date(2019, 10, 31),
 datetime.date(2017, 7, 16),
 datetime.date(2019, 3, 24),
 datetime.date(2019, 6, 15),
 datetime.date(2019, 11, 11),
 datetime.date(2017, 6, 19)]

In [6]:
# TABLE WILL BE USED FOR OUR JOIN:
#Note:Can use np.column_stack instead of zip(),this is mainly for preference and speedup

join_table_=pd.DataFrame(np.column_stack([fake_ppl['credit_card'],cpus,purchase_dates]),
             columns=['credit_card','cpu','purchase_date'])

join_table_.head()

Unnamed: 0,credit_card,cpu,purchase_date
0,5399-3484-4724-7187,Intel Core i1-7554K,2019-10-31
1,1630-5261-6108-7631,AMD Ryzen 1 5827X,2017-07-16
2,4435-3866-1076-3595,Intel Core i5-9457K,2019-03-24
3,3489-7099-9906-8660,AMD Ryzen 4 3401X,2019-06-15
4,8631-4500-5666-1510,Intel Core i6-7283K,2019-11-11


In [7]:
# CREATE TABLE: staging_fake_cpu_purchases

def create_staging_table(cursor):
    cursor.execute("""
        DROP TABLE IF EXISTS staging_fake_cpu_purchases;
        CREATE UNLOGGED TABLE staging_fake_cpu_purchases (
            credit_card      TEXT PRIMARY KEY,
            cpu              TEXT,
            purchase_date    DATE NOT NULL
        );""")


In [8]:
def create_fake_ppl_table(cursor):
    cursor.execute("""
        DROP TABLE IF EXISTS staging_fake_ppl;
        CREATE UNLOGGED TABLE staging_fake_ppl (
            credit_card         TEXT PRIMARY KEY,
            email               TEXT,
            first_name          TEXT,
            last_name           TEXT,
            primary_phone       TEXT
        );""")

with conn.cursor() as cursor:
    create_fake_ppl_table(cursor)
    

# Sending Fake CPU Purchases  to PSQL:
+ First convert to a .CSV(), then use the function `send_csv_to_psql` to send the data


In [9]:
join_table_.to_csv('cpu_purchase.csv',index=False)


def send_csv_to_psql(connection,csv,table_):
    sql = "COPY %s FROM STDIN WITH CSV HEADER DELIMITER AS ','"
    file = open(csv, "r")
    table = table_
    with connection.cursor() as cur:
        cur.execute("truncate " + table + ";")  # avoiding uploading duplicate data!
        cur.copy_expert(sql=sql % table, file=file)
        conn.commit()
#         cur.close() # Omit these to lines because we don't want to finish connection yet
#         conn.close()
    return conn.commit()


# Sending Fake Purchases to PSQL From Python:
send_csv_to_psql(conn,'cpu_purchase.csv','staging_fake_cpu_purchases')



In [10]:
# Sending Fake People to PSQL FROM Python:

send_csv_to_psql(conn,'noIndx.csv','staging_fake_ppl')

# Simple Query with Psycog2:¶
when doing a SELECT query use: fetchone( ), fetchall( ) or fetchmany( ) methods



In [11]:
# Query TO Verify: fake people was inserted into psql
sql_="SELECT * FROM staging_fake_ppl"
cur.execute(sql_)
cur.fetchone()

('5399-3484-4724-7187',
 'gso@qiegan.sqe',
 'Donyell Ann',
 'Ospina',
 '5219459148')

In [12]:
sq_="SELECT * FROM staging_fake_cpu_purchases"
cur.execute(sq_)
cur.fetchone()

('5399-3484-4724-7187', 'Intel Core i1-7554K', datetime.date(2019, 10, 31))

# Inner Join:
+ We will take the CPU purchases table and fake people table and join based on what is in 
common. For us that will be a Primary Key, of a credit card number.

This is our Query: It cannot fit in the line to explain:
 + First: select all from both tables
 + Second: Inner Join (*common column*) between both
 + Third: Declare what we will join based on for both

`sql_= "SELECT staging_fake_cpu_purchases .*,staging_fake_ppl FROM staging_fake_cpu_purchases INNER JOIN staging_fake_ppl ON staging_fake_ppl.credit_card=staging_fake_cpu_purchases.credit_card"
`


`sql_= "SELECT staging_fake_cpu_purchases.cpu,staging_fake_cpu_purchases.purchase_date,staging_fake_ppl .* FROM staging_fake_cpu_purchases INNER JOIN staging_fake_ppl ON staging_fake_ppl.credit_card=staging_fake_cpu_purchases.credit_card"
`

[Explaining How Joins Work](https://dzone.com/articles/how-to-perform-joins-in-apache-hive)

# Example 01: Inner Join (*Duplicate*) Column

In [13]:
sql_= "SELECT staging_fake_cpu_purchases .*,staging_fake_ppl .* FROM staging_fake_cpu_purchases INNER JOIN staging_fake_ppl ON staging_fake_ppl.credit_card=staging_fake_cpu_purchases.credit_card"

cur.execute(sql_)
cur.fetchone()

('5399-3484-4724-7187',
 'Intel Core i1-7554K',
 datetime.date(2019, 10, 31),
 '5399-3484-4724-7187',
 'gso@qiegan.sqe',
 'Donyell Ann',
 'Ospina',
 '5219459148')

# Example 02: Your data from one table shows up at string
+ Your entire entry for each row is stored as a sting separated by commas

In [14]:
sql_= "SELECT staging_fake_cpu_purchases .*,staging_fake_ppl FROM staging_fake_cpu_purchases INNER JOIN staging_fake_ppl ON staging_fake_ppl.credit_card=staging_fake_cpu_purchases.credit_card"

cur.execute(sql_)
cur.fetchone()

('5399-3484-4724-7187',
 'Intel Core i1-7554K',
 datetime.date(2019, 10, 31),
 '(5399-3484-4724-7187,gso@qiegan.sqe,"Donyell Ann",Ospina,5219459148)')

# Example 03: Fixing the Data to your proper format of `INNER JOIN`

In [15]:
sql="SELECT staging_fake_ppl .*, staging_fake_cpu_purchases.cpu,staging_fake_cpu_purchases.purchase_date FROM staging_fake_cpu_purchases INNER JOIN staging_fake_ppl ON staging_fake_ppl.credit_card=staging_fake_cpu_purchases.credit_card"
cur.execute(sql)
cur.fetchone()

('5399-3484-4724-7187',
 'gso@qiegan.sqe',
 'Donyell Ann',
 'Ospina',
 '5219459148',
 'Intel Core i1-7554K',
 datetime.date(2019, 10, 31))

# Create Table from a Join:
+ This can be done in two ways, `Create Temp Table` or `Create Table`. Depeding on your needs.

+ A `Temp` table will not be stored after you exit psql. It is useful when you would like to run a query based on data you access often, but do not need to have a copy.

+ Creating a new table from joined data is useful, when you access the data enough where you need a copy. This can be useful to store separately from original separated tables.


**Ex.)**

` "CREATE TABLE ppl_cpu_purchases AS SELECT staging_fake_ppl .*,staging_fake_cpu_purchases.cpu,staging_fake_cpu_purchases.purchase_date FROM staging_fake_cpu_purchases INNER JOIN staging_fake_ppl ON staging_fake_ppl.credit_card=staging_fake_cpu_purchases.credit_card"
`

# We are creating a NEW Table from our join.
+ The columns will be Everything from fake_ppl (Credit_card,Name:first,last,phone) and from cpu_purchases: (credit_card,cpu_type,date).

+ The Column we INNER JOIN will not be duplicated

[Help](https://www.postgresqltutorial.com/postgresql-create-table-as/)

In [82]:

sql_c= "CREATE TABLE ppl_cpu_purchases AS SELECT staging_fake_ppl .*,staging_fake_cpu_purchases.cpu,staging_fake_cpu_purchases.purchase_date FROM staging_fake_ppl INNER JOIN staging_fake_cpu_purchases ON staging_fake_ppl.credit_card=staging_fake_cpu_purchases.credit_card"

cur.execute(sql_c)
# cur.fetchone()

In [83]:
sq="SELECT * FROM ppl_cpu_purchases LIMIT 3"
cur.execute(sq)
cur.fetchall()

[('5399-3484-4724-7187',
  'gso@qiegan.sqe',
  'Donyell Ann',
  'Ospina',
  '5219459148',
  'Intel Core i1-7554K',
  datetime.date(2019, 10, 31)),
 ('1630-5261-6108-7631',
  'xnji@gfruaxqnvm.fha',
  'Bishop',
  'Siyed',
  '4164254716',
  'AMD Ryzen 1 5827X',
  datetime.date(2017, 7, 16)),
 ('4435-3866-1076-3595',
  'dvyco@tkzhsop.zxg',
  'Connor',
  'Powers',
  '3627413915',
  'Intel Core i5-9457K',
  datetime.date(2019, 3, 24))]

# Query by Date:

In [71]:
# Show the days elapsed for an order, printing name and date:

sq="SELECT first_name, last_name, now() - purchase_date as diff FROM ppl_cpu_purchases"

cur.execute(sq)
cur.fetchone()

('Donyell Ann',
 'Ospina',
 datetime.timedelta(days=209, seconds=53741, microseconds=778610))

In [72]:
#Query Date Range:

q="select * from ppl_cpu_purchases WHERE purchase_date BETWEEN '2019-05-01' and now()"
cur.execute(q)
cur.fetchone()

('5399-3484-4724-7187',
 'gso@qiegan.sqe',
 'Donyell Ann',
 'Ospina',
 '5219459148',
 'Intel Core i1-7554K',
 datetime.date(2019, 10, 31))

In [80]:
# Find All Purchases in the Last 10 Days:

d="SELECT * FROM ppl_cpu_purchases WHERE purchase_date > current_date - interval '10' day"

cur.execute(d)
cur.fetchone()

('4581-8717-5316-8278',
 'xcukr@msre.uln',
 'Christina',
 'Trustee',
 '5294387139',
 'AMD Ryzen 7 8559X',
 datetime.date(2020, 5, 21))

# Thanks for watching and Don't Forget TO<font color=red> SUBscribe</font>

+ Throw a <font color=red>LIKE</font> & Leave a <font color=red>comment</font>

`---------------------------------------------------------------`

# Citations & Help:

https://www.postgresqltutorial.com/postgresql-create-table-as/

https://www.postgresqltutorial.com/postgresql-show-tables/

https://www.postgresqltutorial.com/postgresql-date/