# `Psycopg2 Dict_Hstore` : Take semi-structured data and use it within postgresql 

# <font color=red>Mr Fugu Data Science</font>


# (◕‿◕✿)

# Purpose & Outcome:

+ Create a new way of formating data into dictionaries [or sets] using Postgresql
    + Query the new formatted data
    
    
`------------------------`


**Starting DATA**: 

| Employer      	| Year_begin 	| Year_end 	| Dictionary                                                                 	|
|---------------	|------------	|----------	|----------------------------------------------------------------------------	|
| Mr Han        	| 1977       	| 2019     	| {"first_name":"Chewie", "last_name":"Wookie","occupation":"thrill seeker"} 	|
| Self employed 	| 1983       	| 1985     	| {"first_name":"Ewok","last_name":"Endor",   "occupation":"forest dweller"} 	|
| self          	| 1999       	| 2011     	| {"first_name":"Harry","last_name":"P", "occupation":"magic"}               	|


**End Result PSQL**: Notice the `=>` this will be used to query against!

| id 	| Employer      	| Year_begin 	| Year_end 	|        H_store                                                     	|
|----	|---------------	|------------	|----------	|-------------------------------------------------------------------------------	|
| 1  	| Mr Han        	| 1977       	| 2019     	| "last_name"=>"Wookie", "first_name"=>"Chewie",  "occupation"=>"thrill seeker" 	|
| 2  	| Self employed 	| 1983       	| 1985     	| "last_name"=>"Endor", "first_name"=>"Ewok",  "occupation"=>"forest dweller"   	|
| 3  	| self          	| 1999       	| 2011     	| "last_name"=>"P", "first_name"=>"Harry",  "occupation"=>"magic"               	|

# What is a practical use for `Hstore` anyway?

+ Assume that you had a bunch of key value pairs that you wanted to store in one column, instead of tracking down all of your data in either different columns or tables. 
    + You can update, insert, track everything related to these data, create triggers etc. You're free to do as you want and have all the data stored in one location. 
    + Query operations are also available: only with a shift in syntax
+ Postgresql documentation also suggests that in circumstances of semi-structured data or rarely used data can also benefit from using `Hstore`.

**Also, Only one key is stored: if you have duplicates will not be saved and there is no way to know which one will be saved.** In the case that you have mapping of multiple mappings to the same key; this will not be useful. 
    
    
+ `Upside`: Don't have to predefine keys
    + stores, as a single column of your key:value pairs as a string
    + good for semi/unstructured data, that normally isn't psql
    + different types of indexing: such as GiN, GiST
+ `Downsides`:
    + No nesting of data
    + stored as a string, so datatypes are now limited.
    
`If you are interested in Hstore vs JSON vs JSONB`:
https://www.citusdata.com/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/

`------------------------------`

To use it we need to import: `from psycopg2.extras import register_hstore `



In [1]:
from psycopg2.extras import register_hstore # create dictionaries
import psycopg2  # make connection to/from python-psql
import pandas as pd # DF stuff

# Import the 'config' function from the config_user_dta.py file:
from config_user_dta import config  # call my user credentials

`If you do not want to create init or config files: do something similar to this`

**`import psycopg2
conn = psycopg2.connect("dbname=test user=postgres") # Connect to an existing database
cur = conn.cursor() #Open a cursor to perform database operations`**

In [2]:
# Establish a connection to the database by creating a cursor object

# Get the config params
params_ = config()

# Connect to the Postgres_DB:
conn = psycopg2.connect(**params_)

# Create new_cursor allowing us to write Python to execute PSQL:
cur = conn.cursor()

conn.autocommit = True  # read documentation understanding when to Use & NOT use (TRUE)

# psycopg2.extras.register_hstore(conn)

In [3]:
# Data:

names_occup = [("Mr Han",1977,2019,{"first_name":"Chewie",
"last_name":"Wookie","occupation":"thrill seeker"},),
("self_empl",1983,1985,{"first_name":"Ewok","last_name":"Endor",
                        "occupation":"forest dweller"},),
("self",1999,2011,{"first_name":"Harry","last_name":"P","occupation":"magic"},)]


# Hstore:

+ In order to use *Hstore* we have to use/install the extension which is used in this function below

In [7]:
# Create a Table to store data: (you have to establish the extension to use hstore)

def create_staging_table(cursor):
    cursor.execute("""
        CREATE EXTENSION hstore;
        DROP TABLE IF EXISTS h_dct_prac;
        CREATE UNLOGGED TABLE  h_dct_prac (
        ID serial NOT NULL PRIMARY KEY,
      employer text, yr_begin INT, 
      yr_end INT, fun_col_info hstore);""")

In [8]:
# Send the Schema to PSQL

with conn.cursor() as cursor:
    create_staging_table(cursor)

# Convert Formatting to use with Postgresql `[H_store]`

# INSERT: `One`

+ Pay ATTENTION: to the string formatting, it is highly important

In [11]:
# Only Insert entry:

sql_="""
INSERT INTO h_dct_prac (employer,yr_begin,yr_end,fun_col_info) VALUES (%s,%s,%s,%s)
"""

nemo=['everyone',
  1677,
  2020,
  '"first_name"=>"Santa","last_name"=>"Clause","occupation"=>"gift giver"']
cur.execute(sql_, nemo)

In [12]:
sq="""select * from h_dct_prac"""
cur.execute(sq)
cur.fetchall()

[(1,
  'everyone',
  1677,
  2020,
  '"last_name"=>"Clause", "first_name"=>"Santa", "occupation"=>"gift giver"')]

 # Insert: `MANY`
 
 + You have to do some trick string formatting and it will work! 
 + There is NO where else online that has done this, believe me.

In [16]:
'''
 First we need to do some tricky strick manipulation to achieve formatting that h_store
 will use.
 
 ['"=>"'.join(tups) for tups in i[3].items() ]
 
 This list comprehension will format so 
 your strings will have 'key"=>"value' but we will need one more double qout one each 
 side of key and value to be correct:  '"key"=>"value"' 
 
 That is done with the second loop and list comprehension:
 
 ['"'+item + '"' for item in i[3]]
 
 
'''

r_p=[] # stores first part format: [employer,yr_begin,yr_end,str_partial_format]
final_format=[]

for i in names_occup:
    first_format_step=['"=>"'.join(tups) for tups in i[3].items() ]
    r_p.append([i[0],i[1],i[2],first_format_step])


# finishing formatting of the hstore: attributes 

for i in r_p:
#     print(i[3])
    g=['"'+item + '"' for item in i[3]] # finish formating: 
    
    gg=','.join(g) # separating strings by comma
    
    final_format.append([i[0],i[1],i[2],gg])
final_format[1]

['self_empl',
 1983,
 1985,
 '"first_name"=>"Ewok","last_name"=>"Endor","occupation"=>"forest dweller"']

In [14]:
sql_="""
INSERT INTO h_dct_prac (employer,yr_begin,yr_end,fun_col_info) VALUES (%s,%s,%s,%s)
"""

for i in final_format:
#     print(i)
    emp, yr_b,yr_e,hstore=i
    cur.execute(sql_, [emp, yr_b, yr_e, hstore])
#     cur.execute(sql_,i)

In [33]:
sql="""
select * from h_dct_prac
"""
cur.execute(sql)
cur.fetchall()

[(2,
  'Mr Han',
  1977,
  2019,
  '"last_name"=>"Wookie", "first_name"=>"Chewie", "occupation"=>"thrill seeker"'),
 (3,
  'self_empl',
  1983,
  1985,
  '"last_name"=>"Endor", "first_name"=>"Ewok", "occupation"=>"forest dweller"'),
 (4,
  'self',
  1999,
  2011,
  '"last_name"=>"P", "first_name"=>"Harry", "occupation"=>"magic"'),
 (1,
  'everyone',
  1677,
  2020,
  '"last_name"=>"Clause", "first_name"=>"Mr_Santa", "occupation"=>"gift giver"')]

# UPDATE: existing key-value pair

In [35]:
sq_santa="""
UPDATE h_dct_prac SET fun_col_info = fun_col_info || 
'"first_name"=>"Mr_Santa"' 
WHERE id = 1;  
"""

cur.execute(sq_santa)

In [36]:
cur.execute('''select * from h_dct_prac''')
cur.fetchall()

[(2,
  'Mr Han',
  1977,
  2019,
  '"last_name"=>"Wookie", "first_name"=>"Chewie", "occupation"=>"thrill seeker"'),
 (3,
  'self_empl',
  1983,
  1985,
  '"last_name"=>"Endor", "first_name"=>"Ewok", "occupation"=>"forest dweller"'),
 (4,
  'self',
  1999,
  2011,
  '"last_name"=>"P", "first_name"=>"Harry", "occupation"=>"magic"'),
 (1,
  'everyone',
  1677,
  2020,
  '"last_name"=>"Clause", "first_name"=>"Mr_Santa", "occupation"=>"gift giver"')]

# Query:

+ Specific Key
+ h_store column
+ WHERE clause
+ Look for a key-value pair
+ multiple keys at once
+ Return either all keys or values from a column

In [17]:
# Select specific key:

s='''
SELECT
fun_col_info -> 'first_name' AS f_n
FROM
h_dct_prac;
'''
cur.execute(s)
cur.fetchall()

[('Santa',), ('Chewie',), ('Ewok',), ('Harry',)]

In [18]:
# h_store column:

s_='''
SELECT
fun_col_info
FROM
h_dct_prac;
'''
cur.execute(s_)
cur.fetchall()

[('"last_name"=>"Clause", "first_name"=>"Santa", "occupation"=>"gift giver"',),
 ('"last_name"=>"Wookie", "first_name"=>"Chewie", "occupation"=>"thrill seeker"',),
 ('"last_name"=>"Endor", "first_name"=>"Ewok", "occupation"=>"forest dweller"',),
 ('"last_name"=>"P", "first_name"=>"Harry", "occupation"=>"magic"',)]

In [19]:
# Where clause:

seq='''
SELECT
yr_end,  fun_col_info -> 'first_name' AS f_n,fun_col_info ->'last_name' as l_n
FROM
h_dct_prac
WHERE
fun_col_info -> 'occupation' = 'magic';
'''
cur.execute(seq)
cur.fetchall()

[(2011, 'Harry', 'P')]

In [28]:
# Look for a key-value pair with this (@>) operator

s_kv='''
SELECT
employer
FROM
h_dct_prac
WHERE
fun_col_info @> '"last_name"=>"Wookie"' :: hstore;
'''
cur.execute(s_kv)
cur.fetchall()

[('Mr Han',)]

In [20]:
# has a key? : 

cur.execute( "select * from h_dct_prac where fun_col_info ? %s;",('first_name', ))
cur.fetchall()

[(1,
  'everyone',
  1677,
  2020,
  '"last_name"=>"Clause", "first_name"=>"Santa", "occupation"=>"gift giver"'),
 (2,
  'Mr Han',
  1977,
  2019,
  '"last_name"=>"Wookie", "first_name"=>"Chewie", "occupation"=>"thrill seeker"'),
 (3,
  'self_empl',
  1983,
  1985,
  '"last_name"=>"Endor", "first_name"=>"Ewok", "occupation"=>"forest dweller"'),
 (4,
  'self',
  1999,
  2011,
  '"last_name"=>"P", "first_name"=>"Harry", "occupation"=>"magic"')]

In [26]:
# All keys:

s='''
SELECT
akeys (fun_col_info)
FROM
h_dct_prac;
'''
cur.execute(s)
cur.fetchall()

[(['last_name', 'first_name', 'occupation'],),
 (['last_name', 'first_name', 'occupation'],),
 (['last_name', 'first_name', 'occupation'],),
 (['last_name', 'first_name', 'occupation'],)]

In [27]:
# Return all keys as a set:

s='''
SELECT
skeys (fun_col_info)
FROM
h_dct_prac;
'''
cur.execute(s)
cur.fetchall()

[('last_name',),
 ('first_name',),
 ('occupation',),
 ('last_name',),
 ('first_name',),
 ('occupation',),
 ('last_name',),
 ('first_name',),
 ('occupation',),
 ('last_name',),
 ('first_name',),
 ('occupation',)]

In [23]:
# All values:
s_v='''
SELECT
avals (fun_col_info)
FROM
h_dct_prac;
'''
cur.execute(s_v)
cur.fetchall()

[(['Clause', 'Santa', 'gift giver'],),
 (['Wookie', 'Chewie', 'thrill seeker'],),
 (['Endor', 'Ewok', 'forest dweller'],),
 (['P', 'Harry', 'magic'],)]

# Convert to JSON:

In [21]:
yay='''
SELECT
  employer,
  hstore_to_json(fun_col_info) json
FROM
  h_dct_prac ;
'''

cur.execute(yay)
cur.fetchall()

[('everyone',
  {'last_name': 'Clause', 'first_name': 'Santa', 'occupation': 'gift giver'}),
 ('Mr Han',
  {'last_name': 'Wookie',
   'first_name': 'Chewie',
   'occupation': 'thrill seeker'}),
 ('self_empl',
  {'last_name': 'Endor',
   'first_name': 'Ewok',
   'occupation': 'forest dweller'}),
 ('self', {'last_name': 'P', 'first_name': 'Harry', 'occupation': 'magic'})]

# <font color=red>LIKE</font>, Share &

# <font color=red>SUB</font>scribe

# Citations & Help:

# ◔̯◔

https://silo.tips/download/advanced-access-to-postgresql-from-python-with-psycopg2
    
https://www.postgresqltutorial.com/postgresql-hstore/

https://www.postgresql.org/docs/9.1/hstore.html

https://fladriss.wordpress.com/2013/07/25/python-and-postgres-hstores/

https://www.ibm.com/cloud/blog/an-introduction-to-postgresqls-hstore

https://www.citusdata.com/blog/2016/07/14/choosing-nosql-hstore-json-jsonb/

https://www.geeksforgeeks.org/python-join-tuple-elements-in-a-list/