### JSON Objects in PostgreSQL

**Author**: Aaron Liu & Ron Volkovinsky

**Date**: 5/10/2023

**Objective**: Practice basic insert queries using SQL, and the corresponding programming to automate the process in psycopg2

## Setup

Some functions in python are already given in the cells below:

**DOI JSON retrieval (doi2dict)**
(Credit Ron Volkovinsky)

In [5]:
import requests
import json
import pandas as pd
import bibtexparser
import pprint

def doi2dict(doi):
    #create url
    url = "http://dx.doi.org/" + doi
    
    #create dictionary of http bibtex headers that requests will retrieve from the url
    headers = {"accept": "application/x-bibtex"}
    
    #reqeusts information specified by bibtex from url
    r = requests.get(url, headers = headers).text    

    #parse the returned bibtex text to a dictionary
    #NOTE: USE bibtexparser.customization to split strings into list, etc. (https://bibtexparser.readthedocs.io/en/master/bibtexparser.html?highlight=bparser#module-bibtexparser.bparser)
    bibdata = bibtexparser.bparser.BibTexParser().parse(r)
    
    #return dict of metadata
    return bibdata.entries[0]

doi = '10.1021/acsami.1c20994'
doi2 = '10.1021/acscentsci.9b00476'

doidict = doi2dict(doi2)

dict

**Connection Details**

Fill in your connection details here. Note that `127.0.0.1`, `localhost`, and your **local IP address** (found using the `ipconfig` command in your command line) are all synonymous with your local computer as a server. If you are connecting to an external server, you of course need to find the appropriate connection details of that server.

I recommend creating your own database as a test environment for interacting with your database. You must do this either through psql or pgAdmin, externally from Python. Call the database whatever you want, like `pg_practice` or `ofetdb_testenv`, etc. Either way, the default username and password are what go into the connection details. The port by default for PostgreSQL is almost always `5432`, unless this was specified differently during your installation of PostgreSQL.

In [None]:
conn_kwargs = {
    "host"      : "localhost",
    "database"  : "test", ## FILL IN CONNECTION DETAILS HERE
    "user"      : "postgres",
    "password"  : "password",
    "port"      : "5432",
}

conn = pg.connect(**conn_kwargs)
print("Connection Successful")

conn.close()
print("Connection Closed")

In [16]:
# Postgres python
import psycopg2 as pg
import numpy as np
from psycopg2.extensions import AsIs

# import os
# import functools
# import sys

# Adapters necessary for converting python data types to PostgreSQL compatible data types 
def addapt_numpy_float64(numpy_float64):
    return AsIs(numpy_float64)

def addapt_numpy_int64(numpy_int64):
    return AsIs(numpy_int64)

def nan_to_null(f,
        _NULL=AsIs('NULL'),
        _Float=pg.extensions.Float):
    if not np.isnan(f):
        return _Float(f)
    return _NULL

pg.extensions.register_adapter(np.float64, addapt_numpy_float64)
pg.extensions.register_adapter(np.int64, addapt_numpy_int64)
pg.extensions.register_adapter(float, nan_to_null)

param_dict = {
    "host"      : "127.0.0.1",
    "database"  : "ofetdb_testenv",
    "user"      : "postgres",
    "password"  : "password",
    "port"      : "5432",
}

def connect(params_dict):
    """ Connect to the PostgreSQL database server """
    conn = None
    try:
        # connect to the PostgreSQL server
        print('Connecting to the PostgreSQL database...')
        conn = pg.connect(**params_dict)
    except (Exception, pg.DatabaseError) as error:
        print(error)
        sys.exit(1) 
    print("Connection successful")
    return conn

In [22]:
# Create a table that holds journal article information
sql = '''
    CREATE TABLE IF NOT EXISTS EXPERIMENT_INFO (
        exp_id              SERIAL          PRIMARY KEY,
        citation_type       VARCHAR(20),
        meta                JSONB,
        UNIQUE(citation_type, meta)
    );
'''

cur.execute(sql)

print("Table(s) created successfully")
conn.commit()

print("Operation successful")
conn.close()

dict

In [None]:
from psycopg2.extras import Json 

doi = '10.1021/acsami.1c20994'
doi2 = '10.1021/acscentsci.9b00476'

doidict = doi2dict(doi)

a = Json(doidict)
# type(a)
print(a)

In [7]:
import psycopg2

kwargs = {
    'database': 'test',
    'user': 'postgres',
    'password': 'password',
    'host': '127.0.0.1',
    'port': '5432'
}

# %% Create Tables for EXPERIMENT_INFO

conn = psycopg2.connect(**kwargs)

print("Connection Successful")

cur = conn.cursor()
cur.execute(
    '''
    CREATE TABLE IF NOT EXISTS EXPERIMENT_INFO (
        exp_id              SERIAL          PRIMARY KEY,
        citation_type       VARCHAR(20),
        meta                JSONB,
        UNIQUE(citation_type, meta)
    );
    '''
)

print("Table(s) created successfully")
conn.commit()

print("Operation successful")
conn.close()

Connnection Successful
Table(s) created successfully
Operation successful


In [26]:
Json(doidict)

<psycopg2._json.Json at 0x221a91df520>

In [44]:
sql = "INSERT INTO experiment_info(%s) VALUES %s"

columns = ['citation_type', 'meta']
values = ['literature', Json(doidict)]

tup = (AsIs(','.join(columns)), tuple(values))

conn = psycopg2.connect(**kwargs)

print("Connection Successful")

cur = conn.cursor()
cur.execute(sql, tup)

print("Table(s) created successfully")
conn.commit()

print("Operation successful")
conn.close()


Connection Successful


UniqueViolation: duplicate key value violates unique constraint "experiment_info_citation_type_meta_key"
DETAIL:  Key (citation_type, meta)=(literature, {"ID": "Lin_2019", "doi": "10.1021/acscentsci.9b00476", "url": "https://doi.org/10.1021%2Facscentsci.9b00476", "year": "2019", "month": "sep", "pages": "1523--1531", "title": "{BigSMILES}: A Structurally-Based Line Notation for Describing Macromolecules", "author": "Tzyy-Shyang Lin and Connor W. Coley and Hidenobu Mochigase and Haley K. Beech and Wencong Wang and Zi Wang and Eliot Woods and Stephen L. Craig and Jeremiah A. Johnson and Julia A. Kalow and Klavs F. Jensen and Bradley D. Olsen", "number": "9", "volume": "5", "journal": "{ACS} Central Science", "ENTRYTYPE": "article", "publisher": "American Chemical Society ({ACS})"}) already exists.


In [None]:
## I left off talking about inserting new tuples that already exist... and violating key constraints. What about sequencing?
## Let's insert like 5 doi's, see what happens