## <span style=color:blue>Illustrations of using python to load and manipulate Postgres data    </span>

In [1]:
# These are boiler plate imports that seem useful
# Perhaps cleaner would be to delete or comment out the ones that aren't used in this script...

import sys
import json
import csv
import yaml

import pandas as pd
import numpy as np

import matplotlib as mpl

## <span style=color:blue>Setting up Postgres connection.  Note database name is "Small_Examples" </span>

In [2]:
import psycopg2

In [3]:
# following https://earthly.dev/blog/psycopg2-postgres-python/

db_conn = psycopg2.connect(dbname='Small_Examples',
                           user='postgres',
                           password='postgres',
                           host='localhost',
                           port=5432)

print("Successfully connected to the database.")

# Actually, rather than putting the user and password directly into the code, we should
#   set up a configuration file, e.g., "db_info.ini" (see the URL given above

Successfully connected to the database.


<span style=color:blue>One way to interact with the database is to use a "cursor". Think of the cursor as being able to run something that you highlight in a DBeaver SQL script window.  </span>

In [4]:
db_cursor = db_conn.cursor()

In [5]:
q1 = ''' 
SELECT table_name
FROM information_schema.tables
WHERE table_schema='company'
  AND table_type='BASE TABLE';
'''
db_cursor.execute(q1)

print(db_cursor.fetchone())


('works_on',)


In [6]:
print(db_cursor.fetchmany(10))

[('dependent',), ('dept_locations',), ('project',), ('department',), ('employee',)]


<span style=color:blue>We can do a loop on the contents of the cursor</span

In [7]:
db_cursor.execute(q1)

for record in db_cursor:
    print(record)

('works_on',)
('dependent',)
('dept_locations',)
('project',)
('department',)
('employee',)


<span style=color:blue>I want my program to run on the company schema in particular.  So I can use the set search_path command.  Also, I need to "commit" that change to this session of the database.  Commits will also be needed if I create new tables or make updates to the data.  (There is also an "autocommit" feature that you can declare in psycopg2, but in my experience it does not always do a commit, so I avoid it.)</span>

In [8]:
q2 = '''set search_path to company'''
db_cursor.execute(q2)
db_conn.commit()

In [9]:
q3 = '''
SELECT *
FROM department
'''
db_cursor.execute(q3)
print(db_cursor.fetchmany(20))

[('Research', 5, '333445555', '5/22/88'), ('Administration', 4, '987654321', '1/1/95'), ('Headquarters', 1, '888665555', '6/19/81')]


<span style=color:blue>I can introduce parameters into my queries</span>

In [10]:
# goal is to get employees whose first names start with a specified initial
#   the WHERE from the query will look like: WHERE fname LIKE 'J%'  (where J will be a parameter)

q4start = """
SELECT *
FROM employee
WHERE fname LIKE '""" 

q4end = """%'"""

finit = 'J'

q4 = q4start + finit + q4end

print(q4)
print()

db_cursor.execute(q4)
print(db_cursor.fetchmany(20))


SELECT *
FROM employee
WHERE fname LIKE 'J%'

[('Jennifer', 'S', 'Wallace', '987654321', '6/20/41', '291-Berry-Bellaire-TX', 'F', 43000, '888665555', 4), ('James', 'E', 'Borg', '888665555', '11/10/37', '450-Stone-Houston-TX', 'M', 55000, None, 1), ('John', 'B', 'Smith', '123456789', '1/9/65', '731-Fondren-Houston-TX', 'M', 33000, '333445555', 5), ('Joyce', 'A', 'English', '453453453', '7/31/72', '5631-Rice-Houston-TX', 'F', 27500, '333445555', 5), ('John', 'B', 'Smith', '998877665', None, None, None, None, None, None)]


In [11]:
import pprint
pprint.pp(db_cursor.fetchmany(20))

[]


<span style=color:blue>Why is the above command giving us empty?  It is because the fetchone() and fetchmany() cycle through the answer and then finish.  To get the full answer we have to execute the query again</span>

In [13]:
db_cursor.execute(q4)
pprint.pp(db_cursor.fetchmany(20), width=120)

[('Jennifer', 'S', 'Wallace', '987654321', '6/20/41', '291-Berry-Bellaire-TX', 'F', 43000, '888665555', 4),
 ('James', 'E', 'Borg', '888665555', '11/10/37', '450-Stone-Houston-TX', 'M', 55000, None, 1),
 ('John', 'B', 'Smith', '123456789', '1/9/65', '731-Fondren-Houston-TX', 'M', 33000, '333445555', 5),
 ('Joyce', 'A', 'English', '453453453', '7/31/72', '5631-Rice-Houston-TX', 'F', 27500, '333445555', 5),
 ('John', 'B', 'Smith', '998877665', None, None, None, None, None, None)]


<span style=color:blue>Soon we will introduce Pandas dataframes, which provide a natural way to import the contents of database tables and query answers, to import/export csv files, and to load csv files into the database</span>


<span style=color:blue>It is a good practice to "close" your connection to a database before exiting</span>

In [14]:
db_conn.close()

In [15]:
db_cursor.execute(q3)

InterfaceError: cursor already closed