# Accessing DC2 data in PostgreSQL at NERSC part 2

This notebook demonstrates additional features of the PostgreSQL database at NERSC.  Currently the only recommended datasets are the object catalogs for Run1.2i and Run1.2p

You may skip to the first code cell if you have already accessed PostgreSQL from another notebook.  The prerequisites are the same.

### Prerequisites
* A file ~/.pgpass containing a line like this:

`nerscdb03.nersc.gov:54432:desc_dc2_drp:desc_dc2_drp_user:`_password_

This line allows you to use the desc_dc2_drp_user account, which has *SELECT* privileges on the database, without entering a password in plain text. There is a separate account for adding to or modifying the database. .pgpass must be protected so that only owner may read and write it.
 
You can obtain the file by running the script `/global/common/software/lsst/dbaccess/postgres_reader.sh`.  It will copy a suitable file to your home directory and set permissions.  

If you already have a `.pgpass` file in your home directory the script will stop without doing anything to avoid clobbering your file.  In that case, see the file `reader.pgpass` in the same directory.  You can merge it into your `.pgpass` file by hand.  

* Access to the psycopg2 package which provides a Python interface to PostgreSQL. The recommended method to achieve this is to use the desc-python kernel on jupyter-dev


In [None]:
import psycopg2

import numpy as np

%matplotlib inline 
import matplotlib.pyplot as plt

import pandas as pd

Make the db connection

In [None]:
dbname = 'desc_dc2_drp'
dbuser = 'desc_dc2_drp_user'
dbhost = 'nerscdb03.nersc.gov'
dbconfig = {'dbname' : dbname, 'user' : dbuser, 'host' : dbhost}
dbconn = psycopg2.connect(**dbconfig)

### Finding datasets
Tables for the Run1.2i data as well as a view to make dpdd quantities more easily accessible are in the `schema` (acts like a namespace) `run12i`.  To reference, say, a table called `position` for Run1.2i use `run12i.position`. For Run1.2p the schema name is `run12p_v4`.

To find out which datasets are available and by what schema names, query the table `run_provenance`. It's in a special schema known as `public` which does not normally need to be specified.

In [None]:
cols = ['schema_name', 'run_designation','simulation_program', 'db_ingest', 'remarks']
# Additional columns in run_provenance store software and input data versions
prov_query = 'SELECT '  + ','.join(cols) + ' from run_provenance'
with dbconn.cursor() as cursor:
    cursor.execute(prov_query)
    fmt = '{0!s:13} {1!s:16} {2!s:18} {3!s:12} {4!s}'
    print(fmt.format(cols[0], cols[1], cols[2], cols[3], cols[4]))
    for record in cursor:
        print(fmt.format(record[0], record[1], record[2], record[3], record[4]))

In [None]:
# Pick one of the supported datasets
schema = 'run12p_v4'

### Looking at Stars
Pick a band and get an idea of magnitude distribution with a histogram. Using Run 1.2p this query will take at least a couple minutes.

In [None]:
global_cuts = 'clean and (extendedness < 1.0) '
pop = 'Stars'
min_SNR = 25
max_err = 1/min_SNR
band = 'i'
mag_col = 'mag_' + band
band_cuts = ' (magerr_{band} < {max_err}) '.format(**locals())
where = ' WHERE ' + global_cuts + ' AND ' + band_cuts  
q5 = "SELECT mag_{band}, ra, dec FROM {schema}.dpdd ".format(**locals()) + where
print(q5)
records = []
with dbconn.cursor() as cursor:
    %time cursor.execute(q5)
    records = cursor.fetchall()
    nObj = len(records)
    print('{} objects found '.format(nObj))
    
mags = pd.DataFrame(records, columns=[mag_col, 'ra', 'dec'])

In [None]:
plt.figure(figsize=(8, 8))
plt.xlabel(mag_col)
plt.ylabel('count')
plt.suptitle(pop, size='xx-large', y=0.92)
plt.hist(mags[mag_col], bins=20, color='y')

Make a cut on magnitude to get to make a more visually pleasing scatter plot

In [None]:
mag_max = 15.5
nda = mags.values
nrow = 0
for r in nda:
    if r[0] < mag_max: nrow += 1
print('After filtering left with {} objects'.format(nrow))
nda_filt = np.ndarray((nrow, nda.shape[1]), dtype=mags.dtypes[0])
irow = 0
for r in nda:
    if r[0] < mag_max: 
        nda_filt[irow] = r
        irow += 1
mags_filt = pd.DataFrame(data=nda_filt, columns=mags.columns)
plt.figure(figsize=(8, 8))
plt.xlabel('ra')
plt.ylabel('dec')
plt.suptitle(pop, size='xx-large', y=0.92)
p = plt.scatter(mags_filt['ra'], mags_filt['dec'], color='y')

## Using coord
The  **dpdd** view has one extra column, `coord`, which is not formally a DPDD quantity. `coord` is an alternate way (other than `ra` and `dec`) to express location.  A `coord` value is a triple of doubles representing a position on a sphere in units of arcseconds. This column is indexed, which can make certain calculations faster. In particular, using the functions `conesearch` and `boxsearch` (which take a `coord` as input) rather than starting with `ra` and `dec` makes queries much faster.  There are also functions to translate between `coord` and `(ra, dec)`.



### Cone search
Find all stars satisfying quality cuts within a fixed radius of a particular coordinate.  The function `coneSearch` returns true if `coord` is within the cone centered at (ra, dec) of the specified radius, measured in arcseconds. 

In [None]:

schema = 'run12p_v4'
ra = 54.5
decl = -31.4
radius = 240.0
where = ' where (magerr_{band} < {max_err}) and clean and (extendedness < 1.0) and coneSearch(coord, {ra}, {decl}, {radius})'
qcone = ('SELECT ra, dec, mag_{band} from {schema}.dpdd ' + where).format(**locals())
print(qcone)
with dbconn.cursor() as cursor:
    %time cursor.execute(qcone)
    records = cursor.fetchall()
    nObj = len(records)
    print('{} objects found '.format(nObj))

In [None]:
cmags = pd.DataFrame(records, columns=['ra', 'dec', mag_col])

plt.figure(figsize=(8, 8))
plt.xlabel('ra')
plt.ylabel('dec')
plt.suptitle(pop + ' Cone search', size='xx-large', y=0.92)
p = plt.scatter(cmags['ra'], cmags['dec'], color='y')

### Box search
Find all stars, subject to quality cuts, with the specified ra and dec bounds

In [None]:
ra1 = 54.4
ra2 = 54.8
decl1 = -31.6
decl2 = -31.3

where = ' where (magerr_{band} < {max_err}) and clean and (extendedness < 1.0) and boxSearch(coord, {ra1}, {ra2},{decl1}, {decl2})'
qbox = ('SELECT ra, dec, mag_{band} from {schema}.dpdd ' + where).format(**locals())
print(qbox)
with dbconn.cursor() as cursor:
    %time cursor.execute(qbox)
    records = cursor.fetchall()
    nObj = len(records)
    print('{} objects found '.format(nObj))

In [None]:
bmags = pd.DataFrame(records, columns=['ra', 'dec', mag_col])

plt.figure(figsize=(8, 8))
plt.xlabel('ra')
plt.ylabel('dec')
plt.suptitle(pop + ' Box search', size='xx-large', y=0.92)
p = plt.scatter(bmags['ra'], bmags['dec'], color='y')
