# Emulating the LSST DRP Source Catalog with CosmoDC2Realizer

__Author:__ Ji Won Park (@jiwoncpark), __Last Run:__ 2018-12-14 (by @jiwoncpark)

__Goals:__
- In Part 1 (this notebook), learn how CosmoDC2Realizer emulates the LSST DRP Source Catalog of lensed quasars and contaminants from the CosmoDC2 extragalactic catalog and the truth catalog
- In Part 2, visualize sample light curves from the CosmoDC2Realized catalog

The following notebook was referenced to access and query the truth catalog:

    Scott Daniel's DC2 Tutorial truth_gcr_intro.ipynb

In [1]:
import os, sys
import numpy as np
import pandas as pd 
pd.options.display.max_columns = None
import matplotlib.pyplot as plt
import os, sys
sys.path.insert(0, 'hackurdc2_utils')
import units
import moments
import cosmodc2realizer_helper as helper
# For reading in the OpSim database
import sqlite3
import healpy
# For accessing and querying the CosmoDC2 extragalactic catalog and truth catalog
#import GCRCatalogs
#from GCR import GCRQuery
%matplotlib inline
%load_ext autoreload
%autoreload 2

## About CosmoDC2Realizer

CosmoDC2Realizer is a framework that emulates the LSST DRP Source Catalog. It takes in two DC2 catalogs--the extragalactic and truth catalogs, which provide properties of extended galaxy sources and point sources (e.g. stars, AGNs), respectively--and the Opsim database, which provide the per-visit observation conditions. 

__Assumptions:__
- Emulation is made fast by bypassing image generation; we model each object as a mixture of Gaussians and the point-spread function (PSF) as a circular Gaussian so that we can _analytically_ compute the first and second moments required to populate the Source Catalog. 
- We also assume a fairly good deblender with a fixed deblending scale of 0.5"--chosen because it roughly corresponds to the full-width half maximum (FWHM) of the best LSST seeing. All sources located within the deblending scale of an object for a given visit will contribute to the moments of that object.

## 1. Choosing the OpSim fields
The OpSim database is organized in terms of 5292 viewing fields generated from a tesselation of the sky ([OpSim catalog schema documentation](https://www.lsst.org/scientists/simulations/opsim/summary-table-column-descriptions-v335)). The observing schedule and conditions are the same within each field so, for computational efficiency, CosmoDC2Realizer first identifies the set of fields over which to realize the comprising objects.

In [2]:
# Read in the minion_1016 opsim database
opsim_v3 = os.path.join('..', 'data', 'minion_1016_sqlite.db')
conn = sqlite3.connect(opsim_v3)

# See which tables the db file has
cursor = conn.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
print(cursor.fetchall())

[(u'Session',), (u'Config',), (u'Field',), (u'ObsHistory',), (u'Proposal',), (u'SeqHistory',), (u'SlewHistory',), (u'SlewActivities',), (u'SlewState',), (u'SlewMaxSpeeds',), (u'TimeHistory',), (u'ObsHistory_Proposal',), (u'Cloud',), (u'Seeing',), (u'Log',), (u'Config_File',), (u'Proposal_Field',), (u'SeqHistory_ObsHistory',), (u'MissedHistory',), (u'SeqHistory_MissedHistory',), (u'Summary',)]


We are primarily interested in two tables of the `minion_1016` database: `ObsHistory` containing the observation conditions and `Field` containing the field positions.

In [3]:
%%time 
# ~ 25s
# Save the tables ObsHistory and Field as Pandas DataFrames
obs_history = pd.read_sql(sql='SELECT * from ObsHistory', con=conn)
field = pd.read_sql(sql='SELECT * from Field', con=conn)

CPU times: user 24.8 s, sys: 3.35 s, total: 28.1 s
Wall time: 28.1 s


For speed considerations, we will only work with galaxies in `cosmoDC2_v1.0_9556`, a version of the extragalactic catalog restricted to one healpixel. This healpixel, it turns out, roughly coincides with the OpSim field with ID 1188 so we pre-save a subset of the `ObsHistory` table with the columns we'll need.

In [4]:
obs_history = helper._format_obs_history(obs_history)
field = helper._format_field(field)

## 2. Getting galaxies
We query the extragalactic catalog for objects that lie in this field. As mentioned earlier, we load `cosmoDC2_v1.0_9556` rather than the full cosmoDC2 catalog in this notebook for fast demonstration.

In [None]:
%%time
catalog = GCRCatalogs.load_catalog('cosmoDC2_v1.0_9556')
#catalog = GCRCatalogs.load_catalog('cosmoDC2_v1.0_image')
# 'cosmoDC2_v1.0_image' takes ~14 sec
quantities = ['galaxy_id', 'ra_true', 'dec_true', 'redshift_true', 
              'size_bulge_true', 'size_minor_bulge_true', 'sersic_bulge', 'ellipticity_1_bulge_true',
              'ellipticity_2_bulge_true', 'ellipticity_bulge_true',
              'size_disk_true', 'size_minor_disk_true', 'sersic_disk', 'ellipticity_1_disk_true',
              'ellipticity_2_disk_true', 'ellipticity_disk_true',
              #'ellipticity_1_true', 'ellipticity_2_true',
              #'position_angle_true', 'ellipticity_true',
              #'size_true', 'size_minor_true', 'sersic',
              'bulge_to_total_ratio_i',
              'mag_true_u_lsst',
              'mag_true_g_lsst',
              'mag_true_r_lsst',
              'mag_true_i_lsst',
              'mag_true_z_lsst',
              'mag_true_Y_lsst',
              'is_central', 'halo_mass',]

cuts = [# A loose magnitude cut
        #GCRQuery('mag_true_g_lsst < 27'), 
        # Query halo masses likely to host an AGN
        GCRQuery('halo_mass > 1.e13'),
        # Query sources belonging to Field 1188
        GCRQuery('abs(ra_true - %f) < %f' %(field_ra, field_radius)),
        GCRQuery('abs(dec_true - %f) < %f' %(field_dec, field_radius)),]
# Add filters as necessary!
galaxies = catalog.get_quantities(quantities, filters=cuts)

We take a small subset of 1000 galaxies to realize. These galaxies will be at the center of our viewing window.

In [None]:
small_galaxy_df.to_csv('small_galaxy_df.csv', index='galaxy_id')

## 3. Getting line-of-sight neighbors
For each extended galaxy source, any other galaxy or point source (unlensed AGN or star) that lie within its blending scale will be its line-of-sight neighbor. Galaxy neighbors will simply be taken from the extragalactic catalog, which we've already fetched. Point-source neighbors will be taken from the truth catalog as below.

### Getting the neighbors (unlensed AGNs and stars) from the truth catalog

In [None]:
# truth_catalog = GCRCatalogs.load_catalog('dc2_truth_run1.1_static')
truth_catalog = GCRCatalogs.load_catalog('dc2_truth_run1.1', {'md5': None})

truth_catalog_columns = ['object_id', 'ra', 'dec', 'star', 'agn', 'sprinkled', 'healpix_2048',
                        'u', 'g', 'r', 'i', 'z', 'y',]

ra_min, ra_max = field_ra - field_radius, field_ra + field_radius
dec_min, dec_max = field_dec - field_radius, field_dec + field_radius

field_ra_rad = np.radians(field_ra)
field_dec_rad = np.radians(field_dec)

center_vec = np.array([np.cos(field_dec_rad)*np.cos(field_ra_rad),
                       np.cos(field_dec_rad)*np.sin(field_ra_rad),
                       np.sin(field_dec_rad)])

list_of_healpix = healpy.query_disc(2048, center_vec, np.radians(radius), nest=True, inclusive=True)

def filter_on_healpix(hp):
    return np.array([hh in list_of_healpix for hh in hp])

coord_filters = [
    'ra >= {}'.format(ra_min),
    'ra < {}'.format(ra_max),
    'dec >= {}'.format(dec_min),
    'dec < {}'.format(dec_max),
]

In [None]:
# delete later
data_dir = os.path.join('..', 'data')
galaxies = pd.read_csv(os.path.join(data_dir, 'small_galaxy_df.csv'))
point_neighbors = pd.read_csv(os.path.join(data_dir, 'neighbors.csv'))
#field = pd.read_csv(os.path.join(data_dir, 'field.csv'))

deblending_scale = 0.5 # arcsec
# Information about the field we will work with
field_ids = [1188,]
field_radius = units.deg_to_arcsec(0.5*3.5) # arcsec

In [None]:
# Some unit conversion and column renaming
galaxies = helper._format_extragal_catalog(galaxies)
point_neighbors = helper._format_truth_catalog(point_neighbors)

In [None]:
%%time

mog_pre_observed_cols = ['objectId', 'ra', 'dec', 'gauss_sigma', 'e', 'phi',
                         'num_star_neighbors', 'num_agn_neighbors', 'num_sprinkled_neighbors']
mog_pre_observed_cols += ['flux_%s' %bp for bp in 'ugrizy']

source_cols = ['objectId', 'ccdVisitId', 
               'apFlux', 'Ix', 'Iy', 'Ixx', 'Iyy', 'Ixy', 
               'Ixx_PSF', 'sky', 'apFluxErr', 'expMJD',
               'num_star_neighbors', 'num_agn_neighbors', 'num_sprinkled_neighbors']

source = pd.DataFrame(columns=source_cols)

for field_id in field_ids:
    # Query obs_history for the field of interest
    obs_history_in_field = obs_history.loc[obs_history['Field_fieldID']==field_id]
    # Find field center for field id by querying the field table of OpSim db
    field_info = field.loc[field['fieldID']==field_id]
    field_ra, field_dec = field_info['fieldRA'].item(), field_info['fieldDec'].item()
    # Query extragalactic catalog for galaxies within field
    galaxies_in_field, galaxies_in_field_idx = helper.get_neighbors(galaxies, field_ra, field_dec, field_radius)
    num_galaxies = len(galaxies_in_field_idx)
    # Query truth catalog for stars/AGNs within field
    points_in_field, _ = helper.get_neighbors(point_neighbors, field_ra, field_dec, field_radius)
    # Initialize DataFrame to populate before joining with obs_history_in_field
    source_in_field = pd.DataFrame(columns=source_cols)
    
    for gal_idx in range(num_galaxies):
        # Central galaxy
        central_gal = galaxies_in_field.iloc[gal_idx]
        ra_center, dec_center = central_gal['ra'], central_gal['dec'] # pos of central galaxy
        
        ##########################
        # Find blended neighbors #
        ##########################
        # Galaxy neighbors (extended) : includes the central galaxy, not just neighbors
        all_gal, all_gal_idx = helper.get_neighbors(galaxies_in_field, ra_center, dec_center, deblending_scale) 
        num_gal_neighbors = len(all_gal_idx) - 1 # subtract central galaxy itself
        # Stars/AGN neighbors (point)
        point, point_idx = helper.get_neighbors(points_in_field, ra_center, dec_center, deblending_scale) 
        num_star_neighbors = point['star'].sum()
        num_agn_neighbors = point['agn'].sum()
        num_sprinkled_neighbors = point['sprinkled'].sum()

        #################
        # Sersic to MoG #
        #################
        # Separate galaxy catalog into bulge and disk
        bulge, disk, all_gal = helper.separate_bulge_disk(all_gal)
        # Deconstruct bulge/disk into MoG
        bulge_mog = helper.sersic_to_mog(sersic_df=bulge, bulge_or_disk='bulge')
        disk_mog = helper.sersic_to_mog(sersic_df=disk, bulge_or_disk='disk')
        point_mog = helper.point_to_mog(point_df=point)
        # Concat the three
        full_mog = pd.concat([bulge_mog, disk_mog, point_mog], axis=0)
        
        # Add some metadata
        full_mog['objectId'] = central_gal['galaxy_id'] # identifier for blended system
        full_mog['num_gal_neighbors'] = num_gal_neighbors
        full_mog['num_star_neighbors'] = num_star_neighbors
        full_mog['num_agn_neighbors'] = num_agn_neighbors
        full_mog['num_sprinkled_neighbors'] = num_sprinkled_neighbors
        
        # Join with observations
        mog_observed = helper.join_with_observation(full_mog, obs_history_in_field)
        # Collapse unobserved fluxes
        mog_observed = helper.collapse_unobserved_fluxes(mog_observed)
        # Calculate moments of blended system
        mog_observed = moments.calculate_total_flux(mog_observed)
        mog_observed = moments.calculate_1st_moments(mog_observed)
        mog_observed = moments.calculate_2nd_moments(mog_observed)
        # Collapse MoGs of the blended system
        blended = moments.collapse_mog(mog_observed)
        blended = moments.apply_psf(blended)
        
        source_in_field = pd.concat([source_in_field, blended], axis=0)
        
    source = pd.concat([source, source_in_field], axis=0)