# Inject Sources in v23 for DIA Improvement

Michael Wood-Vasey and Shu Liu

Based heavily on
https://github.com/lsst/source_injection/blob/tickets/DM-34253/examples/si_demo_dc2_visit.ipynb

Uses a custom Jupyter kernel to load `tickets/DM-34253` version of `source_injection` package with a
setup -j -r ${HOME}/local/lsst/source_injection

This is most convenient to do in the `eups` world (and thus before the Notebook starts) rather than in the Jupyter notebook.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.ndimage import gaussian_filter

In [None]:
from lsst.daf.butler import Butler, DimensionUniverse, DatasetType, CollectionType
from lsst.daf.butler.registry import MissingCollectionError
import lsst.afw.display as afwDisplay
from lsst.geom import SpherePoint, degrees

afwDisplay.setDefaultBackend("matplotlib")

In [None]:
from lsst.source.injection.inject_visit import VisitInjectConfig, VisitInjectTask

Need to have a `~/.lsst/db-auth.yaml` file with the db URL, username, and password to load the Butler:

In [None]:
repo = "/global/cfs/cdirs/lsst/production/gen3/DC2/Run2.2i/repo"
butler = Butler(repo)

In [None]:
collections = sorted(list(set(
    butler.registry.queryCollections()
)))

In [None]:
display(collections)

In [None]:
# Let's pick
input_collection = "u/descdm/coadds_Y1_4639"

In [None]:
# Find a calexp
tract = 4639
detector = 1
calexp_DatasetRefs = sorted(list(set(
    butler.registry.queryDatasets(
        "calexp",
        collections = input_collection,
        where=f"instrument='LSSTCam-imSim' AND skymap='DC2' AND tract={tract} AND detector={detector}",
    ))))

In [None]:
# Find a calexp
tract = 4639
detector = 1
patch = 0
calexp_DatasetRefs = sorted(list(set(
    butler.registry.queryDatasets(
        "calexp",
        collections = input_collection,
        where=f"instrument='LSSTCam-imSim' AND skymap='DC2' AND tract={tract} AND patch={patch}",
    ))))

In [None]:
print(f"Identified {len(calexp_DatasetRefs)} calexp DatasetRefs")

# Let's just pick one and look at it.
display(calexp_DatasetRefs[5])

dataId = calexp_DatasetRefs[5].dataId

print(f"{dataId = }")

In [None]:
calexp = butler.get("calexp", dataId=dataId, collections=input_collection)
display(calexp)

## Plot the input calexp

Lets generate a plot of this `calexp` and its associated `mask`.

First, we smooth the `calexp` image (for the purpose of aiding its display in this notebook).

Then, we use afwDisplay to display these data.

In [None]:
Q = 10

plot_calexp = calexp.clone()
plot_calexp.image.array = gaussian_filter(calexp.image.array, sigma=3)

fig, ax = plt.subplots(1, 2, figsize=(8, 6), dpi=150)

plt.sca(ax[0])
display0 = afwDisplay.Display(frame=fig)
display0.scale("asinh", min=-5/Q, max=25/Q, Q=Q)
display0.mtv(plot_calexp.image)
plt.title("calexp image")

plt.sca(ax[1])
display1 = afwDisplay.Display(frame=fig)
display1.scale("linear", min=1, max=2)
display1.mtv(plot_calexp.mask)
plt.title("calexp mask")

plt.suptitle(str(dataId), y=0.8)
plt.tight_layout()
plt.show()

Set up a synthetic source input catalogue

We now have a calexp image that we want to inject into. Next we need to set up a simple synthetic source catalogue.

In this notebook, we opt to inject 100 synthetic _point_ sources into the detector.


In [None]:
np.random.seed(0)

nsource = 100

x = np.random.uniform(0, calexp.getBBox().endX, nsource)
y = np.random.uniform(0, calexp.getBBox().endY, nsource)
ra, dec = calexp.wcs.pixelToSkyArray(x, y, degrees=True)

In [None]:
si_cat = pd.DataFrame(dict(
    ra=ra,
    dec=dec,
    mag=np.random.uniform(15, 25, nsource),
    source_type="DeltaFunction",
))

display(si_cat[:5])

## Register the source injection collection

The input `si_cat` will be ingested into a RUN collection in the `repo`. Here we register this collection for subsequent use below.

To begin, we first instantiate a writable `butler`. Butlers are instantiated in read-only mode by default. By setting the argument `writeable` to `True`, a butler can also be made to be writeable.

> Warning: take care when working with a writeable butler, as data on-disk has the potential to be permanently removed or corrupted.

As a precaution, we attempt to remove our chosen RUN collection if it exists before continuing with processing. If we attempt to inject synthetic sources into a collection which already exists, the task will complain that the output data already exist on disk.

Finally, the source injection collection is registered in the `repo`.

In [None]:
writeable_butler = Butler(repo, writeable=True)

si_input_collection = "u/wmwv/si_demo"

try:
    writeable_butler.removeRuns([si_input_collection])
except MissingCollectionError:
    print("Writing into a new RUN collection")
    pass
else:
    print("Prior RUN collection located and successfully removed")

# Register the collection
_ = writeable_butler.registry.registerCollection(si_input_collection, type=CollectionType.RUN)

## Register the input catalogue dataset type

Here we define the `si_cat` dataset type, which lets the `repo` know about the dimensions and storage class for these synthetic source data.

Using this definition, the new dataset type is registered in the `repo` using `registerDatasetType`. This method returns True if the datasetType was inserted, and False if an identical existing DatsetType was found.

In [None]:
si_dataset_type =  DatasetType(
    "si_cat",
    dimensions=["skymap", "tract"],
    storageClass="DataFrame",
    universe=DimensionUniverse(),
)

writeable_butler.registry.registerDatasetType(si_dataset_type)

## Ingest the input catalogue into the repo

Finally, we ingest the input catalogue into the `repo`.

In [None]:
si_dataId = dict(tract=tract, skymap="DC2")

writeable_butler.put(si_cat, si_dataset_type, si_dataId, run=si_input_collection)

Instantiate the injection classes

At this stage, we have an input image and we have a fully ingested synthetic source input catalogue. We're now ready to inject synthetic sources into the image using the tools available in the `source_injection` repo.

First, we instantiate the `VisitInjectConfig` class. The `VisitInjectConfig` class is where configuration of the injection task occurs, allowing for modifications to be made to how the task operates.

Following this, we then instantiate the `VisitInjectTask`, using `inject_config` as the configuration argument.

In [None]:
inject_config = VisitInjectConfig()

display(inject_config)

inject_task = VisitInjectTask(config=inject_config)

## Generate a deferred dataset handle

The visit inject task requires the input catalogue to be in the form of a 'deferred dataset handle'.

A deferred dataset handle is an object which performs an immediate registry lookup but does not immediately retrieve the data, allowing for that data to be subsequently accessed as needed.

> Note: Prior to running `getDeferred` below, we first re-instantiate the `butler`. This updates the registry, allowing us to make use of our newly constructed source injection RUN collection.


In [None]:
butler = Butler(repo)

si_cat_deferred = butler.getDeferred("si_cat", dataId=si_dataId, collections=si_input_collection)

display(si_cat_deferred)

## Run the source injection task

Finally, we run the run method of the inject task.

As an input, the run method needs:

the input injection catalogue
the input exposure
the WCS information
the photometric calibration information
the skyMap dataset type.
The skyMap is easily loaded using butler.get. All other inputs are ready for use at this stage.

As an output, the inject task provides:

the output exposure with sources injected
the output source injection catalogue

Note: here we use a clone of the input calexp. This is because the calexp is edited in-place, so inputting a clone allows us to continue using the original calexp later in this notebook.

In [None]:
skyMap = butler.get("skyMap", collections=input_collection, skymap="DC2")

In [None]:
inject_output = inject_task.run(
    injection_catalogs=[si_cat_deferred],
    input_exposure=calexp.clone(),
    sky_map=skyMap,
    wcs=calexp.getWcs(),
    photo_calib=calexp.getPhotoCalib(),
)
si_calexp = inject_output.output_exposure
si_cat_out = inject_output.output_catalog

In [None]:
display(si_cat_out[:5])

## Plot the output si_calexp

As before, lets display an image of our newly constructed `si_calexp`.

We similarly smooth the new image first, and then display the `calexp` alongside the `si_calexp` using `afwDisplay`.

In [None]:
Q = 10

plot_si_calexp = si_calexp.clone()
plot_si_calexp.image.array = gaussian_filter(si_calexp.image.array, sigma=3)

fig, ax = plt.subplots(1, 2, figsize=(8, 6), dpi=150)

plt.sca(ax[0])
display0 = afwDisplay.Display(frame=fig)
display0.scale("asinh", min=-5/Q, max=25/Q, Q=Q)
display0.mtv(plot_calexp.image)
plt.title("calexp image")

plt.sca(ax[1])
display1 = afwDisplay.Display(frame=fig)
display1.scale("asinh", min=-5/Q, max=25/Q, Q=Q)
display1.mtv(plot_si_calexp.image)
plt.title("si_calexp image")

plt.suptitle(str(dataId), y=0.8)
plt.tight_layout()
plt.show()

## Plot a zoomed-in view of the si_calexp

Here is a zoomed in section of the above.

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(8, 6), dpi=150)

plt.sca(ax[0])
display0 = afwDisplay.Display(frame=fig)
display0.scale("asinh", min=-5/Q, max=25/Q, Q=Q)
display0.mtv(plot_calexp.image)
plt.title("calexp image (zoom)")
plt.xlim(1000, 2500)
plt.ylim(300, 1800)

plt.sca(ax[1])
display1 = afwDisplay.Display(frame=fig)
display1.scale("asinh", min=-5/Q, max=25/Q, Q=Q)
display1.mtv(plot_si_calexp.image)
plt.title("si_calexp image (zoom)")
plt.xlim(1000, 2500)
plt.ylim(300, 1800)

plt.suptitle(str(dataId), y=0.8)
plt.tight_layout()
plt.show()

## Plot the differences between the images

It is reasurring to look at a difference image to see the sources we injected.

In [None]:
# Get the x, y back from the ra, dec in si_cat (because that's all that's saved)
# This feels a little silly.  There should surely be a one-line version of this:
sky = [SpherePoint(ra, dec, degrees) for ra, dec in zip(si_cat_out.ra, si_cat_out.dec)]
xy = calexp.wcs.skyToPixel(sky)

x = [i.x for i in xy]
y = [i.y for i in xy]

In [None]:
plot_diff_calexp = calexp.clone()
plot_diff_calexp.image.array = si_calexp.image.array - calexp.image.array

fig, ax = plt.subplots(1, 2, figsize=(8, 6), dpi=150)

plt.sca(ax[0])
display0 = afwDisplay.Display(frame=fig)
display0.scale("asinh", min=-5/Q, max=25/Q, Q=Q)
display0.mtv(plot_diff_calexp.image)
plt.title("diff image (si_calexp - calexp)")

plt.sca(ax[1])
display1 = afwDisplay.Display(frame=fig)
display1.scale("asinh", min=-5/Q, max=25/Q, Q=Q)
display1.mtv(plot_diff_calexp.image)
plt.title("diff image with markers")
plt.scatter(x, y, marker="o", s=50, fc="none", ec="orange", lw=1.5)

plt.suptitle(str(dataId), y=0.8)
plt.tight_layout()
plt.show()