# Notebook 1: Detecting Sources from Images

## Astronomical images

To measure light from objects, we need to first have an image that contains those objects. Here, we will first read in an image from the JWST NGDEEP survey and plot it out. Astronomical images are usually stored in the fits format. Fits files can contain both the image itself, as well as information about the image in the file "header".

In [None]:
from astropy.io import fits
import numpy as np

import matplotlib.pyplot as plt
from astropy.visualization import ImageNormalize, ZScaleInterval
from matplotlib.patches import Ellipse
from astropy.wcs import WCS
from astropy.table import Table
from matplotlib.colors import LogNorm
from astropy.table import join

# %matplotlib notebook 
# The line above makes matplotlib plot interactive
# If you are having a problem with javascript, just comment out this line

### 1. Science image

In [None]:
# Reading in the image and header from the fits file
sci, hdr = fits.getdata('Photometry_module_data/ngdeep_nircam_f277w_bkgsub_sci_cutout.fits.gz', header=True)

In [None]:
# Making the plot
plt.figure(figsize=(6, 6))
plt.title('Science Image')
plt.imshow(sci, cmap='Greys', origin='lower', interpolation='none')
plt.colorbar()

plt.tight_layout()
plt.show()

Oops! We can't see anything.

The reason is that we haven't set the normalization scaling of the image. The normalization maps the values in the image to different colors. 

Here, we will create a normalization using the "z scale". This scaling is useful in displaying image values near its median. Note the difference in the color bar.

In [None]:
# First, we create a normalization using the z scale to help visualize the image better
# Note that the area outside our pointing has values=0, so we exclude these pixels when
# calculating the normalization
norm = ImageNormalize(sci[sci!=0], interval=ZScaleInterval())

# Making the plot
plt.figure(figsize=(6, 6))
plt.title('Science Image')
plt.imshow(sci, cmap='Greys', norm=norm, origin='lower', interpolation='none')
plt.colorbar()

plt.tight_layout()
plt.show()

Congratulations! This is the most basic way to visualize an image. Feel free to zoom in and pan around to inspect different features in there.


### 2. World Coordinate System
You may wonder what the values on the axis labels mean. They are simply the number of pixels along the axes of the image, also known as "pixel coordinates". For example, (101, 201) means the pixel on the 100th row and 200th column.

Each pixel on the image corresponds to a different position on the sky. The transformation for how to convert pixels to coordinates on the sky is described in the World Coordinate System (WCS). Next, we read in the WCS from the header.

In [None]:
wcs = WCS(hdr)
wcs

CRVAL is the "sky coordinates" of the reference pixel in the image.

CRPIX in the "pixel coordinates" of the reference pixel in the image.

CDELT is the angular size of each pixel in the image in degrees.

NAXIS is the number of pixels in the horizontal and vertical axes of the image.

Next, we can plot the image in its sky coordinates using WCS projection.

In [None]:
plt.figure(figsize=(6, 6))
plt.subplot(projection=wcs) # Turn on WCS projection
plt.title('Science Image')
plt.imshow(sci, cmap='Greys', norm=norm, origin='lower', interpolation='none')
plt.show()


### 3. Error and weight maps

The science image we just plotted contained the measured fluxes. There are other data products that contain other important information about the science image. These include the error image and weight image, a.k.a. the error map and weight map.

The error map contains the uncertainty of the science image at each pixel. The sources of error include fluctuations of the light from the "background", as well as fluctuation of the light from the sources. This is important in the following steps.

The weight map tells you the "weight" of each pixel. The weight is mostly proportional to the exposure time spent on that pixel. A longer exposure time will give a longer weight.

In [None]:
wht = fits.getdata('Photometry_module_data/ngdeep_nircam_f277w_bkgsub_wht_cutout.fits.gz')
err = fits.getdata('Photometry_module_data/ngdeep_nircam_f277w_bkgsub_err_cutout.fits.gz')

We can create a "mask" to denote areas we do NOT want to use in our photometry. This includes regions where we have no exposure time, or zero weights in the weight map.

In [None]:
mask = wht==0

Generally speaking, the weight map is proportional to the inverse variance of the background noise of the image, i.e. $w \propto 1/\sigma^2$.

The reason is as follows.

Assume the exposure time is $t$. Over that time, the number of background photons collected, or counts, is $N \propto t$.

Photon noise follows Poisson error, which means the error in the counts is $\Delta N = \sqrt N$, which is $\propto \sqrt t$.

The "count rate" or "flux" is defined as $f = N/t$.

The error in the flux is then $\Delta f = \Delta N / t = \sqrt N / t$. As $N \propto t$, we can rewrite this as $\Delta f \propto \sqrt t / t \propto 1/\sqrt t$.

Now, recall the weight is proportional to exposure time, i.e. $w \propto t$.

Voila! The flux error is proportional to $1/ \sqrt w$, or $w \propto 1/ \Delta f ^2$

Next, we create a background error map using the weight map. Note that this background error map is different from the general error map we just read in. The background error map only contains the error from the fluctuation of the background light. The general error map contains both the former and the error from the fluctuation of the source light.

In [None]:
# Create a background error map from the weight
wht2rms = 1/np.sqrt(wht)

We have created a background error map using $\Delta f \propto 1/\sqrt w$. We then need to find the proportionality constant for the background error map, i.e. the value of $C$ in $\Delta f = C / \sqrt w$. We do this by finding the average ratio of the general error map to $1 / \sqrt w$ map, then we can scale $1 / \sqrt w$ by this number to reach the correct error level.

Recall that the general error map contains both fluctuations from sources and the background, and we only want the latter. In pixels where there are no sources, it will only contain the background error. Since the majority of the sky is empty space, taking a median will remove the contribution from source pixels.

We can take the median of ratio of the general error map to the $1 / \sqrt w$ map to estimate correct proportionality constant to calculate the error of the sky background.

In [None]:
# Find proportionality constant by taking the median of the ratio bewteen 
# the general error map and the background error map
err_fac = np.median(err[~mask] / wht2rms[~mask])
print(err_fac)

In [None]:
# Apply the proportionality constant
wht2rms *= err_fac

Let's plot out the images to see what they look like. You can see that the general error map conatins higher values at the positions of sources, but the background error map doesn't.

In [None]:
norm = ImageNormalize(sci[mask==0], interval=ZScaleInterval())

plt.figure(figsize=(9, 9))
plt.subplot(221)
plt.title('Science Image')
plt.imshow(sci, cmap='Greys', norm=norm, origin='lower', interpolation='none')

plt.subplot(222)
plt.title('General Error Image')
plt.imshow(err, cmap='Greys', norm=norm, origin='lower', interpolation='none')

plt.subplot(223)
plt.title('Weight Image')
plt.imshow(wht, cmap='Greys', origin='lower', interpolation='none')

plt.subplot(224)
plt.title('Background Error Image')
plt.imshow(wht2rms, cmap='Greys', norm=norm, origin='lower', interpolation='none')

plt.tight_layout()
plt.show()

The background error map is what we need when we move on to detecting sources in the next step.

To detect sources, we look for pixels where the flux is substantially higher than the background error. If we use the general error map instead, positions with sources will have a higher error, making it more difficult to detect the sources.

## Source detection and photometry

Once we have all the science, error and weight images, we can work on detecting sources and measuring fluxes from them.

In [None]:
from astropy.convolution import convolve
from photutils.segmentation import detect_sources
from photutils.segmentation import deblend_sources
from photutils.segmentation import SourceFinder
from photutils.segmentation import SourceCatalog
from photutils.aperture import CircularAperture
from photutils.aperture import aperture_photometry


### 1. Preprocessing the image

Usually, we will slightly smooth the science image to improve source detection. This will smooth out noise fluctuations and make real sources stand out.

To do this, we apply a convolution kernel to the image. This means that the flux in each pixle is "spread out" to the neighboring pixels following the kernel, making the image smoother.

In [None]:
# Create a convolution kernel
conv = np.array([[0.000000,0.220000,0.480000,0.220000,0.000000],
                 [0.220000,0.990000,1.000000,0.990000,0.220000],
                 [0.480000,1.000000,1.000000,1.000000,0.480000],
                 [0.220000,0.990000,1.000000,0.990000,0.220000],
                 [0.000000,0.220000,0.480000,0.220000,0.000000]])
conv

In [None]:
# Convolve science image with convolution kernel
sci_conv = convolve(sci, conv)

Let's plot out the raw and convolved science images to see the difference.

In [None]:
# Plot out raw and convolved science images
plt.figure(figsize=(9, 4.5))
plt.subplot(121)
plt.title('Science Image')
plt.imshow(sci, cmap='Greys', norm=norm, origin='lower', interpolation='none')

plt.subplot(122)
plt.title('Convolved science Image')
plt.imshow(sci_conv, cmap='Greys', norm=norm, origin='lower', interpolation='none')

plt.tight_layout()
plt.show()

### 2. Source detection

Now we can detect sources in the convolved image! Source detection is the process of finding positions in the image where there is significant emission from objects. The basic principles of detecting sources are as follows.

(1) Compare the science image with the background error map. <br>
(2) Select pixels where the flux is higher than the background error by a factor of $\texttt{thresh}$. We'll call them "significant pixels".  <br>
(3) Find areas where more than $\texttt{minarea}$ significant pixels are connected together.  <br>

This gives us regions of significant fluxes where at least $\texttt{minarea}$ connected pixels are higher than the background by a factor of $\texttt{thresh}$. 

We can change $\texttt{thresh}$ and $\texttt{minarea}$ when working with different data sets, as different images will have different properties. Higher $\texttt{thresh}$ and $\texttt{minarea}$ will yield fewer but more significant sources, while lower values will yield a larger number of less significant sources.

Another step in source detection is "deblending". If two sources are close to each other, their light will overlap in the image, creating a large group of connected significant pixels spanning both sources.

Deblending is controlled by two parameters: **deblend_cont** and **deblend_nthresh**. **deblend_cont** is a number between 0 and 1. The smaller the number, the more aggresively the deblending will be, i.e., the source will be separating into more components. **deblend_cont** =1 will mean no deblending is performed. **deblend_nthresh** is somewhat less important. A value of 32 will usually work.

Again, these parameters should be changed when working with different images. There are no optimal parameters that will work for all images. Some experimenting is needed to find the best values that work for your data set.

We will use the **photutils** package to perform source detection here. Some other source detection and/or photometry packages or software include $\texttt{SourceExtractor}$ and $\texttt{SEP}$.

In [None]:
# We perform source detection using photutils

thresh = 1.6
minarea = 5
deblend_cont = 0.01
deblend_nthresh = 32

# Function that sets up the criteria for selecting sources in our image
finder = SourceFinder(npixels=minarea, nlevels=deblend_nthresh, contrast=deblend_cont)

# Function that actually finds the sources in the imaging and we are finding
# sources that have fluxes greater than 1.6 times the background error fluxes
segment_map = finder(sci_conv, thresh*wht2rms)


It creates a "segmentation map", which labels the regions of connected significant pixels attributed to the sources detected.

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(9,4.5))

ax1.imshow(sci_conv, origin='lower', cmap='Greys', norm=norm, interpolation='none')
ax1.set_title('Data')

ax2.imshow(segment_map, origin='lower', cmap=segment_map.cmap, interpolation='none')
ax2.set_title('Segmentation Image')

plt.tight_layout()
plt.show()

### 3. Photometry

Source detection gives us a list of positions where there are sources. We then perform photometry on these sources, which is the process of measuring the amount of light emitted by these sources.

A common method to do so is aperture photometry. The basic concept of aperture photometry is to draw a circular or elliptical aperture around the souce position, and sum up the flux contained therein.

The most basic form of aperture photometry is using circular apertures of fixed radii. While this is simple, a drawback is that different sources have different sizes and shapes. So circular apertures of fixed radii will capture different fraction of light from different sources.

A more advance form of aperture photometry uses the Kron aperture. Kron et al. (1980) outlines a method to compute elliptical apertures depending on the actual size and shape of the source, aiming to capture a more uniform fraction of light in the aperture for a wide range of sources.

This is be easily done using $\texttt{photutils}$.

In [None]:
cat = SourceCatalog(data=sci, segment_img=segment_map, convolved_data=sci_conv, error=err, wcs=wcs)
print(cat)

In [None]:
# Make an astropy table from the results and list the columns
### This may take a few seconds to run ###

tbl = cat.to_table()
print(tbl.columns)


Next, lets plot out the sources and see where they are in the image. It will look a bit crowded. When you zoom in, it will make a lot more sense.

At this step, we want to check two things. (1) Are there sources in the image that are visible by your eyes, but not detected? (2) Are there places where a sources is detected but you don't see anything in the image?

If the former is true, go back and try decreasing $\texttt{thresh}$ and/or $\texttt{minarea}$. If the latter is true, try increasing the values.

Also, check for the quality of deblending. If a source is divided into too many components, try increasing $\texttt{deblend_cont}$. If sources are not deblended enough, trying lower the value instead.

In [None]:
### This may take a few seconds to run ###

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(9, 4.5))

ax1.imshow(sci, origin='lower', cmap='Greys', norm=norm, interpolation='none')
ax1.set_title('Data')

ax2.imshow(segment_map, origin='lower', cmap=segment_map.cmap, interpolation='none')
ax2.set_title('Segmentation Image')

cat.plot_kron_apertures(ax=ax1, color='C3', lw=1.5)
cat.plot_kron_apertures(ax=ax2, color='C3', lw=1.5)

plt.tight_layout()
plt.show()

Let's print out the fluxes extracted using Kron aperture.

In [None]:
tbl['kron_flux']

We can also use $\texttt{photutils}$ to perform circular aperture photometry.

In [None]:
# The CiruclarAperture class in photutils take input of positions as 
# numpy arrays of (x, y) pairs, so we create an array of pairs here.

positions = np.array([tbl['xcentroid'], tbl['ycentroid']]).T

# Note the .T at the end to transpose [[x1, x2, ..., xn], [y1, y2, ..., yn]]
# into [[x1, y1], [x2, y2], ..., [xn, yn]]


In [None]:
# This creates a single aperture for each source with radius of 5 pixels
aperture = CircularAperture(positions, r=5)


In [None]:
# This creates a list of apertures for each source with radii of 2, 4, 6, 8, 10 pixels
apertures = [CircularAperture(positions, r=r) for r in [2, 4, 6, 8, 10]]


In [None]:
# Perform aperture photometry. It will return the output in an astropy table
### This may take a few seconds to run ###

phot_table = aperture_photometry(sci, apertures)
phot_table


### 4. Half-light radius

An important quantity in photometry is the "half-light radius". It is the radius that conatins half of the light of a source. This quantifies how extended a source is.

$\texttt{photutils}$ has a function **fluxfrac_radius** to do this. It calculates the radius that contains a given fraction of light of the source. When set to 0.5, it gives the half-light radius.

In [None]:
# Add the half-light radius to the catalog table
tbl['rh'] = cat.fluxfrac_radius(0.5)

### 5. Photometric catalog

After we have our photometry, we will write the results in a catalog.

Before doing so, we make some minor formatting changes to the catalog. By default, $\texttt{photutils}$ gives sky coordaintes in the format of $\texttt{astropy}$ $\texttt{SkyCoord}$ objects. 

In [None]:
tbl['sky_centroid']

However, if this is written to a file in text or fits format, it will simply be converted to a string, losing its $\texttt{SkyCoord}$ properties. So we will just extract the RA and Dec and write them as floats in two columns.

In [None]:
tbl['ra'] = tbl['sky_centroid'].ra
tbl['dec'] = tbl['sky_centroid'].dec
tbl.remove_column('sky_centroid')

We will also add the circular aperture results in the catalog using the $\texttt{astropy.table.join}$ function.

In [None]:
tbl = join(tbl, phot_table, keys_left='label', keys_right='id')

# Remove some redundant columns
tbl.remove_columns(['xcenter', 'ycenter', 'label'])

This is our final photometric catalog!

In [None]:
tbl

If everything looks right, we will write it out!

In [None]:
# Write catalog in fits format
tbl.write('source_catalog.fits', overwrite=True)

In [None]:
# We can also write it in ascii format
tbl.write('source_catalog.txt', format='ascii', overwrite=True)