# Detecting sources in the XDF

## Data for this notebook 

We will be manipulating Hubble eXtreme Deep Field (XDF) data, which was collected using the Advanced Camera for Surveys (ACS) on Hubble between 2002 and 2012. The image we use here is the result of 1.8 million seconds (500 hours!) of exposure time, and includes some of the faintest and most distant galaxies that had ever been observed. 

*The methods demonstrated here are also available in narrative from within the [`photutils.detection` documentation](http://photutils.readthedocs.io/en/stable/detection.html).*

*The original authors of this notebook were Lauren Chambers, Erik Tollerud and Tom Wilson.*


## Import necessary packages

First, let's import packages that we will use to perform arithmetic functions and visualize data:

In [None]:
from astropy.io import fits
import astropy.units as u
from astropy.nddata import CCDData
from astropy.stats import sigma_clipped_stats, SigmaClip
from astropy.visualization import ImageNormalize, LogStretch
import matplotlib.pyplot as plt
from matplotlib.ticker import LogLocator
import numpy as np
from photutils.background import Background2D, MeanBackground

# Show plots in the notebook
%matplotlib inline

Let's also define some `matplotlib` parameters, such as title font size and the dpi, to make sure our plots look nice. To make it quick, we'll do this by loading a [style file shared with the other photutils tutorials](../photutils_notebook_style.mplstyle) into `pyplot`. We will use this style file for all the notebook tutorials. (See [here](https://matplotlib.org/users/customizing.html) to learn more about customizing `matplotlib`.)

In [None]:
plt.style.use('../photutils_notebook_style.mplstyle')

## Data representation

Throughout this notebook, we are going to store our images in Python using a `CCDData` object (see [Astropy documentation](http://docs.astropy.org/en/stable/nddata/index.html#ccddata-class-for-images)), which contains a `numpy` array in addition to metadata such as uncertainty, masks, or units. In this case, each image has units electrons (counts) per second.

Note that you could create the `CCDData` object directly from the URL where this image is taken from:

```python
url = 'https://archive.stsci.edu/pub/hlsp/xdf hlsp_xdf_hst_acswfc-60mas_hudf_f435w_v1_sci.fits'
xdf_image = CCDData.read(url)
```

Since the data for the guide is meant to be downloaded in bulk before going through the guide, we read the image from disk here.


In [None]:
xdf_image = CCDData.read('hlsp_xdf_hst_acswfc-60mas_hudf_f435w_v1_sci.fits')

As explained in a [previous notebook](../01_background_estimation/01_background_estimation.ipynb) on background estimation, it is important to **mask** these data, as a large portion of the values are equal to zero. We will mask out the non-data portions of the image array, so all of those pixels that have a value of zero don't interfere with our statistics and analyses of the data. 

In [None]:
# Define the mask
mask = xdf_image.data == 0
xdf_image.mask = mask

Let's look at the data:

In [None]:
# Set up the figure with subplots
fig, ax1 = plt.subplots(1, 1, figsize=(8, 8))

# Set up the normalization and colormap
norm_image = ImageNormalize(vmin=1e-4, vmax=5e-2, stretch=LogStretch(), clip=False)
cmap = plt.get_cmap('viridis')
cmap.set_bad('white') # Show masked data as white

# Plot the data
fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image),
                      norm=norm_image, cmap=cmap)

# Define the colorbar
cbar = plt.colorbar(fitsplot, fraction=0.046, pad=0.04, ticks=LogLocator())

def format_colorbar(bar):
    # Add minor tickmarks
    bar.ax.yaxis.set_minor_locator(LogLocator(subs=range(1, 10)))

    # Force the labels to be displayed as powers of ten and only at exact powers of ten
    bar.ax.set_yticks([1e-4, 1e-3, 1e-2])
    labels = [f'$10^{{{pow:.0f}}}$' for pow in np.log10(bar.ax.get_yticks())]
    bar.ax.set_yticklabels(labels)

format_colorbar(cbar)

# Define labels
cbar.set_label(r'Flux Count Rate ({})'.format(xdf_image.unit.to_string('latex')),
               rotation=270, labelpad=30)
ax1.grid()
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)');

*Tip: Double-click on any inline plot to zoom in.*

With the `DAOStarFinder` [class](http://photutils.readthedocs.io/en/stable/api/photutils.DAOStarFinder.html), `photutils` provides users with an easy application of the popular [DAOFIND](http://stsdas.stsci.edu/cgi-bin/gethelp.cgi?daofind) algorithm ([Stetson 1987, PASP 99, 191](http://adsabs.harvard.edu/abs/1987PASP...99..191S)), originally developed at the Dominion Astrophysical Observatory. 

This algorithm detects sources by:
* Searching for local maxima
* Selecting only sources with peak amplitude above a defined threshold
* Selecting sources with sizes and shapes that match a 2-D Gaussian kernel (circular or elliptical)

It returns:
* Location of the source centroid
* Parameters reflecting the source's sharpness and roundness

Generally, the threshold that source detection algorithms use is defined as a multiple of the standard deviation. So first, we need to calculate statistics for the data:

In [None]:
mean, median, std = sigma_clipped_stats(xdf_image.data, sigma=3.0, maxiters=5, mask=xdf_image.mask)

Now, let's run the `DAOStarFinder` algorithm on our data and see what it finds. 

In [None]:
from photutils import DAOStarFinder

In [None]:
daofind = DAOStarFinder(fwhm=5.0, threshold=20. * std)
sources_dao = daofind(np.ma.masked_where(xdf_image.mask, xdf_image))
print(sources_dao)

In [None]:
# Set up the figure with subplots
fig, ax1 = plt.subplots(1, 1, figsize=(8, 8))

# Plot the data
fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image),
                      norm=norm_image, cmap=cmap)
ax1.scatter(sources_dao['xcentroid'], sources_dao['ycentroid'], s=30, marker='o',
            lw=1, alpha=0.7, facecolor='None', edgecolor='r')

# Define the colorbar and fix the labels
cbar = plt.colorbar(fitsplot, fraction=0.046, pad=0.04, ticks=LogLocator(subs=range(10)))
labels = ['$10^{-4}$'] + [''] * 8 + ['$10^{-3}$'] + [''] * 8 + ['$10^{-2}$']
cbar.ax.set_yticklabels(labels)

# Define labels
cbar.set_label(r'Flux Count Rate ({})'.format(xdf_image.unit.to_string('latex')),
               rotation=270, labelpad=30)
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)')
ax1.set_title('DAOFind Sources')

Let's randomly pull out some of these sources to get a closer look at them.

In [None]:
fig, axs = plt.subplots(3,3, figsize=(3, 3))
plt.subplots_adjust(wspace=0.1, hspace=0.1)

cutout_size = 20

srcs = np.random.permutation(sources_dao)[:axs.size]
for ax, src in zip(axs.ravel(), srcs):
    slc = (slice(int(src['ycentroid'] - cutout_size), int(src['ycentroid'] + cutout_size)),
           slice(int(src['xcentroid'] - cutout_size), int(src['xcentroid'] + cutout_size)))
    ax.imshow(xdf_image[slc], norm=norm_image)
    ax.text(2, 2, str(src['id']), color='w', va='top')
    ax.set_xticks([])
    ax.set_yticks([])

https://matplotlib.org/stable/gallery/subplots_axes_and_figures/zoom_inset_axes.html#sphx-glr-gallery-subplots-axes-and-figures-zoom-inset-axes-py

In [None]:
fig, ax = plt.subplots(figsize=(8, 8))
top = 1125
left = 2350
size = 500

ax.imshow(xdf_image, cmap=cmap, norm=norm_image, alpha=0.5)
ax.scatter(sources_dao['xcentroid'], sources_dao['ycentroid'], s=30, marker='o',
           lw=1, alpha=0.5, facecolor='None', edgecolor='r')

ax2 = ax.inset_axes([top, left, 5 * size, 5 * size], transform=ax.transData)
ax.indicate_inset_zoom(ax2)
snippet = xdf_image[top:top + size, left:left + size]
ax2.imshow(snippet, cmap=cmap, norm=norm_image)

in_region = (
    (left < sources_dao['xcentroid']) &
    (sources_dao['xcentroid'] < (left + size)) & 
    (top < sources_dao['ycentroid']) &
    (sources_dao['ycentroid'] < (top + size))
)
sources_dao_to_plot = sources_dao[in_region]
plt.scatter(sources_dao_to_plot['xcentroid'] - left, sources_dao_to_plot['ycentroid'] - top, s=30, marker='o',
            lw=1, alpha=0.7, facecolor='None', edgecolor='r')
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)')
ax1.set_title('DAOFind Sources')

## Source Detection with `IRAFStarFinder`

Similarly to `DAOStarFinder`, `IRAFStarFinder` is a class that implements a pre-existing algorithm that is widely used within the astronomical community. This class uses the `starfind` [algorithm](http://stsdas.stsci.edu/cgi-bin/gethelp.cgi?starfind)  that was originally part of IRAF.

`IRAFStarFinder` is fundamentally similar to `DAOStarFinder` in that it detects sources by finding local maxima above a certain threshold that match a Gaussian kernel. However, `IRAFStarFinder` differs in the following ways:
* Does not allow users to specify an elliptical Gaussian kernel
* Uses image moments to calculate the centroids, roundness, and sharpness of objects

Let's run the `IRAFStarFinder` algorithm on our data, with the same FWHM and threshold, and see how its results differ from `DAOStarFinder`:

In [None]:
from photutils import IRAFStarFinder

In [None]:
iraffind = IRAFStarFinder(fwhm=5.0, threshold=20. * std)
sources_iraf = iraffind(np.ma.masked_where(xdf_image.mask, xdf_image))
print(sources_iraf)

In [None]:
# Set up the figure with subplots
fig, ax1 = plt.subplots(1, 1, figsize=(8, 8))

# Plot the data
fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image_clipped),
                      norm=norm_image, cmap=cmap)
ax1.scatter(sources_iraf['xcentroid'], sources_iraf['ycentroid'], s=30, marker='o',
            lw=1, alpha=0.7, facecolor='None', edgecolor='r')

# Define the colorbar and fix the labels
cbar = plt.colorbar(fitsplot, fraction=0.046, pad=0.04, ticks=LogLocator(subs=range(10)))
labels = ['$10^{-4}$'] + [''] * 8 + ['$10^{-3}$'] + [''] * 8 + ['$10^{-2}$']
cbar.ax.set_yticklabels(labels)

# Define labels
cbar.set_label(r'Flux Count Rate ({})'.format(xdf_image.unit.to_string('latex')),
               rotation=270, labelpad=30)
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)')
ax1.set_title('IRAFFind Sources')

Again, let's randomly select some sources for a closer look:

In [None]:
fig, axs = plt.subplots(3,3, figsize=(3, 3))
plt.subplots_adjust(wspace=0.1, hspace=0.1)

cutout_size = 20

srcs = np.random.permutation(sources_iraf)[:axs.size]
for ax, src in zip(axs.ravel(), srcs):
    slc = (slice(int(src['ycentroid'] - cutout_size), int(src['ycentroid'] + cutout_size)),
           slice(int(src['xcentroid'] - cutout_size), int(src['xcentroid'] + cutout_size)))
    ax.imshow(xdf_image_clipped[slc], norm=norm_image)
    ax.text(2, 2, str(src['id']), color='w', va='top')
    ax.set_xticks([])
    ax.set_yticks([])

## Note: Comparing `DAOStarFinder` and `IRAFStarFinder`

You might have noticed that the `IRAFStarFinder` algorithm only found 211 sources in our data &ndash; 14% of what `DAOStarFinder` found. Why is this?

The answer comes down to the default settings for the two algorithms: (1) there are differences in the upper and lower bounds on the requirements for source roundness and sharpness, and (2) `IRAFStarFinder` includes a minimum separation between sources that `DAOStarFinder` does not have:

|  &nbsp;        | `IRAFStarFinder`   | `DAOStarFinder`   |
|----------------|-------|------|
|   sharplo      |   0.5 |  0.2  |
|   sharphi      |  2.0  | 1.0   |
|   roundlo      |  0.0  |  -1.0  |
|   roundhi      | 0.2   |   1.0 |
|   minsep_fwhm  | 1.5 * FWHM   |   N/A |

Thinking about this, *it then makes sense* that `IRAFStarFinder` would find fewer sources. It has tighter restrictions on source roundness and ``sharplo``, meaning that it eliminates  more elliptical galactic sources (this is the eXtreme Deep Field, after all!), and the minimum separation requirement further rules out sources that are too close to one another.

If we set all these parameters to be equivalent, though, we should find much better agreement between the two methods:

In [None]:
iraffind_match = IRAFStarFinder(fwhm=5.0, threshold=20. * std,
                                sharplo=0.2, sharphi=1.0,
                                roundlo=-1.0, roundhi=1.0,
                                minsep_fwhm=0.0)
sources_iraf_match = iraffind_match(np.ma.masked_where(xdf_image.mask, xdf_image))
print(sources_iraf_match)

The number of detected sources are in much better agreement now &ndash; 1415 versus 1470 &ndash; but the improved agreement can also be seen by plotting the location of these sources:

In [None]:
# Set up the figure with subplots
fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(12, 6))
plt.tight_layout()

# Plot the DAOStarFinder data
fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image_clipped),
                      norm=norm_image, cmap=cmap)
ax1.scatter(sources_dao['xcentroid'], sources_dao['ycentroid'], s=30, marker='o',
            lw=1, alpha=0.7, facecolor='None', edgecolor='r')
ax1.set_xlabel('X (pixels)')
ax1.set_ylabel('Y (pixels)')
ax1.set_title('DAOStarFinder Sources')

# Plot the IRAFStarFinder data
fitsplot = ax2.imshow(np.ma.masked_where(xdf_image.mask, xdf_image_clipped),
                      norm=norm_image, cmap=cmap)
ax2.scatter(sources_iraf_match['xcentroid'], sources_iraf_match['ycentroid'],
            s=30, marker='o', lw=1, alpha=0.7, facecolor='None', edgecolor='r')
ax2.set_xlabel('X (pixels)')
ax2.set_title('IRAFStarFinder Sources')

# Define the colorbar
cbar_ax = fig.add_axes([1, 0.09, 0.03, 0.87])
cbar = plt.colorbar(fitsplot, cbar_ax, ticks=LogLocator(subs=range(10)))
labels = ['$10^{-4}$'] + [''] * 8 + ['$10^{-3}$'] + [''] * 8 + ['$10^{-2}$']
cbar.ax.set_yticklabels(labels)
cbar.set_label(r'Flux Count Rate ({})'.format(xdf_image.unit.to_string('latex')),
               rotation=270, labelpad=30)

Take this example as reminder to be mindful when selecting a source detection algorithm, and when defining algorithm parameters! Don't be afraid to play around with the parameters and investigate how that affects your results.

## ADD COMPARISON PLOT BACK INTO THIS