<h1 align="center"> Photons != Science, <br>and the Challenges of Turning the Former into the Later</h1>

<br>

<div align="center">
<font size="+10"> Adam A Miller </font>
<br>
(CIERA/Northwestern/Adler)
<br>
<br>
LSSTC DSFP Session 11
<br>
<br> 
18 Aug 2020</div>

## Introduction

<br>
Session 11 is focused on imaging processing, and, informally attempts to answer the question: what happens between the glass and the database? 
<br>

When most people imagine the Vera C. Rubin Observatory, they envision this: 

<img style="display: block; margin-left: auto; margin-right: auto" src="images/2015-SL_LSST_LSSTIllus.jpg" width="600" align="middle">

<div align="right"> <font size="-3">(credit: Kavli foundation) </font></div>

You will (almost certainly) never visit the Rubin Observatory.

Instead, you will interact with it like this: 

<img style="display: block; margin-left: auto; margin-right: auto" width="900" src="images/ps1_casjobs.png">

<div align="right"> <font size="-3">(credit: PS1 casjobs) </font></div>

Given the impending deluge of data from the Rubin Observatory's Legacy Survey of Space and Time (LSST; also Euclid, Nancy Grace Roman Telescope, etc.), one might argue that only 2 skills are necessary for success in the LSST era:

  1. Advanced programming skills (`python`, SQL, etc) 
  - Statistical knowledge (machine learning, Bayes, etc)

Finding an actual strawperson to make this specific argument may be hard, but if you looked hard enough I bet you could find someone that would argue the validity of the above statement. 

Indeed, these two skills are the precise focus of the DSFP.

Master them, and you will be a full-fledged data scientist.

But! This conclusion is missing something: 

<img style="display: block; margin-left: auto; margin-right: auto" src="images/Data_Science_VD.png" width="600">

<div align="right"> <font size="-3">(credit: Drew Conway) </font></div>

**Domain Knowledge is an essential ingredient for the data science practitioner.** 

To "prove" this is the case, let's consider some conclusions that would be derived from the Rubin Observatory database without a working knowledge of astronomy (and the DOE LSST detectors):

### Incorrect Conclusion #1

There are no galaxies fainter than $i \approx 27.5 \, \mathrm{mag}$.

[Perhaps this signals the edge of the universe...]

This incorrect conclusion follows from the inverse-square law: $$\mathrm{flux} \propto r^{-2}$$ combined with the sensitivity limit of the LSST detector. We know fainter galaxies do exist, but they are either too distant or intrinsically dim to be detected by LSST.

### Incorrect Conclusion #2

Two stars cannot be closer than $\sim$0.35 arcsec in the sky.

[Perhaps there is some repulsive force between stars that keeps them separated...]

This apparent conclusion reflects the typical seeing at Cerro Pachon ($\sim$0.7 arcsec). Very nearby stars ($\theta < 0.3$ arcsec), cannot be resolved by LSST.

### Incorrect Conclusion #3

The Universe emits more light in the $r$-band than the $y$-band <br> (i.e., $\sum r_\mathrm{flux} > \sum y_\mathrm{flux}$).

[Red is the color of the Chicago Bulls, who had the greatest basketball player ever, Michael Jordan, so perhaps the Universe is trying to confirm something we already know...] 

This apparent conclusion is a bit more subtle than the previous two, and there are multiple factors contributing to this incorrect assertion.\* LSST will be far more sensitive in the $r$-band than the $y$-band (lower sky backgrounds and higher detector efficiency are the primary reasons). 

Blue sources naturally emit more light in the $r$-band than the $y$-band, but this imbalance should be countered by red sources (due to reddening and redshift there *should* be a lot more sources with observed red colors). Many red sources ($m_r - m_y > 0$) will only be detected in the $r$-band, however, due to the relative sensitivity in each filter. 


\* *Note* - If you have a convincing argument that there is more $r$-band flux than $y$-band flux in the Universe let me know.

### Upshot

Domain knowledge (of both astrophysics *and* the full telescope system) will be an essential ingredient for success once the Rubin Observatory begins LSST. LSST will push the boundaries for the 3 Vs (volume, variety, and velocity) of data science for astronomy. Success in this era will require substantial working knowledge of both "hacking" and "stats/mathematical analysis", but progress will be impeded without a corresponding expertise in how the data were acquired and why the Universe produced those data in the first place.

## Telescopes

<br><br><br><br><br>






Here's a true story...

In grad school a couple friends and I designed and developed the Imaginery Telescope. 

And IT is awesome!

IT has a diameter of 1 AU and it detects *all* wavelengths of the EM spectrum with 100% efficiency. It is revolutionary in it's design, and, as you might imagine, it will serve as a complete game changer for the field of astronomy.

(pssst – as you also might imagine it is completely imaginary)

Fundamentally, the thing we care about is measuring fluxes (and positions - though these two are related).

In principle, flux measurements are straight forward: count the number of photons per unit energy per unit time. 

If you want to be more sensitive to faint fluxes increase the size of your telescope (just one reason why IT is so powerful...)

In practice, things are not this simple: 
  -  telescope's optical elements are not 100% efficient <br>
      (we *can* measure inefficiencies and correct them $\rightarrow$ complicates the uncertainties beyond Poisson)
  -  our detectors introduce noise to our measurements <br>
  -  detectors eventually stop counting photons <br>
      (saturation)

In practice, things are not this simple (con't): 
  -  cannot measure absolute position of photons <br>
      (Heisenberg)
  -  further complicated by pixelated detectors  <br>
      (cannot measure continuous distribution)
  -  shutter opening and closing produces a variable exposure time across focal plane <br><br>
 

In summary, while our basic task — counting — is in principle quite simple, measuring the flux/position of an astronmical source is somewhat complicated. We control all the elements of the system, however, and a variety of different measurements can correct for these issues (though this results in more challenging uncertainty estimates).

There is an important element that we cannot control:

<img style="display: block; margin-left: auto; margin-right: auto" src="images/clouds.png" width="600">


The atmosphere really really complicates everything, making calibration a *nightmare*.

Turbulence distorts the signal, but clouds are the real pain. It's very difficult to measure the absolute attenuation of incident photons due to clouds. 

Briefly, we can calibrate the number of photons that we have counted to an actual measure of flux by 

(i) agreeing that there is a small handful of stars that are *not variable*, with *precisely known flux*. Then 

(ii) on nights that are "photometric" we observe these "standard stars" and the sources we care about, make some asumptions about atmospheric attenuation, and finally, 

(iii) we compare the relative counts in the detector for the standard stars and the sources we care about to determine the absolute flux for the sources we care about.

We are lucky that Dr. Colin Slater will be joining us later this week to address this question in greater detail. 

So far we have focused on the question – "How do we measure a flux?" (the answer: counting)

However, before we can measure the flux of some source, we have to identify it on the image.

How many stars are in this image?

<img style="display: block; margin-left: auto; margin-right: auto" src="images/star_field.png" width="500">


I bet you identified 22 stars in that image. 

You likely "eyeballed" an estimated background level in the image, and then searched for "peaks" above this background. 

Algorithmically that is precisely how we find sources in images.

(the correct answer is 23... 

the answer is always 23...

see the solutions notebook)

How many stars (or more precisely sources) are in this image?

<img style="display: block; margin-left: auto; margin-right: auto" src="images/cosmos-PSF-matched.png" width="600">


Let's make it easier and zoom in:

<img style="display: block; margin-left: auto; margin-right: auto" src="images/cosmos-zoomed.png" width="500">


This image from the [Hyper Suprime-Cam](https://www.naoj.org/Projects/HSC/) is more or less at the same depth as the final stack LSST images.

How do you define a "background" when there are literally galaxies ~everywhere?

Similarly, do all the photons we count in a single pixel "belong" to a single source?

(These are capital H hard questions. Luckily we have Dr. Yusra AlSayyad to answer them later in the session.)

Let's try an easier problem: measuring the flux of a galaxy. 

On the next slide I will show an image of a galaxy.
  -  make a quick mental map of which pixels should be included as we "count"  to measure the flux.

<br>
<img style="display: block; margin-left: auto; margin-right: auto" src="images/galaxy1.png" width="700">


Now repeat this exercise for this galaxy:

<br>
<img style="display: block; margin-left: auto; margin-right: auto" src="images/galaxy2.png" width="700">


And, finally, one more time for this galaxy:

<br>
<img style="display: block; margin-left: auto; margin-right: auto" src="images/galaxy3.png" width="700">


Each of those images displayed the same galaxy, the only thing that changed was the image "stretch":

<img style="display: block; margin-left: auto; margin-right: auto" src="images/all_galaxies.png" width="900">

Despite seeing the same galaxy 3 times, I would bet your mental map of the pixels belonging to the galaxy increased with each image.

So: how do we define the "edge" of a galaxy?

(in order to count photons for our flux measurement we need to know which pixels "belong" to the galaxy)

(if your answer to the above question was along the lines of: use the point-spread function OR model the profile of the galaxy, then let me ask – HOW do you know the intrinsic profile of the galaxy in order to build your model?)

Professor Gary Bernstein will tell us a lot more about measuring the flux from galaxies later this week.

Without getting into all the gory details we have already seen that our simple problem: 

count photons $\longrightarrow$ measure flux, 

is actually quite complicated.




While some of these challenges can be well understood (sensitivity of the detector), others are uncertain ($N_\mathrm{clouds}$). 

**Break Out Problem 1**

Given all these complications, how can one actually make any (informed) inferences about the universe?

*Hint* - think back to the previous session.

**Solution to Break Out 1**

*Pause the lecture – take a few minutes to think about the answer*

<font size="-1"> But,  </font> <font size="+3"> but,  </font> <font size="+5"> but...  </font>

## Speed Matters

<br><br><br><br><br>







As previously noted, the velocity and volume of LSST observations are going to be enormous. There isn't enough computing power in the world to sample a posterior that accounts for every photon detected by LSST. 

**Break Out Problem 2**

How long would it take to perform basic processing of all of LSST on your laptop? 

The bare minimum for image processing includes bias (subtraction) and flat-field (division) corrections. Assume your laptop has a single 3 GHz processor that requires 1 tick to perform a single addition operation and 4 ticks to perform a single multiplication operation.

**Solution to Break Out 2**

*Pause the lecture – take a few minutes to think about the answer*

**A more realistic solution to Break Out 2**

Based on PTF, it takes $\sim$30 s to fully process (bias, flat-field, astrometry, photometry, image subtraction...) 1M pixels (much of this is tied up in I/O). Using the same numbers from the previous example, LSST will take $\sim$200 yr to process.

Do not, I repeat, DO NOT, attempt to process all the images from the Rubin Observatory on your laptop. 

Some of you may be doing astronomy long enough that Moore's Law will save you and this will be possible on your laptop, but this definitely isn't happening in the 2020s.

## Conclusions

Most astronomers will only interact with the Rubin Observatory via the LSST database.

Domain knowledge will nevertheless be vitally important.

Lots and lots of complicated analysis happens between the glass and the database.