# A Short Tour of an Astronomical Inference

Goals:

* Understand what is meant by "data", "noise", and "models"
* Gain some appreciation for what astronomical data is like, and what astronomers are typically trying to do

## Data analysis

What is data? How should we think about data in science?
<br>
<br>
* Data are *constants* (usually numbers) 

* ... that we are *handed* (typically in a data file) 
       
* ... that *we hope to learn something from.*

## Scientific data analysis

* Propose observations
* Observe sky, collect and "reduce" data
* **Explore and summarize the data**
* **Hypothesize**, and **test**
* **Interpret,** conclude, speculate
* Report

> This course primarily concerns the parts of the investigation listed in **bold**.

## Learning from data

* Data analysis is _central_ to the scientific process: statistical inference is the mathematical formalization of _learning_.

* The formalism is important: hypothesizing, testing, and intepreting are all potentially _very messy._


Let's take a short tour through a simple example astronomical data analysis, that will briefly introduce many key concepts involved in learning from data.

## An example image dataset

* In optical, X-ray and gamma-ray astronomy, the most basic datasets are *images*

* Images can be 2D, from cameras, or 1D, from spectrographs, or 3D, from IFUs (integral field units). 

* Image data come packaged as an *array* of numbers, which we can visualize, and do calculations with.

Let's look at some X-ray image data from the XMM satellite, for the galaxy cluster A1835.

<img src="../graphics/tour_cluster_image.png" width=70%>

<table><tr width=90%>
<td><img src="../graphics/tour_cluster_image.png" height=300></td>
<td><img src="../graphics/tour_cluster_image_zoom.png" height=300></td>
</tr></table>

#### In pairs:
Identify some features of this image (note the zoomed-in view), and be prepared to speculate on what causes them. What can you say about the uncertainty associated with them?

## Sources of uncertainty

* Noise: "statistical" uncertainty, random error

* Astrophysical sources: "signal", from the target and otherwise

* Instrumental effects: variable sensitivity, point spread function blurring, vignetting, artifacts etc

* Calibration: units of pixel values

## Coping with uncertainty

* Coping with statistical uncertainty means acknowledging that things could have been different: if we took the observation again we would get different pixel values.

* This thought leads us to the notion of a _probability distribution for the data_

* Learning from data ("statistical inference") is about being able to make mock datasets that match, or "fit", the observed one, _to within the statistical uncertainties_ - as if they had all been drawn from the same probability distribution.


## Noise

<img src="../graphics/tour_cluster_image_zoom.png" height=200>

## Noise

* The photons arriving in our pixels seem to have been emitted, and arrive, "at random", giving rise to a "noisy" image. (You may have seen such images on TV.)

* Ultimately, the source of this randomness is quantum mechanics: atoms do not emit photons at regular intervals

* We expect the total number of photons arriving in each pixel during the exposure time to be well described as being drawn from a _probability distribution_ $P(N_k|\theta,H)$, whose functional form we can hope to first guess and then refine.

## Systematics
<img src="../graphics/tour_cluster_image.png" width=60%>

## Signals and Systematics

* Despite the noise, we can see a variety of "signals" in the image

* The feature we care about most is the cluster of galaxies in the center of the field

* Failure to account for the other features will introduce _systematic errors_ in our inferences

* Understanding these features means being able to "predict" them: that is, to _generate_ mock images that have the same types of feature

## Modeling data

* In order to generate mock data for comparison with our observations we need a _mathematical model_ 

* In practice, this model needs to be implemented in computer code.

* Writing this model involves making assumptions $H$ - about both the noise and the various signals in the data

* _These assumptions are unavoidable_

## You cannot do inference without making assumptions

## An example analysis

To see how some of these ideas crop up in real data analysis, let's take a quick tour through the [following paper](https://arxiv.org/abs/1509.01322), from 2016:

<img src="../graphics/tour_title.png" width=80%>


<img src="../graphics/tour_abstract.png" width=80%>

## Program

* **Observe** 40 clusters, producing X-ray images and spectra, bin in annuli

* **Model** the variation in gas density and temperature with radius in each cluster, assuming spherical symmetry

* **Check** how well these simple models fit the image data

* **Summarize** each cluster with "measurements" of e.g. gas mass, total mass (assuming hydrostatic equilibrium), etc

* **Model** the population of clusters, using simple "scaling relations" between their measured total masses and gas masses etc. 
* **Check** how well this simple model fits the measurements

## Modeling the X-ray images

Predictions of X-ray images (left) need to include variable effective exposure time and point source masking (right).

<table>
<tr>
<td><img src="../graphics/tour_cluster_image.png"></td>
<td><img src="../graphics/tour_cluster_expmap_masked.png"></td>
</tr>
</table>



## Modeling the X-ray Images

* Assuming spherical symmetrical clusters provides a significant shortcut

* After choosing a cluster center, the image pixels can be summarized in annuli, providing a high signal to noise spectrum in each one

* These spectra can be predicted given a cluster gas model


## Modeling the cluster gas

Spherically-symmetric, radially piecewise constant gas density and temperature: predict spectra in annuli and fit to the summarized data.

<img src="../graphics/tour_cluster_profiles.png" width=90%>

## Checking the cluster models

<img src="../graphics/tour_cluster_spec_residuals.png" width=50%>

* Does the _residual_ (difference) between predicted and observed data "look like" noise?

## Summarizing/measuring each cluster

* The measured gas mass is the integral of the model gas density: $M_{\rm gas} = \int_0^{r_{500}} 4\pi r^2 \rho_{\rm gas} dr$ where $\rho_{\rm gas} = \rho_{\rm gas}(r ; \theta)$

* The measured total gravitating mass $M(r)$ can be calculated from the model gas density and temperature once hydrostatic equilibrium is assumed: $\frac{dP_{\rm gas}}{dr} = -\frac{G M(r) \rho_{\rm gas}}{r^2}$

* In practice, Mantz et all assumed a model for $M(r)$ and used it to predict $\rho_{\rm gas}(r)$ given a piecewise constant model for $T(r)$  

* Since $P_{\rm gas} \propto \rho_{\rm gas} T_{\rm gas}$, we expect our uncertainty in $M_{\rm gas}$ to be correlated with that in $M$

## Aside: "measurements" usually come with assumptions 

* "Measurement" is used to describe both collecting data and doing inference. Reduced or summarized data, and inferences, all  come with assumptions
<br>
<br>
#### Thought Experiment:
With a tape measure, you measure your height 100 times and combine the results. What assumptions does your final measurement depend on? Are you using a model? Discuss this with your neighbor for a few minutes and be prepared to share your thoughts with the class.

## Measurements of cluster mass

<img src="../graphics/tour_cluster_mgas-vs-m500.png" width=70%> 

## Modeling the population

<img src="../graphics/tour_cluster_population_slopes.png" width=90%>

Power-law scaling relation slopes and intercepts, and "intrinsic scatter", are all "hyper-parameters" that describe the cluster population, not individual clusters

## Checking the population model

<img src="../graphics/tour_cluster_population_check_text.png" width=60%>

* Do all the individual objects look like they were drawn from the assumed relation?
* Are there sub-populations that behave differently, pointing towards a more complex model?

## Take-home messages

* Data are constants, which we need to interpret

* We need a model in order to be able to learn from data

* Matching models to observations allows us to cope with uncertainty 

* You cannot do inference, or make measurements, without making assumptions

* Assumptions can, and should, be tested, with the data

* The result of one inference can be (summarized and) used as the data for a subsequent one