# Application in Gamma-ray Astronomy: 

## Searching for Dark Matter Signals in Fermi-LAT data 

### Lecture outline

1. [Scientific Context](#Scientific-Context:-Indirect-Detection-Dark-Matter-Searches): Indirect (i.e., astronomical) dark matter searches with Fermi-LAT data
 1. Fermi Large Area Telescope and gamma-ray sky
 1. Dark matter signals from dwarf spheroidal (dSph) galaxies
 1. Frequentist (as opposed to Baysian) methodology

1. <a href="FermiAnalysisChain.ipynb">Analysis Context: Fermi-LAT data analysis chain</a>
 1. Event data and binned data 
 1. Instrument response, livetime and exposure
 1. Flux model templates
 1. Likelihood fitting

1. <a href="LikelihoodFittingExample.ipynb">Example 1:  Fitting the flux normalization of a target source (the dwarf galaxy draco)</a>

1. <a href="SEDFittingExample.ipynb">Example 2:  Extracting upper limits on the thermally averaged dark matter annihilation cross section</a>


### Scientific Context: Indirect Detection Dark Matter Searches

One sentence summary: if dark matter is a massive particles with weak-force-scale interactions (such as in many super-symettry scenarios) then signals may be detectable in astronomical gamma-ray data.


The most important equation gives the expected gamma-ray flux from annihiliations in a dark matter halo: 

<img src="figures/FluxEquation.png" alt="Flux Equation" width="500px">

The left hand side of the equation is the observed flux as a function of energy and direction in the sky.

$ \langle \sigma_{\chi} v \rangle $ is the "thermally averaged-cross section", i.e., velocity times the cross section, which gives an interaction rate per unit density ( $cm^{-3} s^{-1}$ ).

The summation is over annihilation final states (e.g., $b \bar{b}$ or $\tau^+\tau^-$, labeled by f), and $dN_{F}/dE$ is the average spectrum of gamma-rays for the final state (f), and $B_{f}$ is the branching ratio to that final state.

Together the first two terms on the right hand side of the equation give all of our knowledge of the particle physics of the dark matter annihilation.   In particle terms we don't know the cross section or the veloctiy distribution or branching fractions.   For a given final state we can use numerical codes to estimate the resultion gamma-ray spectrum $dN_{F}/dE$.

Here are some examples of the gamma-ray spectra for different final states:

<img src="figures/DM_Channels.png" alt="Dark Matter Spectra" width="400px">

The final term in the flux equation contains all of the astrophysical information, and is usually called the "J factor" (actually the common definition of the J factor doesn't include the factors of $1/4\pi$ and $1/m_{\chi}^2$, and is just the integral along the line-of-sight of the square of the density of the dark matter).    

Note that the line of sight integration intrinsically handles the distance-squared projection effects, i.e., you when you compare J factors you don't have to account for distance, that has already been done.    

Here are some examples of the radial density distribution of dark maater for different models of the Milky Way:

<img src="figures/gc_profiles_rho.png" alt="Dark Matter Density Profiles" width="400px">

and here are the resulting J-factors intgrated out from the Galactic center:

<img src="figures/gc_profiles_J_int.png" alt="Dark Matter J Factors" width="400px">

The J factors given here for the center of the Milky Way are much higher than for other targets.   Typical J factors for a dwarf galaxy such as Draco, which we will consider later, are more like $10^{19}$ GeV$^2$ cm$^{-5}$. 

### The Fermi Large Area Telescope and the Gamma-ray Sky

Here is a schematic view of the Fermi Large Area Telescope.   Basically it is a particle physics detector in space.

<img src="figures/LAT.png" alt="LAT Schematic" width="500px">


### Dark matter signals from dwarf spheroidal (dSph) galaxies

The gamma-ray sky is dominated by there types of emission:

1. Galactic diffuse emssion.  These are gamma-rays produced by the interactions between high-energy cosmic rays and dust, gas and radiation fields.  
1. Point sources.  The most recent LAT source catalog (3FGL) identifies over 3000 gamma-ray point sources, the largest class of extra-galactic sources are Active Galactic Nuclei (AGN) and the largest class of galactic sources are pulsars.
1. Isotropic emission.  This is largely (~85%) attributable to unresolve emisison for AGN.  

Here is an image of the gamma-ray sky corresponding 6 years of data, shown in Galactic coordinates in a Hammar-Aitoff projection:

<img src="figures/LAT_Sky.png" alt="LAT Sky" width="400px">

Dark matter signals might be a small contribution to the gamma-ray sky.  We can significantly improve our senstivity to such signals if we know where to look.  

Dwarf spheroidal galaxies are know to be very dark-matter dominated.

<img src="figures/dSphs.png" alt="Dwarf Spheroidal Galaxies" width="400px">

So, we can set up a search to look for *excess* gamma-ray emission in the direction of the known dwarf galaxies.   Today we are going to do that for one particular dwarf, Draco.  (Not Segue I, which is shown in this slide: )

<img src="figures/LAT_dSphs.png" alt="LAT Dark Matter Search in Dwarf Galaxies" width="400px">

Draco is located at (l,b) = (86.4,34.7) is ~76kpc from us, and has of J factor of $\log_{10} (J/GeV^{2} cm^{-5}) = 18.8$ and has an angular extent of less that half a degree (it would appear almost as a point-source to the Fermi-LAT").

### Frequentist (as opposed to Bayesian) methodology

In gamma-ray astronomy we tend to use frequentist formulations.   There is a very simple reason for this: we don't really know a lot about the gamma-ray sky, so we don't have very much prior knowledge of what to expect in our measurements.   

While it is true that Bayesian methods can be used with un-informative priors, the fact is that un-informative priors are often (or in part) designed to give the same answers as frequentist methods.   These can be useful if you really want to put things in a Bayesian context, (i.e., if you need a posterior distribution to sample).  However, by themselves they don't really tell you about the science, so in Fermi we generally don't bother with them.  (We get the same results, and reduce our carbon footprint to boot.)

In the frequentist formulation the main lens through which we interpret results are *Wilks'and Chernoff's theorems.*   They are both statements about the distribution of the "Test Statistic" in the case that the null-hypothesis is true.


Frequentist test statistics depend on the data, $N$, and our estimates of the model parameters, $\hat{\alpha}$. The distribution of the test statistic $TS$ is a transformation of the sampling distribution: ${\rm Pr}(TS(N,\hat{\alpha})|H_0)$. In frequentism there is no "probability distribution for the model parameters," only *estimates of the model parameters.*

In our case the test statistic is the likelihood ratio:

<img src="figures/TS_Definition.png" alt="Test Statistic Definition" width="200px">

*Wilks’ theorem:* as the sample size approaches infinity, the test statistic for a nested model will be asymptotically $\chi^2$-distributed with degrees of freedom equal to the difference in dimensionality of the test hypothesis and the null-hypothesis

*Chernoff’s theorem:* as above, but if the null hypothesis is at a parameter boundary of the 1/2 of the trials will have TS=0

*Caveat:* there are some subtleties in what is considered the null hypothesis in standard usage (more later)

### Poisson Likelihood

We will be performing binned likelihood fitting on counts data.  The function that we will be minimize is the negative log of the Poisson likelihood to observe a particular number of counts in each pixel / bin $n_i$ given that our model predicts $m_i$ counts.

<img src="figures/Likelihood_Poisson.png" alt="Test Statistic Definition" width="200px">

The negative log likelihood is:

<img src="figures/LogLikelihood_Poisson.png" alt="Test Statistic Definition" width="400px">

(We can neglect that last term b/c it does not depend on the model).


### Procedure:

* Fit the data with the model, by finding the parameters $\hat{\alpha}$ that maximize the likelihood (and minimize the negative log likelihood).


* Compute the test statistic $TS$ for the data and these parameter estimates.


* Inspect the $\chi^2$ distribution, our approximation for ${\rm Pr}(TS|H_0)$, and compute the $p$-value, the probability of getting a test statistic larger than that observed (in a long sequence of repeated trials). 

** *[Back to the outline](#Lecture-outline)* **