# An overview of `astropy.uncertainty`:

A relatively recent addition to `astropy` is the `astropy.uncertainty` sub-package.  Its primary purpose is to represent the *uncertainties* of Quantities in a way that allows relatively straightforward application of error-propogation rules.

Some important caveats for `astropy.uncertainty`: it is *not* intended to be a fully-featured replacement for thorough statistical analysis. For that you will want to use more statistical modeling approaches like combining statistically-oriented fitting tools with `astropy` pieces for just the astro-specific parts. E.g., you might use the `emcee` MCMC sampler with `astropy.modeling` astronomy-specific models implementing the likelihood function.  `astropy.uncertainty` is instead meant to provide a vehicle by which you can store uncertainties, and follow the basic error propogation rules when your science case does not require full statistical modeling.

Moreover, it is a newer sub-package.  While we do not anticipate major changes, it is possible some of the interface will evolve in future versions of astropy.

### *Note: This notebook is a copy of the tutorial notebook with some redundant cells omitted and with exercise solutions filled in*

### Preliminary imports

We start by importing some general packages we will need below:

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np

from astropy.visualization import quantity_support
quantity_support()

from IPython import display

In [None]:
from astropy import units as u

from astropy import uncertainty

# Recording uncertainties of data

The first use case for `uncertainty` is simply storing uncertainties *with* a quantity of interest. Let's start with a concrete and fairly typical use case: apparent magnitude measurements from the literature. As an example, consider the two galaxies highlighted in this paper: https://doi.org/10.1088/2041-8205/798/1/L21, which have apparent $r$-band magnitudes with standard symmetric error bars. The typical assumption is that these values are to be thought of as having Gaussian/Normal uncertainties with the uncertainty being the $\sigma$ of the Gaussian. Let's start by trying to represent just one of these (Pisces A) using `uncertainty`:

In [None]:
piscA_mr = uncertainty.normal(17.35*u.mag, std=0.05*u.mag, n_samples=10000)
piscA_mr

In [None]:
type(piscA_mr)

It is immediately apparent something has happened here beyond just recording the value and its $\sigma$.  `astropy.uncertainty` uses a Monte Carlo representation of the quantities it stores.  So when we called `uncertainty.normal`, we created a normal distribution of numbers with the given parameters.  This is why `n_samples` is required: only you, the user, knows how careful you want to be in modeling the uncertainties.  The standard choice of 10000 is reasonable, as the error in your uncertainty-related parameters generally go like $\sqrt{N_{\rm samples}}$, so with 10000 samples you can trust your estimates *on the uncertainties* to 1%.

To illustrate what has happened, lets use some of the convenience parameters `QuantityDistribution` provides:

In [None]:
piscA_mr.pdf_mean()

In [None]:
piscA_mr.pdf_std()

In [None]:
piscA_mr.pdf_median()

The above compute the mean, standard deviation, and median of the *samples* from the distribution. It is apparent that this reproduces the input we gave but only to ~1%, as expected from the number of samples.  We can also access the samples directly, as needed to produce, for example, a plot of the distribution:

In [None]:
plt.hist(piscA_mr.distribution, bins='auto', density=True, histtype='step')

plt.axvline(piscA_mr.pdf_mean(), color='k')
plt.axvline(piscA_mr.pdf_mean() + piscA_mr.pdf_std(), color='k', ls=':')
plt.axvline(piscA_mr.pdf_mean() - piscA_mr.pdf_std(), color='k', ls=':');

Now we can represent both galaxies from this paper the same way,  letting us do quick convenient operations over them:

In [None]:
piscB_mr = uncertainty.normal(17.18*u.mag, std=0.07*u.mag, n_samples=10000)

for distr in [piscA_mr, piscB_mr]:
    plt.hist(distr.distribution, bins='auto', density=True, histtype='step')
    
    plt.axvline(distr.pdf_mean(), color='k')
    plt.axvline(distr.pdf_mean() + distr.pdf_std(), color='k', ls=':')
    plt.axvline(distr.pdf_mean() - distr.pdf_std(), color='k', ls=':')


This on its own is enough to ask simple statistical questions using just the samples.  For example: what is the probability that Pisces A is brighter than Pisces B?

In [None]:
gridA, gridB = np.meshgrid(piscA_mr.distribution, piscB_mr.distribution)
np.sum(gridA < gridB) / gridA.size

Depending on your computer's speed, you may have seen a noticeable amount of time passing for that computation. This is because the `meshgrid` function created an array element for every possible pair in the two distributions, which is $10000^2 = 10^8$, meaning it had to create two 100-million element arrays.  This is likely a significant amount of your computer's memory. This is one of the "gotchas" to be aware of with the Monte Carlo method used in `uncertainty`: when doing operations that involve multiple distributions you often need to compare all combinations, which quickly becomes expensive and overwhelms your computer.

Just in case, we delete the large variables we created above. (so that if your computer has limited memory it won't become a problem later):

In [None]:
del gridA, gridB

A reasonable compromise to get around this is to just use the fact that the two distributions are independent and compare them element-wise:

In [None]:
np.sum(piscA_mr.distribution < piscB_mr.distribution) / piscA_mr.n_samples

This still gets a similar answer as the more complete version, although it is different at the percent level as expected for $10^4$ $n_{\rm samples}$

### Exercises

What is the probability that Pisces A and Pisces B are within .1 mags of each other?

In [None]:
gridA, gridB = np.meshgrid(piscA_mr.distribution, piscB_mr.distribution)
np.sum(np.abs(gridA - gridB) < 0.1*u.mag) / gridA.size

In [None]:
# or at lower precision but a lot faster/less memory-intensive
np.sum(np.abs(piscA_mr.distribution - piscB_mr.distribution) < 0.1*u.mag) / piscA_mr.n_samples

Think of some quantity with error bars you find interesting and try to represent it using `astropy.uncertainty`.  It can be from a paper you recently read/wrote, an important astrophysical quantity, or really anything that piques your interest.  Or if you hate open-ended questions when trying to learn... ignore this (and the related follow-up questions)!

In [None]:
# The answer depends on the reader so there is no "correct" solution provided here.

## Array distributions

Thus far, you could have done the same operations by drawing from Gaussians on your own. But this module can do more complex operations that are much more awkward to do on your oen - one example is wrapping up multiple values as arrays that each contain distributions: 

In [None]:
galaxies_mr = uncertainty.normal([17.35, 17.18]*u.mag, std=[.05, .07]*u.mag, n_samples=10000)
galaxies_mr

In [None]:
galaxies_mr.shape

Despite holding a large number of samples, this looks like a single quantity of length 2.  We can still get at the samples though, using the `.distribution` property:

In [None]:
galaxies_mr.distribution.shape

In [None]:
for dist in galaxies_mr.distribution:
    plt.hist(dist, bins='auto', density=True, histtype='step')

With a bit of string-processing and Jupyter notebook ticks, we can also use this to produce some nicer-looking quantities:

In [None]:
for mr in galaxies_mr:
    mean = mr.pdf_mean()
    std = mr.pdf_std()
    lstr = '${mean:.2f} \pm {std:.2f}$'
    
    # or equivalently, a one-liner using Python f-strings:
    lstr = f'${mr.pdf_mean():.2f} \pm {mr.pdf_std():.2f}$'

    display.display(display.Latex(lstr))

## Using Distributions as Quantities

But the real power of `Distribution`s is their ability to be treated just like ordinary quantities. (To refresh yourself on `Quantities` you can have a look at the [units and quantitites notebook](../03-UnitsQuantities/Astropy_Units.ipynb).)  For example, we can represent *both* galaxies from this paper as a single `Distribution`:

While the above provides some conveniences, more utility comes from treating these as quantities the way you would any other quantity.  For example, suppose we wanted to convert these magnitudes to fluxes following the standard Pogson formulation of magnitudes:

$m = -2.5 \log_{10}(f)$

(Note there are some more convenient ways to handle this conversion in `astropy` - see the [docs section on this in astropy.units](https://docs.astropy.org/en/stable/units/logarithmic_units.html), but here we do it by-hand to illustrate how to use `uncertainty` in a more general way.)

In [None]:
galaxies_rflux = 10**(galaxies_mr/(-2.5*u.mag)) * u.ABflux
galaxies_rflux

In [None]:
galaxies_rflux.pdf_mean()

In [None]:
galaxies_rflux.pdf_std()

In [None]:
for dist in galaxies_rflux.distribution:
    plt.hist(dist, bins='auto', density=True, histtype='step')

Close inspection of this distribution shows that it is no longer quite Gaussian, as there is an extended tail to higher fluxes.  This is more apparent if we artifically inflate the magnitude uncertainty by a factor of 10:

In [None]:
galaxies_mr_inflated_uncertainty = uncertainty.normal([17.35, 17.18]*u.mag, std=[.5, .7]*u.mag, n_samples=10000)
galaxies_rflux_inflated_uncertainty = 10**(galaxies_mr_inflated_uncertainty/(-2.5*u.mag)) * u.ABflux
for dist in galaxies_rflux_inflated_uncertainty.distribution:
    plt.hist(dist, bins='auto', density=True, histtype='step',)

And similarly, the error bars now clearly need to be asymmetric, as demonstrated by comparing the standard deviation to the 16% / 84% tails of the distribution:

In [None]:
galaxies_rflux_inflated_uncertainty.pdf_std()

In [None]:
galaxies_rflux_inflated_uncertainty.pdf_percentiles(16) - galaxies_rflux_inflated_uncertainty.pdf_median()

In [None]:
galaxies_rflux_inflated_uncertainty.pdf_percentiles(84) - galaxies_rflux_inflated_uncertainty.pdf_median()

In [None]:
galaxies_rflux_inflated_uncertainty

In [None]:
for f in galaxies_rflux_inflated_uncertainty:
    lower, mid, upper = f.pdf_percentiles([16, 50, 84]).value/1e-7
    lstr = f'${mid:.2} ^ {{ +{upper-mid:.2} }} _ {{ {lower-mid:.2} }} \\times 10^{-7}$'

    display.display(display.Latex(lstr))

### Exercises

Using distributions, verify the standard error propagation rules that the uncertainty of a sum or difference of gaussian variates are the quadrature sum of the individual uncertainties, and that the fractional uncertainty of a product is the quadrature sum of the fractional uncertainties.

In [None]:
a = uncertainty.normal(5, std=.22, n_samples=10000)
b = uncertainty.normal(7, std=.13, n_samples=10000)

In [None]:
s = a + b

s.pdf_std(), (.22**2 + .13**2)**0.5

In [None]:
s = a * b

s.pdf_std(), 5*7*((.22/5)**2 + (.13/7)**2)**0.5

Starting from the quantity of interest to you that you identified in the previous exercises, compute some derived quantity of interest to you, and plot up its distribution.

In [None]:
# The answer depends on the reader so there is no "correct" solution provided here.

## More complex manipulations with other Astropy functionality

While there is plenty to be done with quantities, `uncertainty` is also useful for more complex `astropy` objects.  We will illustrate this by using the `astropy.coordinates.SkyCoord` object.  This section assumes at least some familiarity with `coordinates`, so if you are confused by some of the coordinates-related operations, you may want to look at the [coordinates notebook](../04-Coordinates/astropy_coordinates.ipynb).

We need to import functionality from the other parts of astropy we will use:

In [None]:
from astropy.coordinates import SkyCoord, EarthLocation
from astropy.time import Time

Now let's assume you have ground-based observations of a star.  You have measured the centroid of the star in equatorial coordinates (ICRS) and believe you can trust the pointing of your telescope quite well. However, the weather conditions were not stellar (get it?), and the seeing was significantly greater than one arcsecond. You estimate that your centroid's uncertainty is about an arcsecond. We can encode that by creating a relevant `SkyCoord`, but providing the star as a distribution instead of a `Quantity`:

In [None]:
ra = uncertainty.normal(137*u.deg, std=1*u.arcsec, n_samples=10000)
dec = uncertainty.normal(-75*u.deg, std=1*u.arcsec, n_samples=10000)
star_icrs = SkyCoord(ra=ra, dec=dec, frame='icrs')
ra, dec, star_icrs

Note `uncertainty` was perfectly happy to accept different units for the value and its `std`, and took care of the conversion for you.

We can visualize this uncertainty on-sky by just plotting the distribution and letting the density of points indicate to us the probability distribution:

In [None]:
plt.subplot(aspect='equal')
plt.scatter(ra.distribution, dec.distribution, s=1, alpha=.25)

plt.xlabel('RA [deg]')
plt.ylabel('Dec [deg]')

Now you want to try to match your observations to those of a collaborator from a previous night.  Your collaborator has sent you a list of two possible stars in the field... but their list has the stars in Galactic coordinates instead of Equatorial:


In [None]:
comparison_star_gal = SkyCoord(l=[289.90508, 289.90483]*u.deg, 
                               b=[-18.12996, -18.12953]*u.deg, 
                               frame='galactic') 
comparison_star_gal


Astropy, fortunately, has the Galactic system built-in, so you can simply convert your own coordinates like any `SkyCoord`:

In [None]:
star_galactic = star_icrs.galactic
star_galactic

This at first glance appears to be a regular `SkyCoord`, but the elements in fact were transformed with their full distribution:

In [None]:
star_galactic.l, star_galactic.b

This then seems straightforward: compute which comparison star is closest to your observation, and the closest one should be the match.  Lets try that:

In [None]:
star_galactic.separation(comparison_star_gal).pdf_mean().arcsec

Looks like the first is the match!  Just in case, lets plot this distribution just as we did before, with our comparisons stars over top of it (the match in green, the non-match in red):

In [None]:
plt.subplot(aspect='equal')
plt.scatter(star_galactic.l.distribution, star_galactic.b.distribution, s=1, alpha=.25)


plt.scatter([star_galactic.l.pdf_mean()], [star_galactic.b.pdf_mean()], marker='s', color='k')
plt.scatter(comparison_star_gal.l, comparison_star_gal.b, marker='x', color=['g', 'r'])  # green is the one that appears to be a match, red is non-match

plt.xlabel('Gal long (l) [deg]')
plt.ylabel('Gal lat (b) [deg]');

Uh-oh.... An important "gotcha" now becomes apparent: our uncertainties were given as a circular region in ra/dec.  But that turns out to be a highly non-circular region in galactic coordinates, because spherical coordinates are not linear! Hence, with this particular pair of comparison stars, the one visibly closer to the *center* of the distribution is in a much lower probabilty region of the distribution, because the uncertainty is clearly not a circular Gaussian.

While this should also lead you to question the underlying assumption then that the uncertainty is gaussian in the equatorial space, the point this illustrates is that coordinate transformations can have non-linear effects, so tracking uncertainties can be critical to proper interpretation of the results.

To illustrate this problem one step further, lets see what happens if you take your observations, report them as typical symmetric $\pm$-style error bars:

In [None]:
avg_uncertainty = (star_galactic.l.pdf_std().arcsec + star_galactic.b.pdf_std().arcsec)/2

lstr = f'l = ${star_galactic.l.pdf_mean().deg:5f}^\circ \pm {star_galactic.l.pdf_std().arcsec :.2f}"$'
bstr = f'b = ${star_galactic.b.pdf_mean().deg:5f}^\circ \pm {star_galactic.b.pdf_std().arcsec :.2f}"$'
display.Latex(lstr + ', ' + bstr)

In [None]:
plt.subplot(aspect='equal')

# this is the correct answer
plt.scatter(star_galactic.l.distribution, star_galactic.b.distribution, s=1, alpha=.25, c='k') 

# this is what someone would assume from the above in a paper
l_circular = uncertainty.normal(289.905194*u.deg, std=.84*u.arcsec, n_samples=10000)
b_circular = uncertainty.normal(-18.1298*u.deg, std=.67*u.arcsec, n_samples=10000)
star_circular = SkyCoord(l=l_circular, b=b_circular, frame='galactic')
plt.scatter(star_circular.l.distribution, star_circular.b.distribution, s=1, alpha=.15, c='r')

plt.xlabel('Gal long (l) [deg]')
plt.ylabel('Gal lat (b) [deg]');

These are clearly very different uncertainty distributions, illustrating the importance of care in recording faithfully the coordinate system your uncertainty is in.

### Exercises

While easy to make, the "dot plots" above are sometimes hard to interpret when the dots overlap a lot.  Plot the star's probability distribution in `l`/`b` in a way that better reflects the probability - e.g. brighter or a different color where the density of points is high.

In [None]:
# There are plenty of alternative approaches, like seaborn.kdeplot or corner.corner. Or color the points according to nearest-neighbor distance. 
# But the matplotlib hexbin will work fine for this and is a one-liner, so:

plt.hexbin(star_galactic.l.distribution, star_galactic.b.distribution, gridsize=30)


plt.scatter(comparison_star_gal.l, comparison_star_gal.b, marker='x', color=['g', 'r']);

Starting from the quantity of interest to you that you identified in the previous exercises, try transforming it into some other form using astropy functionality - e.g. a coordinate transformation or a `.to` unit transformation.

In [None]:
# The answer depends on the reader so there is no "correct" solution provided here.

# Beware the ghost of covariance past

As one more point of how `astropy.uncertainty` can help with subtle statistical effects, we turn last to a subtle but tricky subject: intrinsically correlated uncertainties.

To illustrate this, we consider two data points that are frequently encountered in Galatic and some forms of Extragalactic astronomy: the $[{\rm Fe}/{\rm H}]$ and $[\alpha / {\rm Fe}]$ of an object (could be either a star or a galaxy).  If you see these written up in a paper for a particular object, it's not uncommon to see this as something like:

$[{\rm Fe}/{\rm H}] = -0.7 \pm 0.14$ dex

$[{\rm \alpha}/{\rm Fe}] = 0.5 \pm 0.1$ dex

This seems very straightforward to interpret like this:

In [None]:
FeH = uncertainty.normal(-0.7, std=0.1, n_samples=10000)
aFe = uncertainty.normal(0.5, std=0.14, n_samples=10000)

plt.scatter(FeH.distribution, aFe.distribution, s=1, alpha=.25)

plt.xlabel(r'$[{\rm Fe}/{\rm H}]$')
plt.ylabel(r'$[{\rm \alpha}/{\rm Fe}]$')

astr = f'$[{{\\rm Fe}}/{{\\rm H}}] =  {FeH.pdf_mean():.2f} \pm {FeH.pdf_std():.2f}$'
fstr = f'$[{{\\rm \\alpha}}/{{\\rm Fe}}] =  {aFe.pdf_mean():.2f} \pm {aFe.pdf_std():.2f}$'
display.Latex(fstr + ' ,  '+ astr)

This seems quite simple.  But it is in practice quite often subtly wrong.  In reality, it is plausible that the paper actually measured $[{\rm Fe}/{\rm H}]$ and $[\alpha / {\rm H}]$ separately from spectral lines, and then compute $[\alpha / {\rm Fe}]$ by combining those two.  We can model that process in `astropy.uncertainty` by a simple operation:

In [None]:
FeH = uncertainty.normal(-0.7, std=0.1, n_samples=10000)
alH = uncertainty.normal(-1.2, std=0.1, n_samples=10000)

aFe = alH - FeH

plt.scatter(FeH.distribution, aFe.distribution, s=1, alpha=.25)

plt.xlabel(r'$[{\rm Fe}/{\rm H}]$')
plt.ylabel(r'$[{\rm \alpha}/{\rm Fe}]$')

astr = f'$[{{\\rm Fe}}/{{\\rm H}}] =  {FeH.pdf_mean():.2f} \pm {FeH.pdf_std():.2f}$'
fstr = f'$[{{\\rm \\alpha}}/{{\\rm Fe}}] =  {aFe.pdf_mean():.2f} \pm {aFe.pdf_std():.2f}$'
display.Latex(fstr + ' ,  '+ astr)

Despite having the same appearence written in a paper, these distributions are clearly very different in character, with a strong covariance in the second case that is intrinsic to the measurement but not easy to write down in a simple form as two data points each with independent error bars. 

While the above is more natural to model given you can reasonably guess the *intent* of the paper author, an author might be careful themselves and provide you with an actual covariance between the variables.  While you cannot create a distribution directly in this form, you can create the multivariate Gaussian yourself and use that to create distribution objects that then allow you to continue with an `uncertainty`-oriented workflow:

In [None]:
# provided by the author of the paper
covariance_matrix = np.array([[.01, -.01],
                              [-.01, .02]])

FeH_values, aFe_values = np.random.multivariate_normal([-.7, -.5], covariance_matrix, size=10000).T

FeH = uncertainty.Distribution(FeH_values)
aFe = uncertainty.Distribution(aFe_values)

In [None]:
plt.scatter(FeH.distribution, aFe.distribution, s=1, alpha=.25)

plt.xlabel(r'$[{\rm Fe}/{\rm H}]$')
plt.ylabel(r'$[{\rm \alpha}/{\rm Fe}]$')

astr = f'$[{{\\rm Fe}}/{{\\rm H}}] =  {FeH.pdf_mean():.2f} \pm {FeH.pdf_std():.2f}$'
fstr = f'$[{{\\rm \\alpha}}/{{\\rm Fe}}] =  {aFe.pdf_mean():.2f} \pm {aFe.pdf_std():.2f}$'
display.Latex(fstr + ' ,  '+ astr)

While `astropy.uncertainty` gives you straightforward tools to back out the intended answer as illustrated above, it is important to understand this has limits.  For example, you might think you have accounted for this, but not realize the importance of using the *same* variables.  For example, you might later do something like this:

In [None]:
# oops, correcting a typo, the paper actually said -0.75:
FeH = uncertainty.normal(-0.75, std=0.1, n_samples=10000)

# but I can keep the aFe the same because I did that right
plt.scatter(FeH.distribution, aFe.distribution, s=1, alpha=.25, c='r')

plt.xlabel(r'$[{\rm Fe}/{\rm H}]$')
plt.ylabel(r'$[{\rm \alpha}/{\rm Fe}]$');


By doing this you've inadvertantly removed the covariance!  So you must be very careful to keep sets of variables for the same object that have a covariance together, and not shuffle their samples in any way - that will break the covariance and effectively treat your data points as independent.

### Exercises

Use `uncertainty` to model covariance in resampling of spectra: create a "spectrum" composed of 3 poisson-distributed pixels.  Then produce two resampled pixels in between them by averaging the 1st/2nd and 2nd/3rd pixel. What kind of covariance structure do you get between those two pixels?

In [None]:
pixels_mid = [8, 11, 5]

pixels = uncertainty.poisson(pixels_mid, n_samples=100000)

resampled_pixel_1 = (pixels[0] + pixels[1])/2
resampled_pixel_2 = (pixels[1] + pixels[2])/2

px1_mid = (pixels_mid[0] + pixels_mid[1])/2
px2_mid = (pixels_mid[1] + pixels_mid[2])/2

In [None]:
plt.scatter(resampled_pixel_1.distribution, resampled_pixel_2.distribution, s=1, alpha=.02)
plt.xlabel('Pixel 1')
plt.ylabel('Pixel 2')

plt.axvline(px1_mid, c='k', ls=':', alpha=.5)
plt.axhline(px2_mid, c='k', ls=':', alpha=.5);

The above is good enough, but we can take advantage of the discrete nature of the poisson distribution to do a slightly more readable version:

In [None]:
grid1 = np.arange(36)/2 # averaging two integers always gives integers or half-integers
xg, yg = np.meshgrid(grid1, grid1)

counts = np.zeros((len(grid1), len(grid1)))

# nested for loops are usually slow in python but sometimes its ok if they are small and its confusing to code some other way
for i in range(len(grid1)):
    for j in range(len(grid1)):
        counts[j, i] = np.sum((resampled_pixel_1.distribution == grid1[i]) & (resampled_pixel_2.distribution == grid1[j]))

In [None]:
plt.pcolor(xg, yg, counts)
plt.xlabel('Pixel 1')
plt.ylabel('Pixel 2')

plt.axvline(px1_mid, c='r', ls=':', alpha=.5)
plt.axhline(px2_mid, c='r', ls=':', alpha=.5);

## Wrap-up

This tutorial covers a fair amount of material, but `astropy.uncertainty` has even more functionality that we were unable to cover in this workshop. For documentation on other features of `astropy.uncertainty`, check out [the astropy.uncertainty section of the Astropy documentation](http://astropy.readthedocs.org/en/stable/uncertainty/index.html).