# Working with geomedians

Now we've successfully removed cloudy data points using the cloud mask, we need to piece together all the good data points to give us one complete image. We can do this by collating data points using a geomedian.

## Overview

The goal of this section is to better understand geomedians, which are a kind of composition method. Compositing is the process of creating one image for an area from several images for that area over time. Compositing creates one value for each band for each pixel based on the time series data for that pixel.

As we have seen in the previous section on [cloud masking](01_cloud_masking.ipynb) and in the [data loading exercise](../session_2/04_Load_data_exercise.ipynb) from Session 2, clouds often cover terrain. So, to create cloud-free images of areas, we composite our data.

In this section, we will compare median and geomedian composites, and explain why geomedians are better.

## Median composites

Median composites set the value for each band for each pixel in the output image to the median value for that band for that pixel. 

The benefit of a median composite is that it is very fast to compute, so it can be used to quickly create cloud-free images for areas.

To find the median of an `xarray` dataset over time, we can call upon the `median` function on our dataset. An example is shown in code below. `median('time')` means we are finding the median over time.

<img align="middle" src="../_static/session_3/02_intro_geomedian_median.png" alt="Code for calculating medians." width="600">

Because the median is calculated over all the available timesteps to give us one value, the output does not have a time dimension.

<img align="middle" src="../_static/session_3/02_intro_geomedian_median2.png" alt="Output when calculating medians." width="600">

## Geomedian composites

Geomedian &mdash; 'geometric median' &mdash; composites are multi-band generalisations of median composites. Instead of finding a median value for each band for each pixel **individually**, like a median composite does, a geomedian composite finds the median values of the bands for each pixel when considered **together**. 

This means they represent the data **better** than median composites. 

## Comparing medians and geomedians

The difference between medians and geomedians can often be subtle, especially if you are looking at the overall composite image. For example, the RGB images for these median and geomedian composites look almost identical.

<img align="middle" src="../_static/session_3/02_intro_geomedian_rgb.png" alt="RGBs of median and geomedian" width="600">

However, on a pixel-by-pixel basis, it is possible to visualise the difference between median and geomedian.

<img align="middle" src="../_static/session_3/02_intro_geomedian_geomedian_median_scatter.png" alt="Dataset scatter plot." width="800">

Inspect the above scatter plot of a single pixel. The values for median and geomedian are **not** the same &mdash; you can see the green and red crosses do not quite overlap. Imagine this difference, over millions of pixels. The composite results will certainly be affected. 

Geomedians take more processing time to calculate than median composites. However, unless you are only doing a quick visualisation, you should use the geomedian method when creating composites. This is because the geomedian value is more scientifically rigorous as it accounts for all the bands in the dataset.

The geomedian function is called `xr_geomedian` and is imported from `odc.algo`. An example of its use is shown in the code snippet below.

<img align="middle" src="../_static/session_3/02_intro_geomedian_xrgeo.png" alt="xr_geomedian code example." width="600">

The output is also an `xarray` dataset where the time dimension has been collapsed to produce the geomedian statistic.

<img align="middle" src="../_static/session_3/02_intro_geomedian_xrgeo2.png" alt="xr_geomedian code example." width="600">

If you compare the `Data variables` values from the geomedian to the median, you will be able to see they do in fact have different values for the same band and pixel. For example, the first pixel in the `red` band reads a geomedian value of `626.52185` while the corresponding value in the median shows `648.0`.

## Conclusion

You now know what geomedian composites are, and why we use them. 

To learn more about composites in general, including more kinds of composites, see [this notebook on generating composites](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks/blob/master/Frequently_used_code/Generating_composites.ipynb).

To learn more about geomedian composites specifically, see [this notebook on geomedians](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks/blob/master/Frequently_used_code/Generating_geomedian_composites.ipynb).

In the next section, we will walk through calculating a geomedian and creating a composite image using the techniques we have learned about.