# Cloud masking

Now we've successfully removed cloudy data points using the cloud mask, we need to piece together all the good data points to give us one complete image. We can do this by collating data points using a geomedian.

## Exercise: Cloud masking with `load_ard()`

In this exercise, we will apply a cloud mask to some Sentinel-2 data */our image of XYZ. To edit*

### Make new notebook

Like in the last exercise, we will begin by making a new, blank Jupyter notebook. If you want more detailed instructions on making a new notebook, see [this section in the exercise on loading data in the Sandbox](../session_2/04_Load_data_exercise.ipynb#Make-a-new-notebook) from the previous session. Otherwise, follow the steps below.

1. Navigate to the **Training** folder.
2. Click the **+** button and click **Python 3** under the **Notebook** section.
3. Rename your file so you know it is from this exercise. For example, call it `cloud_mask.ipynb`.
4. Open the notebook.

### Load packages and functions

We can now load the packages and functions we want to use. In the first cell of your new notebook `cloud_mask.ipynb`, enter the following code and run the cell.


<img align="middle" src="../_static/session_3/01_cloud_masking_imports.PNG" alt="Set up imports for the cloud mask notebook." width="500">

We used most of these packages and functions in the previous exercise on loading data in the Sandbox. The only new function is `load_ard`. 

### Connect to the datacube

Enter the following code and run the cell to create our `dc` object, which provides access to the datacube.

<img align="middle" src="../_static/session_3/01_cloud_masking_datacube.PNG" alt="Set up datacube connection for the cloud mask notebook." width="500">

### Load data with `load_ard()`

Step through loading the data with load_ard

Detailed with screenshots

They should now have an image with holes

Let us take a look at an area near the coast of Guinea Bissau. Enter the following code and run it to display a map of the area. As before, `x` denotes longitude and `y` latitude.

<img align="middle" src="../_static/session_3/01_cloud_masking_displaymap.PNG" alt="Example of display_map input and output." width="700">

In the new cell below, enter the following code, and then run it to load Sentinel-2 data. It will generate the output text `Using pixel quality parameters for Senntinel 2...`. The output text tells us we have loaded 4 timesteps.

<img align="middle" src="../_static/session_3/01_cloud_masking_loadard.PNG" alt="Using load_ard." width="700">

Take note of some of the differences between `dc.load()` and `load_ard`.

* `dc=dc` is a required parameter for `load_ard()`. This links the data search to the datacube
* The paramter for loading products is `products` (plural) not `product` as it is in `dc.load()`
* Product items must be listed inside square brackets `[]`, which is not required for `dc.load()`
* `min_gooddata` stands for 'minimum good data' and discards observations with less than the fractional requisite of good quality pixels

We can use the same `rgb` plotting code as in the last session to show an RGB image of one of the timesteps. Let's start with the first timestep, which has an `index` of `0`.

<img align="middle" src="../_static/session_3/01_cloud_masking_rgbin.PNG" alt="Plotting an RGB of the first timestep." width="700">

This should produce a single RGB image. What happens if you try changing the `index` number?

<img align="middle" src="../_static/session_3/01_cloud_masking_rgbout.PNG" alt="Output RGB of the first timestep." width="550">

If we want to see RGB images of all the timesteps at once, we can replace the `index` parameter with the `col` parameter. `col` stands for 'column', so `col='time'` gives us a new column with an image for each timestep.

<img align="middle" src="../_static/session_3/01_cloud_masking_colin.PNG" alt="Plotting RGBs of all timestep." width="700">

The output should look like this.

<img align="middle" src="../_static/session_3/01_cloud_masking_colout.PNG" alt="Output RGBs from col=time." width="800">


*Discussion from Tuesday meeting: Extend with `nearest` and/or making a composite image? Don't want to confuse users or distract them with other averages (mean etc) which are not as good as geomedians.*

## Conclusion

Good work &mdash; you have now loaded data using `load_ard()`, which has an automatic cloud mask. We can see that the images at different timesteps have different cloud cover, so they have been masked in different places. This is why having data at different timesteps can allow us to create a composite image without any cloud. 

Next, we will introduce geomedians, which are an important statistical value used when making a composite image.