# Analysing forest recovery after logging events - comparison with satellite data

During the previous practical, you learned about the data that the Victorian Department of Energy, Environment and Climate Action keep on the management of logging coupes over time, and exported all the coupes that can be easily compared with satellite data from the Sentinel-2 constellation.

Your team has reviewed your file and selected three known logging coupes, which they would like you to investigate. Each coupe uses a different type of silvicultural system for harvesting, and your team are curious about how these events show up in satellite data. 

In this practical, you'll load and view the satellite data associated with each logging event, both as an image, and using the normalised difference vegetation index (NDVI). You'll then view the time series of how NDVI is changing over time for each event. 

## Overview

During this activity, you will learn to

* select specific rows from a table of geospatial data
* load satellite data for the time and location corresponding to a given logging event
* review satellite data, both as images and as timeseries

Most of the Python code for loading and analysing satellite data has been provided for you, but there will be a number of opportunities to write your own code, as well as customise existing code to explore different results. The focus of this session is to review the results and think about what you can learn from satellite data.

### Guiding text

This practical contains a number of headings to help guide you. 

* <span style="color:blue;font-weight:bold">Your task</span>: This indicates there is a task you must complete before proceeding. It will usually require you to add code or text before you can move on.
* <span style="color:green;font-weight:bold">Need some help?</span>: Your demonstrators are here to help -- this text is there to remind you to ask for help if you're not sure what you need to do. You can ask for help at any time.
* <span style="color:orange;font-weight:bold">Going further</span>: This indicates that there is an *optional* extension you can try if you've already completed the tasks.
* **Code explanation**: The text following this header will provide you with more information about how the code works -- you only need to read this if you're interested.

### Errors and warnings

It is normal when developing and running Python code to encounter errors and warnings. If you see a red box containing text appear, it is an error or a warning, which is the computer's way of giving you feedback that something isn't quite right. Read the [common errors guide](error_guide.ipynb) to learn more about what might be causing the issue, and then try and resolve it on your own, or with help from your demonstrator.

### Terminology

In the previous practical, your colleagues provided some useful definitions:

* [Silviculture](https://www.forestrycorporation.com.au/operations/silviculture) is the science of forestry.
* A [coupe](https://www.vicforests.com.au/vicforest-forest-management/ops-planning/where-vicforests-operates/timber-release-plan) is a defined area in a forest that timber can be harvested from.
* A [silvicultural system](https://www.fs.usda.gov/Internet/FSE_DOCUMENTS/fseprd530429.pdf) is the planned strategy for managing a coupe, including the harvesting and regeneration of timber.

## Notebook setup

In addition to `pandas` and `geopandas`, you'll now also need the `datacube` package; this is what lets you load satellite data. The other analyst on your team has also provided a number of extra functions that you'll use throughout the notebook.

To run the code, click on the next cell, and press `Shift`+`Enter` on your keyboard.

In [None]:
# import key packages
import datacube
import geopandas
import pandas
import matplotlib.pyplot as plt

# import extra functions
from datacube.utils.geometry import Geometry
from dea_tools.datahandling import load_ard
from dea_tools.spatial import xr_rasterize
from odc.algo import xr_geomedian
from plotting_functions import plot_rgb_ndvi

# Change a pandas setting to view all columns and all rows of loaded data
pandas.set_option("display.max_columns", None)
pandas.set_option("display.max_rows", None)

## Load the data

In the last practical, you exported a file called "LOG_SEASON_FILTERED.gpkg". This contains all the logging events that can be matched with Sentinel-2 data. The **path** to the data is `"LOG_SEASON_FILTERED.gpkg"`.

From the first practical, recall that you can use the `read_file` function from GeoPandas to load the data. You can read more about this function in the [geopandas documentation](https://geopandas.org/en/stable/docs/reference/api/geopandas.read_file.html#geopandas.read_file).

### <span style="color:blue;font-weight:bold">Your task</span>

> Load the logging data using the `read_file` function from `geopandas` and assign the loaded data to a variable called `logging_season_data`. Then, use the empty cell below it to view the **first 5 rows** of the data.
>
> What's provided:
> * The variable you'll assign the data to: `logging_season_data`.
> * The `=` sign that will assign the results of any code that comes after it to the variable.
> * An empty cell you can use to view the first 5 rows of the data.
>
> What you'll need to add:
> * After the `=` sign, type `geopandas.read_file()` to call the function.
> * Inside the `()` for the function, type the path to the file: `"LOG_SEASON_FILTERED.gpkg"`
> * In the empty cell, type the geopandas function for viewing the first 5 rows of the data. If you're not sure what the function is, open the previous practical and review the second task.
>
>After adding the required code to the cell below, run the cells by clicking on each cell and pressing `Shift`+`Enter` on your keyboard.

### <span style="color:green;font-weight:bold">Need some help?</span>
> If you see a <span style="color:red">NameError</span>, make sure you have included double quotes (`" "`) around the name of the file.
> 
>If you're not sure what to do, get in touch with a demonstrator (in the room or online) and show them your screen to talk through what you've tried and what the next step might be.

In [None]:
logging_season_data =

In [None]:
# View the first five rows using the head() method


## Select logging coupes for analysis

Your team has identified three logging coupes for you to investigate, with each coupe demonstrating a different silvicultural system. They have provided you with the event LOGHISTID values so that you can select them from the filtered data you created during the previous practical:

* **Clearfelling system**: "14/770/507/0011/201920/00"
* **Regrowth retention harvesting system**: "08/286/505/0029/201920/00"
* **Variable retention 1 system**: "16/686/510/0026/201920/00"

Your colleague has provided these  LOGHISTID values as a Python [dictionary](https://www.w3schools.com/python/python_dictionaries.asp), and has provided some code to only select the rows in your dataset with these LOGHISTID values. Run the cells below, and then complete the exercise.

In [None]:
# Store the events of interest in a Python dictionary
events_of_interest = {
    "Clearfelling": "14/770/507/0011/201920/00",
    "Regrowth retention harvesting": "08/286/505/0029/201920/00",
    "Variable retention 1": "16/686/510/0026/201920/00",
}

# Get a list containing the IDs from the dictionary
event_ids = list(events_of_interest.values())

# Identify the rows that correspond to the IDs in the list
rows_of_interest = logging_season_data.loc[:, "LOGHISTID"].isin(event_ids)

# Make a new table only containing the rows of interest
logging_events_to_analyse = logging_season_data.loc[rows_of_interest, :].reset_index(drop=True)

# View the new table
logging_events_to_analyse

**Code explanation**

> The above code uses two `GeoDataFrame` methods: `loc[]` and `isin()`. You can review all methods in the [GeoPandas documentation](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.html).
> 
> * `loc[row, column]` selects the data from the provided `row` and `column` values. Providing a value of `:` selects everything
> * `isin(list)` produces a value of `True` or `False` for each row depending on whether the row contains text that appears in the `list`.

### Exercise: Reviewing the events

Your team is interested in the answers to the following questions:

* **Question 1**: What is the **earliest start date** across the three events?
* **Question 2**: What is the **latest end date** across the three events?
* **Question 3**: Which **forest type** was harvested in all three events?

### <span style="color:blue;font-weight:bold">Your task</span>
> Review the table of selected events above, and answer the questions below.
>
>Double click the text below to add your answers.

### Your answers

**Question 1**: 

**Question 2**:

**Question 3**:

## Loading satellite data for each event

### Connecting to the datacube and constructing the query

In the next cell, your colleauge has provided you with the code to connect to the datacube, and some recommended settings for loading the data. The recommended settings are as follows:

* **time_range**: The date range to search for data over. Your colleague suggests looking at events sometime before and after the actual events.
* **products**: the satellite data to load from. `"ga_s2am_ard_3"` is the analysis-ready data product for Sentinel-2A, and `"ga_s2bm_ard_3"` is the analysis-ready data product for Sentinel-2B.
* **measurements**: the Sentinel-2 bands to load. The `nbart_` prefix has to do with how the data have been processed. For visual interpretation, you need the red, green and blue bands. For calculating NDVI, you need the red and near-infrared (`nir_1`) bands.
* **resolution**: the resolution (in metres) for each pixel in the image. The first number (`10`) specifies 10m in the horizonal direction, and the second number (`-10`) specifies 10m in the vertical direction.
* **output_crs**: the coordinate reference system (CRS) to use for the loaded data. [EPSG:3577](https://epsg.io/3577) is the Australian Albers equal-area projection.
* **min_gooddata**: The proportion of pixels in the image that must be good quality (i.e. not cloudy) for the whole image to be loaded. This makes sure we only load data with minimal cloud coverage. 

In [None]:
# Connect to the datacube
dc = datacube.Datacube(app="Logging_analysis")

# Recommended settings
time_range = ("2019-06-01", "2020-12-31")
products = ["ga_s2am_ard_3", "ga_s2bm_ard_3"]
measurements = ["nbart_red", "nbart_green", "nbart_blue", "nbart_nir_1"]
resolution = (10, -10)
output_crs = "EPSG:3577"
min_gooddata = 0.99

### Loading data

The next cell contains multiple steps that are used to load the data for each event. This is done by using a Python [for loop](https://www.w3schools.com/python/python_for_loops.asp), which loads data for each event in turn, and stores the outputs.

> **The data loading step will take 5 minutes!** 

Please be patient and keep an eye on the output. While you are waiting, you can read more about the steps involved. The steps are described below the next cell. You will know the code has finished running when you see the message "All data loading is complete! You can progress to the next step." in the output.

> **When running the code, the following warning may appear**: `/env/lib/python3.8/site-packages/rasterio/warp.py:344: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  _reproject(`. This occasionally happens when loading data and you don't need to worry.

Run the code cell below, then scroll down to read about the steps while the data loads.

In [None]:
# Add the settings into a dictionary, which can be used for all three events.
query = {
    "time": time_range,
    "products": products,
    "measurements": measurements,
    "resolution": resolution,
    "output_crs": output_crs,
    "min_gooddata": min_gooddata,
    "group_by": "solar_day",
}

# Create empty dictionary to store results in
event_data = {}

# Run the for loop, applying the code to each pair of event_type and event_id in the events_of_interest dictionary
for (event_type, event_id) in events_of_interest.items():

    print(f"Analysing LOGHISTID {event_id}; Event type: {event_type}")

    # Select the row corresponding to the LOGHISTID value
    event = logging_events_to_analyse.loc[logging_events_to_analyse.LOGHISTID == event_id]

    # Get the polygon geometry for the event and add it to the query
    geometry = Geometry(geom=event.geometry.values[0], crs=logging_events_to_analyse.crs)
    query.update({"geopolygon": geometry})

    # Load the data using the query
    ds = load_ard(dc=dc, **query)

    # Generate a polygon mask to keep only data within the polygon and apply the mask
    mask = xr_rasterize(event, ds)
    ds = ds.where(mask)

    # Group by 3 month intervals and calculate a composite dataset using the geomedian
    grouped = ds.resample(time="3MS")
    composite = grouped.map(xr_geomedian)

    # Calculate NDVI using (NIR - Red)/(NIR + Red)
    composite["NDVI"] = (composite.nbart_nir_1 - composite.nbart_red) / (composite.nbart_nir_1 + composite.nbart_red)

    # Store the results in the event_data dictionary
    event_data[event_type] = composite
    
    print("\n")
    
print("All data loading is complete! You can progress to the next step.")

**Code Explanation**

> The Python [for loop](https://www.w3schools.com/python/python_for_loops.asp) allows us to repeat the same set of actions for each event. The actions are:
>
> 1. **Select the row corresponding to the LOGHISTID value.**
>
> 2. **Get the polygon geometry for the event and add it to the query.** This means the datacube will only return data relevent to the event area.
>
> 3. **Load the data useing the query.** The `load_ard()` function takes the datacube connection and the query, and loads the relevent data. You can read more about this function in the [Digital Earth Australia documentation](https://docs.dea.ga.gov.au/notebooks/Tools/gen/dea_tools.datahandling.html#dea_tools.datahandling.load_ard).
> 
> 4. **Generate a polygon mask to keep only data within the polygon and apply the mask.** This maps the geometry to a raster with ones and zeros, where ones correspond to pixels within the area of interest, and zeros correspond to areas outside the area of interest. This will allow us to isolate out pixels corresponding to the logging event.
> 
> 5. **Group by three month intervals and calculate a composite dataset using the geomedian.** This step allows us to create a representative dataset over a three month period, by selecting the median pixel value from all values loaded for that period. In this case, we use a special type of median, called the geomedian. You can read more about it in the [Digital Earth Africa documentation](https://docs.digitalearthafrica.org/en/latest/data_specs/GeoMAD_specs.html#Geomedian). The date for geomedian measurement is the middle of the three month period.
> 
> 6. **Calculate NDVI using (NIR - Red)/(NIR + Red)**. This step takes the loaded red and near-infrared bands from the composite and calculates the corresponding NDVI values. NDVI is a satellite band index that indicates the presence of vegetation, with values ranging from -1 to 1. Higher values typically correspond to dense, green vegetation.
> 
> 7. **Store the results in the event_data dictionary**. This step allows us to store the data for each event and use it for our analysis.

## Visualising loaded data

### Spatial time series
Writing code to plot data in Python can be time-consuming to develop, so your colleague has dug out an old function they once made to help you. The function is called `plot_rgb_ndvi()` and it takes two arguments: the first is the event data and the second is the name of the event type to use as a title. It will show the visual image of the logging area (the combination of red, green and blue bands) on the left-hand side, and the corresponding NDVI values on the right hand side. It will plot each composite that was generated, one after the other.

Run the following cell to view the RGB and NDVI images for each logging event.

### <span style="color:orange;font-weight:bold">Going further</span>
> This is an optional exercise to further your own understanding. There are no questions to answer for this component.
> 
> What are the different steps involved in creating these visualisations? Review the plotting function created by your colleague in the [plotting_functions.py file](./plotting_functions.py)

In [None]:
# Plot RGB and NDVI for the event areas
for event_type, event_ds in event_data.items():
    plot_rgb_ndvi(event_ds, event_type)

### Summary time series

Your colleague has provided additional code to calculate and plot the average NDVI value for each composite, which will allow you to see the general trend in the presence of vegetation for the three events.

Run the cell below to view the average NDVI over time.

In [None]:
# Store the plot labels to add at the end
labels = []

# Create the figure
fig, ax = plt.subplots(figsize=(12, 6))

# Add a title
fig.suptitle("Change in NDVI over time for different logging events", fontsize=16)

# Plot the mean (average) NDVI for each event
for event_type, event_ds in event_data.items():
    event_ds.NDVI.mean(dim=["x", "y"]).plot(add_legend=False, ax=ax)
    labels.append(event_type)

# Add the labels and display the plot
plt.legend(labels, ncol=1, fontsize=12)
plt.show()

## Analysis

Congratulations! You have successfully loaded and visualised satellite data for the three logging events! Now, your colleagues are curious to know what you found.

Your team ask you to report back on the following:

* **Question 1**: In 1-2 sentences, describe one similarity you noticed between the three events when looking at the RGB and NDVI spatial time series.
* **Question 2**: In 1-2 sentences, describe one difference you noticed between the three events when looking at the RGB and NDVI spatial time series.
* **Question 3**: When looking at the summary time series for the Clearfelling event, around what month and year would you say the Clearfelling began? Is this consistent with the start date in the event table you created when [selecting logging coupes for analysis](#Select-logging-coupes-for-analysis)? Why or why not?

### <span style="color:blue;font-weight:bold">Your task</span>
> Review the spatial and summary time series, and answer the analysis questions below.
> Double click the text below to add your answers.

### <span style="color:green;font-weight:bold">Need some help?</span>
>If you have any questions about interpreting the plots, get in touch with a demonstrator (in the room or online) and show them your screen to talk through what you're thinking.

### Your answers

**Question 1**: 

**Question 2**:

**Question 3**:

## Submit your work

1. Ensure you have added answers to all the questions and save your file (in the menu bar, click File > Save Notebook).

2. In the file browser, right-click the `prac2_logging_site_monitoring.ipynb` file, and press "Download"

3. Email the downloaded file to caitlinisabeladams@swin.edu.au