# Multispectral data for deforestation detection

## The electromagnetic spectrum

What we perceive as light is just a small portion of the full electromagnetic spectrum. It is organized by wavelength ($λ$), which controls the interaction with matter. Different sensors use different parts of the spectrum for specific applications: microwave remote sensing uses wavelengths on a centimeter scale to analyze soil water content, and thermal remote sensing estimates land surface temperature with wavelengths on the micrometer scale $(\text{µm} = 10^{-6} \text{m})$. Multi-spectral sensors register the energy in frequency bands whose wavelengths are described on a micrometer or a nanometer scale $(\text{nm} = 10^{-9} \text{m})$.

The source of the energy received by the sensor depends on the wavelength it records. Microwave sensors can generate their own illumination (active; scatterometers, synthetic aperture radar), or register the energy emitted by the earth (passive; e.g., radiometers). Thermal and multi-spectral sensors are passive, with the former registering the amount of energy emitted by the earth, and the latter, which registers how much of the sun’s energy is reflected by the Earth.

The most used regions on multi-spectral remote sensing are the visible region (0.4 – 0.7 μm) is the part of the spectrum our eyes can perceive. As wavelength increases we veer into the infrared, region, which usually is separated into near-infrared (NIR, 0.7 – 1.2 μm) and middle- or shortwave infrared (MIR/SWIR, 1.2 – 3 μm). Over the next section we will see what is the usefulness of these bands.

<img src="../images/electromagnetic_spectrum.png" width="700"
         alt="The electromagnetic spectrum">

_The electromagnetic spectrum (Modified from [Wikipedia commons](https://en.wikipedia.org/wiki/Electromagnetic_spectrum#/media/File:EM_Spectrum_Properties_edit.svg))_

## An intuitive view of multi-spectral remote sensing
### The human eye as a sensor

The human eye contains two types of light-sensitive the cells: rods and cones. The former are concerned with lightness and motion, whereas the latter enable color vision. Each type of cone is sensitive to a specific portion of the spectrum, with areas where are they have no sensitivity, areas with some, and a point where it is at its maximum (peak sensitivity). Our three types cone cells allow us to perceive red, green, blue and their mixtures. Other animals can see fewer (e.g., the dogs, 2) or more colors (e.g., birds such as the European starling, 4). Color sensitivity as a concept extends to the next closest example: the cameras.

<img src="../images/eye_response.png" alt="Retinal response" width="700">

_Relative probability of absorption for the different rods of the human and the European Starling (bird) eye (From [Batchelor et al 2012](https://doi.org/10.1007/978-1-84996-169-1_3))_

## Common Cameras

Cameras capture object reflections by using a lens to gather (focus) light onto a light sensitive medium. The amount of energy recorded is controlled by the shutter aperture and speed. The medium can be a sensor (digital cameras), or film (analog cameras). Early cameras could not separate the three colors we perceive, reason why the first photographs were always in black and white. This was solved by the application of red, green and blue filters, which allowed to capture colors separately. This idea will helps us in the future when we try to visualize and interpret multi-spectral images. 

<img src="../images/color_photography.png" alt="Color synthesis" width="700">

_Three pictures taken by Mikhailovich Prokudin-Gorskii representing the red, green and blue color channels (left), and color composite generated from them (right). From [Wikipedia commons](https://commons.wikimedia.org/w/index.php?curid=464234)_

## Multi-spectral sensors

Every band of a multi-spectral sensor measures the amount of energy received within a “bracket” or “band” of the electromagnetic spectrum. The number of these bands and how wide they are depends on the specific sensor. On the barest terms, the sensor for each band records the amount of energy received. Cameras have three bands (blue, green and red), commonly used multi-spectral sensors usually have 3-10 bands. The sensors we will be using for these notebooks are the **Operational Land Imager** (OLI on Landsat 8 and 9) and the **Multi-Spectral Imager** (MSI on Sentinel-2 A, B and C). Both are widely used in the field of multi-spectral remote sensing. [Here you can find a list  of their bands](https://www.earthdata.nasa.gov/data/projects/hls/spectral-bands).

The energy recorded depends on the sensor aperture size, the integration time (think of shutter aperture and speed in a common camera), and its spectral sensitivity (it would look similar to the relative probability of absorption). Most of these parameters are known instrument characteristics, which are accounted for, allowing us to retrieve an estimate of the energy received: the spectral irradiance. [More details can be found here](https://www.cesbio.cnrs.fr/multitemp/radiometric-quantities-irradiance-radiance-reflectance/)

## Preparing the images for usage

Going from satellite images as such to the images we analyze is a process with many steps whose purpose are to ensure they are as comparable as possible. This section is not meant to go in full depth, only to briefly mention some concepts to make it easy to understand which products to use and what they account for. For starters, the images need to be geo-referenced, adding data indicating their position in the context of a coordinate reference system (CRS).

The irradiance could be understood as something like the “Blue Marble” that astronauts saw from the orbit. However, for practical use the irradiance recorded by the sensor needs to be normalized to account for how much energy is emitted by the sun at each specific wavelength. The result is the **reflectance**, the fraction of the incoming radiation that is reflected (0-1 range). The result depicts what earth reflects for each wavelength, a quantity called **top-of-atmosphere (TOA) reflectance**, the standard used for **Level-1 products**.

TOA reflectance would not suffice for our analyses, as the image would be affected by the earth, but also by the column of air that needs to be traversed twice to reach the sensor (sun-atmosphere-ground, ground-atmosphere-sensor). Atmospheric correction uses physical modeling to account for the impact of atmospheric gases (ozone, water vapor, etc.). This process allow to go from top- to **bottom-of atmosphere (BOA) reflectance**, where the impact of the atmosphere should be reduced. Furthermore, multi-spectral sensors work on similar wavelengths as our eyes. We can see the clouds, and so can they!. For land applications clouds and their shadows need to be removed as they are extremely bright or dark compared with most of the land surfaces. **Level-2 products** have received both atmospheric corrections and are accompanied by cloud and shadow masks. 

Level-2 products are considered [**analysis-ready**](https://ceos.org/ard/files/PFS/SR/v5.0/CARD4L_Product_Family_Specification_Surface_Reflectance-v5.0.pdf). The product we use for our notebooks goes even further. The HLS product has been designed to ensure that images from Landsat OLI and Sentinel-2 MSI can be used together as if the images came from a single sensor. The reason for this is that by using satellites from different programs it is possible to increase observation frequency, increasing our chances to “catch” cloudless observations. To make the observations more compatible the image grids are resampled to match, and reflectance is refined even further to suppress the impact of the combined effect of both illumination and the observation angles. Finally, band-pass adjustment is applied to Sentinel-2 (MSI) images so their reflectance matches the ones from Landsat 8/9 (OLI).

<img src="../images/spectral_response.png" alt="Spectral response" width="700">

_Landsat 8/9 and Sentinel-2 A/B relative spectral responses (From [Lima et al, 2025](https://doi.org/10.1016/j.srs.2025.100225))_


## Interpreting the Images
Common digital images are displayed based on the three pixel values (ranged 0-255) contained in every pixel, which represent how much red, green and blue light should be emitted by the three channels of the display. Multi-spectral reflectance images (0-1 range) can be displayed in the same way, in what we call a true color composite. However, if we remain within the confines of the wavelengths perceived by our eye, we would not see what the other bands have to offer. What we can do is to place swap our usual red-green-blue with other bands, putting wavelengths invisible to our eye where it can perceive them. For example, in the following figure we can see two photos of the same area. The first is in natural color, but the second places the near-infrared (NIR) on the red channel, whereas the red is displaced to the green channel, and green to the blue channel. This is what is called a [false or synthetic color composite](https://earthobservatory.nasa.gov/features/FalseColor). The most striking change is that trees change their familiar green for red, indicating they are way more reflective on the NIR than they were on the red. Over the next sections we will see what information we can get from every band.

True color composite from August 2005.<br><br>Red, Green and blue are assigned to their homonyms channels.<br><br>Credit: [Mike Murphy / Wikipedia commons](https://commons.wikimedia.org/wiki/File:Half_dome_yosemite_national_park.jpg).|MSS (Landsat 1-3 sensor) False color composite from May 1972.<br><br>NIR on the red channel, red on the green and, and green on the blue.<br><br>Credit: [Hughes Aircraft Company / NASA](https://landsat.gsfc.nasa.gov/article/1972-mss-image-of-half-dome/)
:-------------------------:|:-------------------------:
![](../images/halfDome_natural.png)  |  ![](../images/halfDome_false.png)

_Two photographs of the Half dome of Yosemite National Park. Both were taken from Glacier Point._

## What is shown by every band: the spectral signature

The interaction of light with any kind of surface depends on the wavelength of the light, and the physical and chemical properties of the surface. Dry soils tend to be very bright, darkening with an increasing water content. Water reflects very little on the visible spectrum (red, green, blue), with clearer waters having lower reflectances. Towards the near and shortwave infrared the reflectance is nearly 0. The case of vegetation is very interesting the chlorophyll in the plants absorbs blue and red, whereas it reflects the green, the color we perceive. Over the near infrared (NIR) plants are very reflective/bright due leaf structure, whereas on the shortwave infrared (SWIR1) the leaves are significantly less reflective due the water absorption: the moister the leaves are, the darker they get.

<img src="../images/spectral_signature.png" alt="Spectral signatures" width="700">

_Reflectance of some targets across different wavelengths. Modified from “Introduction to hyperspectral imaging” ([Smith , 2012, ©MicroImages, Inc.](https://www.microimages.com/documentation/Tutorials/hyprspec.pdf))_

## Imports

In [None]:
import geopandas as gpd
import hvplot.pandas
import hvplot.xarray  # noqa
import numpy as np
import pandas as pd
import xarray as xr

from envrs.download_path import make_url

## Read the data and understand the data

Our first step is to read a series of images covering an area afflicted by deforestation, and two points one with, and another without deforestation. Please note these files have been placed on a special repository for this specific example. Data from other areas/times would need to be loaded in a different way.

In [None]:
# Read the cube
cube_uri = make_url("HLS_clip4plots_both_b30_v20.zarr.zip", is_zip=True)

full_cube = xr.open_dataset(
    cube_uri,
    engine="zarr",
    consolidated=False,
).compute()

# Read the points, reproject tro match the datacube, set the index
points_uri = make_url("timeline_points.geojson")
points_raw = (
    gpd.read_file(points_uri)
    # .to_crs(full_cube["spatial_ref"].attrs["crs_wkt"])
    # .set_index("intact")
)

## The image cube

`full_cube` contains the images, which have coordinates `x` (longitude), `y` (latitude) and `time` (moment of acquisition). The first two depicts the spatial position of each pixel in the context of a coordinate reference system (CRS).

In [None]:
full_cube.coords

The information regarding the coordinate reference system (CRS) is contained on the attributes of the `spatial_ref` variable. `crs_wkt` depicts a standard representation of the CRS ([well-known-text, WKT](https://en.wikipedia.org/wiki/Well-known_text_representation_of_coordinate_reference_systems)), and [`GeoTransform`](https://gdal.org/en/stable/tutorials/geotransforms_tut.html) details the placement of the spatial grid within the CRS. Things like the lat/lon of the top left pixel, the pixel size (30 m in this case) or the rotation relative to the CRS axes.

In [None]:
full_cube["spatial_ref"].attrs

The rest of variables (`data_vars`) depict the reflectances of the bands (`dtype` or data type `float64`) or the auxiliary masks (`dtype` `bool`)indicating the presence (`True` ) of some objects we may want to exclude (clouds and shadows, snow or water).

In [None]:
for variable_name, variable_data in full_cube.data_vars.items():
    print(f"{variable_data.dtype!s:<10}{variable_name}")

We will prepare some lists detailing which phenomena we intend to mask. This will simplify the following code cells.

In [None]:
cirrus_mask = "cirrus cloud"
cloud_mask = "cloud"
adjacent_mask = "adjacent to cloud"
shadow_mask = "cloud shadow"
tainted_masks = [adjacent_mask, cloud_mask, cirrus_mask, shadow_mask]

### The points

The point dataset contains just a geospatial table ([GeoDataFrame](https://geopandas.org/en/stable/docs/user_guide/data_structures.html#geodataframe)) with couple of columns. `geometry` contains the longitude and latitude of every point. `intact` is just a boolean (`bool`) indicating wether that pint is considered deforested (`True`) or not (`False`).

In [None]:
points_raw

The context of the coordinates is depicted `points_raw` has an attribute called `crs`

In [None]:
points_raw.crs

### Harmonize the dataset CRS'

To use both datasets we need to ensure they use the same CRS. Otherwise we may end up looking on the wrong places. If we compare the WKT representations of the CRS, they do not match.

In [None]:
full_cube["spatial_ref"].attrs["crs_wkt"] == points_raw.crs.to_wkt()

Thus, we need to reproject `points_raw` so it has the same coordinates as `full_cube`. Note that point coordinates now are different

In [None]:
points_matched = points_raw.to_crs(full_cube["spatial_ref"].attrs["crs_wkt"])
points_matched

## Visualizing the images over time

In this section we will prepare a couple of widgets showing some of the images contained on `full_cube`. We will create two:

* One with images that can go from fully clear to fully covered by clouds, just to show how much of a problem clouds can be.
* One just contained with no detected clouds.

### Selecting a subset of images

To select the data for these visualizations we need to know how many pixels have been considered to be tainted. For this we align the variables indicative of cloud/shadow (`tainted_masks` along the dimension `mask`) and check if `any` of them was `True`. Then we `sum` to know how many pixels were tainted in every time step (`x` and `y` are lost on the process). The last two lines converts the result into a table ([`dataframe`](https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe)), and sorts the table so "clearer" observations appear earlier.

In [None]:
tainted_frame = (
    full_cube[tainted_masks]
    .to_dataarray(dim="mask")
    .any(dim="mask")
    .sum(dim=["x", "y"])
    .to_dataframe(name="count_tainted")
    .sort_values(by="count_tainted")
)

To make our "cloudy" visualization we [group by](https://pandas.pydata.org/docs/user_guide/groupby.html) month ([`ME` = month's end](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)) and take the clearest observation. Note that clear**EST** does no necessarily means **clear**. If all the images for a month are cloudy, so will be the "clearest".

In [None]:
monthly_clearest = tainted_frame.groupby(pd.Grouper(freq="ME")).head(1)
monthly_cube = full_cube.sel(time=monthly_clearest.index.values)

To make our cloudless visualization we just take visualizations where there there are not cloudy/shaded pixels

In [None]:
fully_clear = tainted_frame[tainted_frame["count_tainted"] == 0]
clear_cube = full_cube.sel(time=fully_clear.index.values)

### Reflectances to common RGB images

To make a consistent visualization we need to take the reflectances of three bands, which will be displayed on the red, green and blue channels. For this purpose it is necessary to [**scale**](https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-scaler) our data from the usual interval of reflectance (0-1 interval) to the standard used in common images (0-255, 8-bit RGB). The function `to_rgb8` takes three bands of our reflectance cube and converts them to visualization cubes in 8-bits RGB. For every value se subtract the target minimum, the value that will become the 0 (`vmin`, default is 0), divide by the between the target minimum minus the target maximum (`vmax` will become 255) and multiply the result by 255.

$$
x_{8bit} = 255 * \frac{x - x_{vmin}}{x_{vmax} - x_{vmin}}
$$

The choice of target `vmin/vmax` will depend on the specific bands to display. For example, visible bands are readable when the target maximum is 0.15, whereas many land targets are more reflective on the infrared bands, which need a larger target maximum such as 0.40. Note that not all pixels may fall within this interval. For example clouds are so reflective that will show even larger values. To avoid numerical issues we `clip` the data any value lower than 0 becomes 0 (pure black), and any value larger than 1 becomes 1 (pure white), with clouds and shadows often going to one or the other. Pixels with with abnormal values (not-a-number, `Nan`; infinity, `inf`) are clipped as well, displaying them in black. 

The last steps are format the image as an 8-bits, add the `composite` dimension with the names of the spectral bands we used, and renaming the contents of the `band` dimension as `r`, `g` and `b`. The intent of these last two steps is to make the visualization easier to handle during visualization. `plot_rgb` intakes any number of visualization cubes (`*args`) and makes one display per value of `composite` showing the contents of `band` as RGB whilst allowing to scroll through `time`.

In [None]:
def to_rgb8(cube, r, g, b, vmax, vmin=0):
    selected = cube[[r, g, b]].to_dataarray("band")
    stretched = (selected - vmin) / (vmax - vmin)

    is_valid = np.isfinite(stretched)
    positive = stretched.where(is_valid.all(dim="band"), 0)
    clipped = (
        np.clip(255 * positive, 0, 255)
        .astype(np.uint8)
        .expand_dims({"composite": [f"{r=}, {g=}, {b=}"]})
    )
    clipped["band"] = np.array(["r", "g", "b"], dtype="unicode")

    return clipped


def plot_rgb(*args, dimname="composite"):
    return xr.concat(args, dim=dimname).hvplot.rgb(
        x="x",
        y="y",
        bands="band",
        by=dimname,
        groupby="time",
        subplots=True,
        rasterize=True,
        data_aspect=1,
        xaxis=False,
        yaxis=None,
        widget_location="bottom",
    )

### An image at last!

But there are clouds!

In [None]:
# with clouds
plot_rgb(
    to_rgb8(monthly_cube, r="Red", g="Green", b="Blue", vmax=0.15),
    to_rgb8(monthly_cube, r="NIRnarrow", g="SWIR1", b="Red", vmax=0.40),
)

Both cells depict the same forest area in both true color (red, green, blue; on the right) and false color (left, NIR on the red channel, SWIR1 on the green and red on the blue channel). At the start of the year the false color composite looks orange/red, whereas the true color composite looks green. The former is caused by the intense reflection caused by leaf structure, whereas the latter is driven by leaf pigmentation (chlorophyll).

As the year advances the false color composite loses its reddish coloration and turns greener and greener, indicating leaf structure is not so reflective anymore, and a decrease of leaf water content. The true color composite shows a similar pattern, with most areas going from green to brown. Only the valleys retain the same color, possibly because these areas tend to gather water from the surrounding slopes, which would allow vegetation to better withstand a dry season. Around September/October the redness of the false color composite and the greenness of the true color composite start to increase again. This cycle could be explained by the stress induced by a dry season. Another interesting pattern is how the shading created by the terrain varies following the sun declination, an effect that may need to be corrected for some applications.

In [None]:
plot_rgb(
    to_rgb8(clear_cube, r="Red", g="Green", b="Blue", vmax=0.15),
    to_rgb8(clear_cube, r="NIRnarrow", g="SWIR1", b="Red", vmax=0.40),
)

The animation not only shows the natural patterns, but also tree cover losses. At the end of November 2019 we see the appearance of a patch with an unusual color for this time of the year. Even though by March/April of 2020 it has regained its red/green color, it remains paler, which may indicate shorter vegetation has substituted the trees. This process reappears in May 2021, where we see how the deforested area suffers a large increase. At first these areas are dissimilar to the surrounding vegetation, and some regain a reddish/orange (false color) or pale green (true color) tone on the composites. Combined with the spatial pattern of the patches we could interpret that these areas are being cultivated.

## Satellite pixel time series as tables

### Extract pixel values from our image dataset

Now will display two points: one that has been deforested, and another that is intact. For this we need to use the coordinates on our point dataset to select the pixels with matching spatial coordinates from the image dataset (that's why it is so important they they are on the same CRS).

To pick the pixel values first we need to ensure the records of the point table have a meaningful identifier ([index](https://pandas.pydata.org/docs/user_guide/indexing.html)), because it will be used to associate the image data with the point that was used during the extraction. In our case we will use the `intact` column (`True/False`) as a point identifier

In [None]:
points_named = points_matched.set_index("intact")

What we want to do is called ["point-wise indexing"](https://tutorial.xarray.dev/intermediate/indexing/advanced-indexing.html#vectorized-or-pointwise-indexing-in-xarray). It requires that we create one [DataArray](https://docs.xarray.dev/en/latest/user-guide/data-structures.html#dataarray) per dimension (`x`, `y` in this case, both obtained from every point geometry) specifying a shared dimension name (`intact`). Then we use the `.sel` method to pick the points

In [None]:
points_x = xr.DataArray(points_named.geometry.x, dims="intact")
points_y = xr.DataArray(points_named.geometry.y, dims="intact")

sel_cube = full_cube.sel(x=points_x, y=points_y, method="nearest")

If we display the dimension size of our original cube (`full_cube`) and the one from our selection (`sel_cube`) we see that time has retained the same size, but `intact`, our identifier, has replaced `x` and `y` as dimensions. `intact` associates the pixel values at that point with the identifier of the original pont (intact, `True`; deforested, `False`)

In [None]:
dict(full_cube.sizes)

In [None]:
dict(sel_cube.sizes)

Now we can "flatten" our data into a table to simplify data access, and add some auxiliary columns that we will be using during the plotting (`flag`, indicating wether the pixel had any sort of issue;`NDVI`, `NDMI`, normalized vegetation and moisture indices, respectively; `DOY`, day of the year)

### Flatten (xarray dataset/"cube" to pandas dataframe/table)

In [None]:
sel_frame = (
    sel_cube.to_dataframe().drop(columns=["spatial_ref", "x", "y"]).reset_index()
)

sel_frame.loc[:, "flag"] = "clear"
sel_frame.loc[sel_frame[tainted_masks[1:]].any(axis=1), "flag"] = "cloud/shadow"
sel_frame.loc[sel_frame[tainted_masks[0]], "flag"] = "adjacent"


def normalized_difference(frame, positive, negative):
    numerator = frame[positive] - frame[negative]
    denominator = frame[positive] + frame[negative]
    return numerator / denominator


sel_frame["NDVI"] = normalized_difference(sel_frame, "NIRnarrow", "Red")
sel_frame["NDMI"] = normalized_difference(sel_frame, "NIRnarrow", "SWIR1")
sel_frame["DOY"] = sel_frame["time"].dt.dayofyear

As a last step we split the table records in two other tables: one for the intact intact pixel, and another for the deforested one.

In [None]:
deforested_frame = sel_frame[~sel_frame["intact"]]
intact_frame = sel_frame[sel_frame["intact"]]

### A cloudy pixel over time

The widget under this lines shows the intact forest behavior over `time` (`Green` band). The color groups the observations `by` `flag`, indicating wether the observation was considered `clear` (yellow), `cloud/shadow` (black) or close to a cloud/shadow (`adjacent`, orange). If we take a close look on the clear observations (green), we can see that the cycle we mentioned is still there, but is dwarfed by the impact of clouds and shadows. This plot is also meant to underline why most of the time it may be reasonable to mask pixels within certain distance to a cloud. These could be fine (they were not detected as cloud after all), but it also can happen that even though that the value is not extreme, it may be abnormally bright/dark, reason why these pixels may need to be masked as well.

In [None]:
intact_frame.hvplot.scatter(
    x="time", y="Green", by="flag", color=["green", "black", "orange"]
)

### Filter out flagged observations

In [None]:
intact_masked = intact_frame[~intact_frame[tainted_masks].any(axis=1)].copy()
deforested_masked = deforested_frame[
    ~deforested_frame[tainted_masks].any(axis=1)
].copy()

### Missed clouds as spikes

If we plot the band values for the “clear” observations over our forest pixel now we have a much clearer picture of the temporal behavior. However, that does not necessarily mean that all cloudy or shaded observations have been removed. For example, on the 13th of July 2018 we can see that there was a massive downwards spike that is very likely to be a shaded observation.

In [None]:
tetracolor_kwargs = {
    "x": "time",
    "y": ["Blue", "Green", "Red", "NIRnarrow"],
    "color": ["blue", "green", "red", "darkgray"],
}

intact_masked.hvplot(**tetracolor_kwargs)  # .legend(loc="upper left", ncols=4);

A possible solution ([Timesat manual, pages 14-15](https://web.nateko.lu.se/timesat/docs/TIMESAT33_SoftwareManual.pdf)) to de-spike our time series is to calculate the absolute difference between every observations (`center`, time t), and the mean of its immediate neighborhood (`prior`, t-1 and `posterior` t+1, shifted using the `.iloc` method). In our case our function sums up all the reflectances because this workflow is based on a single value, and then discards the observations with the largest vertical distance respective to their temporal neighbors (5% lowest and the 5% highest).

In [None]:
def despike(frame, columns, min_spike, max_spike):
    summed = frame[columns].sum(axis=1)

    # Perform the selections
    central = summed.iloc[1:-1]
    prior = summed.shift(-1).iloc[1:-1]
    posterior = summed.shift(1).iloc[1:-1]

    # remove observations based on their saliency respective to their neighbors
    spikyness = central - (prior + posterior) / 2
    floor, ceiling = spikyness.quantile((min_spike, max_spike))
    selected = central[spikyness.between(floor, ceiling)]

    return frame.loc[selected.index]


cutoff = 0.05
band_names = ["Blue", "Green", "Red", "NIRnarrow", "SWIR1", "SWIR2"]

intact_despiked = despike(intact_masked, band_names, cutoff, 1 - cutoff)
deforested_despiked = despike(deforested_masked, band_names, cutoff, 1 - cutoff)

If we plot the original time series (black) and the de-spiked one (gray) we can see that large spikes have been removed. Note the thresholds should be chosen with great care, as some clear points could be removed as well, which may cause us to miss the peak of the season.

In [None]:
(
    intact_masked.hvplot(x="time", y="NIRnarrow", color="k")
    * intact_despiked.hvplot(x="time", y="NIRnarrow", color="darkgray")
)

### Another view of the temporal behavior

So far we have been plotting using the `time` as `x`, and the reflectances as `y`. However, another possibility is to assign `x` and `y` to specific bands and use the point color to represent the position within the yearly cycle (`DOY`, day of the year; 1-365). On the plot under these lines we can see that at the start of the year the `NIR` value is high (leaf structure reflectance), and `red` is low (chlorophyll absorption). Over the year the NIR decreases and red increases, which could indicate that leaves have been shed if the trees are deciduous (loss of leaf structure and chlorophyll absorption). Past day 250 NIR increase and red decreases, a trend that continues until looping back to the initial position. This plot also is interesting to show that outliers (crossed out observations) may be very far away from the rest of the distribution, but some of them just seemed abnormal based on their specific neighborhood.

In [None]:
spike_frame = intact_masked.iloc[1:-1].drop(index=intact_despiked.index)

In [None]:
# DOY = day of the year
(
    intact_masked.hvplot.scatter(x="Red", y="NIRnarrow", c="DOY", colormap="twilight")
    * spike_frame.hvplot.scatter(x="Red", y="NIRnarrow", marker="x", s=90, color="red")
)

### How deforestation may look like band by band 

Once we have double-checked clouds have been removed, we can compare the time series of a plot where trees have not been disturbed, and another where the trees have been removed. In the case of “intact” (difficult to know if that’s actually the case) we see this nice periodic cycle that we have been describing so far.

In [None]:
intact_despiked.hvplot(**tetracolor_kwargs)

In the deforested plot we see a similar pattern until April 2022, where the behavior starts to change. The NIR reflectance decreases sooner than expected and keeps dropping until reaching the minimum of the time series, possibly due the removal of the tree canopies (mostly the leaves they hold). Past the minimum, blue, green and red seem to oscillate around higher values, which would be explained by a higher exposure of the soil. Past the downward spike the NIR reflectance seems to have a new cycle with two peaks per year instead of a single one. 

In [None]:
deforested_despiked.hvplot(**tetracolor_kwargs)

### On the indices

Several spectral bands can be combined to highlight the presence of specific targets behaviors.  There is a [massive variety of spectral indices](https://doi.org/10.1038/s41597-023-02096-0) specialized to highlight a wide array of phenomena: [water presence, bare soil, burned area, mineral identification, etc.](https://appliedsciences.nasa.gov/sites/default/files/2025-02/Spectral_Indices_QGIS.pdf). The normalized difference vegetation index is one of the earlier and most famous examples. It uses the reflectances of the NIR and red to describe plant health.

$$
NDVI = \frac{NIR - red}{NIR + red}
$$

In [None]:
compare_frame = (
    pd.concat({"deforested": deforested_despiked, "intact": intact_despiked}, axis=0)
    .reset_index(names=["history", "index"])
    .drop(columns="index")
)

In the context of our deforestation example we can see the drop in vegetation health, and, even if it raises again, it remains at lower values than in the case of the intact plot. The shape of the cycle also is different with two sharp spikes instead of the plateau of intact forest, which could be indicative of crop presence with two yearly cropping cycles.

One important takeaway from these plot is that deforestation may alter the pixel spectro-temporal signature, but that does not necessarily mean that plant health or phenology will not show up again. And when it shows up again is important to understand that that does not mean that forest has recovered.

In [None]:
compare_frame.hvplot.line(
    x="time", y="NDVI", by="history", color=["black", "limegreen"]
)