# Monitoring loss of tropical forest cover from Sentinel-1 time-series: A CuSum-based approach

* **Products used:** 
[s1_rtc](https://explorer.digitalearth.africa/products/s1_rtc)

## Background

The Cumulative Sum (CuSum) algorithm is a change point detection method based on time-series analysis. The **Cumulative Sum** change detection algorithm analyses the temporal stability of the signal through the deviation of a variable to its mean.

The CuSum method allows the detection of any type of variation (slow, abrupt) as long as it has an impact on the trend of the time-series. This method has been found to be less affected by the seasonal variability of vegetation and thus more performant to detect abrupt changes in the vegetation structure due to forest cut  [(Ruiz-Ramos et al., 2020)](https://doi.org/10.3390/RS12183061).


## Description
A _compulsory_ description of the notebook, including a brief overview of how Digital Earth Africa helps to address the problem set out above.
It can be good to include a run-down of the tools/methods that will be demonstrated in the notebook:

1. First we do this
2. Then we do this
3. Finally we do this

***

## Getting started

To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell. 

### Load packages

Import Python packages that are used for the analysis.

In [1]:
#!python -m pip install  mahotas

In [2]:
%matplotlib inline

import datacube
import numpy as np
import pandas as pd
import xarray as xr
#import mahotas
import matplotlib.pyplot as plt
from deafrica_tools.datahandling import load_ard
from deafrica_tools.plotting import display_map

### Connect to the datacube

Connect to the datacube so we can access DE Africa data.
The `app` parameter is a unique name for the analysis which is based on the notebook file name.

In [3]:
dc = datacube.Datacube(app='Forest Monitoring')

### Analysis parameters

An *optional* section to inform the user of any parameters they'll need to configure to run the notebook:

* `param_name_1`: Simple description (e.g. `example_value`). Advice about appropriate values to choose for this parameter.
* `param_name_2`: Simple description (e.g. `example_value`). Advice about appropriate values to choose for this parameter.


In [4]:
# Default study area is the Industrie Foresti`ere du Congo (IFCO) COD 018/11 forest concession (Alibuku)
 
# Define the central lattitude and longitude of the study area. 
central_lat = 0.92045
central_lon = 25.43895

# Define the number of degrees to load around the central latitude and longitude. 
lat_buffer = 0.01715
lon_buffer = 0.02674999999999983

# Define the study area.
lat_range = (central_lat - lat_buffer, central_lat + lat_buffer)
lon_range = (central_lon - lon_buffer, central_lon + lon_buffer)

measurements = ["vh", "vv"]
orbit_direction = "descending"
time_range = ("01-01-2018", "01-01-2020")
output_crs = "EPSG:6933"
resolution = (-10, 10)
dask_chunks = dict(x=1000, y=1000)

In [5]:
# View the study area
display_map(x=lon_range, y=lat_range)

In [6]:
# Create a datacube query using the analysis parameters.
query = {
    "measurements": measurements,
    'sat_orbit_state': orbit_direction,
    "y": lat_range,
    "x": lon_range,
    "time": time_range,
    "output_crs": output_crs,
    "resolution": resolution,
    #"dask_chunks": dask_chunks, 
}

In [7]:
ds_s1 = load_ard(dc=dc, 
              products=["s1_rtc"], 
              group_by="solar_day", 
              **query
)

ds_s1

Using pixel quality parameters for Sentinel 1
Finding datasets
    s1_rtc
Applying pixel quality/cloud mask
Loading 60 time steps


In [8]:
#skipped steps : 
# 1.Convert the Digital Number (DN) values to Decibel values (dB)
# 2.Speckle filtering using the bilateral filter made available in the 
# python PyRAT Toolbox (Reigber et al., 2019) with a kernel window size of 7 × 7. 
# 3. Haralick textures and image selection

## Change detection algorithm

### CuSum algorithm

In [57]:
test = ds_s1.vv.isel(time=0)

In [62]:
test.dims

('y', 'x')

In [137]:
def cumsum_algorithm(time_series):
    """
    Takes a numpy array time series and applies the Cumulative
    Sum algorithm described in B. Ygorra et al. 2021.
    
    Last Modified: July 2022
    
    Parameters
    ----------
    time_series : numpy array 
                  A 3 dimensional numpy array.
          
         
    Returns
    -------
    amplitude : numpy array
                A 2 dimensional numpy array.
        
    """
    # Get the mean of the time series over each pixel.
    time_series_mean = np.mean(time_series, axis=0)
    
    # Get the time series residuals.
    residuals = time_series - time_series_mean
    
    # Cumulative sum of the residuals. 
    cumsum_residuals = np.cumsum(residuals, axis=0)
    
    # Determine the maximum and minimum value of the cumulative sum of the residuals.
    max_cumsum_residuals = np.max(cumsum_residuals, axis=0)
    min_cumsum_residuals = np.min(cumsum_residuals, axis=0)
    
    # Compute the amplitude of the time series.
    amplitude = max_cumsum_residuals - min_cumsum_residuals
    
    return amplitude


def change_detection_cumsum(ds, band):
    """
    Takes an xarray.Dataset time series and applies the Cumulative
    Sum algorithm and bootstrapping analysis described in B. Ygorra et al. 2021.
    
    Last Modified: July 2022
    
    Parameters
    ----------
    ds : xarray Dataset
            A multi-dimensional array.
         
    band : str
            Spectral band on which to apply the cumsum algorithm 
         
    Returns
    -------
    amplitude : numpy array
        
    """
    # Check if the xarray.DataArray is a valid time series.
    dimensions = ds[band].sizes
    
    if "time" not in dimensions or dimensions["time"] == 1:
        raise Exception('Please pass a valid time series to the "ds" parameter.')
    
    # Convert the xarray.DataArray to a numpy array.
    time_series = ds[band].data
    
    # Get the number of images in the time series.
    number_of_images = dimensions["time"]
  
    # Compute the amplitude of the time series.
    original_amplitude = cumsum_algorithm(time_series)
    
    # Get the number of boostraps.
    if math.factorial(number_of_images) < 1500:
        n_bootstraps = math.factorial(number_of_images)
    else:
        n_bootstraps = 1500 

    return amplitude

In [203]:
band = "vh"
ds = ds_s1

In [204]:
# Check if the xarray.DataArray is a valid time series.
dimensions = ds[band].sizes

if "time" not in dimensions or dimensions["time"] == 1:
    raise Exception('Please pass a valid time series to the "ds" parameter.')

# Convert the xarray.DataArray to a numpy array.
time_series = ds[band].data

# Get the number of images in the time series.
number_of_images = dimensions["time"]

# Compute the original amplitude of the time series.
original_amplitude = cumsum_algorithm(time_series)

# The bootstrap consists in conducting CuSum on a randomly modified backscatter timeseries n_bootstraps times.
# Get the number of boostraps.
if math.factorial(number_of_images) < 1500:
    n_bootstraps = math.factorial(number_of_images)
else:
    n_bootstraps = 1500 

# Empty list to store the amplitude differences.    
amplitude_positive_differences_list = [] 
amplitude_negative_differences_list = []  

for n in range(n_bootstraps):
    # Randomly organize the the original backscatter time-series thus modifying the temporal order. 
    # np.random.shuffle shuffles the numpy array along the first axis (our time axis) of a multi-dimensional array.
    np.random.shuffle(time_series)
    # Apply the CuSum method to the newly reorganized time series.  
    amplitude = cumsum_algorithm(time_series)
    # Compute the difference in amplitude between the original time series and the reorganized time series.
    amplitude_difference = original_amplitude - amplitude
    
    # If amplitude_difference > 0 then original_amplitude > amplitude. 
    # This means that original_amplitude is affected by the temporal dimension.
    if amplitude_difference > 0:
        amplitude_positive_differences_list.append(amplitude_difference)
    
    # If amplitude_difference < 0 then original_amplitude < amplitude. 
    # This means that original_amplitude is not affected by the temporal dimension.
    else:
        amplitude_negative_differences_list.append(amplitude_difference)
        

# The number of times amplitude_difference > 0 is estimated and referred to as the index n_gj.
# It is an indirect measure of the sequence effect in the backscatter time-series and a 
# sensitivity parameter that intervenes in the computation of the Confidence Level.
n_gj = len(amplitude_positive_differences_list)

# The Confidence Level represents the ratio of bootstraps in which the original backscatter
# time-series presents the original_ampltitude > amplitude in comparison to the total number of bootstraps. 
# A critical threshold value (Tc) can be set as a Confidence Level over which the change point is considered as 
# valid by the bootstrap analysis.
cl = n_gj/n_bootstraps


ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [206]:
amplitude_difference.shape

(438, 517)

In [202]:
for n in range(n_bootstraps): 
    print(n)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
27

In [188]:
a = np.asarray([[[1,2,3],
                 [4,5,6]],
               [[7,8,9],
                 [10,11,12]],
               [[13,14,15],
                 [16,17,18]]])

a

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]],

       [[13, 14, 15],
        [16, 17, 18]]])

In [189]:
a.shape

(3, 2, 3)

In [190]:
np.random.shuffle(a)

In [191]:
a

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[13, 14, 15],
        [16, 17, 18]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [184]:
arr = np.arange(9).reshape((3, 3))
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [185]:
arr.shape

(3, 3)

In [187]:
np.random.shuffle(arr)
arr

array([[6, 7, 8],
       [0, 1, 2],
       [3, 4, 5]])

In [None]:
# A bootstrap analysis based on the amplitude is conducted to validate or invalidate the change.
# There is no global threshold over the magnitude of the change. 
# A threshold is computed individually over each pixel, as the mean value of the time series may change from one pixel to another
# The bootstrap analysis is a mean to check the validity of the change detected through 
# an indirect measure of the impact of the order sequence on the times series. 

# The bootstrap consists in conducting CuSum on a randomly modified backscatter timeseries. 
# n_bootstraps times and check if the generated amplitude of the residuals is greater than 
# the original amplitude of the residuals.



In [98]:
number_of_images = ds_s1.vv.data.shape[0]
number_of_images

60

In [124]:
dimensions = ds_s1["vh"].sizes

In [128]:
dimensions["time"]

60

In [123]:
if "time" not in dimensions or dimensions["time"] == 1:
    raise Exception('Please pass a valid time series to the "ds" parameter.')

In [121]:
ds_s1["vh"].isel(time=1)

In [112]:
test = ds_s1.vh.isel(time=0)
test.sizes

Frozen({'y': 438, 'x': 517})

In [116]:
if "time" not in dimensions:
    print("not in")
else: 
    print ("is in")

is in


In [None]:
if math.factorial(n)

In [None]:
# number of boostraps.
n_bootstrp

In [84]:
import math
math.factorial(n)

NameError: name 'n' is not defined

In [16]:
ds_s1.vh.data

array([[[0.05668332, 0.06381058, 0.06381058, ..., 0.04988737,
         0.04988737, 0.04023619],
        [0.05668332, 0.06381058, 0.06381058, ..., 0.04988737,
         0.04988737, 0.04023619],
        [0.05668332, 0.06381058, 0.06381058, ..., 0.04988737,
         0.04988737, 0.04023619],
        ...,
        [0.10946708, 0.0498796 , 0.0498796 , ..., 0.13147707,
         0.13147707, 0.05171791],
        [0.05200883, 0.03418687, 0.03418687, ..., 0.08920862,
         0.08920862, 0.04327844],
        [0.05200883, 0.03418687, 0.03418687, ..., 0.08920862,
         0.08920862, 0.04327844]],

       [[0.06837511, 0.07667246, 0.07667246, ..., 0.08409165,
         0.08409165, 0.05767737],
        [0.06837511, 0.07667246, 0.07667246, ..., 0.08409165,
         0.08409165, 0.05767737],
        [0.06837511, 0.07667246, 0.07667246, ..., 0.08409165,
         0.08409165, 0.05767737],
        ...,
        [0.06909243, 0.05031131, 0.05031131, ..., 0.10908495,
         0.10908495, 0.05182545],
        [0.0

***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Compatible datacube version:** 

In [None]:
print(datacube.__version__)

**Last Tested:**

In [None]:
from datetime import datetime
datetime.today().strftime('%Y-%m-%d')