![HydroSAR Banner](./NotebookAddOns/HydroSARbanner.jpg)

# Exploring SAR Time Series Data for Flood Monitoring

**Franz J Meyer; University of Alaska Fairbanks**

<img style="padding:7px;" src="../Master/NotebookAddons/UAFLogo_A_647.png" width="170" align="right" /></font>

This notebook introduces you to the time series signatures associated with flooding. The data analysis is done in the framework of *Jupyter Notebooks*. The Jupyter Notebook environment is easy to launch in any web browser for interactive data exploration with provided or new training data. Notebooks are comprised of text written in a combination of executable python code and markdown formatting including latex style mathematical equations. Another advantage of Jupyter Notebooks is that they can easily be expanded, changed, and shared with new data sets or newly available time series steps. Therefore, they provide an excellent basis for collaborative and repeatable data analysis.

**This notebook covers the following data analysis concepts:**

- How to load time series stacks into Jupyter Notebooks and how to explore image content using basic functions such as mean value calculation and histogram analysis.
- How to extract time series information for individual pixels of an image.
- Typical time series signatures over forests and deforestation sites.

**Important Notes about JupyterHub**

Your JupyterHub server will automatically shutdown when left idle for more than 1 hour. Your notebooks will not be lost but you will have to restart their kernels and re-run them from the beginning. You will not be able to seamlessly continue running a partially run notebook.

In [None]:
import url_widget as url_w
notebookUrl = url_w.URLWidget()
display(notebookUrl)

In [None]:
from IPython.display import Markdown
from IPython.display import display

notebookUrl = notebookUrl.value
user = !echo $JUPYTERHUB_USER
env = !echo $CONDA_PREFIX
if env[0] == '':
    env[0] = 'Python 3 (base)'
if env[0] != '/home/jovyan/.local/envs/rtc_analysis':
    display(Markdown(f'<text style=color:red><strong>WARNING:</strong></text>'))
    display(Markdown(f'<text style=color:red>This notebook should be run using the "rtc_analysis" conda environment.</text>'))
    display(Markdown(f'<text style=color:red>It is currently using the "{env[0].split("/")[-1]}" environment.</text>'))
    display(Markdown(f'<text style=color:red>Select the "rtc_analysis" from the "Change Kernel" submenu of the "Kernel" menu.</text>'))
    display(Markdown(f'<text style=color:red>If the "rtc_analysis" environment is not present, use <a href="{notebookUrl.split("/user")[0]}/user/{user[0]}/notebooks/conda_environments/Create_OSL_Conda_Environments.ipynb"> Create_OSL_Conda_Environments.ipynb </a> to create it.</text>'))
    display(Markdown(f'<text style=color:red>Note that you must restart your server after creating a new environment before it is usable by notebooks.</text>'))

## Importing Relevant Python Packages

In this notebook we will use the following scientific libraries:

- [GDAL](https://www.gdal.org/) is a software library for reading and writing raster and vector geospatial data formats. It includes a collection of programs tailored for geospatial data processing. Most modern GIS systems (such as ArcGIS or QGIS) use GDAL in the background.
- [NumPy](http://www.numpy.org/) is one of the principal packages for scientific applications of Python. It is intended for processing large multidimensional arrays and matrices, and an extensive collection of high-level mathematical functions and implemented methods makes it possible to perform various operations with these objects.
- [Matplotlib](https://matplotlib.org/index.html) is a low-level library for creating two-dimensional diagrams and graphs. With its help, you can build diverse charts, from histograms and scatterplots to non-Cartesian coordinates graphs. Moreover, many popular plotting libraries are designed to work in conjunction with matplotlib.

In [None]:
%%capture
from pathlib import Path
from math import ceil

from osgeo import gdal # for GetRasterBand, Open, ReadAsArray
gdal.UseExceptions()
import numpy as np #for log10, mean, percentile, power
import pyproj # for Proj, transform

%matplotlib widget
import matplotlib.pyplot as plt # for add_subplot, axis, figure, imshow, legend, plot, set_axis_off, set_data,
                                # set_title, set_xlabel, set_ylabel, set_ylim, subplots, title, twinx
import matplotlib.patches as patches  # for Rectangle
import matplotlib.animation as an # for FuncAnimation

from matplotlib import rc 
from IPython.display import HTML
plt.rcParams.update({'font.size': 12})

import opensarlab_lib as asfn
asfn.jupytertheme_matplotlib_format()

## Load Data Stack

<img src="https://cdnuploads.aa.com.tr/uploads/Contents/2019/07/23/thumbs_b_c_c3d0986dd192f88d3a3289793ed4d3e6.jpg" width="450" style="padding:5px;" align="right" /> 

This notebook allows for the analysis of two recent flooding events: 
    
1. **2020 South Asia Monsoon Floods:** If run without changes, the notebook will be using a Sentinel-1 data stack (VV only) acquired throughout a 2020 flooding event affecting Eastern India, Nepal, and Bangladesh. This data set studies a subset of a Sentinel-1 SAR time series acquired near the city of Malda, West Bengal, India. The time series covers June to August of 2020 and combines ascending and descending RTC imagery into a joint and consistent time series to monitoring this rapidly developing event. 
    
1. **2016/2017 Flooding in Ecuador:** Alternatively, interested individuals can change the event flag in the code cell below to load a SAR data stack for an area north of Guayaquil, Ecuador into the notebook. Ecuador and other countries in western South America experienced widespread **flooding** during the 2016-2017 winter. [Guayaquil in Guayas](https://www.eluniverso.com/noticias/2017/03/01/nota/6068059/arboles-se-caen-medio-torrencial-lluvia-ayer) was among the affected regions, as precipitation in March 2017 was well above average. The increased precipitation was associated with a Coastal Niño event. The data provided allows to study the extent and progression of the flooding in the year 2016-2017 just north of Guayaquil. To analyze this event, please change the `flevent` from `flevent = 1` to `flevent = 2` in the code cell below.

If you want, **you can change the data set to be analyzed</b> in the code cell below by changing the `flevent` flag from `flevent = 1` (2020 South Asia Monsoon Floods) to `flevent = 2` (2016/17 Ecuador Flooding).

In [None]:
# Pick Dataset to Analyze
flevent = 1      # Options: 1 - 2020 Bangladesh & Eastern India Event   |    2 - Guayaquil flood of 2016-2017

Before we get started, let's first **create a working directory for this analysis and change into it:**

In [None]:
name = "Bangladesh" if flevent == 1 else "flood"
path = Path(f"/home/jovyan/notebooks/SAR_Training/English/HydroSAR/{name}")

if not path.exists():
    path.mkdir()

We will **retrieve the relevant data** from an [Amazon Web Service (AWS)](https://aws.amazon.com/) cloud storage bucket, **unzip the files** (overwriting previous extractions), and **clean up after ourselves:**

In [None]:
time_series_path = f"s3://asf-jupyter-data-west/{name}.tar.gz"
time_series = Path(time_series_path).name
!aws --region=us-west-2 --no-sign-request s3 cp $time_series_path $time_series
!tar -xvzf {name}.tar.gz -C {path}

if Path(f'{name}.tar.gz').exists():
    Path(f'{name}.tar.gz').unlink()

## Define Data Directory and Path to VRT

The following code cells **create a variable containing the VRT filename and the image acquisition dates.** To do that, we first **define some functions** we will use in the following code cells.

In [None]:
def get_dates(path):
    pths = list(path.parent.rglob(path.name))
    pths.sort()
    dates = []
    for pth in pths:
        date = pth.name.split('T')[1].split('_')[1]
        date = f"{date[:4]}-{date[4:6]}-{date[6:]}"
        dates.append(date)
    return dates

def get_dates_sub(path):
    pths = list(path.parent.rglob(path.name))
    pths.sort()
    dates = []
    for pth in pths:
        date = pth.name.split('_')[0]
        date = f"{date[:4]}-{date[4:6]}-{date[6:]}"
        dates.append(date)
    return dates

Now we **visualize the image acquision dates** to give you an idea of the time span that is covered by our data and of the temporal sampling the SAR sensor achieved.

In [None]:
tiff_paths = path/f"tiffsflood/*.tif*"

dates = get_dates_sub(tiff_paths) if flevent == 1 else get_dates(tiff_paths)
print(dates)

Finally, we create a **Virtual Raster Table** or VRT, which will hold the paths to all images in our stack. VRTs are exceptionally useful to handle and manage deep stacks of image data.

In [None]:
polarization = 'VV'
if flevent == 1:
    vrtcommand = f"gdalbuildvrt -separate {path}/stack{name}_{polarization}.vrt {tiff_paths}"
    !{vrtcommand}
image_file = path/f"stack{name}_{polarization}.vrt"

---

## Data Exploration with an Animation

To create an animation of all images in the time series stack, we first have to **read all image data** into memory:

In [None]:
img = gdal.Open(str(image_file))
band = img.GetRasterBand(1)
raster0 = band.ReadAsArray()
band_number = 0 # Needed for updates
rasterstack = img.ReadAsArray()

For visualization, we often **transform the data into a decibel (dB) space.** This is useful due to the enormous brightness difference between the darkest and brightest pixels in a SAR image. This high dynamic range is often difficult to visualize in a linear amplitude or power scale. A dB transformation compresses the dynamic range and improves the appearance of the images for visualization purposes.

In [None]:
use_dB = True

def convert(raster, use_dB=use_dB):
    # some Python trickery: 
    # if you call the convert function later, you can set the keyword 
    # argument use_dB to True or False
    # if you do not provide a keyword argument, the value that you set
    # above (when defining the function) is used
    if use_dB:
        return 10 * np.log10(raster)
    else:
        return raster

Let's create an **animation** to get an idea of where and when flooding might have occurred.

In [None]:
%%capture 
figani = plt.figure(figsize=(12, 7))
axani = figani.subplots()
axani.axis('off')

rasterstack_ = convert(rasterstack)

mask = np.isnan(rasterstack_)
rasterstack_[mask] = np.interp(np.flatnonzero(mask), np.flatnonzero(~mask), rasterstack_[~mask])

imani = axani.imshow(rasterstack_[0,...], cmap='gray', vmin=np.nanpercentile(rasterstack_, 1), 
               vmax=np.nanpercentile(rasterstack_, 99))
axani.set_title(f"{dates}")

def animate(i):
    axani.set_title(dates[i])
    imani.set_data(rasterstack_[i,...])

# Interval is given in milliseconds
ani = an.FuncAnimation(figani, animate, frames=rasterstack_.shape[0], interval=300)
rc('animation', embed_limit=40971520.0)  # We need to increase the limit maybe to show the entire animation

**Render**

In [None]:
HTML(ani.to_jshtml())

<div class="alert alert-success">
<font face="Calibri" size="5"> <b> <font color='rgba(200,0,0,0.2)'> <u>EXERCISE</u>:  </font> Backscatter dynamics</b> </font>

<font face="Calibri" size="3"> What is the most striking change in February 2017? Can you explain why the backscatter changes the way it does during that period?
</font>
</div>

## Create Minimum Image to Identify Inundated Areas

As flooding is often associated with very low backscater, we first compute the minimum backscatter for each pixel to get a first impression of areas that could have been flooded during the entire period.

The following line **calculates the minimum backscatter per pixel** across the time series:

In [None]:
np.seterr(divide='ignore')
np.seterr(invalid='ignore')
rasterstack_masked = np.ma.masked_where(rasterstack==0, rasterstack)
temporal_min = np.nanmin(convert(rasterstack_masked), axis=0)

Now we **write a class to create an interactive plot** from which we can select interesting image locations for time series analysis.

In [None]:
class pixelPicker:
    def __init__(self, image, width, height):
        self.x = None
        self.y = None
        self.fig = plt.figure(figsize=(width, height))
        self.ax = self.fig.add_subplot(111, visible=False)
        self.rect = patches.Rectangle(
            (0.0, 0.0), width, height, 
            fill=False, clip_on=False, visible=False)
        self.rect_patch = self.ax.add_patch(self.rect)
        self.cid = self.rect_patch.figure.canvas.mpl_connect('button_press_event', 
                                                             self)
        self.image = image
        self.plot = self.gray_plot(self.image, fig=self.fig, return_ax=True)
        self.plot.set_title('Select a Point of Interest')
        
        
    def gray_plot(self, image, vmin=None, vmax=None, fig=None, return_ax=False):
        '''
        Plots an image in grayscale.
        Parameters:
        - image: 2D array of raster values
        - vmin: Minimum value for colormap
        - vmax: Maximum value for colormap
        - return_ax: Option to return plot axis
        '''
        if vmin is None:
            vmin = np.nanpercentile(self.image, 1)
        if vmax is None:
            vmax = np.nanpercentile(self.image, 99)
        ax = fig.add_axes([0.1,0.1,0.8,0.8])
        ax.imshow(image, cmap=plt.cm.gist_gray, vmin=vmin, vmax=vmax)
        if return_ax:
            return(ax)
        
    
    def __call__(self, event):
        print('click', event)
        self.x = event.xdata
        self.y = event.ydata
        for pnt in self.plot.get_lines():
            pnt.remove()
        plt.plot(self.x, self.y, 'ro')

Now we are ready to plot the minimum image. 

**Click a point interest for which you want to analyze radar brightness over time.**

In [None]:
fig_xsize = 7.5
fig_ysize = 7.5
my_plot = pixelPicker(temporal_min.data, fig_xsize, fig_ysize)

**Save the selected coordinates:**

In [None]:
sarloc = (ceil(my_plot.x), ceil(my_plot.y))
print(sarloc)

## Plot SAR Brightness Time Series at Point Locations

We will pick a pixel location identified in the SAR image above and plot the time series for this identified point. By focusing on image locations undergoing deforestation, we should see the changes in the radar cross section related to the deforestation event.

First, for processing of the imagery in this notebook we generate a list of image handles and retrieve projection and georeferencing information. We also define a function for mapping image pixels to a geographic projection

In [None]:
img_handle = gdal.Open(str(image_file))
geotrans = img_handle.GetGeoTransform()
proj = img_handle.GetProjection()
xsize = img_handle.RasterXSize
ysize = img_handle.RasterYSize
bands = img_handle.RasterCount
projlatlon = pyproj.Proj('EPSG:4326') # WGS84
projstring = proj.split('[')[-1][:-2].split(',')[-1][1:-1]
projimg = pyproj.Proj(f'EPSG:{projstring}')

def geolocation(x, y=None, latlon=True):
    if len(x) == 2:
        y = x[1]
        x = x[0]
    ref_x=geotrans[0]+sarloc[0]*geotrans[1]
    ref_y=geotrans[3]+sarloc[1]*geotrans[5]
    if latlon:
        proj = pyproj.Transformer.from_crs(int(projstring), 4326, always_xy=True)
        ref_y, ref_x = proj.transform(ref_x, ref_y)
        #ref_y, ref_x = pyproj.transform(projimg, projlatlon, ref_x, ref_y)
    return (ref_x, ref_y)

Now, let's **pick a rectangle around a center pixel defined in variable *sarloc...***

In [None]:
extent = (5, 5) # choose a 5 by 5 rectangle
latlon = True # if False: return utm coordinates

refsarloc = geolocation(sarloc, latlon=latlon)
projsymbol = '°' if latlon else 'm'

... and **extract the time series** for this small area around the selected center pixel in a memory-efficient way (needed for larger stacks):

In [None]:
plt.rcParams.update({'font.size': 9})
bs_aggregated = []
for band in range(bands):
    rs = img_handle.GetRasterBand(band+1).ReadAsArray(sarloc[0], sarloc[1], extent[0], extent[1])
    rs_mean = convert(np.nanmean(rs))
    bs_aggregated.append(rs_mean)

fig, ax = plt.subplots(1, 1, figsize=(9, 5))
labeldB = 'dB' if use_dB else 'linear'
ax.plot(dates, bs_aggregated, color='k', marker='o', markersize=3)
plt.ylim((-28, 6))
ax.set_xlabel('Date')
plt.xticks(rotation=90)
plt.gcf().subplots_adjust(bottom=0.25)
ax.set_ylabel(f'Sentinel-1 $\gamma^0$ [{labeldB}]')

plt.grid()
_ = fig.suptitle(f'Location: {refsarloc[0]:.3f}{projsymbol} {refsarloc[1]:.3f}{projsymbol}')
# fig.tight_layout() 
figname = "RCSTimeSeries-" + f'{refsarloc[0]:.3f}{projsymbol} {refsarloc[1]:.3f}{projsymbol}' + '.png'
plt.savefig(path/figname, dpi=300, transparent='true')

<div class="alert alert-success">
<font face="Calibri" size="5"> <b> <font color='rgba(200,0,0,0.2)'> <u>EXERCISE</u>:  </font> Explore Time Series at Different Point Locations </b> </font>

<font face="Calibri" size="3"> Can you interpret and attribute the changes at various locations? Apart from the flooding, what other patterns do you observe?
</font>
</div>

---
**Version Log**

*Lab1-ExploreSARTimeSeries.ipynb - Version 1.7.1 - November 2021*

*Version Changes*

- *`os` modules and obsolete `asfn` methods replaced with `pathlib` counterparts*
- *Removed several redundancies in regards to flevent*
- *html -> markdown*
- *url_widget*
- *asf_notebook -> opensarlab_lib*