<img src="NotebookAddons/blackboard-banner.jpg" width="100%" />
<font face="Calibri">
<br>
<font size="7"> <b> GEOS 657: Microwave Remote Sensing<b> </font>

<font size="5"> <b>Lab 8: Change Detection in SAR Amplitude Time Series Data <font color='rgba(200,0,0,0.2)'> -- [20 Points] </font> </b> </font>

<br>
<font size="4"> <b> Franz J Meyer; University of Alaska Fairbanks & Josef Kellndorfer, <a href="http://earthbigdata.com/" target="_blank">Earth Big Data, LLC</a> </b> <br>
<img src="NotebookAddons/UAFLogo_A_647.png" width="170" align="right" /><font color='rgba(200,0,0,0.2)'> <b>Due Date: </b> April 23, 2019 </font>
</font>

<font size="3"> This Lab is part of the UAF course <a href="https://radar.community.uaf.edu/" target="_blank">GEOS 657: Microwave Remote Sensing</a>. It is introducing you to the methods of change detection in deep multi-temporal SAR image data stacks. Specifically, the lab applies the method of <i>Cumulative Sums</i> to perform change detection in a 60 image deep Sentinel-1 data stack over Niamey, Niger.  

As previously, the work will be done within the framework of a Jupyter Notebook. <br>

<b>In this chapter we introduce the following data analysis concepts:</b>

- The concepts of time series slicing by month, year, and date.
- The concepts and workflow of Cumulative Sum-based change point detection.
- The identification of change dates for each identified change point.
</font>

<font size="4"> <font color='rgba(200,0,0,0.2)'> <b>THIS NOTEBOOK INCLUDES FOUR HOMEWORK ASSIGNMENTS.</b></font> Complete all four assignments to achieve full score. </font> <br>
<font size="3"> To submit your homework, please download your Jupyter Notebook from the server both as PDF (*.pdf) and Notebook file (*.ipynb) and submit them as a ZIP bundle via Blackboard or email (to fjmeyer@alaska.edu). To download, please select the following options in the main menu of the notebook interface:

<ol type="1">
  <li><font color='rgba(200,0,0,0.2)'> <b> Save your notebook with all of its content</b></font> by selecting <i> File / Save and Checkpoint </i> </li>
  <li><font color='rgba(200,0,0,0.2)'> <b>To export in Notebook format</b></font>, click on <i>File / Download as / Notebook (.ipynb)</i>  <font color='gray'>--- Downloading your file may take a bit depending on its size.</font></li>
  <li>The best option to <font color='rgba(200,0,0,0.2)'> <b>export your notebook in PDF format</b></font> is to print the content of the browser window to a PDF. To do so, <i>right click</i> in your browser window and select the <i>print</i> option in the pop-up menu.</li>
</ol>

Contact me at fjmeyer@alaska.edu should you run into any problems.
</font>

</font>

<hr>
<font face="Calibri">

<font size="5"> <b> 0. Importing Relevant Python Packages </b> </font>

<font size="3"> The first step of this lab exercise on SAR image time series analysis is the import of necessary python libraries into your Jupyter Notebook. See the code cell below for information on which libraries are needed. Information on these libraries is provided in the instructions to a previous lab of this course (Lab 3). 

</font>

In [None]:
import pandas as pd
import gdal
import numpy as np
import time,os

# For plotting
%matplotlib inline
import matplotlib.pylab as plt
import matplotlib.patches as patches

font = {'family' : 'monospace',
          'weight' : 'bold',
          'size'   : 18}
plt.rc('font',**font)

<hr>
<font face="Calibri">

<font size="5"> <b> 1. Load Data Stack for this Lab </b> </font> <img src="NotebookAddons/Lab8-Agrhymet.JPG" width="400" align="right" /> 

<font size="3"> This Lab will be using a 60-image deep C-band Sentinel-1 SAR data stack over Niamey, Niger to demonstrate the concepts of time series change detection. The data are available to us through the services of the <a href="https://www.asf.alaska.edu/" target="_blank">Alaska Satellite Facility</a>. 

Specifically we will use a small image segment over the campus of <a href="http://www.agrhymet.ne/eng/" target="_blank">AGRHYMET Regional Centre</a>, a regional organization supporting West Africa in the use or remote sensing.  

This site was picked as we had information about construction going on at this site sometime in the 2015 - 2017 time frame. Land was cleared and a building was erected. In this exercise we will see if we can detect the construction activity and if we are able to determine when construction began and when it ended.

In this case, we will <b>retrieve the relevant data</b> from an <a href="https://aws.amazon.com/" target="_blank">Amazon Web Service (AWS)</a> cloud storage bucket, <b>using the following command</b>:</font> 

</font>

In [None]:
!aws s3 cp s3://asf-jupyter-data/Niamey.zip Niamey.zip

<font face="Calibri" size="3"> Now, let's <b>unzip the file and clean up after ourselves:</b> </font>

In [None]:
!unzip Niamey.zip
!rm Niamey.zip

<font face="Calibri" size="3"> The following lines set variables that capture path variables needed for data processing. <b>We define variables for the main data directory as well as the main variables containing data and image information:</b> </font>

In [None]:
datadirectory='/home/jovyan/notebooks/ASF/GEOS_657_Labs/cra/'
datefile='S32631X402380Y1491460sS1_A_vv_0001_A_mtfil.dates'
imagefile='S32631X402380Y1491460sS1_A_vv_0001_A_mtfil.vrt'

In [None]:
os.getcwd()  # Uncomment this line to display the present working directory

<hr>
<font face="Calibri" size="4"> <b> 1.1 Switch to the Data Directory: </b> 

<font size="3"> We now switch into the data directory and briefly verify that we ended up in the correct directory.</font>

</font> 

In [None]:
os.chdir(datadirectory)

In [None]:
os.getcwd()  # Uncomment this line to display the present working directory

In [None]:
# glob.glob("*.vrt")   # Uncomment this line to see a List of the files 

<hr>
<font face="Calibri" size="4"> <b> 1.2 Assess Image Acquisition Dates </b> </font> 

<font face="Calibri" size="3"> Before we start analyzing the available image data, we want to examine the content of our data stack. <b>To do so, we read the image acquisition dates for all files in the time series and create a *pandas* date index:</b> </font>

In [None]:
dates=open(datefile).readlines()
tindex=pd.DatetimeIndex(dates)
j=1
print('Bands and dates for',imagefile)
for i in tindex:
    print("{:4d} {}".format(j, i.date()),end=' ')
    j+=1
    if j%5==1: print()

<hr>
<font face="Calibri" size="4"> <b> 1.3 Read in the Data Stack </b> </font> 

<font face="Calibri" size="3"> We Read in the time series raster stack from the entire data set. </font>

In [None]:
rasterstack=gdal.Open(imagefile).ReadAsArray()

<br>
<hr>
<font face="Calibri" size="5"> <b> 2. Plot the Global Means of the Time Series </b> </font> 

<font face="Calibri" size="3"> To accomplish this task, the following processing steps are needed:
<ol>
    <li>Conversion to power</li>
    <li>Compute mean values</li>
    <li>Convert to dB</li>
    <li>Create time series of means using Pandas</li>
    <li>Plot time series of means</li>
</ol>

</font> 

In [None]:
# 1. Conversion to Power
caldB=-83
calPwr = np.power(10.,caldB/10.)
rasterstack_pwr = np.power(rasterstack,2.)*calPwr
# 2. Compute Means
rs_means_pwr = np.mean(rasterstack_pwr,axis=(1,2))
# 3. Convert to dB
rs_means_dB = 10.*np.log10(rs_means_pwr)
# 4. Make a pandas time series object
ts = pd.Series(rs_means_dB,index=tindex)

In [None]:
# 5. Use the pandas plot function of the time series object to plot
# Put band numbers as data point labels
plt.rcParams.update({'font.size': 14})
plt.figure(figsize=(16,8))
ts.plot()
xl = plt.xlabel('Date')
yl = plt.ylabel('$\overline{\gamma^o}$ [dB]')
for xyb in zip(ts.index,rs_means_dB,range(1,len(ts)+1)):
    plt.annotate(xyb[2],xy=xyb[0:2])
plt.grid()

<br>
<hr>
<div class="alert alert-success">
<font face="Calibri" size="5"> <b> <font color='rgba(200,0,0,0.2)'> <u>ASSIGNMENT #1</u>:  </font> Analyze Global Means Time Series Plot</b> <font color='rgba(200,0,0,0.2)'> -- [3 Points] </font> </font>

<font face="Calibri" size="3"> Look at the global means time series plot above and determine from the <i>tindex</i> array at which dates you see  maximum and minimum values. Are relative peaks associated with seasons?
<br><br>
PROVIDE ANSWER HERE:

</font>
</div>

<br>
<hr>
<font face="Calibri" size="5"> <b> 3. Generate Time Series for Point Locations or Subsets</b> </font> 

<font face="Calibri" size="3"> In python we can use the matrix slicing rules (Like Matlab) to obtain subsets of the data. For example to pick one pixel at a line/pixel location and obtain all band values, use:

>  [:,line,pixel] notation. 

Or, if we are interested in a subset at a offset location we can use:

> [:,yoffset:(yoffset+yrange),xoffset:(xoffset+xrange)]

In the section below we will learn how to generate time series plots for point locations (pixels) or areas (e.g. a 5x5 window region). To show  individual bands, we define a <i>showImage</i> function which incorporates the matrix slicing from above.

</font> 

<hr>
<font face="Calibri" size="4"> <b> 3.1 Plotting Time Series for Subset </b> </font> 

<font face="Calibri" size="3"> The following function allows to plot the calibrated time series for a pre-defined subset. </font>

In [None]:
def showImage(rasterstack,tindex,bandnbr,subset=None,vmin=None,vmax=None):
    '''Input: 
    rasterstack stack of images in SAR power units
    tindex time series date index
    bandnbr bandnumber of the rasterstack to dissplay'''
    fig = plt.figure(figsize=(16,8))
    ax1 = fig.add_subplot(121)
    ax2 = fig.add_subplot(122)
    
    # If vmin or vmax are None we use percentiles as limits:
    if vmin==None: vmin=np.percentile(rasterstack[bandnbr-1].flatten(),5)
    if vmax==None: vmax=np.percentile(rasterstack[bandnbr-1].flatten(),95)

    ax1.imshow(rasterstack[bandnbr-1],cmap='gray',vmin=vmin,vmax=vmax)
    ax1.set_title('Image Band {} {}'.format(bandnbr,tindex[bandnbr-1].date()))
    if subset== None:
        bands,ydim,xdim=rasterstack.shape
        subset=(0,0,xdim,ydim)
        
    ax1.add_patch(patches.Rectangle((subset[0],subset[1]),subset[2],subset[3],fill=False,edgecolor='red'))
    ax1.xaxis.set_label_text('Pixel')
    ax1.yaxis.set_label_text('Line')
    
    ts_pwr=np.mean(rasterstack[:,subset[1]:(subset[1]+subset[3]),
                       subset[0]:(subset[0]+subset[2])],axis=(1,2))
    ts_dB=10.*np.log10(ts_pwr)
    ax2.plot(tindex,ts_dB)
    ax2.yaxis.set_label_text('$\gamma^o$ [dB]')
    ax2.set_title('$\gamma^o$ Backscatter Time Series')
    # Add a vertical line for the date where the image is displayed
    ax2.axvline(tindex[bandnbr-1],color='red')
    plt.grid()

    fig.autofmt_xdate()

<font face="Calibri" size="3"> Now we can use the function to <b>compare</b> different time steps for their information content in our area of interest: </font>

In [None]:
bandnbr=24  # 
subset=[5,20,3,3]
showImage(rasterstack_pwr,tindex,bandnbr,subset)

In [None]:
bandnbr=43
showImage(rasterstack_pwr,tindex,bandnbr,subset)

<hr>
<font face="Calibri" size="4"> <b> 3.2 Helper Function to Generate a Time Series Object </b> </font> 

<font face="Calibri" size="3"> The following function allows to create an object representing the time series for an image subset: </font>

In [None]:
def timeSeries(rasterstack_pwr,tindex,subset,ndv=0.):
    # Extract the means along the time series axes
    # raster shape is time steps, lines, pixels. 
    # With axis=1,2, we average lines and pixels for each time 
    # step (axis 0)
    raster=rasterstack_pwr.copy()
    if ndv != np.nan: raster[np.equal(raster,ndv)]=np.nan
    ts_pwr=np.nanmean(raster[:,subset[1]:(subset[1]+subset[3]),
                       subset[0]:(subset[0]+subset[2])],axis=(1,2))
    # convert the means to dB
    ts_dB=10.*np.log10(ts_pwr)
    # make the pandas time series object
    ts = pd.Series(ts_dB,index=tindex)
    # return it
    return ts

<font face="Calibri" size="3"> Using the timeSeries(...) function to make a time series object for the chosen subset: </font>

In [None]:
ts = timeSeries(rasterstack_pwr,tindex,subset)

<font face="Calibri" size="3"> Now <b>plot</b> the time series object: </font>

In [None]:
_=ts.plot(figsize=(16,4))  # _= is a trick to suppress more output.
plt.grid()

<br>
<hr>
<font face="Calibri" size="5"> <b> 4. Create Seasonal Subsets of Time Series Records</b> </font> 

<font face="Calibri" size="3"> Let's expand upon SAR time series analysis. Often it is desirable to subset time series by season or months to compare data acquired under similar weather/growth/vegetation cover conditions. For example, in analyzing C-Band backscatter data, it might be useful to limit comparative analysis to dry season observations only as soil moisture might confuse signals during the wet seasons. To subset time series along the time axis we will make use of the following <i>Pandas</i> datatime index tools:
<ul>
    <li>month</li>
    <li>day of year</li> 
</ul>
First we extract a hectare-sized area around our subset location (5,20,5,5). We then convert the time series to a pandas DataFrame to allow for more processing options. We also label the data value column as 'g0' for $\gamma^0$:

</font> 

In [None]:
subset=(5,20,5,5)
ts = timeSeries(rasterstack_pwr,tindex,subset)
tsdf = pd.DataFrame(ts,index=ts.index,columns=['g0'])

# Plot
ylim=(-15,-5)
tsdf.plot(figsize=(16,4))
plt.title('Sentinel-1 C-VV Time Series Backscatter Profile, Subset: 5,20,5,5  ')
plt.ylabel('$\gamma^o$ [dB]')
plt.ylim(ylim)
_=plt.legend(["C-VV Time Series"])
plt.grid()

<hr>
<font face="Calibri" size="4"> <b> 4.1 Change Start Date of Time Series to November 2015 </b> </font> 

<font face="Calibri" size="3"> We can use the pandas index parameters (e.g.,month) to <b>make seasonal subsets</b>: </font>

In [None]:
tsdf_sub1=tsdf[tsdf.index>'2015-11-01']

# Plot
tsdf_sub1.plot(figsize=(16,4))
plt.title('Sentinel-1 C-VV Time Series Backscatter Profile, Subset: {}'.format(subset))
plt.ylabel('$\gamma^o$ [dB]')
plt.ylim(ylim)
_=plt.legend(["C-VV Time Series"])
plt.grid()

<hr>
<font face="Calibri" size="4"> <b> 4.2 Subset Time Series by Months </b> </font> 

<font face="Calibri" size="3"> We can make use of the Pandas <i>DateTimeIndex</i> object **index.month** and numpy's **logical_and** function to subset a time series easily by month: </font>

In [None]:
# The following line extracts only data points between March and May from the full time series 
tsdf_sub2=tsdf_sub1[np.logical_and(tsdf_sub1.index.month>=3,tsdf_sub1.index.month<=5)]

# Plot
fig, ax = plt.subplots(figsize=(16,4))
tsdf_sub2.plot(ax=ax)
plt.title('Sentinel-1 C-VV Time Series Backscatter Profile, Subset: {}'
          .format(subset))
plt.ylabel('$\gamma^o$ [dB]')
plt.ylim(ylim)
_=plt.legend(["March-May"])
plt.grid()

<font face="Calibri" size="3"> Using numpy's **invert** function, we can invert a selection. In this example, we <b>extract all other months from the time series</b>: </font>

In [None]:
tsdf_sub3=tsdf_sub1[np.invert(np.logical_and(tsdf_sub1.index.month>=3,tsdf_sub1.index.month<=5))]

# Plot
fig, ax = plt.subplots(figsize=(16,4))
tsdf_sub3.plot(ax=ax)
plt.title('Sentinel-1 C-VV Time Series Backscatter Profile, Subset: {}'
          .format(subset))
plt.ylabel('$\gamma^o$ [dB]')
plt.ylim(ylim)
_=plt.legend(["June-February"])
plt.grid()

<hr>
<font face="Calibri" size="4"> <b> 4.3 Split Time Series by Year to Compare Year-to-Year Patterns </b> </font> 

<font face="Calibri" size="3"> Sometimes it is useful to compare year-to-year $\sigma^0$ values to identify changes in backscatter characteristics. This helps to distinguish true change from seasonal variability. </font>

In [None]:
# Split time series into different years:
ts_sub_by_year = tsdf_sub1.groupby(pd.Grouper(freq="Y"))

In [None]:
fig, ax = plt.subplots(figsize=(16,4))
for label, df in ts_sub_by_year:
    df.g0.plot(ax=ax, label=label.year)
plt.legend()
# ts_sub_by_year.plot(ax=ax)
plt.title('Sentinel-1 C-VV Time Series Backscatter Profile, Subset: {}'
          .format(subset))
plt.ylabel('$\gamma^o$ [dB]')
plt.ylim(ylim)
plt.grid()

<hr>
<font face="Calibri" size="4"> <b> 4.4 Create a Pivot Table to Group Years and Sort Data for Plotting Overlapping Time Series </b> </font> 

<font face="Calibri" size="3"> Pivot Tables are  a technique in data processing. They enable a person to arrange and rearrange (or "pivot") statistics in order to draw attention to useful information. To do so, we first add two columns to the data frame:
<ul>
    <li>Day of year (doy)</li>
    <li>year</li>
</ul>

</font>

In [None]:
# Add doy
tsdf_sub1 = tsdf_sub1.assign(doy=tsdf_sub1.index.dayofyear)
# Add year
tsdf_sub1 = tsdf_sub1.assign(year=tsdf_sub1.index.year)

<font face="Calibri" size="3"> Then a pivot table is created which has day of year as the index and years as columns: </font>

In [None]:
piv=pd.pivot_table(tsdf_sub1,index=['doy'],columns=['year'],values=['g0'])
# Set the names for the column indices
piv.columns.set_names(['g0','year'],inplace=True) 
print(piv.head(10))
print('...\n',piv.tail(10))

<font face="Calibri" size="3"> As we can see, there are NaN values on the days in a year where no acquisition took place. Now we use time weighted interpolation to fill the dates for all the observations in any given year. For **time weighted interpolation** to work we need to create a dummy year as a date index, perform the interpolation, and reset the index to the day of year. This is accomplished with the following steps: </font>

In [None]:
# Add fake dates for year 100 to enable time sensitive interpolation 
# of missing values in the pivot table
year_doy = ['2100-{}'.format(x) for x in piv.index]
y100_doy=pd.DatetimeIndex(pd.to_datetime(year_doy,format='%Y-%j'))

# make a copy of the piv table and add two columns
piv2=piv.copy()
piv2=piv2.assign(d100=y100_doy) # add the fake year dates
piv2=piv2.assign(doy=piv2.index) # add doy as a column to replace as index later again

# Set the index to the dummy year
piv2.set_index('d100',inplace=True,drop=True)

# PERFORM THE TIME WEIGHTED INTERPOLATION
piv2 = piv2.interpolate(method='time')  # TIME WEIGHTED INTERPOLATION!

# Set the index back to day of year.
piv2.set_index('doy',inplace=True,drop=True)

<font face="Calibri" size="3"> Let's inspect the new pivot table and see whether we interpolated the NaN values where it made sense: </font>

In [None]:
print(piv2.head(10))
print('...\n',piv2.tail(10))

<hr>
<font face="Calibri" size="3"> Now we can plot the time series data with overlapping years: </font>

In [None]:
piv2.plot(figsize=(16,8))
plt.title('Sentinel-1 C-VV Time Series Backscatter Profile,\
Subset: 5,20,5,5  ')
plt.ylabel('$\gamma^o$ [dB]')
plt.xlabel('Day of Year')
_=plt.ylim(ylim)
plt.grid()

<br>
<hr>
<div class="alert alert-success">
<font face="Calibri" size="5"> <b> <font color='rgba(200,0,0,0.2)'> <u>ASSIGNMENT #2</u>:  </font> Interpret the Year-to-Year Time Series Plot</b> <font color='rgba(200,0,0,0.2)'> -- [4 Points] </font> </font>

<font face="Calibri" size="3"> Answer the following questions related to the year-to-year time series plot shown above:
<Ol>
    <li>Describe the $\gamma^0$ time series for year 2016. What kind of seasonal patterns do you see? Based on the observed seasonal patterns, what type of surface cover do you think was present at this area in 2016? <font color='rgba(200,0,0,0.2)'> -- [2 Points] </font></li><br>
    <li>Describe the $\gamma^0$ time series for year 2017. What kind of seasonal patterns do you see and how do they differ from the previous year? <font color='rgba(200,0,0,0.2)'> -- [2 Points] </font></li><br>
</Ol>
<br>

PROVIDE YOUR ANSWERS BELOW:

</font>
</div>

<br>
<hr>
<font face="Calibri" size="5"> <b> 5. Time Series Change Detection</b> </font> 

<font face="Calibri" size="3"> Now we are ready to perform efficient change detection on the time series data. We will discuss two approaches:
<ol>
    <li>Year-to-year differencing of the subsetted time series</li>
    <li>Cumulative Sum-based change detection</li>
</ol>

</font> 

In [None]:
# Difference between years
# Set a dB change threshold
thres=3

In [None]:
diff1716 = (piv2.g0[2017]-piv2.g0[2016])

<hr>
<font face="Calibri" size="4"> <b> 5.1 Change Detection based on Year-to-Year Differencing </b> </font> 

<font face="Calibri" size="3"> We compute the differences between the interpolated time series and look for change using a threshold value.

</font>

In [None]:
_=diff1716.plot('line',figsize=(16,8))
plt.title('Year-to-Year Difference Time Series')
plt.ylabel('$\Delta\gamma^o$ [dB]')
plt.xlabel('Day of Year')
plt.grid()

In [None]:
thres_exceeded = diff1716[abs(diff1716) > thres]
thres_exceeded

<font face="Calibri" size="3"> From the <i>three_exceeded</i> dataframe we can infer the first date at which the threshold was exceeded. We would label this date as a **change point**. As an additional criteria for labeling a change point, one can also consider the number of observations after an identified change point that also excided the threshold. If only one or two observations differed from the year before this could be considered an outlier. Additional smoothing of the time series may sometimes be useful to avoid false detections. </font>

<br>
<hr>
<div class="alert alert-success">
<font face="Calibri" size="5"> <b> <font color='rgba(200,0,0,0.2)'> <u>ASSIGNMENT #3</u>:  </font> Perform Year-to-Year Differencing-based Change Detection for a Different Subset</b> <font color='rgba(200,0,0,0.2)'> -- [5 Points] </font> </font>

<font face="Calibri" size="3"> Go back to the beginning of Section 4 and change the subset coordinates to a different subset (i.e., modify to <i>subset=($X$,$Y$,5,5)</i> with $X$ and $Y$ being the center of your modified subset). Work through the workbook from the beginning of Section 4 until the end of Section 5.1 with your modified subset. Discuss whether or not your new subset shows change according to the 3dB change threshold.
<br><br>

DISCUSS BELOW WHETHER YOUR NEW SUBSET SHOWS CHANGE:

</font>
</div>

<hr>
<font face="Calibri" size="4"> <b> 5.2 Cumulative Sums for Change Detection</b> </font> 

<font face="Calibri" size="3"> Another approach to detect change in regularly acquired data is employing the method of **cumulative sums**. Changes are determined by comparing the time series data against its mean. A full explanation and examples from the financial sector can be found at [http://www.variation.com/cpa/tech/changepoint.html](http://www.variation.com/cpa/tech/changepoint.html)
<br><br><hr>
<u><b>5.2. A First let's consider a time series and it's mean observation</b></u>:<br> 
We look at two full years of observations from Sentinel-1 data for an area where we suspect change. In the following, we define $X$ as our time series
<br><br>
\begin{equation}
X = (X_1,X_2,...,X_n)
\end{equation}

with $X_i$ being the SAR backscatter values at times $i=1,...,n$ and $n$ is the number of observations in the time series.
</font>

In [None]:
subset=(5,20,3,3)
#subset=(12,5,3,3)
ts1 = timeSeries(rasterstack_pwr,tindex,subset)
X = ts1[ts1.index>'2015-10-31']

<hr>
<font face="Calibri" size="3"> <b><u>5.2.B Filtering the time series for outliers</u></b>:<br>
It is advantageous in noisy SAR time series like those from C-Band Sentinel-1 data to reduce noise by <b>applying a filter along the time axis</b>. Pandas offers a <i>"rolling"</i> function for these purposes. Using the <i>rolling</i> function, we will apply a **median filter** to our data.  </font>

In [None]:
Xr=X.rolling(5,center=True).median()
Xr.plot(figsize=(16,4))
_=X.plot()
plt.title('Original vs. Filtered Time Series')
plt.ylabel('$\gamma^o$ [dB]')
plt.xlabel('Time')
plt.grid()

<font face="Calibri" size="3">Let's plot the filtered time series and its respective mean value:</font>

In [None]:
X=Xr  # Uncomment if rolling mean is wanted for further computation
Xmean = X.mean()

In [None]:
fig,ax=plt.subplots(figsize=(16,4))
X.plot()
plt.ylabel('$\gamma^o$ [dB]')
ax.axhline(Xmean,color='red')
_=plt.legend(['$\gamma^o$','$\overline{\gamma^o}$'])
plt.grid()

<hr>
<font face="Calibri" size="3"> <b><u>5.2.C Calculate the Residuals of the Time Series Against the Mean $\overline{\gamma^o}$</u></b>:<br>
To get to the residual, we calculate 

\begin{equation}
R = X_i - \overline{X}
\end{equation}

</font>

In [None]:
R = X - Xmean

<hr>
<font face="Calibri" size="3"> <b><u>5.2.D Calculate Cumulative Sum of the Residuals</u></b>:<br>
The cumulative sum is defined as: 

\begin{equation}
S = \displaystyle\sum_1^n{R_i}
\end{equation}

</font>

In [None]:
S = R.cumsum()

_=S.plot(figsize=(16,6))
plt.ylabel('CumSum $S$ [dB]')
plt.xlabel('Time')
plt.grid()

<font face="Calibri" size="3"> The **cumulative sum** is a good indicator of change in the time series. An estimator for the magnitude of change is given as the difference between the maximum and minimum value of the cumulative sum $S$: 

\begin{equation}
S_{DIFF} = S_{MAX} - S_{MIN}
\end{equation}

</font>

In [None]:
Sdiff=S.max() - S.min()
print('Change magitude: %5.3f dB' % (Sdiff))

<hr>
<font face="Calibri" size="3"> <b><u>5.2.E Identify Change Point in the Time Series</u></b>:<br>
A candidate change point is identified from $S$ at the time where $S_{MAX}$ is found:

\begin{equation}
T_{{CP}_{before}} = T(S_i = S_{MAX})
\end{equation}

with $T_{{CP}_{before}}$ being the timestamp of the last observation <i>before</i> the identified change point, $S_i$ is the cumulative sum of $R$ with $i=1,...n$, and $n$ is the number of observations in the time series. 

The first observation <i>after</i> a change occurred ($T_{{CP}_{after}}$) is then found as the first observation in the time series following $T_{{CP}_{before}}$.

For our example time series $X$ these points are:
</font>

In [None]:
t_cp_before = S[S==S.max()].index[0]
print('Last date before change occurred: {}'.format(t_cp_before.date()))

In [None]:
t_cp_after = S[S.index > t_cp_before].index[0]
print('First date after change occurred: {}'.format(t_cp_after.date()))

<hr>
<font face="Calibri" size="3"> <b><u>5.2.F Determine our Confidence in the Identified Change Point using Bootstrapping</u></b>:<br>
We can determine if an identified change point is indeed a valid detection by <b>randomly reordering the time series</b> and <b>comparing the various $S$ curves</b>. During this <b>"bootstrapping"</b> approach, we count how many times the $S_{DIFF}$ values are greater than $S_{{DIFF}_{random}}$ of the identified change point. 
    
After bootstrapping, we define the <b>confidence level $CL$</b> in a detected change point according to:

\begin{equation}
CL = \frac{N_{GT}}{N_{bootstraps}}
\end{equation}

where $N_{GT}$ is the number of times $S_{DIFF}$ > $S_{{DIFF}_{random}}$ and $N_{bootstraps}$ is the number of bootstraps randomizing $R$.
<br><br><br>
As another quality metric we can also calculate the <b>significance $CP_{significance}$</b> of a change point according to: 

\begin{equation}
CP_{significance} = 1 - \left( \frac{\sum_{b=1}^{N_{bootstraps}}{S_{{DIFF}_{{random}_i}}}}{N_{bootstraps}} \middle/ S_{DIFF} \right)
\end{equation}

The closer $CP_{significance}$ is to 1, the more significant the change point.

The python code below performs the **bootstrapping algorithm**:
</font>

In [None]:
n_bootstraps=500  # bootstrap sample size
fig,ax = plt.subplots(figsize=(16,6))
S.plot(ax=ax,linewidth=3)
ax.set_ylabel('Cumulative Sums of the Residuals')
fig.legend(['S Curve for Candidate Change Point'])
Sdiff_random_sum=0
Sdiff_random_max=0  # to keep track of the maxium Sdiff of the 
               # bootstrapped sample
n_Sdiff_gt_Sdiff_random=0  # to keep track of the maxium Sdiff of the 
               # bootstrapped sample
print("Running Bootstrapping for %4.1f iterations ..." % (n_bootstraps))
for i in range(n_bootstraps):
    Rrandom = R.sample(frac=1)  # Randomize the time steps of the residuals
    Srandom = Rrandom.cumsum()
    Sdiff_random=Srandom.max()-Srandom.min()
    Sdiff_random_sum += Sdiff_random
    if Sdiff_random > Sdiff_random_max:
        Sdiff_random_max = Sdiff_random
    if Sdiff > Sdiff_random:
        n_Sdiff_gt_Sdiff_random += 1
    Srandom.plot(ax=ax)
    if ((i+1)/n_bootstraps*100)%10 == 0:
        print("%4.1f percent completed ..." % ((i+1)/n_bootstraps*100))
_=ax.axhline(Sdiff_random_sum/n_bootstraps)
plt.grid()

<hr>
<font face="Calibri" size="3"> Based on the bootstrapping results, we can now calculate <b>Confidence Level $CL$</b> and <b>Significance $CP_{significance}$</b> for our candidate change point:
</font>

In [None]:
CL = 1.*n_Sdiff_gt_Sdiff_random/n_bootstraps
print('Confidence Level for change point {} percent'.format(CL*100.))

In [None]:
CP_significance = 1. - (Sdiff_random_sum/n_bootstraps)/Sdiff 
print('Change point significance metric: {}'.format(CP_significance))

<hr>
<font face="Calibri" size="3"> <b><u>5.2.G TRICK: Detrending of Time Series Before Change Detection to Improve Robustness</u></b>:<br>
De-trending the time series with global image means improves the robustness of change point detection as global image time series anomalies stemming from calibration or seasonal trends are removed prior to time series analysis. This de-trending needs to be performed with large subsets so real change is not influencing the image statistics. 

NOTE: Due to the small size of our subset, we will see some distortions when we detrend the time series.

<b>Let's start by building a global image means time series</b>:
</font>

In [None]:
means_pwr = np.mean(rasterstack_pwr,axis=(1,2))
means_dB = 10.*np.log10(means_pwr)
gm_ts = pd.Series(means_dB,index=tindex)
gm_ts=gm_ts[gm_ts.index > '2015-10-31']  # filter dates
gm_ts=gm_ts.rolling(5,center=True).median()

In [None]:
gm_ts.plot(figsize=(16,6))
plt.title('Time Series of Global Means')
plt.ylabel('[dB]')
plt.xlabel('Time')
plt.grid()

<font face="Calibri" size="3"> Now we can **compare** the time series of global means (above) to the time series of our small subset (below):
</font>

In [None]:
X.plot(figsize=(16,6))
plt.title('Sentinel-1 C-VV Time Series Backscatter Profile,\
Subset: 5,20,5,5  ')
plt.ylabel('[dB]')
plt.xlabel('Time')
plt.grid()

<font face="Calibri" size="3"> There are some signatures of the global seasonal trend in our subset time series. To remove these signatures and get a cleaner time series of change, we subtract the global mean time series from our subset time series:
</font>

In [None]:
Xd=X-gm_ts
Xmean=Xd.mean()
Xd.plot(figsize=(16,6))
plt.title('Detrended Sentinel-1 C-VV Time Series Backscatter Profile,\
Subset: 5,20,5,5  ')
plt.ylabel('[dB]')
plt.xlabel('Time')
plt.grid()

In [None]:
R = Xd - Xmean

<font face="Calibri" size="3">Now we compute the <b>cumulative sum $S$ of the detrended time series</b> and plot it:
</font>

In [None]:
S = R.cumsum()

_=S.plot(figsize=(16,6))
plt.ylabel('CumSum $S$ [dB]')
plt.xlabel('Time')
plt.grid()

<font face="Calibri" size="3"><b>Detect Change Point and extract related change dates</b>:
</font>

In [None]:
t_cp_before = S[S==S.max()].index[0]
print('Last date before change occurred: {}'.format(t_cp_before.date()))

In [None]:
t_cp_after = S[S.index > t_cp_before].index[0]
print('First date after change occurred: {}'.format(t_cp_after.date()))

<font face="Calibri" size="3">Perform <b>bootstrapping</b> and calculate <b>Confidence Level $CL$</b> and <b>Significance $CP_{significance}$</b> for our change point candidate:
</font>

In [None]:
n_bootstraps=500  # bootstrap sample size
fig,ax = plt.subplots(figsize=(16,6))
S.plot(ax=ax,linewidth=3)
ax.set_ylabel('Cumulative Sums of the Residuals')
fig.legend(['S Curve for Candidate Change Point'])
Sdiff_random_sum=0
Sdiff_random_max=0  # to keep track of the maxium Sdiff of the 
               # bootstrapped sample
n_Sdiff_gt_Sdiff_random=0  # to keep track of the maxium Sdiff of the 
               # bootstrapped sample
print("Running Bootstrapping for %4.1f iterations ..." % (n_bootstraps))
for i in range(n_bootstraps):
    Rrandom = R.sample(frac=1)  # Randomize the time steps of the residuals
    Srandom = Rrandom.cumsum()
    Sdiff_random=Srandom.max()-Srandom.min()
    Sdiff_random_sum += Sdiff_random
    if Sdiff_random > Sdiff_random_max:
        Sdiff_random_max = Sdiff_random
    if Sdiff > Sdiff_random:
        n_Sdiff_gt_Sdiff_random += 1
    Srandom.plot(ax=ax)
    if ((i+1)/n_bootstraps*100)%10 == 0:
        print("%4.1f percent completed ..." % ((i+1)/n_bootstraps*100))
_=ax.axhline(Sdiff_random_sum/n_bootstraps)
plt.grid()

In [None]:
CL = n_Sdiff_gt_Sdiff_random/n_bootstraps
print('Confidence Level for change point {} percent'.format(CL*100.))

<hr>
<font face="Calibri" size="3">Note how the <b>change point significance $CP_{significance}$</b> has increased in the detrended time series:
</font>

In [None]:
CP_significance = 1. - (Sdiff_random_sum/n_bootstraps)/Sdiff 
print('Change point significance metric: {}'.format(CP_significance))

<br>
<hr>
<div class="alert alert-success">
<font face="Calibri" size="5"> <b> <font color='rgba(200,0,0,0.2)'> <u>ASSIGNMENT #4</u>:  </font> Perform Cumulative Sum-based Change Detection for a Different Subset</b> <font color='rgba(200,0,0,0.2)'> -- [6 Points] </font> </font>

<font face="Calibri" size="3"> Go back to the beginning of Section 5.2 and change the subset coordinates to a different subset (i.e., modify to <i>subset=($X$,$Y$,5,5)</i> with $X$ and $Y$ being the center of your selected subset). Work through the workbook from the beginning to the end of Section 5.2with your selected subset. Discuss whether or not your new subset shows change according to the <i>Cumulative Sum</i> approach.
<br><br>

DISCUSS BELOW WHETHER YOUR NEW SUBSET SHOWS CHANGE:

</font>
</div>

<br>
<hr>
<font face="Calibri" size="5"> <b> 6. Cumulative Sum-based Change Detection Across an Entire Image</b> </font> 

<font face="Calibri" size="3"> With numpy arrays we can apply the concept of **cumulative sum change detection** analysis effectively on the entire image stack. We take advantage of array slicing and axis-based computing in numpy. Axis 0 is the time domain in our raster stacks.
    
<hr>
<font size="4"><b>6.1 We first create our time series stack:</b>
</font> 

In [None]:
X = rasterstack_pwr
# Filter out the first layer ( Dates >= '2015-11-1')
X_sub=X[1:,:,:]
tindex_sub=tindex[1:]
X= 10.*np.log10(X_sub)  # Uncomeent to test dB scale 

In [None]:
plt.figure(figsize=(12, 8))
bandnbr=0
vmin=np.percentile(X[bandnbr],5)
vmax=np.percentile(X[bandnbr],95)
plt.title('Band  {} {}'.format(bandnbr+1,tindex_sub[bandnbr].date()))
plt.imshow(X[0],cmap='gray',vmin=vmin,vmax=vmax)
_=plt.colorbar()

<br>
<hr>
<font face="Calibri" size="4"> <b> 6.2 Calculate Mean Across Time Series to Prepare for Calculation of Cumulative Sum $S$:</b> </font> 

In [None]:
Xmean=np.mean(X,axis=0)
plt.figure(figsize=(12, 8))
plt.imshow(Xmean,cmap='gray')
_=plt.colorbar()

In [None]:
R=X-Xmean

In [None]:
plt.figure(figsize=(12, 8))
plt.imshow(R[0])
plt.title('Residuals for Band  {} {}'.format(bandnbr+1,tindex_sub[bandnbr].date()))
_=plt.colorbar()

<br>
<hr>
<font face="Calibri" size="4"> <b> 6.3 Calculate Cumulative Sum $S$ as well as Change Magnitude $S_{diff}$:</b> </font> 

In [None]:
S = np.cumsum(R,axis=0)
Smax= np.max(S,axis=0)
Smin= np.min(S,axis=0)
Sdiff=Smax-Smin
fig,ax=plt.subplots(1,3,figsize=(16,4))
vmin=Smin.min()
vmax=Smax.max()
p=ax[0].imshow(Smax,vmin=vmin,vmax=vmax)
ax[0].set_title('$S_{max}$')
ax[1].imshow(Smin,vmin=vmin,vmax=vmax)
ax[1].set_title('$S_{min}$')
ax[2].imshow(Sdiff,vmin=vmin,vmax=vmax)
ax[2].set_title('$S_{diff}$')
fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([0.85, 0.15, 0.02, 0.7])
_=fig.colorbar(p,cax=cbar_ax)

<br>
<hr>
<font face="Calibri" size="4"> <b> 6.4 Mask $S_{diff}$ With a-priori Threshold To Idenfity Change Candidates:</b> </font>

<font face="Calibri" size="3">To identified change candidate pixels, we can threshold $S_{diff}$ to reduce computation of the bootstrapping. For land cover change we would not expect more than 5-10% change pixels in a landscape. So, if the test region is reasonably large, setting a threshold for expected change to 10% is appropriate. In our example we'll start out with a very conservative threshold of 20%.

The histogram for $S_{diff}$ is shown below.
</font>

In [None]:
plt.rcParams.update({'font.size': 14})
fig = plt.figure(figsize=(14,6)) # Initialize figure with a size
ax1 = fig.add_subplot(121)  # 121 determines: 2 rows, 2 plots, first plot
ax2 = fig.add_subplot(122)
# Second plot: Histogram
# IMPORTANT: To get a histogram, we first need to *flatten* 
# the two-dimensional image into a one-dimensional vector.
h = ax1.hist(Sdiff.flatten(),bins=200,range=(0,np.max(Sdiff)))
ax1.xaxis.set_label_text('Change Magnitude')
ax1.set_title('Change Magnitude Histogram')
plt.grid()
n, bins, patches = ax2.hist(Sdiff.flatten(), bins=200, range=(0,np.max(Sdiff)), cumulative='True', density='True', histtype='step', label='Empirical')
ax2.xaxis.set_label_text('Change Magnitude')
ax2.set_title('Change Magnitude CDF')
plt.grid()

<font face="Calibri" size="3">Using this threshold, we can <b>visualize our change candidate areas</b>:
</font>

In [None]:
precentile=0.8
outind = np.where(n > precentile)
threshind = np.min(outind)
thres = bins[threshind]
print('At the {}% percentile, the threshold value is {:2.2f}'.format(precentile*100,thres))

Sdiffmask=Sdiff<thres
plt.figure(figsize=(12, 8))
plt.title('Change Candidate Areas (black)')
_=plt.imshow(Sdiffmask,cmap='gray')

<br>
<hr>
<font face="Calibri" size="4"> <b> 6.5 Bootstrapping to Prepare for Change Point Selection:</b> </font>

<font face="Calibri" size="3">We can now perform bootstrapping over the candidate pixels. The workflow is as follows:
<ul>
    <li>Filter our residuals to the change candidate pixels</li>
    <li>Perform bootstrapping over candidate pixels</li>
</ul>
For efficient computing we permutate the index of the time axis.
</font>

In [None]:
Rmask = np.broadcast_to(Sdiffmask,R.shape)
Rmasked = np.ma.array(R,mask=Rmask)

<font face="Calibri" size="3">On the masked time series stack of residuals, we can re-compute the cumulative sums:
</font>

In [None]:
Smasked = np.ma.cumsum(Rmasked,axis=0)

In [None]:
Smasked_max= np.ma.max(Smasked,axis=0)
Smasked_min= np.ma.min(Smasked,axis=0)
Smasked_diff=Smasked_max-Smasked_min
fig,ax=plt.subplots(1,3,figsize=(16,4))
vmin=Smasked_min.min()
vmax=Smasked_max.max()
p=ax[0].imshow(Smasked_max,vmin=vmin,vmax=vmax)
ax[0].set_title('$S_{max}$')
ax[1].imshow(Smasked_min,vmin=vmin,vmax=vmax)
ax[1].set_title('$S_{min}$')
ax[2].imshow(Smasked_diff,vmin=vmin,vmax=vmax)
ax[2].set_title('$S_{diff}$')
fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([0.85, 0.15, 0.02, 0.7])
_=fig.colorbar(p,cax=cbar_ax)

<font face="Calibri" size="3">Now let's perform <b>bootstrapping</b>:
</font>

In [None]:
random_index=np.random.permutation(Rmasked.shape[0])
Rrandom=Rmasked[random_index,:,:]

In [None]:
n_bootstraps=1000  # bootstrap sample size

# to keep track of the maxium Sdiff of the bootstrapped sample:
Sdiff_random_max = np.ma.copy(Smasked_diff) 
Sdiff_random_max[~Sdiff_random_max.mask]=0
# to compute the Sdiff sums of the bootstrapped sample:
Sdiff_random_sum = np.ma.copy(Smasked_diff) 
Sdiff_random_sum[~Sdiff_random_max.mask]=0
# to keep track of the count of the bootstrapped sample
n_Sdiff_gt_Sdiff_random = np.ma.copy(Smasked_diff) 
n_Sdiff_gt_Sdiff_random[~n_Sdiff_gt_Sdiff_random.mask]=0
print("Running Bootstrapping for %4.1f iterations ..." % (n_bootstraps))
for i in range(n_bootstraps):
    # For efficiency, we shuffle the time axis index and use that 
    #to randomize the masked array
    random_index=np.random.permutation(Rmasked.shape[0])
    # Randomize the time step of the residuals
    Rrandom = Rmasked[random_index,:,:]  
    Srandom = np.ma.cumsum(Rrandom,axis=0)
    Srandom_max=np.ma.max(Srandom,axis=0)
    Srandom_min=np.ma.min(Srandom,axis=0)
    Sdiff_random=Srandom_max-Srandom_min
    Sdiff_random_sum += Sdiff_random
    Sdiff_random_max[np.ma.greater(Sdiff_random,Sdiff_random_max)]=\
    Sdiff_random[np.ma.greater(Sdiff_random,Sdiff_random_max)]
    n_Sdiff_gt_Sdiff_random[np.ma.greater(Smasked_diff,Sdiff_random)] += 1
    if ((i+1)/n_bootstraps*100)%10 == 0:
        print("%4.1f percent completed ..." % ((i+1)/n_bootstraps*100))

<br>
<hr>
<font face="Calibri" size="4"> <b> 6.6 Extract Confidence Metrix and Select Final Change Points:</b> </font>

<font face="Calibri" size="3">We first compute for all pixels the confidence level $CL$, the change point significance metric $CP_{significance}$ and the product of the two as our confidence metric for identified change points:
</font>

In [None]:
CL = n_Sdiff_gt_Sdiff_random/n_bootstraps
CP_significance = 1.- (Sdiff_random_sum/n_bootstraps)/Sdiff 
#Plot
fig,ax=plt.subplots(1,3,figsize=(16,4))
a = ax[0].imshow(CL*100)
fig.colorbar(a,ax=ax[0])
ax[0].set_title('Confidence Level %')
a = ax[1].imshow(CP_significance)
fig.colorbar(a,ax=ax[1])
ax[1].set_title('Significance')
a = ax[2].imshow(CL*CP_significance)
fig.colorbar(a,ax=ax[2])
_=ax[2].set_title('CL x S')

<font face="Calibri" size="3">Now if we were to set a threshold of 0.5 for identifying true change our change map would look like the following figure:
</font>

In [None]:
cp_thres=0.5

In [None]:
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(1,1,1)
plt.title('Detected Change Pixels based on Threshold %2.1f' % (cp_thres))
a = ax.imshow(CL*CP_significance <  cp_thres,cmap='cool')

<br>
<hr>
<font face="Calibri" size="4"> <b> 6.7 Derive Timing of Change for Each Change Pixel:</b> </font>

<font face="Calibri" size="3">Our last step in the identification of the change points is to extract the timing of the change. We will produce a raster layer that shows the band number of this first date after a change was detected. We will make use of the numpy indexing scheme. First, we create a combined mask of the first threshold and the identified change points after the bootstrapping. For this we use the numpy "mask_or" operation.
</font>

In [None]:
# make a mask of our change points from the new threhold and the previous mask
cp_mask=np.ma.mask_or(CL*CP_significance<cp_thres,CL.mask)
# Broadcast the mask to the shape of the masked S curves
cp_mask2 = np.broadcast_to(cp_mask,Smasked.shape)
# Make a numpy masked array with this mask
CPraster = np.ma.array(Smasked.data,mask=cp_mask2)

<font face="Calibri" size="3">To retrieve the dates of the change points we find the band indices in the time series along the time axis where the maximum of the cumulative sums was located. Numpy offers the "argmax" function for this purpose.
</font>

In [None]:
CP_index= np.ma.argmax(CPraster,axis=0)
change_indices = list(np.unique(CP_index))
change_indices.remove(0)
print(change_indices)
# Look up the dates from the indices to get the change dates
alldates=tindex[tindex>'2015-10-31']
change_dates=[str(alldates[x].date()) for x in change_indices]
print(change_dates)

<font face="Calibri" size="3">Lastly, we visualize the change dates by showing the $CP_{index}$ raster and label the change dates.
</font>

In [None]:
ticks=change_indices
ticklabels=change_dates

cmap=plt.cm.get_cmap('tab20',ticks[-1])
fig, ax = plt.subplots(figsize=(12,12))
cax = ax.imshow(CP_index,interpolation='nearest',cmap=cmap)
# fig.subplots_adjust(right=0.8)
# cbar_ax = fig.add_axes([0.85, 0.15, 0.05, 0.7])
# fig.colorbar(p,cax=cbar_ax)

ax.set_title('Dates of Change')
# cbar = fig.colorbar(cax,ticks=ticks)
cbar=fig.colorbar(cax,ticks=ticks,orientation='horizontal')
_=cbar.ax.set_xticklabels(ticklabels,size=10,rotation=45,ha='right')  

<font face="Calibri" size="2"> <i>GEOS 657 Microwave Remote Sensing - Version 1.0 - Feb 2019 </i>
</font>