# Forest Degredation in Vietnam using regression analysis on Landsat Imagery 


### This notebook:  

- applies a forest-degredation monitoring method across a time series of landsat imagery  
- displays a rendering of the computed products, 
- saves computed products off to disk for further validation.  
  
------  

### Motivation   

This notebook is inspired by a publication titled **Assessment of Forest Degradation in Vietnam Using Landsat Time Series Data** authored by Vogelmann et al. You can retrieve a copy from `mdpi` by following this [link](http://www.mdpi.com/1999-4907/8/7/238).
  
-------

### Algorithmic Profile  
  
- This algorithm generates a forest degredation product.
- The product is derived from Landsat 7 Collection 1 Tier 2 SR imagery taken from USGS data holdings.
- Linear regression is run on an NDVI product, the linear coeffiecient (slope) is used as a proxy for forest degredation (NDVI decrease). 
-------  
  
<br>  

### Process  
For a select year:

- Compute the NDVI across all Landsat acquisitons    
- Select a time frame of **n** contiguous acquisitions  
- run linear regression on time-series stack of each ndvi pixel  
- capture slope in a lat,lon referenced grid
  
-------  
  
<br>  

>#### Flow Diagram  
> ![](./diagrams/vogelmann/ndvi_trend.png)  

  

-------  


In [None]:
import sys
import os
sys.path.append(os.environ.get('NOTEBOOK_ROOT'))

  
------
  

# Calculations 

### NDVI  
NDVI is a derived index that correlates well with the existance of vegetation.  
  
<br>

$$ NDVI =  \frac{(NIR - RED)}{(NIR + RED)}$$  

<br>


In [None]:
def NDVI(dataset):
    return (dataset.nir - dataset.red)/(dataset.nir + dataset.red).rename("NDVI")

<br>  

## Linear Regression

The following code runs regression analysison every pixel.  


If it looks messy, that's probably because the underlying regression needs to handle `nan` values in a very peculiar way.    

In [None]:
import xarray as xr  
import numpy as np  

def _where_not_nan(arr):
    return np.where(np.isfinite(arr))

def _flatten_shallow(arr):
    return arr.reshape(arr.shape[0] * arr.shape[1])  

def per_pixel_linear_trend(pixel_da: xr.DataArray) -> xr.DataArray:
    time_index_length = len(pixel_da.time)  
    
    ys = _flatten_shallow(pixel_da.values)
    xs = np.array(list(range(time_index_length)))

    not_nan = _where_not_nan(ys)[0].astype(int)

    xs = xs[not_nan]
    ys = ys[not_nan] 

    pf = np.polyfit(xs,ys, 1)
    return xr.DataArray(pf[0])

def trend_product(da: xr.DataArray) -> xr.DataArray:
    stacked = da.stack(allpoints = ['latitude', 'longitude'])
    trend = stacked.groupby('allpoints').apply(per_pixel_linear_trend)
    unstacked = trend.unstack('allpoints')
    return unstacked.rename(dict(allpoints_level_0 = "latitude", allpoints_level_1 = "longitude"))


# Case study  

### Spatial Extents 

In [None]:
# Tano-Offin Forest Reserve, Ghana
latitude = (6.5991, 6.6823)
longitude = (-2.3071, -2.1712)
date_range = ('2000-01-01', '2000-12-31')

# Zanzibar, Tanzania
# latitude = (-6.2238, -6.1267)
# longitude = (39.2298, 39.2909)
# date_range = ('2000-01-01', '2000-12-31')

### Display basemap of area  

In [None]:
from utils.data_cube_utilities.dc_display_map import display_map
display_map(latitude = latitude, longitude = longitude)

### Load Data  

> #### Import datacube

In [None]:
from datacube.utils.aws import configure_s3_access
configure_s3_access(requester_pays=True)

import datacube
dc = datacube.Datacube()

> #### Load data  

In [None]:
product = 'ls7_usgs_sr_scene'
platform = 'LANDSAT_7'
collection = 'c1'
level = 'l2'

# The spatial stride to reduce the resolution by (1/stride).
stride = 2

In [None]:
data = dc.load(latitude = latitude, 
               longitude = longitude,
               product = product,
               time=date_range,
               measurements = ['red', 'nir', 'pixel_qa'],
               dask_chunks={'time':5, 'longitude':1000, 'latitude':1000})
data = data.isel(longitude=slice(0,len(data.longitude),stride), 
                 latitude=slice(0,len(data.latitude),stride))

### Create a cloud mask
Unclear pixels will be masked with a `nan` value.  We'll drop `pixel_qa` from the dataset to preserve memory. 

In [None]:
from utils.data_cube_utilities.clean_mask import landsat_clean_mask_full

mask = landsat_clean_mask_full(dc, data, product=product, platform=platform, 
                               collection=collection, level=level)
data = data.drop(['pixel_qa'])

### Calculate NDVI  

In [None]:
data = NDVI(data)

### Filter clouded/occluded NDVI readings  

In [None]:
data = data.where(mask).persist()
del mask

In [None]:
non_nan_mask = ~np.isnan(data).persist()
# Fill NaN pixels in pixel stacks with the mean of their stack.
filled_data = data.where(non_nan_mask, data.mean('time'))
# Fill remaining stacks.
filled_data = filled_data.where(non_nan_mask.sum('time'), data.mean())

### Run regression  

In [None]:
from time import time 

t1 = time()
data = trend_product(filled_data)
t2 = time()  

In [None]:
print(t2 - t1)

### Plot trends below threshold 

In [None]:
%matplotlib inline 
filled_data.where(filled_data<0).plot(figsize = (16,11))

In [None]:
(-filled_data).plot(figsize = (16,11))

# Cited
1. Deutscher, Janik & Gutjahr, Karlheinz & Perko, Roland & Raggam, Hannes & Hirschmugl, Manuela & Schardt, Mathias. (2017). Humid Tropical Forest Monitoring with Multi-Temporal L-, C- and X-Band SAR Data. 10.1109/Multi-Temp.2017.8035264.
