# <a id="top">What to expect from this notebook</a>

- an example of using of satellites to detect changes in plant life.
- very basic xarray manipulations
- removing clouds and patching scanlines
- building a composite image

<br>  

# Algorithmic process  

- [get the maximum extents of the datacube](#extents)
- [define extents you require that fall within the maximum extents](#define_extents)
    [* selecting too much can make the acquisition process slow ]
- [filter out cloud data and scan lines](#clean_mask)
- [select the dates you wish to form a baseline measure from and the target date for comparison](#baseline)
- [compare the target date's NDVI values against the baseline composite image](#compare)
- [plot the results](#plot)

<hr>
# How It Works

To detect changes in plant life, we use a measure called NDVI. 
* <font color=green>NDVI</font> is the ratio of the difference between the near infrared light <font color=red>(NIR)</font> and red light <font color=red>(RED)</font> wavelengths to their sum
<br>

$$ NDVI =  \frac{(NIR - RED)}{(NIR + RED)}$$  

<br>
<div class="alert-info">
The idea is to observe how much red light is being absorbed versus reflected. Photosynthetic plants absorb most of the visible spectrum's wavelengths when they are healthy.  When they aren't healthy, more of that light will get reflected.  This makes the difference between <font color=red>NIR</font> and <font color=red>RED</font> much smaller which will lower the <font color=green>NDVI</font>.  The resulting values from doing this over several pixels can be used to create visualizations for the changes in the amount of photosynthetic vegetation in large areas.
</div>

In [None]:
#Import the datacube and the API
import datacube
from utils.data_cube_utilities.data_access_api import DataAccessApi

#Create an instance of the datacube and API
dc = datacube.Datacube(config="/home/localuser/.datacube.conf")
api = DataAccessApi(config="/home/localuser/.datacube.conf")

In [None]:
#Get all the current datacube products
products = dc.list_products()

print(products[["platform", "name"]])

In [None]:
#This is the platform (satellite) and product (datacube set) used for this demonstration
platform = ["LANDSAT_7"]
product = ["ls7_ledaps_kenya"]

The magnitudes of the different wavelengths of light can be quanitized and stored on a per pixel basis.  <font color=green>NDVI</font> only requires the use of <font color=red>NIR</font> and <font color=red>RED</font> light but there are many more wavelengths and some additional measures available.  One such additional measure is called pixel_qa.  This is a measure of the quality of the pixel for analysis. A breakdown of the values stored in <font color=darkblue>pixel_qa</font> are beyond the scope of this notebook but we encourage you to check our github for more information on the meaning behind the values stored within.
![](diagrams/rainy_demo/ls7_xarray.png)  

## <a id="extents">Getting the Extents of the Cube</a>

In [None]:
# Get the extents of the cube
descriptor = api.get_query_metadata(platform=platform, product=product[0])

#store the latitudinal and longitudinal extents
lat, lon = products.resolution[products.platform == platform[0]].any()

In [None]:
from utils.data_cube_utilities.dc_display_map import display_map

#save extents
min_date, max_date = descriptor['time_extents']
min_lat, max_lat = descriptor['lat_extents']
min_lon, max_lon = descriptor['lon_extents']

#Adjust date string
min_date_str = str(min_date.year) + '-' + str(min_date.month) + '-' + str(min_date.day)
max_date_str = str(max_date.year) + '-' + str(max_date.month) + '-' + str(max_date.day)

#Round GPS coordinates to 3 decimal places
min_lat_rounded = round(min_lat, 3)
min_lon_rounded =  round(min_lon, 3)
max_lat_rounded = round(max_lat, 3)
max_lon_rounded = round(max_lon, 3) 

#display area
display_map(latitude = (min_lat_rounded, max_lat_rounded),longitude = (min_lon_rounded, max_lon_rounded))

In [None]:
from dc_notebook_utilities import generate_metadata_report

# Display the ranges of the metadata in a table
generate_metadata_report(min_date_str, max_date_str, 
                         min_lon_rounded, max_lon_rounded, lon,
                         min_lat_rounded, max_lat_rounded, lat)

## <a id="define_extents">Defining the Extents of the Analysis</a>

In [None]:
from dc_notebook_utilities import create_extents_gui 

#Create the GUI for the extents derived
extent_values = create_extents_gui(min_date_str, max_date_str,
                                   min_lon_rounded, max_lon_rounded,
                                   min_lat_rounded, max_lat_rounded)

While Latitude and Longitude are pretty straightforward, time slices must be chosen carefully so that you do not accidentally obscure useful information.  In the diagram below you can see that the rainy season is omitted so the composites can be representative of the dry seasons on either side of the rainy season.  The inclusion of the rainy season data would obscure the analysis results.
![img](diagrams/rainy_demo/alg_jn2_02.png)


In [None]:
import datetime

# Save form values
start_date = datetime.datetime.strptime(extent_values[0].value, '%Y-%m-%d')
end_date = datetime.datetime.strptime(extent_values[1].value, '%Y-%m-%d')
min_lon = extent_values[2].value
max_lon = extent_values[3].value
min_lat = extent_values[4].value
max_lat = extent_values[5].value

#get a list of available image aquisition dates
acquisitions_list = api.list_acquisition_dates(product[0], longitude=(min_lon, max_lon), latitude=(min_lat, max_lat))
print(len(acquisitions_list))

In [None]:
#define query parameters
params= dict(platform=platform[0],
             product=product[0],
             lon=(min_lon, max_lon),
             lat=(min_lat, max_lat),
             measurements = ["red", "nir", "pixel_qa"])

# Query the Data Cube
dataset = dc.load(**params)

## <a id="clean_mask">Making a Clean Mask</a>

#### Clouds:
Clouds can obscure imagery from satellites making the analysis harder to perform.  Fortunately clouds can be filtered out rather easily using images from other dates close to the target date.  The small illustration below shows how clouds can obsure a satellite image:
  ![](diagrams/rainy_demo/cloud_clip_01.PNG)

#### Scan Lines:
Scan lines are an artifact of Landsat satellite imagery.  They are a result of a malfunction in the system responsible for ensuring full coverage.  As a result there are missing strips of imagery from most Landsat images.  The illsutration below shows what scan lines might look like on a satellite image:
![](diagrams/rainy_demo/slc_error_02.PNG)

In [None]:
from utils.data_cube_utilities.clean_mask import landsat_qa_clean_mask

#Get the clean mask for the LANDSAT satellite platform
clean_mask = landsat_qa_clean_mask(dataset, platform[0])

In [None]:
#Apply clean mask to dataset
cleaned_dataset = create_mosaic(dataset, reverse_time=False, clean_mask=clean_mask)

# <a id="baseline">Selecting a Target Date and Specifying a Baseline For Comparison</a>

In [None]:
from ipywidgets import widgets
import collections
import operator
from utils.data_cube_utilities.dc_mosaic import (create_mosaic, create_median_mosaic,
                                                 create_max_ndvi_mosaic, create_min_ndvi_mosaic)


#create the widget for the scene selection dropdown
#This will be used to select the scene we are comparing to the baseline NDVI amounts
scene_sel = widgets.Dropdown(options=acquisitions_list, values=acquisitions_list)

#This dropdown widget will allow us to select multiple scenes with which to compose a baseline
baseline_sel = widgets.SelectMultiple(options=acquisitions_list, values=acquisitions_list)

#Set the threshold increment and create the dropdown for it
threshold_sel_options = {str(x)+'%': x/100 for x in range(5, 101, 5)}
threshold_sel = widgets.Dropdown(options=collections.OrderedDict(sorted(threshold_sel_options.items(), key=operator.itemgetter(1))))

#Create a dictionary of the different mosaic method options
mosaic_methods = {'Most Recent':create_mosaic, 'Least Recent':create_mosaic,'Median':create_median_mosaic,
                  'Max NDVI':create_max_ndvi_mosaic, 'Min NDVI':create_min_ndvi_mosaic}

#create the widget for the mosaic options
mosaic_options_sel = widgets.Dropdown(options=list(mosaic_methods.keys()))
    
# Display form
display(widgets.Label('Select a scene to check for anomalies: '), scene_sel)
display(widgets.Label('Select scenes to form a baseline: '), baseline_sel)
display(widgets.Label('Select a mosaic method for the baseline:  '), mosaic_options_sel)
display(widgets.Label('Select a percentage threshold for anomalies: '), threshold_sel)

In [None]:
#Initialize baseline mosaic for comparison
baseline_mosaic = None

#need to reverse the direction of the mosaicing over time if "Most Recent" selected
reverse_time = True if mosaic_options_sel.value.title() == 'Most Recent' else False
for index in range(len(baseline_sel.value)):
    data = api.get_dataset_by_extent(product[0], latitude=(min_lat, max_lat), longitude=(min_lon, max_lon), 
                                    time=(baseline_sel.value[index],(baseline_sel.value[index+1] if index != len(baseline_sel.value)-1 else baseline_sel.value[index]+datetime.timedelta(seconds=1))),
                                    measurements=['red', 'nir', 'pixel_qa'])
    clean_mask = landsat_qa_clean_mask(data, platform[0])
    baseline_mosaic = mosaic_methods[mosaic_options_sel.value](data, intermediate_product=baseline_mosaic, reverse_time=reverse_time, clean_mask=clean_mask)

## <a id="compare">Calulating the NDVI for the Baseline and Target Scene</a>

In [None]:
import sys #required for epsilon (in case the baseline is zero)

#Calculate the NDVI baseline values
ndvi_baseline = (baseline_mosaic.nir - baseline_mosaic.red) / (baseline_mosaic.nir + baseline_mosaic.red)

#Calculate the NDVI values in the target scene
ndvi_scene = (cleaned_dataset.nir - cleaned_dataset.red) / (cleaned_dataset.nir + cleaned_dataset.red)

#Determine the percentage change
percentage_change = abs((ndvi_baseline - ndvi_scene) / (ndvi_baseline+sys.float_info.epsilon))

## <a id="plot">Plotting the NDVI Anomalies</a>

In [None]:
import matplotlib.pyplot as plt

#Set plot size
plt.figure(figsize = (15,12))

#plot the raw percent changes
percentage_change.plot(cmap='seismic')

In the plot above you can clearly see an anomaly of NDVI.  This indicates that there was a measurable difference in the amount of photosynthetic plants in the area compared to the standard defined in the baseline composite image.

In [None]:
#with a hard cutoff value
anom = percentage_change > 1 

#display plot
anom.plot(cmap='seismic')

In [None]:
#Let's see what that area is
display_map(latitude = (.5, .7),longitude = (35.5, 35.7))

It turns out that the anomaly we detected was Lake Kamnarok in Kenya.  So why did lake Kamnarok show up as an anomaly?  It turns out that Lake Kamnarok, named after all the <i>Narok</i> plants in it, dried up in 2015. When it did, a significant number of the plants inside and around it died.  Without any prior knowledge about the lake, we were able to determine that there was a significant change in the plant life there using satellite imagery.

 
[return](#top)