<a id="ndvi_std_top"></a>
# NDVI STD

Deviations from an established average z-score. 

<hr>  
  
# Notebook Summary

* A baseline for each month is determined by measuring NDVI over a set time
* The data cube is used to visualize at NDVI anomalies over time.
* Anomalous times are further explored and visualization solutions are proposed.

<hr>  

# Index  

* [Import Dependencies and Connect to the Data Cube](#ndvi_std_import)
* [Choose Platform and Product](#ndvi_std_plat_prod)
* [Get the Extents of the Cube](#ndvi_std_extents)
* [Define the Extents of the Analysis](#ndvi_std_define_extents)
* [Load Data from the Data Cube](#ndvi_std_load_data)
* [Create and Use a Clean Mask](#ndvi_std_clean_mask)
* [Calculate the NDVI](#ndvi_std_calculate)
* [Convert the Xarray to a Dataframe](#ndvi_std_pandas)
* [Define a Function to Visualize Values Over the Region](#ndvi_std_visualization_function)
* [Visualize the Baseline Average NDVI by Month](#ndvi_std_baseline_mean_ndvi)
* [Visualize the Baseline Distributions Binned by Month](#ndvi_std_boxplot_analysis)
* [Visualize the Baseline Kernel Distributions Binned by Month](#ndvi_std_violinplot_analysis)
* [Plot Z-Scores by Month and Year](#ndvi_std_pixelplot_analysis)
* [Further Examine Times Of Interest](#ndvi_std_heatmap_analysis)

<hr>  

# How It Works

To detect changes in plant life, we use a measure called NDVI. 
* <font color=green>NDVI</font> is the ratio of the difference between amount of near infrared light <font color=red>(NIR)</font> and red light <font color=red>(RED)</font> divided by their sum.
<br>

$$ NDVI =  \frac{(NIR - RED)}{(NIR + RED)}$$  

<br>
<div class="alert-info">
The idea is to observe how much red light is being absorbed versus reflected. Photosynthetic plants absorb most of the visible spectrum's wavelengths when they are healthy.  When they aren't healthy, more of that light will get reflected.  This makes the difference between <font color=red>NIR</font> and <font color=red>RED</font> much smaller which will lower the <font color=green>NDVI</font>.  The resulting values from doing this over several pixels can be used to create visualizations for the changes in the amount of photosynthetic vegetation in large areas.
</div>

## <span id="ndvi_std_import">Import Dependencies and Connect to the Data Cube [&#9652;](#ndvi_std_top) </span>  

In [None]:
import sys
import os
sys.path.append(os.environ.get('NOTEBOOK_ROOT'))

import time

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib.ticker import FuncFormatter
import seaborn as sns

from utils.data_cube_utilities.dc_load import get_product_extents
from utils.data_cube_utilities.dc_display_map import display_map
from utils.data_cube_utilities.clean_mask import landsat_clean_mask_full

from datacube.utils.aws import configure_s3_access
configure_s3_access(requester_pays=True)

import datacube
from utils.data_cube_utilities.data_access_api import DataAccessApi
api = DataAccessApi()
dc = api.dc

## <span id="ndvi_std_plat_prod">Choose Platform and Product [&#9652;](#ndvi_std_top)</span>

In [None]:
# Change the data platform and data cube here

product = 'ls7_usgs_sr_scene'
platform = 'LANDSAT_7'
collection = 'c1'
level = 'l2'

# product = 'ls8_usgs_sr_scene'
# platform = 'LANDSAT_8'
# collection = 'c1'
# level = 'l2'

## <span id="ndvi_std_extents">Get the Extents of the Cube [&#9652;](#ndvi_std_top)</span>

In [None]:
full_lat, full_lon, min_max_dates = get_product_extents(api, platform, product)

print("{}:".format(platform))
print("Lat bounds:", full_lat)
print("Lon bounds:", full_lon)
print("Time bounds:", min_max_dates)

## <span id="ndvi_std_define_extents">Define the Extents of the Analysis [&#9652;](#ndvi_std_top)</span>

In [None]:
display_map(full_lat, full_lon)

In [None]:
params = {'latitude': (0.55, 0.6),
 'longitude': (35.55, 35.5),
 'time': ( '2005-01-01', '2010-12-31')}

In [None]:
display_map(params["latitude"], params["longitude"])

## <span id="ndvi_std_load_data">Load Data from the Data Cube [&#9652;](#ndvi_std_top)</span>

In [None]:
dataset = dc.load(**params,
                  platform = platform,
                  product = product,
                  measurements = ['red', 'green', 'blue', 'swir1', 'swir2', 'nir', 'pixel_qa'],
                  dask_chunks={'time':1, 'latitude':1000, 'longitude':1000}).persist()
dataset

## <span id="ndvi_std_clean_mask">Create and Use a Clean Mask [&#9652;](#ndvi_std_top)</span>

In [None]:
# Make a clean mask to remove clouds and scanlines.
clean_mask = landsat_clean_mask_full(dc, dataset, product=product, platform=platform, 
                                     collection=collection, level=level)


# Filter the scenes with that clean mask
dataset = dataset.where(clean_mask)

## <span id="ndvi_std_calculate">Calculate the NDVI [&#9652;](#ndvi_std_top)</span>

In [None]:
#Calculate NDVI
ndvi = (dataset.nir - dataset.red)/(dataset.nir + dataset.red)

## <span id="ndvi_std_pandas">Convert the Xarray to a Dataframe [&#9652;](#ndvi_std_top)</span>

In [None]:
#Cast to pandas dataframe
df = ndvi.to_dataframe("NDVI")

#flatten the dimensions since it is a compound hierarchical dataframe
df = df.stack().reset_index()

#Drop the junk column that was generated for NDVI
df = df.drop(["level_3"], axis=1)

#Preview first 5 rows to make sure everything looks as it should
df.head()

In [None]:
#Rename the NDVI column to the appropriate name
df = df.rename(index=str, columns={0: "ndvi"})

#clamp NDVI between 0 and 1
df.ndvi = df.ndvi.clip(lower=0)

#Add columns for Month and Year for convenience
df["Month"] = df.time.dt.month
df["Year"] = df.time.dt.year

#Preview changes
df.head()

## <span id="ndvi_std_visualization_function">Define a Function to Visualize Values Over the Region [&#9652;](#ndvi_std_top)</span>

In [None]:
#Create a function for formatting our axes
def format_axis(axis, digits = None, suffix = ""):
    
    #Get Labels
    labels = axis.get_majorticklabels()
    
    #Exit if empty
    if len(labels) == 0: return
    
    #Create formatting function
    format_func = lambda x, pos: "{0}{1}".format(labels[pos]._text[:digits],suffix)
    
    #Use formatting function
    axis.set_major_formatter(FuncFormatter(format_func))
    

#Create a function for examining the z-score and NDVI of the region graphically
def examine(month = list(df["time"].dt.month.unique()), year = list(df["time"].dt.year.unique()), value_name = "z_score"):
    
    #This allows the user to pass single floats as values as well
    if type(month) is not list: month = [month]
    if type(year) is not list: year = [year]
          
    #pivoting the table to the appropriate layout
    piv = pd.pivot_table(df[df["time"].dt.year.isin(year) & df["time"].dt.month.isin(month)],
                         values=value_name,index=["latitude"], columns=["longitude"])
   
    #Sizing
    plt.rcParams["figure.figsize"] = [11,11]
    
    #Plot pivot table as heatmap using seaborn
    val_range = (-1.96,1.96) if value_name is "z_score" else (df[value_name].unique().min(),df[value_name].unique().max())
    ax = sns.heatmap(piv, square=False, cmap="RdYlGn",vmin=val_range[0],vmax=val_range[1], center=0)

    #Formatting        
    format_axis(ax.yaxis, 6)
    format_axis(ax.xaxis, 7) 
    plt.setp(ax.xaxis.get_majorticklabels(), rotation=90 )
    plt.gca().invert_yaxis()

Lets examine the average <font color=green>NDVI</font> across all months and years to get a look at the region

In [None]:
#It defaults to binning the entire range of months and years so we can just leave those parameters out
examine(value_name="ndvi")

This gives us an idea of the healthier areas of the region before we start looking at specific months and years.

## <span id="ndvi_std_baseline_mean_ndvi">Visualize the Baseline Average NDVI by Month [&#9652;](#ndvi_std_top)</span>

In [None]:
#Make labels for convenience
labels = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

#Initialize an empty pandas Series
df["z_score"] = pd.Series()

#declare list for population
binned_data = list()

#Calculate monthly binned z-scores from the composited monthly NDVI mean and store them
for i in range(12):
    
    #grab z_score and NDVI for the appropriate month
    temp  = df[["z_score", "ndvi"]][df["Month"] == i+1]
    
    #populate z_score
    df.loc[df["Month"] == i+1,"z_score"] = (temp["ndvi"] - temp["ndvi"].mean())/temp["ndvi"].std(ddof=0)
    
    #print the month next to its mean NDVI and standard deviation
    binned_data.append((labels[i], temp["ndvi"].mean(), temp["ndvi"].std()))

#Create dataframe for binned values
binned_data = pd.DataFrame.from_records(binned_data, columns=["Month","Mean", "Std_Dev"])
    
#print description for clarification
print("Monthly Average NDVI over Baseline Period")

#display binned data
binned_data

## <span id="ndvi_std_boxplot_analysis">Visualize the Baseline Distributions Binned by Month [&#9652;](#ndvi_std_top)</span>

In [None]:
#Set figure size to a larger size
plt.rcParams["figure.figsize"] = [16,9]

#Create the boxplot
df.boxplot(by="Month",column="ndvi")

#Create the mean line
plt.plot(binned_data.index+1, binned_data.Mean, 'r-')

#Create the one standard deviation away lines
plt.plot(binned_data.index+1, binned_data.Mean-binned_data.Std_Dev, 'b--')
plt.plot(binned_data.index+1, binned_data.Mean+binned_data.Std_Dev, 'b--')

#Create the two standard deviations away lines
plt.plot(binned_data.index+1, binned_data.Mean-(2*binned_data.Std_Dev), 'g-.', alpha=.3)
plt.plot(binned_data.index+1, binned_data.Mean+(2*binned_data.Std_Dev), 'g-.', alpha=.3)

The plot above shows the distributions for each individual month over the baseline period.
<br>
- The <b><font color=red>red</font></b> line is the mean line which connects the <b><em>mean values</em></b> for each month.  
    <br>
- The dotted <b><font color=blue>blue</font></b> lines are exactly <b><em>one standard deviation away</em></b> from the mean and show where the NDVI values fall within 68% of the time, according to the Empirical Rule.  
    <br>
- The <b><font color=green>green</font></b> dotted lines are <b><em>two standard deviations away</em></b> from the mean and show where an estimated 95% of the NDVI values are contained for that month.
<br>

<div class="alert-info"><font color=black> <em><b>NOTE: </b>You will notice a seasonal trend in the plot above.  If we had averaged the NDVI without binning, this trend data would be lost and we would end up comparing specific months to the average derived from all the months combined, instead of individually.</em></font>
</div>

## <span id="ndvi_std_violinplot_analysis">Visualize the Baseline Kernel Distributions Binned by Month [&#9652;](#ndvi_std_top)</span>
The violinplot has the advantage of allowing us to visualize kernel distributions but comes at a higher computational cost.

In [None]:
sns.violinplot(x=df.Month, y="ndvi", data=df)

<hr>  

## <span id="ndvi_std_pixelplot_analysis">Plot Z-Scores by Month and Year [&#9652;](#ndvi_std_top)</span>

### Pixel Plot Visualization

In [None]:
#Create heatmap layout from dataframe
img = pd.pivot_table(df, values="z_score",index=["Month"], columns=["Year"], fill_value=None)

#pass the layout to seaborn heatmap
ax = sns.heatmap(img, cmap="RdYlGn", annot=True, fmt="f", center = 0)

#set the title for Aesthetics
ax.set_title('Z-Score\n Regional Selection Averages by Month and Year')
ax.fill= None

Each block in the visualization above is representative of the deviation from the average for the region selected in a specific month and year.  The omitted blocks are times when there was no satellite imagery available.  Their values must either be inferred, ignored, or interpolated.

You may notice long vertical strips of red.  These are strong indications of drought since they deviate from the baseline consistently over a long period of time. 

## <span id="ndvi_std_heatmap_analysis">Further Examine Times Of Interest [&#9652;](#ndvi_std_top)</span>

### Use the function we created to examine times of interest

In [None]:
#Lets look at that drought in 2009 during the months of Aug-Oct

#This will generate a composite of the z-scores for the months and years selected
examine(month = [8], year = 2009, value_name="z_score")

Note:
This graphical representation of the region shows the amount of deviation from the mean for each pixel that was binned by month.

### Grid Layout of Selected Times

In [None]:
#Restrict input to a maximum of about 12 grids (months*year) for memory
def grid_examine(month = None, year = None, value_name = "z_score"):
    
    #default to all months then cast to list, if not already
    if month is None: month = list(df["Month"].unique())
    elif type(month) is int: month = [month]

    #default to all years then cast to list, if not already
    if year is None: year = list(df["Year"].unique())
    elif type(year) is int: year = [year]

    #get data within the bounds specified
    data = df[np.logical_and(df["Month"].isin(month) , df["Year"].isin(year))]
    
    #Set the val_range to be used as the vertical limit (vmin and vmax)
    val_range = (-1.96,1.96) if value_name is "z_score" else (df[value_name].unique().min(),df[value_name].unique().max())
    
    #create colorbar to export and use on grid
    Z = [[val_range[0],0],[0,val_range[1]]]
    CS3 = plt.contourf(Z, 200, cmap="RdYlGn")
    plt.clf()    
    
    
    #Define facet function to use for each tile in grid
    def heatmap_facet(*args, **kwargs):
        data = kwargs.pop('data')
        img = pd.pivot_table(data, values=value_name,index=["latitude"], columns=["longitude"], fill_value=None)
                
        ax = sns.heatmap(img, cmap="RdYlGn",vmin=val_range[0],vmax=val_range[1],
                         center = 0, square=True, cbar=False, mask = img.isnull())

        plt.setp(ax.xaxis.get_majorticklabels(), rotation=90 )
        plt.gca().invert_yaxis()
    
    
    #Create grid using the face function above
    with sns.plotting_context(font_scale=5.5):
        g = sns.FacetGrid(data, col="Year", row="Month", height=5,sharey=True, sharex=True) 
        mega_g = g.map_dataframe(heatmap_facet, "longitude", "latitude")      
        g.set_titles(col_template="Yr= {col_name}", fontweight='bold', fontsize=18)                         
       
        #Truncate axis tick labels using the format_axis function defined in block 13
        for ax in g.axes:
            format_axis(ax[0]._axes.yaxis, 6)
            format_axis(ax[0]._axes.xaxis, 7)
                
        #create a colorbox and apply the exported colorbar
        cbar_ax = g.fig.add_axes([1.015,0.09, 0.015, 0.90])
        cbar = plt.colorbar(cax=cbar_ax, mappable=CS3)

In [None]:
grid_examine(month=[8,9,10], year=[2008,2009,2010])