# Tidal regression

**What does this notebook do?** 

This notebook uses the ([OSU Tidal Prediction Software or OTPS](http://volkov.oce.orst.edu/tides/otps.html)) to tidally tag a time series of Landsat imagery, and then compute pixel-wise regression based on NDWI values.

**Requirements:** 

You need to run the following commands from the command line prior to launching jupyter notebooks from the same terminal so that the required libraries and paths are set:

`module use /g/data/v10/public/modules/modulefiles` 

`module load dea`

`module load otps`

If you find an error or bug in this notebook, please either create an 'Issue' in the Github repository, or fix it yourself and create a 'Pull' request to contribute the updated notebook back into the repository (See the repository [README](https://github.com/GeoscienceAustralia/dea-notebooks/blob/master/README.rst) for instructions on creating a Pull request).

**Date:** August 2018

**Authors:** Robbi Bishop-Taylor, Bex Dunn

## Import modules

In [3]:
import os
import sys
import datacube
import numpy as np
import pandas as pd
import xarray as xr
from otps import TimePoint
from otps import predict_tide
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
from datacube.utils import geometry
from datacube.utils.geometry import CRS

# Import external functions from dea-notebooks using relative link to Scripts
sys.path.append('../10_Scripts')
import DEAPlotting
import DEADataHandling

%load_ext autoreload
%autoreload 2

# Create datacube instance
dc = datacube.Datacube(app='Tidal regression')


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Import remotely-sensed time series data
Imports a time series of Landsat observations as a DEA `xarray` dataset.

In [None]:
# Set up analysis data query using 
lat, lon, buffer = -12.463, 130.885, 3500
x, y = geometry.point(lon, lat, CRS('WGS84')).to_crs(CRS('EPSG:3577')).points[0]
query = {'x': (x - buffer, x + buffer),
         'y': (y - buffer, y + buffer),         
         'crs': 'EPSG:3577',
         'time': ('1987-01-01', '2018-06-30')}

# Mask used to identify bad pixels
mask_dict = {'cloud_acca': 'no_cloud', 
             'cloud_fmask': 'no_cloud', 
             'cloud_shadow_acca':'no_cloud_shadow',
             'cloud_shadow_fmask':'no_cloud_shadow',
             'blue_saturated':False,
             'green_saturated':False,
             'red_saturated':False,
             'nir_saturated':False,
             'swir1_saturated':False,
             'swir2_saturated':False,
             'contiguous': True}

# Import data
data = DEADataHandling.load_clearlandsat(dc=dc, query=query, sensors=['ls5', 'ls7', 'ls8'],
                                         bands_of_interest=['swir1', 'nir', 'green'],
                                         mask_dict=mask_dict, masked_prop=0.8, apply_mask=True)

# Plot data
data[['swir1', 'nir', 'green']].isel(time=6).to_array().plot.imshow(robust=True)


Loading ls5 PQ


## Tidal modelling using OTPS
and extracts a list of timestamps based on the time and date of acquisition for each Landsat observation. These timestamps can then be used as one of the inputs to the [OSU Tidal Prediction Software (OTPS) tidal model](http://volkov.oce.orst.edu/tides/otps.html) to compute tidal heights at the time of acquisition of each Landsat observation.

In [None]:
# Extract list of datetimes based on Landsat time of acquisition for each image
observed_datetimes = data.time.data.astype('M8[s]').astype('O').tolist()

#Set a tide post: this is the location the OTPS model uses to compute tides for the supplied datetimes
tidepost_lat, tidepost_lon = -12.48315, 130.85540

# The OTPS model requires inputs as 'TimePoint' objects, which are combinations of lon-lat coordinates 
# and a datetime object. You can create a list of these with a list comprehension:
observed_timepoints = [TimePoint(tidepost_lon, tidepost_lat, dt) for dt in observed_datetimes]

# Feed the entire list of timepoints to the OTPS `predict_tide` function:
observed_predictedtides = predict_tide(observed_timepoints)

# For each of the predicted tide objects, extract a list of tidal heights in `m` units relative to mean 
# sea level (the `tide_m` method should not be confused with the `depth_m` method, which gives you the 
# ocean depth at the tide post location that is used by the OTPS model to predict tides)
observed_tideheights = [predictedtide.tide_m for predictedtide in observed_predictedtides]

# Create a dataframe of tidal heights for each Landsat observation
observed_df = pd.DataFrame({'tide_height': observed_tideheights}, 
                           index=pd.DatetimeIndex(observed_datetimes))

# Plot tidal heights against Landsat observation date
fig, ax = plt.subplots(figsize=(10, 4))
ax.scatter(observed_df.index, observed_df.tide_height, linewidth=0.6, zorder=1, label='Modelled')
ax.set_title('Landsat observations by tidal height (m)')
ax.set_ylabel('Tide height (m)');


### Tagging, filtering and compositing Landsat observations by tidal height/stage
Adds tidal height data back into our original `xarray` dataset so that each Landsat observation is correctly tagged with its corresponding tidal height. Tagged images can then be filtered or composited to study characteristics of the coastline at various tidal stages.

## Compute MNDWI for all timesteps

In [None]:
data['mndwi'] = (data.green - data.swir1) / (data.green + data.swir1)
data

In [None]:
data['mndwi'].plot(col='time', col_wrap=6, robust=True)

### Set up linear regression


In [1]:
len(data.y)

NameError: name 'data' is not defined

In [2]:
ones =np.ones((len(data.x), len(data.y)), dtype=np.float64)

NameError: name 'np' is not defined

In [None]:
ones

In [None]:
data.tide_heights=data.tide_heights*ones

In [None]:
data.mndwi

In [None]:
plt.scatter(data.mndwi, data.tide_heights)

In [None]:
from scipy import stats
import numpy as np

In [None]:
def linear_trend(ds):
    pf = np.polyfit(ds.tide_heights, ds, 1)
    # we need to return a dataarray or else xarray's groupby won't be happy
    return xr.DataArray(pf[0])

In [None]:
data=data.drop(['data_perc','swir1','nir','green'])

In [None]:
stacked = data.stack(space=['x','y'])

In [None]:
stacked

In [None]:
data

In [None]:
stacked.groupby('space').apply(stats.linregress)

In [None]:
stacked.tide_heights

In [None]:
stacked.mndwi

In [None]:
stats.linregress(stacked.tide_heights,stacked.mndwi)

In [None]:
data.tide_heights

In [None]:
plt.scatter(data_sorted.tide_heights, data_sorted.mndwi.mean(dim=['x','y']))

In [None]:
data_mask=np.isfinite(data)

In [None]:
data['tide_heights2']=data['mndwi']*

In [None]:
data_sorted

In [None]:
#This function applies a linear regression to a grid over a set time interval
def linear_regression_grid(input_array, mask_no_trend = False):
    '''
    This function applies a linear regression to a grid over a set time interval by looping through lat and lon 
    and calculating the linear regression through time for each pixel.
    '''
    print(input_array.tide_heights)
    ylen = len(input_array.y)
    xlen = len(input_array.x)
    from itertools import product
    coordinates = product(range(ylen), range(xlen))

    slopes = np.zeros((ylen, xlen))
    p_values = np.zeros((ylen, xlen))
    #print('Slope shape is ', slopes.shape)

    for y, x in coordinates:
        val = input_array.isel(x = x, y = y)
        print (val.tide_heights, val.mndwi)
        #slopes[y, x], intercept, r_sq, p_values[y, x], std_err = stats.linregress(val.tide_heights,val.mndwi)

#     #Get coordinates from the original xarray
#     lat  = input_array.coords['y']
#     long = input_array.coords['x']
#     #Mask out values with insignificant trends (ie. p-value > 0.05) if user wants
#     if mask_no_trend == True:
#         slopes[p_values>0.05]=np.nan        
#     # Write arrays into a x-array
#     slope_xr = xr.DataArray(slopes, coords = [lat, long], dims = ['y', 'x'])
#     p_val_xr = xr.DataArray(p_values, coords = [lat, long], dims = ['y', 'x']) 
#     return slope_xr, p_val_xr

In [None]:
linear_regression_grid(data_sorted)

In [None]:
data_sorted = data.sortby(data.tide_heights)

In [None]:
slope_xr, intercept, r_sq, p_val_xr, std_err = linear_regression_grid(data_sorted)

In [None]:
slope_xr, p_val_xr

In [None]:
p_val_xr.plot()

In [None]:
# #This function applies a linear regression to a grid over a set time interval
# def linear_regression_grid(input_array, mask_no_trend = False, NDVI = False):
#     '''
#     This function applies a linear regression to a grid over a set time interval by looping through lat and lon 
#     and calculating the linear regression through time for each pixel.
#     '''
#     print(input_array.year)
#     ylen = len(input_array.y)
#     xlen = len(input_array.x)
#     from itertools import product
#     coordinates = product(range(ylen), range(xlen))

#     slopes = np.zeros((ylen, xlen))
#     p_values = np.zeros((ylen, xlen))
#     print('Slope shape is ', slopes.shape)

#     for y, x in coordinates:
#         val = input_array.isel(x = x, y = y)
#         # If analysing NDVI data replace negative numbers which are spurious for NDVI with nans
#         if NDVI == True:
#             val[val<0] = np.nan

#             # Check that we have at least three values to perform our linear regression on
#             if np.count_nonzero(~np.isnan(val)) > 3:
#                 if str(val.dims[0]) == 'month':
#                     slopes[y, x], intercept, r_sq, p_values[y, x], std_err = stats.linregress(val.month,val)
#                 elif str(val.dims[0]) == 'year':
#                     slopes[y, x], intercept, r_sq, p_values[y, x], std_err = stats.linregress(val.year,val)
#             else:
#                 slopes[y, x] = np.nan
#                 intercept = np.nan
#                 r_sq = np.nan
#                 p_values[y, x] = np.nan
#         else:
#             if str(val.dims[0]) == 'month':
#                 slopes[y, x], intercept, r_sq, p_values[y, x], std_err = stats.linregress(val.month,val)
#             elif str(val.dims[0]) == 'year':
#                 slopes[y, x], intercept, r_sq, p_values[y, x], std_err = stats.linregress(val.year,val)

#     #Get coordinates from the original xarray
#     lat  = input_array.coords['y']
#     long = input_array.coords['x']
#     #Mask out values with insignificant trends (ie. p-value > 0.05) if user wants
#     if mask_no_trend == True:
#         slopes[p_values>0.05]=np.nan        
#     # Write arrays into a x-array
#     slope_xr = xr.DataArray(slopes, coords = [lat, long], dims = ['y', 'x'])
#     p_val_xr = xr.DataArray(p_values, coords = [lat, long], dims = ['y', 'x']) 
#     return slope_xr, p_val_xr