This notebook loads and/or generates vegetation related products (NDVI, FC, TC, WofS) for a specific area
and exports them as jpegs for further analysis/viewing in ArcGIS

In [1]:
%matplotlib inline
import sys
import warnings
import matplotlib.pyplot as plt
import calendar
import numpy as np
import xarray as xr

import dask
from dask.utils import parse_bytes
from dask.distributed import Client, LocalCluster

import datacube
from datacube.storage import masking
from datacube.helpers import write_geotiff
from datacube.utils.rio import configure_s3_access
from datacube.utils.dask import start_local_dask

from psutil import virtual_memory, cpu_count

# Load custom DEA notebook functions
sys.path.append('../dea-notebooks/Scripts')
import dea_datahandling
import dea_plotting
import DEADataHandling
from dea_bandindices import calculate_indices

Set up a dask cluster

This will help keep our memory use down and conduct the analysis in parallel. If you'd like to view the dask dashboard, click on the hyperlink that prints below the cell.

The parameters for generating the local dask cluster are automatically generated, but if you wish to alter them use the documentation here - https://distributed.dask.org/en/stable/local-cluster.html. Put simply, the code below identifies how many cpus and how much RAM the computer has and generates a local cluster using those variables.


In [2]:
# Figure out how much memory/cpu we really have (those are set by jupyterhub)
cpu_limit = cpu_count()
cpu_limit = int(cpu_limit) if cpu_limit > 0 else 4

mem_limit = virtual_memory()
mem_limit = mem_limit.total
mem_limit = mem_limit if mem_limit > 0 else parse_bytes('8Gb')

# leave 3Gb for notebook itself
mem_limit -= parse_bytes('3Gb')

# close previous client if any, so we can re-run this cell without issues
client = locals().get('client', None)
if client is not None:
    client.close()
    del client

# start up a local cluster
client = start_local_dask(n_workers=1,
                          threads_per_worker=cpu_limit,
                          memory_limit=mem_limit)

# show the dask cluster settings
display(client)

Port 8787 is already in use. 
Perhaps you already have a cluster running?
Hosting the diagnostics dashboard on a random port instead.


0,1
Client  Scheduler: tcp://127.0.0.1:33879  Dashboard: http://127.0.0.1:34800/status,Cluster  Workers: 1  Cores: 8  Memory: 30.67 GB




Initialise the data cube. 'app' argument is used to identify this app. It does not influence the analysis.
Note Fractional Cover is not in the DEA Collection 3 yet so will for the time being be loaded using from
Collection 2 using old functions.

In [3]:
try:
    dc_landsat3 = datacube.Datacube(app='VegAnalysis-WD', env='c3-samples')
except:
    dc_landsat3 = datacube.Datacube(app='VegAnalysis-WD')


Create spatial and temporal query. This is used for both collections (Collection 3 - query_3; Collection 2 - query_2). 
If running this notebook locally, use the smaller spatial extent and subset of the time series. If running on gadi, the larger extent covers the full Western Davenport study area and the full time-series should be used. 
Note, the Fractional Cover products (defined in query_2) are from Collection 2 and only go up to 2018. This can be updated once FC is added to Collection 3. FC and Wofs also start from 1987 rather than 1984.

In [4]:
query_3 = {
#        'lon': (134.24, 134.34),             # small test area - this works via vdi
#        'lat': (-20.79, -20.86),             # small test area - this works via vdi
        'lon': (132.07, 132.50),             # small test area - current test extent on vdi
        'lat': (-20.31, -21.00),             # small test area - current test extent on vdi
#        'lon': (132.07, 135.36),             # full study area
#        'lat': (-20.31, -22.11),             # full study area
#        'time':('2015-01', '2018-12'),       # subset of time-series
        'time':('1987-01', '2018-12'),       # full time-series
        'output_crs': 'EPSG:28352',
         'resolution': (30, 30),
         'group_by': 'solar_day'
}


Specify the months (0-11) that represent the dry season for the area of interest

In [5]:
dry_months = [5,6,7]

Load Red and NIR Landsat data from Collection 3 using `.load_ard`. This bands will be converted to NDVI

In [18]:
ds = dea_datahandling.load_ard(dc=dc_landsat3,
        mask_dtype = np.float16,
        products=['ga_ls5t_ard_3', 'ga_ls7e_ard_3', 'ga_ls8c_ard_3'], 
        measurements=['nbart_red','nbart_nir','nbart_green',
                      'nbart_blue','nbart_swir_1','nbart_swir_2'],
        mask_contiguity='nbart_contiguity',
        dask_chunks = {'x':500, 'y':500},
        **query_3)

Loading ga_ls5t_ard_3 data
    Applying pixel quality/cloud mask
    Applying invalid data mask
    Applying contiguity mask
Loading ga_ls7e_ard_3 data
    Applying pixel quality/cloud mask
    Applying invalid data mask
    Applying contiguity mask
Loading ga_ls8c_ard_3 data
    Applying pixel quality/cloud mask
    Applying invalid data mask
    Applying contiguity mask
Combining and sorting data
    Returning 1780 observations as a dask array


Calculate various indices using calculate_indices function.

In [19]:
# Tasselled Cap Wetness
calculate_indices(ds,index = 'TCW', collection = 'ga_ls_3',
        normalise = True, deep_copy = False)

# Tasselled Cap Brightness
calculate_indices(ds,index = 'TCB', collection = 'ga_ls_3',
        normalise = True, deep_copy = False)

# Tasselled Cap Greenness
calculate_indices(ds,index = 'TCG', collection = 'ga_ls_3',
        normalise = True, deep_copy = False)

# Normalised Difference Vegetation Index
calculate_indices(ds,index = 'NDVI', collection = 'ga_ls_3',
        normalise = True, deep_copy = False)

# Leaf Area Index
calculate_indices(ds,index = 'LAI', collection = 'ga_ls_3',
        normalise = True, deep_copy = False)

# Normalised Difference Wetness Index
calculate_indices(ds,index = 'NDWI', collection = 'ga_ls_3',
        normalise = True, deep_copy = False)

Calculate standard deviations and mean for the the dry season

In [39]:
mean_TCW_dry = ds.TCW['time.month'].isin(dry_months)
mean_TCW_dry = ds.TCW.groupby('time.month').mean(dim = 'time')
mean_TCW_dry = mean_TCW_dry.mean(dim = 'month')

mean_TCB_dry = ds.TCB['time.month'].isin(dry_months)
mean_TCB_dry = ds.TCB.groupby('time.month').mean(dim = 'time')
mean_TCB_dry = mean_TCB_dry.mean(dim = 'month')

mean_TCG_dry = ds.TCG['time.month'].isin(dry_months)
mean_TCG_dry = ds.TCG.groupby('time.month').mean(dim = 'time')
mean_TCG_dry = mean_TCG_dry.mean(dim = 'month')

ndvi = ds.NDVI
mean_ndvi = ndvi.groupby('time.month').mean(dim = 'time')
mean_ndvi = mean_ndvi.mean(dim = 'month')

std_ndvi = ndvi.groupby('time.month').std(dim = 'time')
std_ndvi = std_ndvi.std(dim = 'month')

std_ndvi_dry = ndvi[ndvi['time.month'].isin(dry_months)]
std_ndvi_dry = std_ndvi_dry.groupby('time.month').std(dim = 'time')
std_ndvi_dry = std_ndvi_dry.std(dim = 'month')

std_ndvi_diff1 = ndvi.groupby('time.month').std(dim = 'time').isel(month = 0)
std_ndvi_diff2 = ndvi.groupby('time.month').std(dim = 'time').isel(month = 7)
std_ndvi_diff = std_ndvi_diff1 - std_ndvi_diff2

mean_ndvi_dry = ndvi[ndvi['time.month'].isin(dry_months)]
mean_ndvi_dry = mean_ndvi_dry.groupby('time.month').mean(dim = 'time')
mean_ndvi_dry = mean_ndvi_dry.mean(dim = 'month')

mean_LAI_dry = ds.LAI['time.month'].isin(dry_months)
mean_LAI_dry = ds.LAI.groupby('time.month').mean(dim = 'time')
mean_LAI_dry = mean_LAI_dry.mean(dim = 'month')



Exporting data

In order to use the datacube.helpers write_geotiff function to export a simple single-band, single time-slice geotiff the above xarray DataArrays need to be converted to xarray Datasets. We do this be using the xarray function .to_dataset. If you don't do this, the write_geotiff fucntion will return an error. 
We also need to reassign the coordinate reference system before the write_geotiff function will work. This is done by the .attrs function. We take the crs from the original imported data (ds).
Each file will be exported as a geotiff and saved in the same directory as this notebook. It can be downloaded from this location to the GA network using FileZilla.

In [26]:
#set variable for path to save files
savefilepath = '/g/data/zk34/ljg547/Outputs/'

# Set project naming convention. Start and end dates are reformated to remove '-'.
Proj = 'SSC_WD_'

ds_startDate = str(ds.isel(time=0).time.values)[0:10]
ds_startDate = str(ds_startDate[0:4] + f'{int(ds_startDate[6:7]):02d}' + 
              f'{int(ds_startDate[9:10]):02d}')

ds_endDate = str(ds.isel(time=-1).time.values)[0:10]
ds_endDate = str(ds_endDate[0:4] + f'{int(ds_endDate[6:7]):02d}' + 
              f'{int(ds_endDate[9:10]):02d}')


Generating naming convention for dry season files based on Project area (Proj), specified dry season and time series start and end dates. 

In [65]:
mean_TCW_dry = ds.TCW['time.month'].isin(dry_months)
mean_TCW_dry = ds.TCW.groupby('time.month').mean(dim = 'time')
mean_TCW_dry = mean_TCW_dry.mean(dim = 'month')

mean_TCW_dry

In [66]:
arr = mean_TCW_dry.astype(dtype='float32')
arr

In [67]:
arr = arr.to_dataset(name='mean_TCW_dry')
arr


In [68]:
arr.attrs = ds.attrs
arr

In [69]:
fname = str(savefilepath + Proj + 'meanTCW_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.tif')


In [70]:
write_geotiff(dataset = arr, filename = fname)



KeyboardInterrupt: 



In [42]:
# Export data
#arr = mean_TCW_dry.to_dataset(name='mean_TCW_dry')
#arr.attrs = ds.attrs
fname = str(savefilepath + Proj + 'meanTCW_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.tif')
write_geotiff(dataset = arr, filename = fname)

# Create metadata file. w - writes, r - reads, a- appends
f = open(savefilepath + Proj + 'meanTCW_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.txt','w')  

f.write("Tasselled Cap Wetness for the dry season (" + 
        str(dry_months[0]+1) + "-" + str(dry_months[-1]+1) + " month)" +  
        " from " + ds_startDate + "-" + ds_endDate + "." + "\n" +
        "TCW_dry_mean is the mean value of TCW over the dry months."+ "\n" +
        "This product was derived from VegProducts_Export_Coll3_Dask.ipynb"
    )

f.close()


TypeError: invalid dtype: 'float16'

In [None]:
# Export data
arr = mean_TCB_dry.to_dataset(name='mean_TCB_dry')
arr.attrs = ds.attrs
fname = str(savefilepath + Proj + 'meanTCB_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.tif')
write_geotiff(dataset=arr, filename=fname)

# Create metadata file. w - writes, r - reads, a- appends
f = open(savefilepath + Proj + 'meanTCB_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.txt','w')  

f.write("Tasselled Cap Brigthness for the dry season (" + 
        str(dry_months[0]+1) + "-" + str(dry_months[-1]+1) + " month)" +  
        " from " + ds_startDate + "-" + ds_endDate + "." + "\n" +
        "TCB_dry_mean is the mean value of TCB over the dry months."+ "\n" +
        "This product was derived from VegProducts_Export_Coll3_Dask.ipynb"
    )

f.close()

In [None]:
# Export data
arr = mean_TCG_dry.to_dataset(name='mean_TCG_dry')
arr.attrs = ds.attrs
fname = str(savefilepath + Proj + 'meanTCG_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.tif')
write_geotiff(dataset=arr, filename=fname)

# Create metadata file. w - writes, r - reads, a- appends
f = open(savefilepath + Proj + 'meanTCG_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.txt','w')  

f.write("Tasselled Cap Greenness for the dry season (" + 
        str(dry_months[0]+1) + "-" + str(dry_months[-1]+1) + " month)" +  
        " from " + ds_startDate + "-" + ds_endDate + "." + "\n" +
        "TCG_dry_mean is the mean value of TCG over the dry months."+ "\n" +
        "This product was derived from VegProducts_Export_Coll3_Dask.ipynb"
    )

f.close()

In [None]:
# Export data
arr = mean_ndvi.to_dataset(name='mean_ndvi')
arr.attrs = ds.attrs
fname = str(savefilepath + Proj + 'meanNDVI_' +
              ds_startDate + '_' + ds_endDate + '.tif')
write_geotiff(dataset=arr, filename=fname)

# Create metadata file. w - writes, r - reads, a- appends
f = open(savefilepath + Proj + 'meanNDVI_' +
              ds_startDate + '_' + ds_endDate + '.txt','w')  

f.write("Mean NDVI for all months" + " from " + ds_startDate + 
      "-" + ds_endDate + "." + "\n" +
      "This product was derived from VegProducts_Export_Coll3_Dask.ipynb"
    )

f.close()

In [None]:
# Export data
arr = std_ndvi.to_dataset(name='std_ndvi')
arr.attrs = ds.attrs
fname = str(savefilepath + Proj + 'stdNDVI_' +
              ds_startDate + '_' + ds_endDate + '.tif')
write_geotiff(dataset=arr, filename=fname)

# Create metadata file. w - writes, r - reads, a- appends
f = open(savefilepath + Proj + 'stdNDVI_' +
              ds_startDate + '_' + ds_endDate + '.txt','w')  

f.write("Standard deviation of NDVI for all months" + " from " + ds_startDate + 
      "-" + ds_endDate + "." + "\n" + 
      "Higher standard deviation suggests greater variation in vegetation greenness and therefore inferred water supply."
      "This product was derived from VegProducts_Export_Coll3_Dask.ipynb"
    )

f.close()


In [None]:
# Export data
arr = std_ndvi_dry.to_dataset(name='std_ndvi_dry')
arr.attrs = ds.attrs
fname = str(savefilepath + Proj + 'stdNDVI_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.tif')
write_geotiff(dataset=arr, filename=fname)

# Create metadata file. w - writes, r - reads, a- appends
f = open(savefilepath + Proj + 'stdNDVI_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.txt','w')  

f.write("Standard deviation of NDVI for the dry season(" + 
        str(dry_months[0]+1) + "-" + str(dry_months[-1]+1) + " month)" +  
        " from " + ds_startDate + "-" + ds_endDate + "." + "\n" + 
      "Higher standard deviation suggests greater variation in vegetation greenness and therefore inferred water supply."
      "This product was derived from VegProducts_Export_Coll3_Dask.ipynb"
    )

f.close()


In [None]:
# Export data
arr = std_ndvi_diff.to_dataset(name='std_ndvi_diff')
arr.attrs = ds.attrs
fname = str(savefilepath + Proj + 'stdNDVI_DiffJanAug_' +
              ds_startDate + '_' + ds_endDate + '.tif')
write_geotiff(dataset=arr, filename=fname)

# Create metadata file. w - writes, r - reads, a- appends
f = open(savefilepath + Proj + 'stdNDVI_DiffJanAug_' +
              ds_startDate + '_' + ds_endDate + '.txt','w')  

f.write("Comparison between NDVI standard deviation during the wet season (January) ad at the end of the dry season (August)." + "\n" 
      "Time series includes imagery from " + ds_startDate + "-" + ds_endDate + "." + "\n" + 
      "Where vegetation is accessing more reliable water sources (e.g. groundwater), residual standard deviation is low." + "\n"
      "This product was derived from VegProducts_Export_Coll3_Dask.ipynb"
    )

f.close()


In [None]:
# Export data
arr = mean_ndvi_dry.to_dataset(name='mean_ndvi_dry')
arr.attrs = ds.attrs
fname = str(savefilepath + Proj + 'meanNDVI_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.tif')
write_geotiff(dataset=arr, filename=fname)

# Create metadata file. w - writes, r - reads, a- appends
f = open(savefilepath + Proj + 'meanTCW_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.txt','w')  

f.write("NDVI of dry period (" + str(dry_months[0]+1) + "-" + str(dry_months[-1]+1) + " month)" +  
      " from " + ds_startDate + "-" + ds_endDate + "." + "\n" +
      "NDVI_dry_mean is the mean value of NDVI over the dry months."+ "\n"
      "This product was derived from VegProducts_Export_Coll3_Dask.ipynb"
    )

f.close()


In [None]:
# Export data
arr = mean_LAI_dry.to_dataset(name='mean_LAI_dry')
arr.attrs = ds.attrs
fname = str(savefilepath + Proj + 'meanLAI_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.tif')
write_geotiff(dataset=arr, filename=fname)

# Create metadata file. w - writes, r - reads, a- appends
f = open(savefilepath + Proj + 'meanLAI_DrySeason' +
              str(dry_months[0]+1) + 'to' + str(dry_months[-1]+1) +
              '_' + ds_startDate + '_' + ds_endDate + '.txt','w')  

f.write("Leaf Area Index for the dry season (" + 
        str(dry_months[0]+1) + "-" + str(dry_months[-1]+1) + " month)" +  
        " from " + ds_startDate + "-" + ds_endDate + "." + "\n" +
        "LAI_dry_mean is the mean value of LAI over the dry months."+ "\n" +
        "This product was derived from VegProducts_Export_Coll3_Dask.ipynb"
    )

f.close()
