# Make a geomedian composite, plot an RGB, save a netCDF and GeoTiff

This notebook takes data from one landsat sensor, plots it and then saves the output to netcdf and to geotiff!

Make sure you have run the bash script (by running source jload_stats.sh) on the command line before you open this notebook,
otherwise it won't work.

This notebook is setup to use an input polygon, but you can specify lat/long if you prefer.
You also need to update the input and output paths so you are saving to your own directories.

Note about landsat data availability:

- Landsat 5 - 1986 to April 1999 followed by a gap until May 2003 - November 2011 (data from 2009 onwards becomes less reliable in southern Australia)
- Landsat 7 - April 1999 to present, however after May 2003 the scan line corrector (SLC) failed, so data are referred to as SLC-off, meaning they've got a venetian blinds appearance with wedges of missing data. This data is not well suited for inclusion in composites, but is fine to use in time series analysis
- Landsat 8 - April 2013 onwards


Notebook created by Bex Dunn, Vanessa Newey, Josh Sixsmith and Erin Telfer, February 2018

### check that we have the correct modules loaded into your terminal session.

In [202]:
!module list

print(
'''your output above should look like this:
Currently Loaded Modulefiles:
  1) /agdc-py3-env/20171214   4) /agdc_statistics/0.9a7
  2) /agdc-py3/1.5.4          5) udunits/2.1.24
  3) /agdc-py3-prod/1.5.4     6) /dea-prod/20171219''')

Currently Loaded Modulefiles:
  1) pbs                      5) /agdc_statistics/0.9a7
  2) /agdc-py3-env/20171214   6) udunits/2.1.24
  3) /agdc-py3/1.5.4          7) /dea-prod/20171219
  4) /agdc-py3-prod/1.5.4
your output above should look like this:
Currently Loaded Modulefiles:
  1) /agdc-py3-env/20171214   4) /agdc_statistics/0.9a7
  2) /agdc-py3/1.5.4          5) udunits/2.1.24
  3) /agdc-py3-prod/1.5.4     6) /dea-prod/20171219


### import some modules

In [203]:
import pandas as pd
import xarray as xr
from datetime import date, timedelta
import gdal
from gdal import *
import numpy

import datacube
from datacube.helpers import ga_pq_fuser
from datacube.storage import masking
from datacube.storage.masking import mask_to_dict
from datacube_stats.statistics import GeoMedian

from matplotlib.backends.backend_pdf import PdfPages
from matplotlib import pyplot as plt
import matplotlib.dates

dc = datacube.Datacube(app='dc-tcw and geomedian')

#libraries for polygon and polygon mask
import fiona
import shapely.geometry
import rasterio.features
import rasterio
from datacube.utils import geometry

#for drawing rgb composite plots
from skimage import exposure
import numpy as np
from matplotlib.pyplot import imshow

#for writing to netcdf
from datacube.storage.storage import write_dataset_to_netcdf

### Define a function to deal with polygon inputs:

In [206]:
def open_polygon_from_shapefile(shapefile, index_of_polygon_within_shapefile=0):
    '''hide the messy process of getting a polygon input, opening it using fiona and getting the geopolygon out of it for the datacube query 
    within this function. It will also make sure you have the correct crs object for the DEA'''

    # open all the shapes within the shape file
    shapes = fiona.open(shapefile)
    i =index_of_polygon_within_shapefile
    print('shapefile index is '+str(i))
    if i > len(shapes):
        print('index not in the range for the shapefile'+str(i)+' not in '+str(len(shapes)))
        sys.exit(0)
    #copy attributes from shapefile and define shape_name
    geom_crs = geometry.CRS(shapes.crs_wkt)
    geo = shapes[i]['geometry']
    geom = geometry.Geometry(geo, crs=geom_crs)
    geom_bs = shapely.geometry.shape(shapes[i]['geometry'])
    shape_name = shapefile.split('/')[-1].split('.')[0]+'_'+str(i)
    print('the name of your shape is '+shape_name)
    #get your polygon out as a geom to go into the query, and the shape name for file names later
    return geom, shape_name

### Define a function to load nbart and pixel quality

In [207]:
def load_nbart(sensor,query, bands_of_interest): 
    '''loads nbart data for a sensor, masks using pq, then filters out terrain -999s
    function written 23-08-2017 based on dc v1.5.1'''  
    dataset = []
    product_name = '{}_{}_albers'.format(sensor, 'nbart')
    print('loading {}'.format(product_name))
    ds = dc.load(product=product_name, measurements=bands_of_interest,
                 group_by='solar_day', **query)
    #grab crs defs from loaded ds if ds exists
    if ds:
        crs = ds.crs
        affine = ds.affine
        print('loaded {}'.format(product_name))
        mask_product = '{}_{}_albers'.format(sensor, 'pq')
        sensor_pq = dc.load(product=mask_product, fuse_func=ga_pq_fuser,
                            group_by='solar_day', **query)
        if sensor_pq:
            print('making mask {}'.format(mask_product))
            cloud_free = masking.make_mask(sensor_pq.pixelquality,
                                           cloud_acca='no_cloud',
                                           cloud_shadow_acca = 'no_cloud_shadow',                           
                                           cloud_shadow_fmask = 'no_cloud_shadow',
                                           cloud_fmask='no_cloud',
                                           blue_saturated = False,
                                           green_saturated = False,
                                           red_saturated = False,
                                           nir_saturated = False,
                                           swir1_saturated = False,
                                           swir2_saturated = False,
                                           contiguous=True)
            ds = ds.where(cloud_free)
            ds.attrs['crs'] = crs
            ds.attrs['affine'] = affine
            print('masked {} with {} and filtered terrain'.format(product_name,mask_product))
            # nbarT is correctly used to correct terrain by replacing -999.0 with nan
            ds=ds.where(ds!=-999.0)
        else: 
            print('did not mask {} with {}'.format(product_name,mask_product))
    else:
        print ('did not load {}'.format(product_name)) 

    if len(ds)>0:
        return ds
    else:
        return None

### define a function to draw true color plots

In [7]:
def drawTrueColour(ds, time = 0):
    '''code by Mike Barnes Feb 2018 draws true color plots. 
    altered by bex for drawing composites with no time dimension'''
    #t, y, x = ds['red'].shape
    y, x = ds['red'].shape
    rawimg = np.zeros((y,x,3), dtype = np.float32)
    for i, colour in enumerate(['red','green','blue']):
        #rawimg[:,:,i] = ds[colour][time].values
        rawimg[:,:,i] = ds[colour].values
    rawimg[rawimg == -999] = np.nan
    img_toshow = exposure.equalize_hist(rawimg, mask = np.isfinite(rawimg))
    fig = plt.figure(figsize=[10,10])
    imshow(img_toshow)
    ax = plt.gca()
    #ax.set_title(str(ds.time[time].values))


### define a function to write to netcdf based on the datacube 'write_dataset_to_netcdf' function

In [209]:
def write_your_netcdf(data, dataset_name, filename, crs):
    '''this function turns an xarray dataarray into a dataset so we can write it to netcdf. It adds on a crs definition
    from the original array. data = your xarray dataset, dataset_name is a string describing your variable'''    
    #turn array into dataset so we can write the netcdf
    if isinstance(data,xr.DataArray):
        dataset= data.to_dataset(name=dataset_name)
    elif isinstance(data,xr.Dataset):
        dataset = data
    else:
        print('your data might be the wrong type, it is: '+type(data))
    #grab our crs attributes to write a spatially-referenced netcdf
    dataset.attrs['crs'] = crs
    #dataset.attrs['affine'] =affine
    #dataset.dataset_name.attrs['crs'] = crs
    try:
        write_dataset_to_netcdf(dataset, filename)
    except RuntimeError as err:
        print("RuntimeError: {0}".format(err))        

### define a function to write to geotiff based on advice from Josh Sixsmith

In [212]:
def write_your_geotiff(filename, data):
    '''this function uses rasterio and numpy to write a multi-band geotiff for one timeslice, or for
    a single composite image. It assumes the input data is an xarray dataset (note, dataset not dataarray)
    and that you have crs and affine objects attached, and that you are using float data. future users
    may wish to assert that these assumptions are correct. Bex Dunn+ Josh Sixsmith 260218'''
    
    kwargs = {'driver': 'GTiff', 'count': len(data.data_vars),#geomedian no time dim
              'width': data.sizes['x'], 'height': data.sizes['y'],
              'crs' : data.crs.crs_str,
              'transform':data.affine,
              'dtype': data.blue.values.dtype,
              'nodata': 0,'compress': 'deflate', 'zlevel': 4, 'predictor': 3 }#for ints use 2 for floats use 3}

    with rasterio.open(filename, 'w', **kwargs) as src:
        for i, band in enumerate(data.data_vars):
            src.write(data[band].data, i+1)


### *Change these to specify input and output directories*

In [214]:
###input folder is where your data is coming from, so you don't have to type it all the time:
input_folder = '/g/data/r78/rjd547/groundwater_activities/Whole_NA/WholeNA_shapes/small_shapes/'

###Where you want your data to go when it's saved
output_folder = '/g/data/r78/rjd547/groundwater_activities/Arafura_sw/'

### Define input area

#### get geometry and name your shape by getting a polygon from a shapefile

In [215]:
geom, shape_name = open_polygon_from_shapefile(input_folder+'arafura_sml.shp')

shapefile index is 0
the name of your shape is arafura_sml_0


#### Uncomment for lat/long specs instead of polygon:

In [216]:
# #Specify lat and long max and minimum corners
# lat_min = -20.375 #down
# lat_max = -20.340 #up
# lon_min = 148.757 #left
# lon_max = 148.806 #right

### Set up the datacube query by specifying polygon area, spectral bands, epoch of interest and which landsat sensors you require

In [217]:
#pick a time range
start_of_epoch = '2014-01-01'
end_of_epoch =  '2017-12-31'

#Define wavelengths/bands of interest, remove this kwarg to retrieve all bands
bands_of_interest = ['blue',
                     'green',
                     'red',
                     'nir',
                     'swir1',
                     'swir2'
                     ]

query = {
    'time': (start_of_epoch, end_of_epoch), 'geopolygon': geom,
}

### load the data

In [218]:
#this is done separately instead of in a loop because the datasets can be quite large.
#currently this is a way of memory handling -there is probably a better way of doing it.
#sensor1_nbart=load_nbart('ls5',query, bands_of_interest)
#sensor2_nbart=load_nbart('ls7',query,bands_of_interest)
sensor3_nbart=load_nbart('ls8',query,bands_of_interest)

loading ls8_nbart_albers


  # Remove the CWD from sys.path while we load stuff.


loaded ls8_nbart_albers




making mask ls8_pq_albers
masked ls8_nbart_albers with ls8_pq_albers and filtered terrain




 ### sort the data to make sure the time dimension is sorted properly

In [None]:
nbart=sensor3_nbart.sortby('time')

### Run the geomedian

In [None]:
#geomedian transform
nbart_gm=GeoMedian().compute(nbart)

In [None]:
#check that the geomedian is in there
print(nbart_gm)

### draw a true color plot just to prove we can

In [None]:
drawTrueColour(nbart_gm)

### *set up a fancy filename*:

In [None]:
filename = output_folder+shape_name+'_'+start_of_epoch+'_'+end_of_epoch+'_ls8_gmcomposite3'+'.nc'
print(filename)

###  Write out to netcdf

In [None]:
write_your_netcdf(nbart_gm,'nbart_gm',filename=filename, crs=nbart_gm.crs)

### *set up a fancy filename*:

In [None]:
filename = output_folder+shape_name+'_'+start_of_epoch+'_'+end_of_epoch+'_ls8_gmcomposite3'+'.tif'
print(filename)

###  Write out to geotiff!!!

In [None]:
write_your_geotiff(filename, nbart_gm)

## You are done. Now go check your files to make sure everything worked!!!