# Comparing current and collection upgrade Landsat data

**What does this notebook do?** 

This notebook demonstrates how to load matching data from both the current collection and the collection upgrade databases, make both datasets consistent, then conduct some very basic comparisons of values for each band. This is intended to serve as a starting point for more advanced comparisons of the two collections. 

**Requirements:**

You need to run the following commands from the command line prior to launching jupyter notebooks from the same terminal so that the required libraries and paths are set:

`module use /g/data/v10/public/modules/modulefiles` 

`module load dea` 

This notebook uses external functions `rgb` and `display_map`. These functions is available in the `10_Scripts` folder of the [dea-notebooks Github repository](https://github.com/GeoscienceAustralia/dea-notebooks/tree/master/10_Scripts). Note that these functions have been developed by DEA users, not the DEA development team, and so are provided without warranty. If you find an error or bug in the functions, please either create an 'Issue' in the Github repository, or fix it yourself and create a 'Pull' request to contribute the updated function back into the repository (See the repository [README](https://github.com/GeoscienceAustralia/dea-notebooks/blob/master/README.rst) for instructions on creating a Pull request).

**Date:** February 2019

**Author:** Robbi Bishop-Taylor

## Database access
Create a config file in your home directory named `.ard-interoperability_tmp.conf` containing the following info:

```
[datacube]
db_hostname: agdcstaging-db.nci.org.au
db_port:     6432
db_database: ard_interop
```    

## Load modules

In [14]:
import datacube 
import xarray as xr
from datacube.helpers import ga_pq_fuser
from datacube.storage import masking
import fiona
from datacube.utils import geometry

dc = datacube.Datacube()
import os
import sys
sys.path.append('../10_Scripts')
import DEAPlotting, SpatialTools
from datacube.storage.storage import write_dataset_to_netcdf

output_dir = '/g/data/r78/vmn547/delwp/output'
shape_file = '/g/data/r78/vmn547/delwp/SHPs_CKrause/GHCMA_Lower G3_region.shp'
shp_name = shape_file.split('/')[-1].split('.')[0].replace(' ','_')
with fiona.open(shape_file) as shapes:
    crs = geometry.CRS(shapes.crs_wkt) 
    print(crs)
    shapes_list = list(shapes)

%load_ext autoreload
%autoreload 2


PROJCS["GDA_1994_MGA_Zone_54",GEOGCS["GCS_GDA_1994",DATUM["Geocentric_Datum_of_Australia_1994",SPHEROID["GRS_1980",6378137.0,298.257222101],TOWGS84[0,0,0,0,0,0,0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",10000000.0],PARAMETER["Central_Meridian",141.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0],AUTHORITY["EPSG","28354"]]
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Set up query and analysis parameters
Here we set a centroid for the area we want to compare, and set up CRS, resolution and resampling that will be applied to both collectiondatasets. The values below extract both collections to match the collection upgrade CRS and resolution (UTM zone 56 S and 30m pixels).

In [15]:
# Centre point of spatial query
lat, lon = -33.324, 149.09
time_period = ('1987-01-01', '2019-04-01')

# Desired output resolution and projection for both datasets

output_resolution = (30, 30)
output_resamp_continuous = 'bilinear'
output_resamp_categorical = 'nearest'

product ='nbart'

# Bands/measurements to load

currentcollection_bands = ['red', 'blue', 'green', 'nir', 'swir1', 'swir2']


## Load in current collection data

In [16]:
from datetime import datetime
for shape in shapes_list:
    first_geometry = shape['geometry']
    shp_id = shape['properties']['ID']

    geom = geometry.Geometry(first_geometry, crs=crs)
    current_time = datetime.now()
    time_period = ('2019-01-01', current_time.strftime('%m/%d/%Y'))
    query = {'geopolygon': geom, 'time': time_period}
    # Set up query

    xarray_dict = {}
    for sensor in ['ls5','ls7','ls8']:

        # Load data 
        landsat_ds = dc.load(product = f'{sensor}_{product}_albers', 
                             measurements = currentcollection_bands,
                             group_by = 'solar_day', 
                             **query)
        
        if len(landsat_ds.attrs)>0:
            
            # Load PQ data 
            landsat_pq = dc.load(product = f'{sensor}_pq_albers', 
                                 measurements = ['pixelquality'],
                                 group_by = 'solar_day', 
                                 **query)                       

            # Filter to subset of Landsat observations that have matching PQ data 
            time = (landsat_ds.time - landsat_pq.time).time
            landsat_ds = landsat_ds.sel(time=time)
            landsat_pq = landsat_pq.sel(time=time)

            # Create PQ mask
            good_quality = masking.make_mask(landsat_pq.pixelquality,
                                             cloud_acca='no_cloud',
                                             cloud_shadow_acca='no_cloud_shadow',
                                             cloud_shadow_fmask='no_cloud_shadow',
                                             cloud_fmask='no_cloud',
                                             blue_saturated=False,
                                             green_saturated=False,
                                             red_saturated=False,
                                             nir_saturated=False,
                                             swir1_saturated=False,
                                             swir2_saturated=False,
                                             contiguous=True) 

            # Apply mask to set all PQ-affected pixels to NaN and set nodata to NaN
            landsat_ds = landsat_ds.where(good_quality)
            
            # Add result to dict
            xarray_dict[sensor] = landsat_ds
        #fname = os.path.join(output_dir, f'{product}/collection2/c2_{sensor}_{product}_albers_2012_01_2019_02.nc')
        #write_dataset_to_netcdf(landsat_ds,fname)
    # Concatenate multiple sensors into one dataset
    landsat_currentcollection = xr.concat(xarray_dict.values(), dim='time')
    landsat_currentcollection = landsat_currentcollection.sortby('time')
    fname = os.path.join(output_dir, f'c2_ls_{product}_albers_{shp_name}_{shp_id}_test.nc')
    write_dataset_to_netcdf(landsat_currentcollection,fname)
landsat_currentcollection

<xarray.Dataset>
Dimensions:  (time: 6, x: 1456, y: 1115)
Coordinates:
  * y        (y) float64 -4.117e+06 -4.117e+06 ... -4.145e+06 -4.145e+06
  * x        (x) float64 8.248e+05 8.248e+05 8.248e+05 ... 8.611e+05 8.612e+05
  * time     (time) datetime64[ns] 2019-01-01T00:21:52 ... 2019-02-26T00:15:29.500000
Data variables:
    red      (time, y, x) float64 1.083e+03 1.097e+03 ... 2.007e+03 1.899e+03
    blue     (time, y, x) float64 503.0 518.0 516.0 ... nan 1.229e+03 996.0
    green    (time, y, x) float64 784.0 785.0 795.0 ... nan 1.554e+03 1.374e+03
    nir      (time, y, x) float64 2.229e+03 2.263e+03 ... 3.092e+03 3.043e+03
    swir1    (time, y, x) float64 3.238e+03 3.33e+03 ... 4.252e+03 4.267e+03
    swir2    (time, y, x) float64 2.11e+03 2.225e+03 ... 3.148e+03 2.941e+03
Attributes:
    crs:      EPSG:3577