<a id="top"></a>
# Export Notebook  

<hr>

# Notebook Summary

The code in this notebook subsets a data cube, selects a specific set of variables, generates some additional data from those and then outputs that data into a GeoTIFF file. The goal is to be able to do external analyses of this data using other data analysis tools or GIS tools. The files would be reasonable in size, since we would restrict the region and parameters in the output.

<hr>

# Index

* [Import Dependencies and Connect to the Data Cube](#import)
* [Choose Platforms and Products](#plat_prod)
* [Get the Extents of the Cube](#extents)
* [Define the Extents of the Analysis](#define_extents)
* [Load Data from the Datacube](#retrieve_data)
* [Derive Products](#derive_products)
* [Combine Data](#combine_data)
* [Export Data](#export)
    * [Export to GeoTIFF](#export_geotiff)
    * [Export to NetCDF](#export_netcdf)
    
<hr>

## <span id="import">Import Dependencies and Connect to the Data Cube [&#9652;](#top)</span> 

In [1]:
import xarray as xr  
import numpy as np
import datacube
from utils.data_cube_utilities.data_access_api import DataAccessApi  

In [2]:
api = DataAccessApi()
dc = api.dc

## <span id="plat_prod">Choose Platforms and Products [&#9652;](#top)</span>

**List available products for each platform**

In [3]:
list_of_products = dc.list_products()
netCDF_products = list_of_products[list_of_products['format'] == 'NetCDF']
netCDF_products

Unnamed: 0_level_0,name,description,time,format,lon,label,product_type,creation_time,lat,instrument,platform,crs,resolution,tile_size,spatial_dimensions
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
13,ls7_ledaps_ghana,Landsat 7 USGS Collection 1 Higher Level SR sc...,,NetCDF,,,LEDAPS,,,ETM,LANDSAT_7,EPSG:4326,"(-0.000269494585236, 0.000269494585236)","(0.943231048326, 0.943231048326)","(latitude, longitude)"
17,ls7_ledaps_kenya,Landsat 7 USGS Collection 1 Higher Level SR sc...,,NetCDF,,,LEDAPS,,,ETM,LANDSAT_7,EPSG:4326,"(-0.000269493, 0.000269493)","(0.99981903, 0.99981903)","(latitude, longitude)"
18,ls7_ledaps_senegal,Landsat 7 USGS Collection 1 Higher Level SR sc...,,NetCDF,,,LEDAPS,,,ETM,LANDSAT_7,EPSG:4326,"(-0.000271152, 0.00027769)","(0.813456, 0.83307)","(latitude, longitude)"
16,ls7_ledaps_sierra_leone,Landsat 7 USGS Collection 1 Higher Level SR sc...,,NetCDF,,,LEDAPS,,,ETM,LANDSAT_7,EPSG:4326,"(-0.000269494585236, 0.000269494585236)","(0.943231048326, 0.943231048326)","(latitude, longitude)"
19,ls7_ledaps_tanzania,Landsat 7 USGS Collection 1 Higher Level SR sc...,,NetCDF,,,LEDAPS,,,ETM,LANDSAT_7,EPSG:4326,"(-0.000271277688070265, 0.000271139577954979)","(0.999929558226998, 0.999962763497961)","(latitude, longitude)"
31,ls7_ledaps_vietnam,Landsat 7 USGS Collection 1 Higher Level SR sc...,,NetCDF,,,LEDAPS,,,ETM,LANDSAT_7,EPSG:4326,"(-0.000269494585236, 0.000269494585236)","(0.943231048326, 0.943231048326)","(latitude, longitude)"
9,ls8_lasrc_ghana,Landsat 8 USGS Collection 1 Higher Level SR sc...,,NetCDF,,,LaSRC,,,OLI_TIRS,LANDSAT_8,EPSG:4326,"(-0.000269494585236, 0.000269494585236)","(0.943231048326, 0.943231048326)","(latitude, longitude)"
10,ls8_lasrc_kenya,Landsat 8 USGS Collection 1 Higher Level SR sc...,,NetCDF,,,LaSRC,,,OLI_TIRS,LANDSAT_8,EPSG:4326,"(-0.000271309115317046, 0.00026957992707863)","(0.999502780827996, 0.999602369607559)","(latitude, longitude)"
11,ls8_lasrc_senegal,Landsat 8 USGS Collection 1 Higher Level SR sc...,,NetCDF,,,LaSRC,,,OLI_TIRS,LANDSAT_8,EPSG:4326,"(-0.000271152, 0.00027769)","(0.813456, 0.83307)","(latitude, longitude)"
8,ls8_lasrc_sierra_leone,Landsat 8 USGS Collection 1 Higher Level SR sc...,,NetCDF,,,LaSRC,,,OLI_TIRS,LANDSAT_8,EPSG:4326,"(-0.000269494585236, 0.000269494585236)","(0.943231048326, 0.943231048326)","(latitude, longitude)"


**Choose product**

In [4]:
platform = "LANDSAT_7"
product = "ls7_ledaps_vietnam"

# platform = "LANDSAT_8"
# product = "ls8_lasrc_vietnam"

## <span id="extents">Get the Extents of the Cube [&#9652;](#top)</span>

In [5]:
from utils.data_cube_utilities.dc_load import get_product_extents
from utils.data_cube_utilities.dc_time import dt_to_str

full_lat, full_lon, min_max_dates = get_product_extents(api, platform, product)

# Print the extents of the combined data.
print("Latitude Extents:", full_lat)
print("Longitude Extents:", full_lon)
print("Time Extents:", list(map(dt_to_str, min_max_dates)))

Latitude Extents: (9.1762906272858, 13.964939912344285)
Longitude Extents: (102.4041694654867, 108.9310588253174)
Time Extents: ['1999-09-08', '2016-12-29']


In [6]:
## The code below renders a map that can be used to orient yourself with the region.
from utils.data_cube_utilities.dc_display_map import display_map
display_map(full_lat, full_lon)

## <span id="define_extents">Define the Extents of the Analysis [&#9652;](#top)</span>

In [7]:
######### Vietnam - Buan Tua Srah Lake ################## 
lon = (108.02, 108.15)
lat  = (12.18 , 12.30)

time_extents = ('2015-01-01', '2015-12-31')

In [8]:
from utils.data_cube_utilities.dc_display_map import display_map
display_map(lat, lon)

## <span id="retrieve_data">Load Data from the Data Cube [&#9652;](#top)</span>

In [9]:
landsat_dataset = dc.load(latitude = lat,
                          longitude = lon,
                          platform = platform,
                          time = time_extents,
                          product = product,
                          measurements = ['red', 'green', 'blue', 'nir', 'swir1', 'swir2', 'pixel_qa']) 

In [10]:
landsat_dataset

## <span id="derive_products">Derive Products [&#9652;](#top)</span> 

> ### Masks

In [11]:
from utils.data_cube_utilities.clean_mask import landsat_qa_clean_mask

clear_xarray = landsat_qa_clean_mask(landsat_dataset, platform, cover_types=['clear'])
water_xarray = landsat_qa_clean_mask(landsat_dataset, platform, cover_types=['water'])
shadow_xarray = landsat_qa_clean_mask(landsat_dataset, platform, cover_types=['shadow'])

clean_xarray = xr.ufuncs.logical_or(clear_xarray , water_xarray).rename("clean_mask")

> ### Water Classification

In [12]:
from utils.data_cube_utilities.dc_water_classifier import wofs_classify

water_classification = wofs_classify(landsat_dataset,
                                     clean_mask = clean_xarray.values, 
                                     mosaic = False) 

  return (a - b) / (a + b)


In [13]:
wofs_xarray = water_classification.wofs

> ###  Normalized Indices  

In [14]:
def NDVI(dataset):
    return ((dataset.nir - dataset.red)/(dataset.nir + dataset.red)).rename("NDVI")

In [15]:
def NDWI(dataset):
    return ((dataset.green - dataset.nir)/(dataset.green + dataset.nir)).rename("NDWI")

In [16]:
def NDBI(dataset):
        return ((dataset.swir2 - dataset.nir)/(dataset.swir2 + dataset.nir)).rename("NDBI")

In [17]:
ndbi_xarray = NDBI(landsat_dataset)  # Urbanization - Reds
ndvi_xarray = NDVI(landsat_dataset)  # Dense Vegetation - Greens
ndwi_xarray = NDWI(landsat_dataset)  # High Concentrations of Water - Blues  

>### TSM  

In [18]:
from utils.data_cube_utilities.dc_water_quality import tsm

tsm_xarray = tsm(landsat_dataset, clean_mask = wofs_xarray.values.astype(bool) ).tsm

> ### EVI  

In [19]:
def EVI(dataset, c1 = None, c2 = None, L = None):
        return ((dataset.nir - dataset.red)/((dataset.nir  + (c1 * dataset.red) - (c2 *dataset.blue) + L))).rename("EVI")

In [20]:
evi_xarray = EVI(landsat_dataset, c1 = 6, c2 = 7.5, L = 1 )

## <span id="combine_data">Combine Data [&#9652;](#top)</span>  

In [21]:
combined_dataset = xr.merge([landsat_dataset,
          ## <span id="combine_data">Combine Data [&#9652;](#top)</span>  clean_xarray,
          clear_xarray,
          water_xarray,
          shadow_xarray,
          evi_xarray,
          ndbi_xarray,
          ndvi_xarray,
          ndwi_xarray,
          wofs_xarray,
          tsm_xarray])

# Copy original crs to merged dataset 
combined_dataset = combined_dataset.assign_attrs(landsat_dataset.attrs)

combined_dataset

## <span id="export">Export Data [&#9652;](#top)</span>  

### <span id="export_geotiff">Export to GeoTIFF [&#9652;](#top)</span>  

Export each acquisition as a GeoTIFF.

In [22]:
from utils.data_cube_utilities.import_export import export_xarray_to_multiple_geotiffs

# Ensure the output directory exists before writing to it.
if platform == 'LANDSAT_7':
    !mkdir -p output/geotiffs/landsat7
else:
    !mkdir -p output/geotiffs/landsat8

output_path = "output/geotiffs/landsat{0}/landsat{0}".format(7 if platform=='LANDSAT_7' else 8)

export_xarray_to_multiple_geotiffs(combined_dataset, output_path)

Check to see what files were exported. The size of these files is also shown.

In [23]:
if platform == 'LANDSAT_7':
    !ls -lah output/geotiffs/landsat7/*.tif
else:
    !ls -lah output/geotiffs/landsat8/*.tif

-rw-rw-r-- 1 localuser localuser 14M Jan 12 04:13 output/geotiffs/landsat7/landsat7_2015_01_09_03_06_13.tif
-rw-rw-r-- 1 localuser localuser 14M Jan 12 04:13 output/geotiffs/landsat7/landsat7_2015_01_25_03_06_16.tif
-rw-rw-r-- 1 localuser localuser 14M Jan 12 04:13 output/geotiffs/landsat7/landsat7_2015_02_10_03_06_21.tif
-rw-rw-r-- 1 localuser localuser 14M Jan 12 04:13 output/geotiffs/landsat7/landsat7_2015_02_26_03_06_30.tif
-rw-rw-r-- 1 localuser localuser 14M Jan 12 04:13 output/geotiffs/landsat7/landsat7_2015_03_14_03_06_36.tif
-rw-rw-r-- 1 localuser localuser 14M Jan 12 04:13 output/geotiffs/landsat7/landsat7_2015_03_30_03_06_43.tif
-rw-rw-r-- 1 localuser localuser 14M Jan 12 04:13 output/geotiffs/landsat7/landsat7_2015_04_15_03_06_52.tif
-rw-rw-r-- 1 localuser localuser 14M Jan 12 04:13 output/geotiffs/landsat7/landsat7_2015_05_01_03_06_58.tif
-rw-rw-r-- 1 localuser localuser 14M Jan 12 04:13 output/geotiffs/landsat7/landsat7_2015_05_17_03_07_06.tif
-rw-rw-r-- 1 localu

Sanity check using `gdalinfo` to make sure that all of our bands exist    .

In [24]:
if platform == 'LANDSAT_7':
    !gdalinfo output/geotiffs/landsat7/landsat7_2015_01_09_03_06_13.tif
else:
    !gdalinfo output/geotiffs/landsat8/landsat8_2015_01_01_03_07_41.tif

Driver: GTiff/GeoTIFF
Files: output/geotiffs/landsat7/landsat7_2015_01_09_03_06_13.tif
Size is 483, 446
Coordinate System is:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["unknown"],
        AREA["World"],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (108.020032379927088,12.299867617463660)
Pixel Size = (0.000268936625432,-0.000268890337287)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (

Zip all GeoTIFFs.

In [25]:
if platform == 'LANDSAT_7':
    !tar -cvzf output/geotiffs/landsat7/landsat_7.tar.gz output/geotiffs/landsat7/*.tif
else:
    !tar -cvzf output/geotiffs/landsat8/landsat_8.tar.gz output/geotiffs/landsat8/*.tif

output/geotiffs/landsat7/landsat7_2015_01_09_03_06_13.tif
output/geotiffs/landsat7/landsat7_2015_01_25_03_06_16.tif
output/geotiffs/landsat7/landsat7_2015_02_10_03_06_21.tif
output/geotiffs/landsat7/landsat7_2015_02_26_03_06_30.tif
output/geotiffs/landsat7/landsat7_2015_03_14_03_06_36.tif
output/geotiffs/landsat7/landsat7_2015_03_30_03_06_43.tif
output/geotiffs/landsat7/landsat7_2015_04_15_03_06_52.tif
output/geotiffs/landsat7/landsat7_2015_05_01_03_06_58.tif
output/geotiffs/landsat7/landsat7_2015_05_17_03_07_06.tif
output/geotiffs/landsat7/landsat7_2015_06_02_03_07_11.tif
output/geotiffs/landsat7/landsat7_2015_08_05_03_07_28.tif
output/geotiffs/landsat7/landsat7_2015_08_21_03_07_30.tif
output/geotiffs/landsat7/landsat7_2015_09_06_03_07_32.tif
output/geotiffs/landsat7/landsat7_2015_10_08_03_07_43.tif
output/geotiffs/landsat7/landsat7_2015_10_24_03_08_01.tif
output/geotiffs/landsat7/landsat7_2015_11_09_03_08_15.tif
output/geotiffs/landsat7/landsat7_2015_11_25_03_08_31.tif
output/geotiff

### <span id="export_netcdf">Export to NetCDF [&#9652;](#top)</span>  

Export all acquisitions together as a single NetCDF.

In [26]:
combined_dataset

In [27]:
def export_xarray_to_netcdf(data, path):
    """
    Exports an xarray object as a single NetCDF file.

    Parameters
    ----------
    data: xarray.Dataset or xarray.DataArray
        The Dataset or DataArray to export.
    path: str
        The path to store the exported NetCDF file at.
        Must include the filename and ".nc" extension.
    """
    # Record original attributes to restore after export.
    orig_data_attrs = data.attrs.copy()
    orig_data_var_attrs = {}
    if isinstance(data, xr.Dataset):
        for data_var in data.data_vars:
            orig_data_var_attrs[data_var] = data[data_var].attrs.copy()

    # If present, convert the CRS object from the Data Cube to a string.
    # String and numeric attributes are retained.
    # All other attributes are removed.
    def handle_attr(data, attr):
        if attr == 'crs' and not isinstance(data.attrs[attr], str):
            data.attrs[attr] = data.crs.crs_str
        elif not isinstance(data.attrs[attr], (str, int, float)):
            del data.attrs[attr]

    # To be able to call `xarray.Dataset.to_netcdf()`, convert the CRS
    # object from the Data Cube to a string, retain string and numeric
    # attributes, and remove all other attributes.
    for attr in data.attrs:
        handle_attr(data, attr)
    if isinstance(data, xr.Dataset):
        for data_var in data.data_vars:
            for attr in list(data[data_var].attrs):
                handle_attr(data[data_var], attr)
    # Move units from the time coord attributes to its encoding.
    if 'time' in data.coords:
        orig_time_attrs = data.time.attrs.copy()
        if 'units' in data.time.attrs:
            time_units = data.time.attrs['units']
            del data.time.attrs['units']
            data.time.encoding['units'] = time_units
    # Export to NetCDF.
    data.to_netcdf(path)
    # Restore original attributes.
    data.attrs = orig_data_attrs
    if 'time' in data.coords:
        data.time.attrs = orig_time_attrs
    if isinstance(data, xr.Dataset):
        for data_var in data.data_vars:
            data[data_var].attrs = orig_data_var_attrs[data_var]

In [28]:
import os
# from utils.data_cube_utilities.import_export import export_xarray_to_netcdf

# Ensure the output directory exists before writing to it.
ls_num = 7 if platform=='LANDSAT_7' else 8
output_dir = f"output/netcdfs/landsat{ls_num}"
if not os.path.exists(output_dir):
    os.mkdir(output_dir)

output_file_path = output_dir + f"/ls{ls_num}_netcdf_example.nc"
# Remove the file if it exists to avoid an error.
# if os.path.isfile(output_file_path):
#     os.remove(output_file_path)
export_xarray_to_netcdf(combined_dataset, output_file_path)

Sanity check using `gdalinfo` to make sure that all of our bands exist  .  

In [29]:
if platform == 'LANDSAT_7':
    !gdalinfo output/netcdfs/landsat7/ls7_netcdf_example.nc
else:
    !gdalinfo output/netcdfs/landsat8/ls8_netcdf_example.nc

Driver: netCDF/Network Common Data Format
Files: output/netcdfs/landsat7/ls7_netcdf_example.nc
Size is 512, 512
Metadata:
  NC_GLOBAL#crs=EPSG:4326
Subdatasets:
  SUBDATASET_1_NAME=NETCDF:"output/netcdfs/landsat7/ls7_netcdf_example.nc":red
  SUBDATASET_1_DESC=[19x446x483] red (16-bit integer)
  SUBDATASET_2_NAME=NETCDF:"output/netcdfs/landsat7/ls7_netcdf_example.nc":green
  SUBDATASET_2_DESC=[19x446x483] green (16-bit integer)
  SUBDATASET_3_NAME=NETCDF:"output/netcdfs/landsat7/ls7_netcdf_example.nc":blue
  SUBDATASET_3_DESC=[19x446x483] blue (16-bit integer)
  SUBDATASET_4_NAME=NETCDF:"output/netcdfs/landsat7/ls7_netcdf_example.nc":nir
  SUBDATASET_4_DESC=[19x446x483] nir (16-bit integer)
  SUBDATASET_5_NAME=NETCDF:"output/netcdfs/landsat7/ls7_netcdf_example.nc":swir1
  SUBDATASET_5_DESC=[19x446x483] swir1 (16-bit integer)
  SUBDATASET_6_NAME=NETCDF:"output/netcdfs/landsat7/ls7_netcdf_example.nc":swir2
  SUBDATASET_6_DESC=[19x446x483] swir2 (16-bit integer)
  SUBDATA