# Exporting to NetCDF format  

> **Description**  
> The code in this notebook subsets a data cube, selects a specific set of variables, and then outputs that data into a netCDF or GeoTIFF file. The goal is to be able to do external analyses of this data using other data analysis tools or GIS tools. The files would be reasonable in size, since we would restrict the region and parameters in the output.

----  

# Boiler Plate, Loading Data

> ### Import the Datacube

In [1]:
import datacube
dc = datacube.Datacube(app = 'my_app', config = '/home/localuser/.datacube.conf')

  """)


>### Browse the available Data Cubes on the storage platform    
> You might want to learn more about what data is stored and how it is stored.


In [2]:
list_of_products = dc.list_products()
netCDF_products = list_of_products[list_of_products['format'] == 'NetCDF']
netCDF_products

Unnamed: 0_level_0,name,description,platform,creation_time,lon,lat,label,instrument,product_type,time,format,crs,resolution,tile_size,spatial_dimensions
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
13,ls7_ledaps_ghana,Landsat 7 USGS Collection 1 Higher Level SR sc...,LANDSAT_7,,,,,ETM,LEDAPS,,NetCDF,EPSG:4326,"(-0.000269494585236, 0.000269494585236)","(0.943231048326, 0.943231048326)","(latitude, longitude)"
17,ls7_ledaps_kenya,Landsat 7 USGS Collection 1 Higher Level SR sc...,LANDSAT_7,,,,,ETM,LEDAPS,,NetCDF,EPSG:4326,"(-0.000269493, 0.000269493)","(0.99981903, 0.99981903)","(latitude, longitude)"
18,ls7_ledaps_senegal,Landsat 7 USGS Collection 1 Higher Level SR sc...,LANDSAT_7,,,,,ETM,LEDAPS,,NetCDF,EPSG:4326,"(-0.000271152, 0.00027769)","(0.813456, 0.83307)","(latitude, longitude)"
16,ls7_ledaps_sierra_leone,Landsat 7 USGS Collection 1 Higher Level SR sc...,LANDSAT_7,,,,,ETM,LEDAPS,,NetCDF,EPSG:4326,"(-0.000269494585236, 0.000269494585236)","(0.943231048326, 0.943231048326)","(latitude, longitude)"
19,ls7_ledaps_tanzania,Landsat 7 USGS Collection 1 Higher Level SR sc...,LANDSAT_7,,,,,ETM,LEDAPS,,NetCDF,EPSG:4326,"(-0.000271277688070265, 0.000271139577954979)","(0.999929558226998, 0.999962763497961)","(latitude, longitude)"
31,ls7_ledaps_vietnam,Landsat 7 USGS Collection 1 Higher Level SR sc...,LANDSAT_7,,,,,ETM,LEDAPS,,NetCDF,EPSG:4326,"(-0.000269494585236, 0.000269494585236)","(0.943231048326, 0.943231048326)","(latitude, longitude)"
9,ls8_lasrc_ghana,Landsat 8 USGS Collection 1 Higher Level SR sc...,LANDSAT_8,,,,,OLI_TIRS,LaSRC,,NetCDF,EPSG:4326,"(-0.000269494585236, 0.000269494585236)","(0.943231048326, 0.943231048326)","(latitude, longitude)"
10,ls8_lasrc_kenya,Landsat 8 USGS Collection 1 Higher Level SR sc...,LANDSAT_8,,,,,OLI_TIRS,LaSRC,,NetCDF,EPSG:4326,"(-0.000271309115317046, 0.00026957992707863)","(0.999502780827996, 0.999602369607559)","(latitude, longitude)"
11,ls8_lasrc_senegal,Landsat 8 USGS Collection 1 Higher Level SR sc...,LANDSAT_8,,,,,OLI_TIRS,LaSRC,,NetCDF,EPSG:4326,"(-0.000271152, 0.00027769)","(0.813456, 0.83307)","(latitude, longitude)"
8,ls8_lasrc_sierra_leone,Landsat 8 USGS Collection 1 Higher Level SR sc...,LANDSAT_8,,,,,OLI_TIRS,LaSRC,,NetCDF,EPSG:4326,"(-0.000269494585236, 0.000269494585236)","(0.943231048326, 0.943231048326)","(latitude, longitude)"


>### Pick a product  
>Use the platform names from the previous block to select a small Data Cube. The data_access_api utility will give you lat, lon, and time bounds of your Data Cube.   

In [3]:
import utils.data_cube_utilities.data_access_api as dc_api  
api = dc_api.DataAccessApi(config = '/home/localuser/.datacube.conf')

platform = "LANDSAT_7"
# product = "ls7_ledaps_vietnam"
product = "ls7_ledaps_ghana"

# Get product extents
prod_extents = api.get_query_metadata(platform=platform, product=product, measurements=[])#.get_full_dataset_extent(platform = platform, product = product)

latitude_extents = prod_extents['lat_extents']
print("Lat bounds:", latitude_extents)
longitude_extents = prod_extents['lon_extents']
print("Lon bounds:", longitude_extents)
time_extents = list(map(lambda time: time.strftime('%Y-%m-%d'), prod_extents['time_extents']))
print("Time bounds:", time_extents)

Lat bounds: (3.772924193304, 11.318772579912)
Lon bounds: (-3.772924193304, 1.886462096652)
Time bounds: ['2000-01-01', '2017-12-28']


  if not dataset:


# Visualize Data Cube Region

In [4]:
## The code below renders a map that can be used to orient yourself with the region.
from utils.data_cube_utilities.dc_display_map import display_map
display_map(latitude = latitude_extents, longitude = longitude_extents)

> #### Picking a smaller analysis region

In [5]:
# ######### Colombia - Cartegena ##################
# longitude_extents = (-74.863, -74.823)
# latitude_extents = (1.326, 1.357)

######### Vietnam - Buan Tua Srah Lake ################## 
# longitude_extents = (108.02, 108.15)
# latitude_extents  = (12.18 , 12.30)

######### Vietnam - Central Lam Dong Province ################## 
# longitude_extents = (107.8118, 108.0314)
# latitude_extents  = (11.7408, 11.8990)

######## Kenya - Lake Nakuru ##################
# longitude_extents = (36.02, 36.13)
# latitude_extents = (-0.42, -0.28) 

# Ghana
longitude_extents = (0.1, 0.2)
latitude_extents = (7.0, 7.1)

time_extents = ('2015-01-01', '2016-01-01')
print ( time_extents )

('2015-01-01', '2016-01-01')


In [6]:
display_map(latitude = latitude_extents, longitude = longitude_extents)

In [7]:
landsat_dataset = dc.load(latitude = latitude_extents,
                          longitude = longitude_extents,
                          platform = platform,
                          time = time_extents,
                          product = product,
                         measurements = ['red', 'green', 'blue', 'nir', 'swir1', 'swir2', 'pixel_qa']) 

In [8]:
landsat_dataset

<xarray.Dataset>
Dimensions:    (latitude: 372, longitude: 372, time: 13)
Coordinates:
  * time       (time) datetime64[ns] 2015-01-12T10:13:52 ... 2015-12-30T10:16:40
  * latitude   (latitude) float64 7.1 7.1 7.099 7.099 ... 7.001 7.001 7.0 7.0
  * longitude  (longitude) float64 0.1001 0.1004 0.1007 ... 0.1996 0.1998 0.2001
Data variables:
    red        (time, latitude, longitude) int16 1122 1152 1122 ... 608 579 551
    green      (time, latitude, longitude) int16 1093 1092 1157 ... 658 657 657
    blue       (time, latitude, longitude) int16 1109 1171 1109 ... 686 686 717
    nir        (time, latitude, longitude) int16 1063 1105 1105 ... 448 448 448
    swir1      (time, latitude, longitude) int16 785 823 823 823 ... 259 222 259
    swir2      (time, latitude, longitude) int16 592 630 552 592 ... 169 208 169
    pixel_qa   (time, latitude, longitude) int32 224 224 224 224 ... 68 68 68 68
Attributes:
    crs:      EPSG:4326

# Derive Several Products

>### Unpack pixel_qa

In [9]:
import xarray as xr  
import numpy as np

def ls7_unpack_qa( data_array , cover_type):  
    
    land_cover_endcoding = dict( fill     =  [1], 
                                 clear    =  [66,  130], 
                                 water    =  [68,  132],
                                 shadow   =  [72,  136],
                                 snow     =  [80,  112, 144, 176],
                                 cloud    =  [96,  112, 160, 176, 224],
                                 low_conf =  [66,  68,  72,  80,  96,  112],
                                 med_conf =  [130, 132, 136, 144, 160, 176],
                                 high_conf=  [224]
                               ) 
    boolean_mask = np.isin(data_array.values, land_cover_endcoding[cover_type]) 
    return xr.DataArray(boolean_mask.astype(int),
                        coords = data_array.coords,
                        dims = data_array.dims,
                        name = cover_type + "_mask",
                        attrs = data_array.attrs)  

In [10]:
clear_xarray  = ls7_unpack_qa(landsat_dataset.pixel_qa, "clear")  
water_xarray  = ls7_unpack_qa(landsat_dataset.pixel_qa, "water")

shadow_xarray = ls7_unpack_qa(landsat_dataset.pixel_qa, "shadow")  

In [11]:
clean_xarray = xr.ufuncs.logical_or(clear_xarray , water_xarray).astype(np.int8).rename("clean_mask")

clean_mask = np.logical_or(clear_xarray.values.astype(bool),
                           water_xarray.values.astype(bool)) 

> ### Water

In [12]:
from utils.data_cube_utilities.dc_water_classifier import wofs_classify

water_classification = wofs_classify(landsat_dataset,
                                     clean_mask = clean_mask, 
                                     mosaic = False) 

In [13]:
wofs_xarray = water_classification.wofs

> ###  Normalized Indices  

In [14]:
def NDVI(dataset):
    return ((dataset.nir - dataset.red)/(dataset.nir + dataset.red)).rename("NDVI")

In [15]:
def NDWI(dataset):
    return ((dataset.green - dataset.nir)/(dataset.green + dataset.nir)).rename("NDWI")

In [16]:
def NDBI(dataset):
    return ((dataset.swir2 - dataset.nir)/(dataset.swir2 + dataset.nir)).rename("NDBI")

In [17]:
ndbi_xarray = NDBI(landsat_dataset)  # Urbanization - Reds
ndvi_xarray = NDVI(landsat_dataset)  # Dense Vegetation - Greens
ndwi_xarray = NDWI(landsat_dataset)  # High Concentrations of Water - Blues  

>### TSM  

In [18]:
from utils.data_cube_utilities.dc_water_quality import tsm

tsm_xarray = tsm(landsat_dataset, clean_mask = wofs_xarray.values.astype(bool) ).tsm

> ### EVI  

In [19]:
def EVI(dataset, c1 = None, c2 = None, L = None):
        return ((dataset.nir - dataset.red)/((dataset.nir  + (c1 * dataset.red) - (c2 *dataset.blue) + L))).rename("EVI")

In [20]:
evi_xarray = EVI(landsat_dataset, c1 = 6, c2 = 7.5, L = 1 )

# Combine Everything  

In [21]:
combined_dataset = xr.merge([landsat_dataset,
          clean_xarray,
          clear_xarray,
          water_xarray,
          shadow_xarray,
          evi_xarray,
          ndbi_xarray,
          ndvi_xarray,
          ndwi_xarray,
          wofs_xarray,
          tsm_xarray])

# Copy original crs to merged dataset 
combined_dataset = combined_dataset.assign_attrs(landsat_dataset.attrs)

combined_dataset

<xarray.Dataset>
Dimensions:      (latitude: 372, longitude: 372, time: 13)
Coordinates:
  * time         (time) datetime64[ns] 2015-01-12T10:13:52 ... 2015-12-30T10:16:40
  * latitude     (latitude) float64 7.1 7.1 7.099 7.099 ... 7.001 7.001 7.0 7.0
  * longitude    (longitude) float64 0.1001 0.1004 0.1007 ... 0.1998 0.2001
Data variables:
    red          (time, latitude, longitude) float32 1122.0 1152.0 ... 551.0
    green        (time, latitude, longitude) float32 1093.0 1092.0 ... 657.0
    blue         (time, latitude, longitude) float32 1109.0 1171.0 ... 717.0
    nir          (time, latitude, longitude) float32 1063.0 1105.0 ... 448.0
    swir1        (time, latitude, longitude) float32 785.0 823.0 ... 222.0 259.0
    swir2        (time, latitude, longitude) float32 592.0 630.0 ... 208.0 169.0
    pixel_qa     (time, latitude, longitude) int32 224 224 224 224 ... 68 68 68
    clean_mask   (time, latitude, longitude) int8 0 0 0 0 0 0 0 ... 1 1 1 1 1 1
    clear_mask   (time, la

# Export NetCDF

In [22]:
# Ensure the output directory exists before writing to it.
!mkdir -p output/netcdfs/landsat7
output_file_name  =  "output/netcdfs/landsat7/ls7_netcdf_example.nc"
# Remove the file if it exists to avoid an error.
import os
if os.path.isfile(output_file_name):
    os.remove(output_file_name)
dataset_to_output =  combined_dataset
combined_dataset = combined_dataset.assign_attrs(landsat_dataset.attrs)
datacube.storage.storage.write_dataset_to_netcdf(dataset_to_output, output_file_name)

----  

Sanity check using `gdalinfo` to make sure that all of our bands exist  

In [23]:
!gdalinfo output/netcdfs/landsat7/ls7_netcdf_example.nc

Driver: netCDF/Network Common Data Format
Files: output/netcdfs/landsat7/ls7_netcdf_example.nc
Size is 512, 512
Coordinate System is `'
Metadata:
  NC_GLOBAL#Conventions=CF-1.6, ACDD-1.3
  NC_GLOBAL#date_created=2018-11-02T20:03:04.163835
  NC_GLOBAL#geospatial_bounds=POLYGON ((0.099982491122556 7.10010434262766,0.099982491122556 6.99985235691986,0.200234476830348 6.99985235691986,0.200234476830348 7.10010434262766,0.099982491122556 7.10010434262766))
  NC_GLOBAL#geospatial_bounds_crs=EPSG:4326
  NC_GLOBAL#geospatial_lat_max=7.100104342627657
  NC_GLOBAL#geospatial_lat_min=6.999852356919865
  NC_GLOBAL#geospatial_lat_units=degrees_north
  NC_GLOBAL#geospatial_lon_max=0.200234476830348
  NC_GLOBAL#geospatial_lon_min=0.09998249112255601
  NC_GLOBAL#geospatial_lon_units=degrees_east
  NC_GLOBAL#history=NetCDF-CF file created by datacube version '1.6.1+139.g9509423c' at 20181102.
Subdatasets:
  SUBDATASET_1_NAME=NETCDF:"output/netcdfs/landsat7/ls7_netcdf_example.nc":red
 