# Getting Started

**If you can read this you must have completed the CSIRO EASI Data Cube training environment on PC (easi-pc) installation. AWESOME!**

In this notebook we'll show you how to initialise and populate the sample data into your local install of the easi-pc. Almost the exact same approach can be used for any Open Data Cube installation but if you are using a hosted version (e.g. CSIRO Data Cube on AWS) data management will be controlled by the central authority and it will almost certainly provide other methods for user and shared data. For now though, you are the authority for your local installation.

## What we are about to do

1. Learn some tips about using the easi-pc notebooks
1. Initialise the database
2. Add an Earth Observation data product description to the database
3. Index some data in place without transformation
4. Ingest some data - make a copy of the data and transform it to a compute ready form to save on repeated calculations (e.g. reprojection, tiling, different file layout)

Along the way we will also learn some things about Docker and how to use it so you can save yourself from mistakes or save yourself some time. Keep an eye out for ___Docker tip:___. We'll also include ___Jupyter tip:___ and ___Play tip:___ along the way so you can have a better learning experience.

___Play tip:___ _The sample data is relatively small and its quite simple (and fast) to rebuild the easi-pc environment if you make a mistake or want to experiment with other data of your own and want to restart._


# Tips on using the easi-pc notebooks

___Jupyter tip:___ _You will see some common cells in all the training notebooks, particularly at the start. These usually setup notebook related environment information which impact how things are displayed. This next one starting with % tells jupyter we'd like all matplot lib graphics to be placed inline in the notebook, not in a separate window. We won't describe these over and over and of course an internet search will find most of these very easily._

In [1]:
%matplotlib inline

Here's another example of some common code. This time its straight Python (no special characters at the start). We use `pandas.DataFrame` objects to display our tables, so we will set some pandas settings to tweak their formatting so they look nice in the notebook.

In [2]:
import pandas
pandas.set_option('display.max_colwidth', 200)
pandas.set_option('display.max_rows', None)

One more example, by default python will display warnings in the output which display as red text in the output areas of the notebook. Most of these warnings are harmless unless you are developer (e.g. they are warning to let developers know a certain function is going to removed in the future and should be replaced by its new version). Whilst you can mostly just ignore the warning they can be repeated many times and clutter up the notebook display. Sometimes though things don't work and you want to turn the logging on so you can see what the error is and fix it.

# Initialise the database

When you first install the ODC docker images the database is completely blank and requires:
1. An ODC database schema to be initialise
1. EO product information (metadata) to be added that describe the EO data attributes. There are multiple of these dependent on our data sources
1. An index of the actual EO data

The ODC contains a set of command line utilities for initialising the database. First lets check to see what state the database is in and if we can connect to it:

___Jupyter tip:___ _You can execute a command line program from a Jupyter cell by proceeding with the ! mark. To do this on an actual command line you would remove the ! mark._


In [5]:
!datacube system check

Version:       [1m1.7+0.g98cf9ba3.dirty[0m
Config files:  [1m/home/jovyan/.datacube.conf[0m
Host:          [1mpostgres:5432[0m
Database:      [1modc[0m
User:          [1modc[0m
Environment:   [1mNone[0m
Index Driver:  [1mdefault[0m

Valid connection:	[1mYES[0m


Now lets initialise the database with the odc schema

In [6]:
!datacube system init

Initialising database...
[1mUpdated.[0m
Checking indexes/views.
Done.


# Add a product definition for Landsat data from USGS

In [7]:
!datacube product add ~/work/data-pipelines/landsat-usgs/ls875_usgs_sr_scene.yaml

Added "ls8_usgs_sr_scene"
Added "ls7_usgs_sr_scene"
Added "ls5_usgs_sr_scene"


Verify the product definition loaded correctly. We'll look into what this code does later but for now you when it is run you should see a neat little table and the name of the product we just added. Then the next cell will display the measurements that it supports

In [8]:
# A jupyter magic to ensure out matploblib displays are inline in the notebook
%matplotlib inline
# Import pandas and set some parameters so the cells display nicely in our notebook
import pandas
pandas.set_option('display.max_colwidth', 200)
pandas.set_option('display.max_rows', None)

import datacube
dc = datacube.Datacube()
products = dc.list_products()

display_columns = ['name', 'description', 'platform', 'instrument', 'crs', 'resolution']

products[display_columns]

Unnamed: 0_level_0,name,description,platform,instrument,crs,resolution
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
3,ls5_usgs_sr_scene,Landsat 5 USGS Collection 1 Level2 Surface Reflectance LEDAPS. 30m UTM based projection.,LANDSAT_5,TM,,
2,ls7_usgs_sr_scene,Landsat 7 USGS Collection 1 Level2 Surface Reflectance LEDAPS. 30m UTM based projection.,LANDSAT_7,ETM,,
1,ls8_usgs_sr_scene,Landsat 8 USGS Collection 1 Higher Level SR scene proessed using LaSRC. 30m UTM based projection.,LANDSAT_8,OLI_TIRS,,
4,modis_mcd43a1_tile,MODIS 500 metre MCD43A1 Collection 006,AQUA_TERRA,MODIS,"PROJCS[""unnamed"",GEOGCS[""Unknown datum based upon the custom spheroid"",DATUM[""Not specified (based on custom spheroid)"",SPHEROID[""Custom spheroid"",6371007.181,0]],PRIMEM[""Greenwich"",0],UNIT[""degre...",
5,modis_mcd43a2_tile,MODIS 500 metre MCD43A2 Collection 006,AQUA_TERRA,MODIS,"PROJCS[""unnamed"",GEOGCS[""Unknown datum based upon the custom spheroid"",DATUM[""Not specified (based on custom spheroid)"",SPHEROID[""Custom spheroid"",6371007.181,0]],PRIMEM[""Greenwich"",0],UNIT[""degre...",
6,modis_mcd43a3_tile,MODIS 500 metre MCD43A3 Collection 006,AQUA_TERRA,MODIS,"PROJCS[""unnamed"",GEOGCS[""Unknown datum based upon the custom spheroid"",DATUM[""Not specified (based on custom spheroid)"",SPHEROID[""Custom spheroid"",6371007.181,0]],PRIMEM[""Greenwich"",0],UNIT[""degre...",
7,modis_mcd43a4_tile,MODIS 500 metre MCD43A4 Collection 006,AQUA_TERRA,MODIS,"PROJCS[""unnamed"",GEOGCS[""Unknown datum based upon the custom spheroid"",DATUM[""Not specified (based on custom spheroid)"",SPHEROID[""Custom spheroid"",6371007.181,0]],PRIMEM[""Greenwich"",0],UNIT[""degre...",


In [9]:
# Get the measurements
measurements = dc.list_measurements()
# We can restrict which measurement attributes are displayed to reduce clutter
display_columns = ['units', 'nodata', 'aliases']
measurements[display_columns]

Unnamed: 0_level_0,Unnamed: 1_level_0,units,nodata,aliases
product,measurement,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ls5_usgs_sr_scene,blue,reflectance,-9999,"[band_1, sr_band1]"
ls5_usgs_sr_scene,green,reflectance,-9999,"[band_2, sr_band2]"
ls5_usgs_sr_scene,red,reflectance,-9999,"[band_3, sr_band3]"
ls5_usgs_sr_scene,nir,reflectance,-9999,"[band_4, sr_band4]"
ls5_usgs_sr_scene,swir1,reflectance,-9999,"[band_5, sr_band5]"
ls5_usgs_sr_scene,swir2,reflectance,-9999,"[band_7, sr_band7]"
ls5_usgs_sr_scene,lwir,reflectance,-9999,"[band_6, bt_band6]"
ls5_usgs_sr_scene,pixel_qa,bit_index,1,[pixel_qa]
ls7_usgs_sr_scene,blue,reflectance,-9999,"[band_1, sr_band1]"
ls7_usgs_sr_scene,green,reflectance,-9999,"[band_2, sr_band2]"


# Index some Landsat 8 data

First, lets check to see if you have the data in the right place. If the data is already unpacked you should see a list of directories (each line begins with drwx...)



In [10]:
!ls -al /data/ls8_USGS_ESPA_data/


total 380
drwxrwxrwx 2 root root  81920 Mar 21 02:54 .
drwxrwxrwx 2 root root   4096 Jan 17 02:05 ..
drwxrwxrwx 2 root root      0 Oct  9  2018 LC080900842017090401T1-SC20180921064929
drwxrwxrwx 2 root root      0 Oct  9  2018 LC080900842017092001T1-SC20180921064913
drwxrwxrwx 2 root root      0 Oct  9  2018 LC080900842017100601T1-SC20180921064103
drwxrwxrwx 2 root root      0 Oct  9  2018 LC080900842017102201T1-SC20180921063749
drwxrwxrwx 2 root root      0 Oct  9  2018 LC080900842017110701T1-SC20180921070114
drwxrwxrwx 2 root root      0 Oct  9  2018 LC080900842017112301T1-SC20180921063818
drwxrwxrwx 2 root root      0 Oct  9  2018 LC080900842017120901T1-SC20180921063946
drwxrwxrwx 2 root root      0 Oct  9  2018 LC080900842017122501T1-SC20180921065232
drwxrwxrwx 2 root root      0 Oct  9  2018 LC080900842018011001T1-SC20180921063935
drwxrwxrwx 2 root root      0 Oct  9  2018 LC080900842018012601T1-SC20180921083645
drwxrwxrwx 2 root root      0 Oct  9  2018 LC080900842018021101T1-SC2

Now we run prepare script which will go through all the directories and their content gathering up all the metadata required for the datacube index and verifying everything is as it should be

In [11]:
!rm -f /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
!touch /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml && python3 ~/work/data-pipelines/landsat-usgs/easi_prepare_ls_usgs_sr.py --output /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml /data/ls8_USGS_ESPA_data/LC*/

2019-07-18 05:45:40,166 INFO Processing /data/ls8_USGS_ESPA_data/LC080900842017090401T1-SC20180921064929
2019-07-18 05:45:40,290 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2019-07-18 05:45:40,290 INFO Processing /data/ls8_USGS_ESPA_data/LC080900842017092001T1-SC20180921064913
2019-07-18 05:45:40,347 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2019-07-18 05:45:40,347 INFO Processing /data/ls8_USGS_ESPA_data/LC080900842017100601T1-SC20180921064103
2019-07-18 05:45:40,403 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2019-07-18 05:45:40,403 INFO Processing /data/ls8_USGS_ESPA_data/LC080900842017102201T1-SC20180921063749
2019-07-18 05:45:40,448 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2019-07-18 05:45:40,448 INFO Processing /data/ls8_USGS_ESPA_data/LC080900842017110701T1-SC20180921070114
2019-07-18 05:45:40,483 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2019-07-18 05:45:40,483 INFO Processing /data/ls8_USGS_ESPA_data/LC0809008420171

In [None]:
!datacube dataset add /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml

# Landsat 7

A single time landsat 7 image is provided in the sample data. The indexing process is exactly the same as the above, just with a different set of directories.

In [11]:
!rm -f /data/ls7_USGS_data/ls7_usgs_sr.yaml
!touch /data/ls7_USGS_data/ls7_usgs_sr.yaml && python3 ~/work/data-pipelines/landsat-usgs/easi_prepare_ls_usgs_sr.py --output /data/ls7_USGS_data/ls7_usgs_sr.yaml /data/ls7_USGS_data/LE*/

2018-11-29 03:26:22,469 INFO Processing /data/ls7_USGS_data/LE071950542015121201T1-SC20170427222707
2018-11-29 03:26:22,528 INFO Writing /data/ls7_USGS_data/ls7_usgs_sr.yaml


In [12]:
!datacube dataset add /data/ls7_USGS_data/ls7_usgs_sr.yaml

fatal: not a git repository: /home/jovyan/odc/../.git/modules/datacube-core
  """)
