# Getting Started

**If you can read this you must have completed the CSIRO EASI Data Cube training environment on PC (easi-pc) installation. AWESOME!**

In this notebook we'll show you how to initialise and populate the sample data into your local install of the easi-pc. Almost the exact same approach can be used for any Open Data Cube installation but if you are using a hosted version (e.g. CSIRO Data Cube on AWS) data management will be controlled by the central authority and it will almost certainly provide other methods for user and shared data. For now though, you are the authority for your local installation.

## What we are about to do

1. Learn some tips about using the easi-pc notebooks
1. Initialise the database
2. Add an Earth Observation data product description to the database
3. Index some data in place without transformation
4. Ingest some data - make a copy of the data and transform it to a compute ready form to save on repeated calculations (e.g. reprojection, tiling, different file layout)

Along the way we will also learn some things about Docker and how to use it so you can save yourself from mistakes or save yourself some time. Keep an eye out for ___Docker tip:___. We'll also include ___Jupyter tip:___ and ___Play tip:___ along the way so you can have a better learning experience.

___Play tip:___ _The sample data is relatively small and its quite simple (and fast) to rebuild the easi-pc environment if you make a mistake or want to experiment with other data of your own and want to restart._


# Tips on using the easi-pc notebooks

___Jupyter tip:___ _You will see some common cells in all the training notebooks, particularly at the start. These usually setup notebook related environment information which impact how things are displayed. This next one starting with % tells jupyter we'd like all matplot lib graphics to be placed inline in the notebook, not in a separate window. We won't describe these over and over and of course an internet search will find most of these very easily._

In [2]:
%matplotlib inline

Here's another example of some common code. This time its straight Python (no special characters at the start). We use `pandas.DataFrame` objects to display our tables, so we will set some pandas settings to tweak their formatting so they look nice in the notebook.

In [None]:
import pandas
pandas.set_option('display.max_colwidth', 200)
pandas.set_option('display.max_rows', None)

One more example, by default python will display warnings in the output which display as red text in the output areas of the notebook. Most of these warnings are harmless unless you are developer (e.g. they are warning to let developers know a certain function is going to removed in the future and should be replaced by its new version). Whilst you can mostly just ignore the warning they can be repeated many times and clutter up the notebook display. Sometimes though things don't work and you want to turn the logging on so you can see what the error is and fix it.
Thankfully you can show and hide errors in the notebook. In the tutorial notebooks you will find the cell that follows placed at the end of the notebook. If you execute the cell it will create a link you can press to toggle the error display for the entire notebook. You only need to run the cell once but there is no harm in running it multiple times.

In [5]:
from IPython.display import HTML

HTML('''<script>
code_show_err=false; 
function code_toggle_err() {
 if (code_show_err){
 $('div.output_error').hide();
 $('div.output_stderr').hide();
 } else {
 $('div.output_error').show();
 $('div.output_stderr').show();
 }
 code_show_err = !code_show_err
} 
$( document ).ready(code_toggle_err);
</script>
<form action="javascript:code_toggle_err()"><input type="submit" value="Click here to toggle on/off the error output."></form>''')


The cell below produces an error as the module isn't found. Execute the cell above so you can see the toggle button in the output then run the cell below. Click the Toggle button and you should see the error section show/hide.

In [6]:
import SomethingThatDoesntExistSoThereIsAnError

ModuleNotFoundError: No module named 'SomethingThatDoesntExistSoThereIsAnError'

___Play tip:___ _This error suppresion code has nothing to do with the easi-pc. You can copy and paste that cell into any notebook you might have and reuse it_

# Initialise the database

When you first install the ODC docker images the database is completely blank and requires:
1. An ODC database schema to be initialise
1. EO product information (metadata) to be added that describe the EO data attributes. There are multiple of these dependent on our data sources
1. An index of the actual EO data

The ODC contains a set of command line utilities for initialising the database. First lets check to see what state the database is in and if we can connect to it:

___Jupyter tip:___ _You can execute a command line program from a Jupyter cell by proceeding with the ! mark. To do this on an actual command line you would remove the ! mark._


In [2]:
!datacube system check

fatal: not a git repository: /home/jovyan/odc/../.git/modules/datacube-core
  """)
Version:       [1m0+unknown[0m
Config files:  [1m/home/jovyan/.datacube.conf[0m
Host:          [1mpostgres:5432[0m
Database:      [1modc[0m
User:          [1modc[0m
Environment:   [1mNone[0m
Index Driver:  [1mdefault[0m

Valid connection:	Database not initialised: 

No DB schema exists. Have you run init?
	datacube system init


Now lets initialise the database with the odc schema

In [4]:
!datacube system init

fatal: not a git repository: /home/jovyan/odc/../.git/modules/datacube-core
  """)
Initialising database...
[1mUpdated.[0m
Checking indexes/views.
Done.


# Add a product definition for Landsat data from USGS

In [5]:
!datacube product add ~/work/data-pipelines/landsat-usgs/ls875_usgs_sr_scene.yaml

fatal: not a git repository: /home/jovyan/odc/../.git/modules/datacube-core
  """)
Added "ls8_usgs_sr_scene"
Added "ls7_usgs_sr_scene"
Added "ls5_usgs_sr_scene"


Verify the product definition loaded correctly. We'll look into what this code does later but for now you when it is run you should see a neat little table and the name of the product we just added. Then the next cell will display the measurements that it supports

In [6]:
# A jupyter magic to ensure out matploblib displays are inline in the notebook
%matplotlib inline
# Import pandas and set some parameters so the cells display nicely in our notebook
import pandas
pandas.set_option('display.max_colwidth', 200)
pandas.set_option('display.max_rows', None)

import datacube
dc = datacube.Datacube()
products = dc.list_products()

display_columns = ['name', 'description', 'platform', 'instrument', 'crs', 'resolution']

products[display_columns]

  """)


Unnamed: 0_level_0,name,description,platform,instrument,crs,resolution
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
3,ls5_usgs_sr_scene,Landsat 5 USGS Collection 1 Level2 Surface Reflectance LEDAPS. 30m UTM based projection.,LANDSAT_5,TM,,
2,ls7_usgs_sr_scene,Landsat 7 USGS Collection 1 Level2 Surface Reflectance LEDAPS. 30m UTM based projection.,LANDSAT_7,ETM,,
1,ls8_usgs_sr_scene,Landsat 8 USGS Collection 1 Higher Level SR scene proessed using LaSRC. 30m UTM based projection.,LANDSAT_8,OLI_TIRS,,


In [7]:
# Get the measurements
measurements = dc.list_measurements()
# We can restrict which measurement attributes are displayed to reduce clutter
display_columns = ['units', 'nodata', 'aliases']
measurements[display_columns]

Unnamed: 0_level_0,Unnamed: 1_level_0,units,nodata,aliases
product,measurement,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ls5_usgs_sr_scene,blue,reflectance,-9999,"[band_1, sr_band1]"
ls5_usgs_sr_scene,green,reflectance,-9999,"[band_2, sr_band2]"
ls5_usgs_sr_scene,red,reflectance,-9999,"[band_3, sr_band3]"
ls5_usgs_sr_scene,nir,reflectance,-9999,"[band_4, sr_band4]"
ls5_usgs_sr_scene,swir1,reflectance,-9999,"[band_5, sr_band5]"
ls5_usgs_sr_scene,swir2,reflectance,-9999,"[band_7, sr_band7]"
ls5_usgs_sr_scene,lwir,reflectance,-9999,"[band_6, bt_band6]"
ls5_usgs_sr_scene,pixel_qa,bit_index,1,[pixel_qa]
ls7_usgs_sr_scene,blue,reflectance,-9999,"[band_1, sr_band1]"
ls7_usgs_sr_scene,green,reflectance,-9999,"[band_2, sr_band2]"


# Index some Landsat 8 data

First, lets check to see if you have the data in the right place. If the data is already unpacked you should see a list of directories (each line begins with drwx...)



In [8]:
!ls -al /data/ls8_USGS_ESPA_data/


total 364
drwxrwxrwx 2 root root  81920 Nov 29 00:47 .
drwxrwxrwx 2 root root   4096 Nov 27 05:36 ..
drwxrwxrwx 2 root root      0 Oct  9 01:37 LC080900842017090401T1-SC20180921064929
drwxrwxrwx 2 root root      0 Oct  9 01:37 LC080900842017092001T1-SC20180921064913
drwxrwxrwx 2 root root      0 Oct  9 01:38 LC080900842017100601T1-SC20180921064103
drwxrwxrwx 2 root root      0 Oct  9 01:38 LC080900842017102201T1-SC20180921063749
drwxrwxrwx 2 root root      0 Oct  9 01:38 LC080900842017110701T1-SC20180921070114
drwxrwxrwx 2 root root      0 Oct  9 01:38 LC080900842017112301T1-SC20180921063818
drwxrwxrwx 2 root root      0 Oct  9 01:38 LC080900842017120901T1-SC20180921063946
drwxrwxrwx 2 root root      0 Oct  9 01:39 LC080900842017122501T1-SC20180921065232
drwxrwxrwx 2 root root      0 Oct  9 01:39 LC080900842018011001T1-SC20180921063935
drwxrwxrwx 2 root root      0 Oct  9 01:39 LC080900842018012601T1-SC20180921083645
drwxrwxrwx 2 root root      0 Oct  9 01:39 LC08090084201

Now we run prepare script which will go through all the directories and their content gathering up all the metadata required for the datacube index and verifying everything is as it should be

In [9]:
!rm -f /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
!touch /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml && python3 ~/work/data-pipelines/landsat-usgs/easi_prepare_ls_usgs_sr.py --output /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml /data/ls8_USGS_ESPA_data/LC*/

2018-11-29 03:26:11,681 INFO Processing /data/ls8_USGS_ESPA_data/LC080900842017090401T1-SC20180921064929
2018-11-29 03:26:11,738 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:11,738 INFO Processing /data/ls8_USGS_ESPA_data/LC080900842017092001T1-SC20180921064913
2018-11-29 03:26:11,783 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:11,783 INFO Processing /data/ls8_USGS_ESPA_data/LC080900842017100601T1-SC20180921064103
2018-11-29 03:26:11,848 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:11,848 INFO Processing /data/ls8_USGS_ESPA_data/LC080900842017102201T1-SC20180921063749
2018-11-29 03:26:11,902 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:11,902 INFO Processing /data/ls8_USGS_ESPA_data/LC080900842017110701T1-SC20180921070114
2018-11-29 03:26:11,952 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:11,952 INFO Processing /data/ls8_USGS_ESPA_data/LC0809008420171

2018-11-29 03:26:14,036 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:14,036 INFO Processing /data/ls8_USGS_ESPA_data/LC080900852018082201T1-SC20180921063636
2018-11-29 03:26:14,084 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:14,084 INFO Processing /data/ls8_USGS_ESPA_data/LC080900852018090701T1-SC20180921063706
2018-11-29 03:26:14,130 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:14,130 INFO Processing /data/ls8_USGS_ESPA_data/LC080910842017091101T1-SC20180921064953
2018-11-29 03:26:14,175 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:14,176 INFO Processing /data/ls8_USGS_ESPA_data/LC080910842017092701T1-SC20180921082808
2018-11-29 03:26:14,222 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:14,222 INFO Processing /data/ls8_USGS_ESPA_data/LC080910842017101301T1-SC20180921064131
2018-11-29 03:26:14,267 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2

2018-11-29 03:26:16,219 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:16,219 INFO Processing /data/ls8_USGS_ESPA_data/LC080910852018071201T1-SC20180921064154
2018-11-29 03:26:16,264 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:16,265 INFO Processing /data/ls8_USGS_ESPA_data/LC080910852018072801T1-SC20180921064927
2018-11-29 03:26:16,309 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:16,309 INFO Processing /data/ls8_USGS_ESPA_data/LC080910852018081301T1-SC20180921063806
2018-11-29 03:26:16,357 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:16,358 INFO Processing /data/ls8_USGS_ESPA_data/LC080910852018082901T1-SC20180921064858
2018-11-29 03:26:16,408 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml
2018-11-29 03:26:16,409 INFO Processing /data/ls8_USGS_ESPA_data/LC080910852018091401RT-SC20180921063728
2018-11-29 03:26:16,460 INFO Writing /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml


In [10]:
!datacube dataset add /data/ls8_USGS_ESPA_data/ls8_usgs_sr.yaml

fatal: not a git repository: /home/jovyan/odc/../.git/modules/datacube-core
  """)
[?25lIndexing datasets  [####################################]  100%[?25h


# Landsat 7

A single time landsat 7 image is provided in the sample data. The indexing process is exactly the same as the above, just with a different set of directories.

In [11]:
!rm -f /data/ls7_USGS_data/ls7_usgs_sr.yaml
!touch /data/ls7_USGS_data/ls7_usgs_sr.yaml && python3 ~/work/data-pipelines/landsat-usgs/easi_prepare_ls_usgs_sr.py --output /data/ls7_USGS_data/ls7_usgs_sr.yaml /data/ls7_USGS_data/LE*/

2018-11-29 03:26:22,469 INFO Processing /data/ls7_USGS_data/LE071950542015121201T1-SC20170427222707
2018-11-29 03:26:22,528 INFO Writing /data/ls7_USGS_data/ls7_usgs_sr.yaml


In [12]:
!datacube dataset add /data/ls7_USGS_data/ls7_usgs_sr.yaml

fatal: not a git repository: /home/jovyan/odc/../.git/modules/datacube-core
  """)
