## Introduction to querying

**Notebook currently compatible with the `NCI`|`DEA Sandbox` environment only**

## Background
All DEA analyses require the basic construction of a data query which specifies the what? where? and when? of the data request.
Each query returns an xarray dataset containing the contents of your request.
It is essential to understand the xarray dataset as it is fundamental to the structure of the datacube.
Manipulations, transformations and visualisation of the xarray contents provide datacube users with the ability to explore DEA datasets and pose and answer scientific questions.
This notebook introduces how to construct and customise datacube queries in addition to introducing the xarray dataset.

## Prerequisites
Users of this notebook should have a basic understanding of:
* how to run a [Jupyter notebook](future link to Intro_to_Jupyter)
* the basic structure of the DEA [satellite datasets](future link to Intro_to_DEA)
* how to identify [DEA products and measurements](future link to Intro_to_products_and_measurements)

## Description
This notebook will introduce how to load data from the datacube through the construction of a query and use of the *load* function.
Topics covered include:
* Loading data
* Reading the resulting xarray dataset
* Customising the load function
  * measurements
  * crs reprojection 

## Technical details
* **Products used:** `ls7_nbart_geomedian_annual`
* **Analyses used:** load data

## Getting started
To run this introduction to querying, run all the cells in the notebook, starting with the "Connect to the datacube" cell. 

### Connect to the datacube
Give your datacube app a unique name that is consistent with the purpose of the notebook.

In [1]:
import datacube

# Temporary solution to account for Collection 3 data being in a different
# database on the NCI
try:
    dc = datacube.Datacube(app="Introduction_to_querying", env="c3-samples")
except:
    dc = datacube.Datacube(app="Introduction_to_querying")

## Loading data

Loading data from the datacube uses the *load* function.

The function requires the following minimum arguments:

* **product**; A specifc product to load. To revise DEA products, see the [Introduction_to_products_and_measurements](future link to Intro_to_products_and_measurements)
* **x**; Defines the spatial region in the *x* dimension. By default, the *x* and *y* arguments accept queries in a geographical co-ordinate system WGS84, identified by the EPSG code *4326*.
* **y**; Defines the spatial region in the *y* dimension. The dimensions ``longitude``/``latitude`` and ``x``/``y`` can be used interchangeably.
* **time**; Defines the temporal extent. The time dimension can be specified using a tuple of datetime objects or strings with YYYY-MM-DD hh:mm:ss format. 

Lets run a query to load all datasets within the landsat 7 nbart annual geomedian product for Moreton Bay in QLD.
The *load* function requires the minimum following criteria:

* **product**: ls7_nbart_geomedian_annual
* **location**: x=(153.3, 153.4), y=(-27.5, -27.6)
* **time period**: 2015-01-01 to 2016-01-01

Run the following cell to load all matching datasets

In [2]:
data = dc.load(
    product="ls7_nbart_geomedian_annual",
    x=(153.3, 153.4),
    y=(-27.5, -27.6),
    time=("2015-01-01", "2016-01-01"),
)

In [3]:
print(data)

<xarray.Dataset>
Dimensions:  (time: 2, x: 461, y: 508)
Coordinates:
  * time     (time) datetime64[ns] 2015-01-01 2016-01-01
  * y        (y) float64 -3.156e+06 -3.156e+06 ... -3.168e+06 -3.168e+06
  * x        (x) float64 2.067e+06 2.067e+06 2.067e+06 ... 2.079e+06 2.079e+06
Data variables:
    blue     (time, y, x) int16 519 496 480 499 503 506 ... 366 316 287 289 300
    green    (time, y, x) int16 563 555 545 558 552 553 ... 565 487 456 415 460
    red      (time, y, x) int16 308 306 312 314 307 308 ... 490 419 400 365 390
    nir      (time, y, x) int16 207 183 183 189 187 ... 2866 2650 2505 2440 2538
    swir1    (time, y, x) int16 89 88 88 99 112 117 ... 1752 1368 1127 1120 1229
    swir2    (time, y, x) int16 75 98 87 82 91 94 96 ... 894 898 657 573 495 553
Attributes:
    crs:      EPSG:3577


### Reading the result xarray.Dataset
The variable *data* has returned an xarray Dataset containing all matching datasets.

*Dimensions* 
* identifies the number of temporal datasets returned in the search as well as the number of pixels in the x and y directions of the data query.

*Coordinates* 
* *time* identifies the date attributed to each returned dataset
* *x* and *y* are the coordinates for the pixels within the spatial bounds of your query

*Data variables*
* These are the measurements available for the nominated product. For every date (time) returned by the query, the spectral response for each pixel (y, x) is returned as an array for each measurement.

*Attributes*
* *crs* identifies the coordinate reference system. 

## Customising the *load* function

The *load* function can be tailored to refine a query.

Customisation options include

*measurements*

* The measurements argument is a list of measurement names, as listed in `dc.list_measurements()`. If not provided, all measurements for the product will be returned.

*crs*

* The crs of the query is assumed to be WGS84/EPSG:4326 unless the `crs field` is supplied, even if the stored data is in another projection or the `output_crs` is specified. 

*group_by*

* For EO-specific datasets that are based around scenes, the time dimension can be reduced to the day level, using solar day to keep scenes together. 
        E.g. `group_by=solar_day`                 

*reproject/resample*

* To reproject or resample the data, supply the ``output_crs`` and ``resolution`` fields. 
        Eg. To reproject data to 25m resolution for EPSG:3577:          
        dc.load(
            product='ls5_nbar_albers', 
            x=(148.15, 148.2), 
            y=(-35.15, -35.2), 
            time=('1990', '1991'), 
            output_crs='EPSG:3577`, 
            resolution=(-25, 25))
             

For help or more customisation options, run `help(dc.load)` in an empty cell

Example syntax on the use of these options follows in the cells below.

### measurements

In [4]:
# Note the optional inclusion of the measurements list

data_rgb = dc.load(
    product="ls7_nbart_geomedian_annual",
    x=(153.3, 153.4),
    y=(-27.5, -27.6),
    time=("2015-01-01", "2016-01-01"),
    measurements=["red", "blue", "green"],
)

print(data_rgb)

<xarray.Dataset>
Dimensions:  (time: 2, x: 461, y: 508)
Coordinates:
  * time     (time) datetime64[ns] 2015-01-01 2016-01-01
  * y        (y) float64 -3.156e+06 -3.156e+06 ... -3.168e+06 -3.168e+06
  * x        (x) float64 2.067e+06 2.067e+06 2.067e+06 ... 2.079e+06 2.079e+06
Data variables:
    red      (time, y, x) int16 308 306 312 314 307 308 ... 490 419 400 365 390
    blue     (time, y, x) int16 519 496 480 499 503 506 ... 366 316 287 289 300
    green    (time, y, x) int16 563 555 545 558 552 553 ... 565 487 456 415 460
Attributes:
    crs:      EPSG:3577


Note that the *Data variables* component of the xarray now includes only the measurements specified in the query

#### crs reprojection
Certain applications may require that you output your data into a specific crs.
You can reproject your output data by specifying the new `output_crs` and identifying the `resolution` required.
In this example, the resolution is a pixel size, 30 x 30 m.

In [5]:
# This is the same query as initially appears in the notebook. Note the different crs attribute in the xarray.Dataset

data_crs_reprojected = dc.load(
    product="ls7_nbart_geomedian_annual",
    x=(153.3, 153.4),
    y=(-27.5, -27.6),
    time=("2015-01-01", "2016-01-01"),
    output_crs="EPSG: 4326",
    resolution=(30, 30),
)
print(data_crs_reprojected)

<xarray.Dataset>
Dimensions:    (latitude: 1, longitude: 1, time: 2)
Coordinates:
  * time       (time) datetime64[ns] 2015-01-01 2016-01-01
  * latitude   (latitude) float64 -15.0
  * longitude  (longitude) float64 165.0
Data variables:
    blue       (time, latitude, longitude) int16 -999 -999
    green      (time, latitude, longitude) int16 -999 -999
    red        (time, latitude, longitude) int16 -999 -999
    nir        (time, latitude, longitude) int16 -999 -999
    swir1      (time, latitude, longitude) int16 -999 -999
    swir2      (time, latitude, longitude) int16 -999 -999
Attributes:
    crs:      EPSG: 4326


## Recommended next steps

To continue following the introductory notebooks in the beginners guide, users are recommended to continue with:

- [Introduction_to_plotting](link to notebook)
- [Run_a_basic_analysis](link to notebook)

Advanced users are recommended to explore:
- [Using `load_ard`](https://github.com/GeoscienceAustralia/dea-notebooks/blob/Intro_to_products/Frequently_used_code/Using_load_ard.ipynb). This function allows the importing of cloud-free observations from multiple sensors into an xarray dataset. Furthermore, you can query for observations with a user-specified minimum proportion of good quality, non-cloudy or shadowed pixels.
- [DEA_datasets](https://github.com/GeoscienceAustralia/dea-notebooks/tree/develop/DEA_datasets) part of the repository.Here you can explore DEA products in depth.
- Continue exploring some with some [real world applications](https://github.com/GeoscienceAustralia/dea-notebooks/tree/develop/Real_world_examples)

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Australia data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/GeoscienceAustralia/dea-notebooks).

**Last modified:** October 2019

**Compatible `datacube` version:** 

In [6]:
print(datacube.__version__)

1.7+43.gc873f3ea.dirty


## Tags
Browse all available tags on the DEA User Guide's [Tags Index](https://docs.dea.ga.gov.au/genindex.html)