# Accessing Fix6's CMIP6 dataset

The climatalogical data from [NASA's downscaled dataset](https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-gddp-cmip6) have been processed such that it is appropriate for area of interest (AOI) investigation. 
This means that the data can be queried for the entire available time series for an individual location.

## Prerequisites

1. Install `xarray`
2. Have [set up your AWS credentials](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html) provided by Fix6. These will give you *read-only* access to the dataset. Importantly, you should make sure that your AWS region is set to `us-east-1`.

If you navigate to your terminal and past the following code below, you should be able to see your AWS credentials.

```bash
echo $AWS_ACCESS_KEY_ID
echo $AWS_ACCESS_KEY_ID
echo $AWS_DEFAULT_REGION
```


## 

In [1]:
import sys 
import os

rel_path = "../src"
directory_path = os.path.abspath(os.path.join(os.getcwd(), rel_path))
sys.path.append(directory_path)

from nex_gddp_cmip6 import get_nex_dataset, TIME_OPTIMIZED_SCENARIOS, AVAILABLE_VARIABLES

The "parent" function here is `get_nex_dataset`, which will go and fetch the dataset metadata off or S3.
Notice the *metadata* distinction in the previous sentence.
The entire dataset is around 30GB, but we can read in the metadata about information stored in the dataset and then choose which data we want to use.

In [2]:
ds = get_nex_dataset(AVAILABLE_VARIABLES,TIME_OPTIMIZED_SCENARIOS)

In [3]:
ds

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 17.34 TiB 1.05 GiB Shape (20, 5, 55152, 600, 1440) (8, 5, 31411, 15, 15) Dask graph 23040 chunks in 38 graph layers Data type float32 numpy.ndarray",5  20  1440  600  55152,

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 17.34 TiB 1.05 GiB Shape (20, 5, 55152, 600, 1440) (8, 5, 31411, 15, 15) Dask graph 23040 chunks in 38 graph layers Data type float32 numpy.ndarray",5  20  1440  600  55152,

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 17.34 TiB 1.05 GiB Shape (20, 5, 55152, 600, 1440) (8, 5, 31411, 15, 15) Dask graph 23040 chunks in 38 graph layers Data type float32 numpy.ndarray",5  20  1440  600  55152,

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 17.34 TiB 1.05 GiB Shape (20, 5, 55152, 600, 1440) (8, 5, 31411, 15, 15) Dask graph 23040 chunks in 38 graph layers Data type float32 numpy.ndarray",5  20  1440  600  55152,

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 17.34 TiB 1.05 GiB Shape (20, 5, 55152, 600, 1440) (8, 5, 31411, 15, 15) Dask graph 23040 chunks in 38 graph layers Data type float32 numpy.ndarray",5  20  1440  600  55152,

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 17.34 TiB 1.05 GiB Shape (20, 5, 55152, 600, 1440) (8, 5, 31411, 15, 15) Dask graph 23040 chunks in 38 graph layers Data type float32 numpy.ndarray",5  20  1440  600  55152,

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 17.34 TiB 1.05 GiB Shape (20, 5, 55152, 600, 1440) (8, 5, 31411, 15, 15) Dask graph 23040 chunks in 38 graph layers Data type float32 numpy.ndarray",5  20  1440  600  55152,

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 17.34 TiB 1.05 GiB Shape (20, 5, 55152, 600, 1440) (8, 5, 31411, 15, 15) Dask graph 23040 chunks in 38 graph layers Data type float32 numpy.ndarray",5  20  1440  600  55152,

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 17.34 TiB 1.05 GiB Shape (20, 5, 55152, 600, 1440) (8, 5, 31411, 15, 15) Dask graph 23040 chunks in 38 graph layers Data type float32 numpy.ndarray",5  20  1440  600  55152,

Unnamed: 0,Array,Chunk
Bytes,17.34 TiB,1.05 GiB
Shape,"(20, 5, 55152, 600, 1440)","(8, 5, 31411, 15, 15)"
Dask graph,23040 chunks in 38 graph layers,23040 chunks in 38 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


Now, there was something actually pretty inefficient that we did in the call above.
If you look at the `time` dimension you'll see that we have data from 1950 to 2100.
That's great, except not all the `scenario`s apply to all dates.

For instance, the `historical` scenario is only valid from 1950 to 2014.
And, the other scenarios are only valid from 2015 to 2100.
What that means is that extra arrays end up getting allocated for each of the scenarios in order to join the datasets together.
This is memory inefficient and also slows things down.

For that reason, you should **always** only query `get_nex_dataset()` with either the `historical` or `projection` scenarios, i.e.,

If, for instance, we know that we only need say the precipitation and temperature variables from the dataset, then we can choose to only load the information for those two specific variables.
This has the added benefit of loading the information faseter, so whenever possible, if we know *a priori* which variables we need, we should always choose to load the minimal set of variables possible.p

In [4]:
ds_historical = get_nex_dataset(AVAILABLE_VARIABLES,["historical"])
ds_projection = get_nex_dataset(AVAILABLE_VARIABLES,["projection"])

In [5]:
ds_historical

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.49 TiB 407.54 MiB Shape (20, 1, 23741, 600, 1440) (20, 1, 23741, 15, 15) Dask graph 3840 chunks in 3 graph layers Data type float32 numpy.ndarray",1  20  1440  600  23741,

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.49 TiB 407.54 MiB Shape (20, 1, 23741, 600, 1440) (20, 1, 23741, 15, 15) Dask graph 3840 chunks in 3 graph layers Data type float32 numpy.ndarray",1  20  1440  600  23741,

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.49 TiB 407.54 MiB Shape (20, 1, 23741, 600, 1440) (20, 1, 23741, 15, 15) Dask graph 3840 chunks in 3 graph layers Data type float32 numpy.ndarray",1  20  1440  600  23741,

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.49 TiB 407.54 MiB Shape (20, 1, 23741, 600, 1440) (20, 1, 23741, 15, 15) Dask graph 3840 chunks in 3 graph layers Data type float32 numpy.ndarray",1  20  1440  600  23741,

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.49 TiB 407.54 MiB Shape (20, 1, 23741, 600, 1440) (20, 1, 23741, 15, 15) Dask graph 3840 chunks in 3 graph layers Data type float32 numpy.ndarray",1  20  1440  600  23741,

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.49 TiB 407.54 MiB Shape (20, 1, 23741, 600, 1440) (20, 1, 23741, 15, 15) Dask graph 3840 chunks in 3 graph layers Data type float32 numpy.ndarray",1  20  1440  600  23741,

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.49 TiB 407.54 MiB Shape (20, 1, 23741, 600, 1440) (20, 1, 23741, 15, 15) Dask graph 3840 chunks in 3 graph layers Data type float32 numpy.ndarray",1  20  1440  600  23741,

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.49 TiB 407.54 MiB Shape (20, 1, 23741, 600, 1440) (20, 1, 23741, 15, 15) Dask graph 3840 chunks in 3 graph layers Data type float32 numpy.ndarray",1  20  1440  600  23741,

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.49 TiB 407.54 MiB Shape (20, 1, 23741, 600, 1440) (20, 1, 23741, 15, 15) Dask graph 3840 chunks in 3 graph layers Data type float32 numpy.ndarray",1  20  1440  600  23741,

Unnamed: 0,Array,Chunk
Bytes,1.49 TiB,407.54 MiB
Shape,"(20, 1, 23741, 600, 1440)","(20, 1, 23741, 15, 15)"
Dask graph,3840 chunks in 3 graph layers,3840 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [6]:
ds_projection

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 7.90 TiB 862.73 MiB Shape (20, 4, 31411, 600, 1440) (8, 4, 31411, 15, 15) Dask graph 11520 chunks in 3 graph layers Data type float32 numpy.ndarray",4  20  1440  600  31411,

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 7.90 TiB 862.73 MiB Shape (20, 4, 31411, 600, 1440) (8, 4, 31411, 15, 15) Dask graph 11520 chunks in 3 graph layers Data type float32 numpy.ndarray",4  20  1440  600  31411,

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 7.90 TiB 862.73 MiB Shape (20, 4, 31411, 600, 1440) (8, 4, 31411, 15, 15) Dask graph 11520 chunks in 3 graph layers Data type float32 numpy.ndarray",4  20  1440  600  31411,

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 7.90 TiB 862.73 MiB Shape (20, 4, 31411, 600, 1440) (8, 4, 31411, 15, 15) Dask graph 11520 chunks in 3 graph layers Data type float32 numpy.ndarray",4  20  1440  600  31411,

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 7.90 TiB 862.73 MiB Shape (20, 4, 31411, 600, 1440) (8, 4, 31411, 15, 15) Dask graph 11520 chunks in 3 graph layers Data type float32 numpy.ndarray",4  20  1440  600  31411,

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 7.90 TiB 862.73 MiB Shape (20, 4, 31411, 600, 1440) (8, 4, 31411, 15, 15) Dask graph 11520 chunks in 3 graph layers Data type float32 numpy.ndarray",4  20  1440  600  31411,

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 7.90 TiB 862.73 MiB Shape (20, 4, 31411, 600, 1440) (8, 4, 31411, 15, 15) Dask graph 11520 chunks in 3 graph layers Data type float32 numpy.ndarray",4  20  1440  600  31411,

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 7.90 TiB 862.73 MiB Shape (20, 4, 31411, 600, 1440) (8, 4, 31411, 15, 15) Dask graph 11520 chunks in 3 graph layers Data type float32 numpy.ndarray",4  20  1440  600  31411,

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 7.90 TiB 862.73 MiB Shape (20, 4, 31411, 600, 1440) (8, 4, 31411, 15, 15) Dask graph 11520 chunks in 3 graph layers Data type float32 numpy.ndarray",4  20  1440  600  31411,

Unnamed: 0,Array,Chunk
Bytes,7.90 TiB,862.73 MiB
Shape,"(20, 4, 31411, 600, 1440)","(8, 4, 31411, 15, 15)"
Dask graph,11520 chunks in 3 graph layers,11520 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Dask arrays and loading data

Up until this point we haven't actually pulled any of the data records from S3, only the metadata so we can see the shape of the data.
That's why, when you look at the `xr.Dataset` outputs above you will see the data is of type `dask.array`.
What this means is that we now have a representation of where the data live on S3 and we can now manipulate the data we would be getting *before* actually loading any data from S3.
This is powerful because you can create a graph of tasks/computations to be run and Dask only pulls down the data it needs, when it needs it.


**WARNING**: You should **never** do a computation on the whole dataset as it's close to 30TB.
We will be using our laptops for testing and development using examples covering small areas, but any heavy computations will be done on cloud infrastructure.
More on that at a later date.

**note**:
`Dask` is a parallel computing library for python.
Its strength is performing computation on distributed systems when you have very large datasets and need to do computations on that data.
For an overview of seeing Dask and xarray in use, see this [short video from Coiled](https://www.youtube.com/watch?v=blxvfGt9av8).
Dask is an open source project, and Coiled is a managed service created by the Dask maintainers to allow people to more easily run and manage Dask workflows.
We may end up using Coiled in this project, dependiung on the interests of the group.



### Selecting data

The function provided above just gets you the metadata for the entire dataset.
What we want to do in practice is sub-select a region of that data for analysis, e.g., in the case a user uploads a GeoJSON into a microservice.

As an exercise, create a function or set of functions that subselects `ds_historical` by a GeoJSON, such that the latitude and longitude coordinates in the `xr.Dataset` are bounded by the bounding box of the GeoJSON.
For the sake of simplicity, use [this GeoJSON of the wine areas of Michigan](https://github.com/UCDavisLibrary/ava/blob/master/avas_by_state/MI_avas.geojson).

In addition to needing `xarray`, it would also be useful to have `geopandas`, `shapely`, and `rasterio`.
Since the GeoJSON is a vectorized geospatial format, you will need to "rasterize" it in order to use it to sub-select `ds_historical`.