# Interactive data access and visualization
This notebook enables you to explore the AE data catalog options using a graphical user interface (GUI). You can toggle between data options and visualize the data options-- including available data options, spatial and temporal subsetting, and model types-- using a simple and interactive panel. After retrieving data, this notebook also shows you how to generate an interactive map of the data. Finally, you will learn how to easily export the data in various file formats. 

This notebook the python library [climakitae](https://github.com/cal-adapt/climakitae), our AE toolkit for climate data analysis, and the python library [climakitaegui](https://github.com/cal-adapt/climakitaegui), which facilitates the generation of interactive panels and plotting.

In [1]:
import climakitae as ck
import climakitaegui as ckg

## Step 1: Select data
We can call `Select` to display an interface from which to select the data to examine. Execute the cell, and read on for more explanation. To learn more about the data available on the Analytics Engine, [see our data catalog](https://analytics.cal-adapt.org/data/). 

In [2]:
selections = ckg.Select()
selections.show()



In [3]:
selections



Nothing is required to enter these selections, besides moving on to Step 2.

However, if you want to preview what has been selected, you can type "selections" alone in a new cell. This stores your selections behind-the-scenes.

($+$ will create a new cell, following the currently selected) 

## Step 2: Retrieve data
Call selections.retrieve(), to assign the subset/combo of data specified to a variable name of your choosing, in [xarray DataArray or Dataset](https://docs.xarray.dev/en/stable/user-guide/data-structures.html) format.

In [4]:
data_to_use = selections.retrieve()
data_to_use

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! Returned data array is large. Operations could take up to 5x longer than 1GB of data!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!



Unnamed: 0,Array,Chunk
Bytes,0.97 GiB,117.80 MiB
Shape,"(1, 8, 408, 320, 250)","(1, 1, 386, 320, 250)"
Dask graph,16 chunks in 34 graph layers,16 chunks in 34 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 0.97 GiB 117.80 MiB Shape (1, 8, 408, 320, 250) (1, 1, 386, 320, 250) Dask graph 16 chunks in 34 graph layers Data type float32 numpy.ndarray",8  1  250  320  408,

Unnamed: 0,Array,Chunk
Bytes,0.97 GiB,117.80 MiB
Shape,"(1, 8, 408, 320, 250)","(1, 1, 386, 320, 250)"
Dask graph,16 chunks in 34 graph layers,16 chunks in 34 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


You can preview the data in the retrieved, aggregated dataset when this is complete.

Next, load the data into memory. This step may take a few minutes to compute, because the data is only loaded "lazily" until you output it (in visualize or export). This allows the previous steps to run faster.

In [5]:
data_to_use = ck.load(data_to_use)

Processing data to read 996.09 MB of data into memory... Complete!


## Step 3: Visualize data
Preview the data before doing further calculations. 

In [6]:
ckg.view(data_to_use)

The data previewer is also customizable: Check out an example where the display colors and coordinates are modified in gridded data. If you selected station data above, uncomment the second line in the cell below and comment out the first by using the `#` character. 

In [7]:
ckg.view(data_to_use, lat_lon = False, cmap = 'viridis') # grided data (with x-y coordinates)
# ckg.view(data_to_use, lat_lon = False, cmap = 'green') # station, or area-averaged data selection

More plotting helper-functions will be forthcoming.

See other notebooks for example analyses, or add your own.

In [None]:
# [insert your own code here]

You can load up another variable or resolution by modifying your selections and calling: next_data = selections.retrieve()

If you do this a lot, and things are starting to get slow, you might want to try: data_to_use.close()

## Step 4: Export data

To save data as a file, call `export` and input your desired
1) data to export – an [xarray DataArray or Dataset](https://docs.xarray.dev/en/stable/user-guide/data-structures.html), as output by e.g. selections.retrieve()
2) output file name (without file extension)
3) file format ("NetCDF", "Zarr", or "CSV")

We recommend NetCDF or Zarr, which suits data and outputs from the Analytics Engine well – they efficiently store large data containing multiple variables and dimensions. Metadata will be retained in these files.

NetCDF or Zarr can be export locally (such as onto the JupyterHUB user partition). Optionally Zarr can be exported to an AWS S3 scratch bucket for storing very large exports.

CSV can also store Analytics Engine data with any number of variables and dimensions. It works the best for smaller data with fewer dimensions. The output file will be compressed to ensure efficient storage. Metadata will be preserved in a separate file.

CSV stores data in tabular format. Rows will be indexed by the index coordinate(s) of the DataArray or Dataset (e.g. scenario, simulation, time). Columns will be formed by the data variable(s) and non-index coordinate(s).

In [8]:
ck.export(data_to_use, filename="my_filename1", format="NetCDF") # NetCDF4 export locally

Exporting specified data to NetCDF...
Saving file locally as NetCDF4...
Saved! You can find your file in the panel to the left and download to your local machine from there.


In [None]:
ck.export(data_to_use, filename="my_filename2", format="Zarr") # Zarr export locally
#ck.remove_zarr("my_filename2") # helper function to delete Zarr directory tree

In [None]:
ck.export(data_to_use, filename="my_filename3", format="Zarr", mode="s3") # Zarr export to S3

In [None]:
ck.export(data_to_use, filename="my_filename4", format="CSV") # CSV export locally