# Getting started with the Analytics Engine (AE)
This notebook covers:
1. selecting data to work with
2. retrieving a dataset from the catalog
3. a simple plot to preview the data
4. how to export that data

To execute a given 'cell' of this notebook, place the cursor in the cell and press the 'play' icon, or simply press shift+enter together. Some cells will take longer to run, and you will see a [$\ast$] to the left of the cell while AE is still working.

## Step 0: Setup
This cell imports the python library [climakitae](https://github.com/cal-adapt/climakitae), our AE toolkit for climate data analysis, and any other specialized libraries needed for a given notebook.

In [None]:
import panel as pn
pn.extension()

import climakitae as ck

To use climakitae, load a new application:

In [None]:
app = ck.Application()

## Step 1: Select data
Now we can call 'select' to display an interface from which to select the data to examine. Execute the cell, and read on for more explanation.

Currently, you can select from [dynamically-downscaled](https://dept.atmos.ucla.edu/alexhall/downscaling-cmip6) data produced at hourly intervals. If you select 'daily' or 'monthly' for 'Timescale', you will receive an average of the hourly data. The spatial resolution options, on the other hand, are each the output of a different simulation, nesting to higher resolution over smaller areas.

Future projections are available for a [greenhouse gas emission scenario (Shared Socioeconomic Pathway, or SSP)](https://climatescenarios.org/primer/socioeconomic-development) through 2100 for SSP 3-7.0 for 4 General Circulation Models (GCMs).

At 45 and 9km, more GCMs are to come, and one GCM was also downscaled for a higher and lower SSP. (Later, statistical downscaling will also be available at 3km for more GCMs.)

“Historical Climate” includes data from 1980-2014 simulated from the same GCMs used to produce the SSPs. They can be appended to a SSP time series using the option “Append historical.” Because this historical data is obtained through simulations, it represents average weather during the historical period and is not meant to capture historical timeseries as they occurred. 

“Historical Reconstruction” provides a reference downscaled [reanalysis](https://www.ecmwf.int/en/about/media-centre/focus/2020/fact-sheet-reanalysis) dataset based on atmospheric models fit to satellite and station observations, and as a result will reflect observed historical time-evolution of the weather.

To learn more about the data available on the Analytics Engine, [see our data catalog](https://analytics.cal-adapt.org/data/). 

In [None]:
app.select()

Nothing is required to enter these selections, besides moving on to Step 2.

However, if you want to preview what has been selected, you can type "app.selections" alone in a new cell, and "app.location". These store your selections behind-the-scenes.

($+$ will create a new cell, following the currently selected) 

## Step 2: Retrieve data
Call app.retrieve(), to assign the subset/combo of data specified to a variable name of your choosing, in an xarray [DataArray](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html) format.

In [None]:
data_to_use = app.retrieve()

You can preview the data in the retrieved, aggregated dataset when this is complete.

In [None]:
data_to_use

Next, load the data into memory. This step may take a few minutes to compute, because the data is only loaded "lazily" until you output it (in visualize or export). This allows teh previous steps to run faster.

In [None]:
data_to_use = data_to_use.compute()

## Step 3: Visualize data
Preview the data before doing further calculations. 

In [None]:
app.view(data_to_use)

The data previewer is also customizable: Check out an example where the display colors and coordinates are modified.

In [None]:
app.view(data_to_use, lat_lon = False, cmap = "viridis")

More plotting helper-functions will be forthcoming.

See other notebooks for example analyses, or add your own.

In [None]:
# [insert your own code here]

You can load up another variable or resolution by modifying your selections and calling: next_data = app.retrieve()

If you do this a lot, and things are starting to get slow, you might want to try: data_to_use.close()

## Step 4: Export data

To export, first pick a format from the dropdown menu.
- We recommend NetCDF, which will work with any number of variables and dimensions in your dataset
- CSV and GeoTIFF can only be used for data arrays with one variable
- CSV works best for up to 2-dimensional data (e.g., lon x lat), and will be compressed and exported with a separate metadata file
- GeoTIFF can accept 3 dimensions total: 
    - X and Y dimensions are required
    - The third dimension is flexible and will be a "band" in the file: time, simulation, or scenario could go here
    - Metadata will be accessible as "tags" in the .tif

In [None]:
app.export_as()

Next, write in the object you wish to export and your desired filename (in single or double quotation marks).

In [None]:
app.export_dataset(data_to_use, 'my_filename')