Pythonic data access using climakitae 
--------------------------------------
This notebook showcases helper functions from `climakitae` that enable you to access the AE catalog data **without** using a GUI, while also allowing you to perform spatial subsetting and view the data options in an easy-to-use fashion. These functions could be easily implemented in a python script. <br>

As a reminder, you can access the data using one of the following methods: 
1) the climakitae Selections GUI ([getting_started.ipynb](getting_started.ipynb))
2) using helper functions in the `climakitae` library (this notebook!) 
3) the python library `intake` ([intake_direct_data_download.ipynb](intake_direct_data_download.ipynb))
<br>

This notebook showcases option 2.

In [None]:
from climakitae.core.data_interface import (
    get_data_options, 
    get_subsetting_options, 
    get_data
)

## See all the data options in the catalog 
These options will match those in our AE selections GUI. 

In [None]:
get_data_options()

## See the data options for a particular subset of inputs
The `get_data_options` function enables you to input a number of different function arguments, corresponding to the columns in the table above, to subset the table. Inputting no arguments, like we did above, will return the entire range of options.<br><br>First, lets print the function documentation to see the inputs and outputs of the function. If an argument (or "parameter", as listed in the documentation) is listed as "optional", that means you don't have to input anything for that argument. In the case of this function, none of the function arguments are required, so you can simply call the function. 

In [None]:
print(get_data_options.__doc__)

If you call the function with **no inputs**, it will simply return the entire catalog! But, let's say you want to see all the data options for statistically downscaled data at 3 km resolution. You'll want to provide inputs for the `downscaling_method` and `resolution` arguments. 

In [None]:
get_data_options(
    downscaling_method = "Statistical", 
    resolution = "3 km"
)

Perhaps you want to see all the data options for daily precipitation. We have several precipitation options in the catalog. You don't need to know the name of these variables; simply use "precipitation" as your input to the function for the `variable` argument.<br><br>The function prefers that your inputs match an actual option in the catalog-- with exact capitalizations and no misspelling-- and will print a warning if your input is not a direct match ("precipitation" is not an option, but "Precipitation (total)" is). The function will then try to make a guess as to what you actually meant. 

In [None]:
get_data_options(
    variable = "precipitation", 
    timescale = "daily"
) 

The function can also return a simple pandas DataFrame without the complex MultiIndex. Just set `tidy = False`.

In [None]:
get_data_options(
    variable = "precipitation", 
    timescale = "daily", 
    tidy = False
) 

## See all the geometry options for spatially subsetting the data during retrieval
These options will match those in our AE selections GUI. This will enable you to retrieve a subset for a specific region.

In [None]:
get_subsetting_options()

This shows a lot of options! Say you're only interested in California counties. Simply set the argument `area_subset` to "CA counties" to see the all options for counties. The function documentation shows the other options, which also match the values in the column "area_subset" in the table above. 

In [None]:
print(get_subsetting_options.__doc__)

In [None]:
get_subsetting_options(area_subset = "CA counties")

You can see all the options for subsetting, and their corresponding geometries, but you don't actually need to use the geometries for subsetting if you use climakitae's data retrieval function-- `get_catalog_data` -- explained in the next section. 

## Retrieve the data 
You can easily retrieve the data using the following function. This function requires you to input values for the following arguments: 
- variable (required)
- downscaling method (required)
- resolution (required)
- timescale (required)


The options for each can be found using the `get_data_options` function. If desired, you can also specify a unit conversion using the argument `units`.<br><br>By default, the function will return the entire spatial domain of the data. If you wish to spatially subset the data, you can supply the following arguments to the function: 
- area_subset (optional) 
- cached_area (required) 

You can also opt to perform an area average by setting `area_average = True`. The default is `False`. 

Details for the function are in the docstrings, printed below. 

In [None]:
print(get_data.__doc__)

In [None]:
get_data(
    variable = "Precipitation (total)", 
    downscaling_method = "Statistical", 
    resolution = "3 km", 
    timescale = "daily", 
    scenario = "Historical Climate",
)

Now say you're only interested in data for San Bernadino County, and you want to compute an area average over the entire county. 

In [None]:
get_data(
    variable = "Precipitation (total)", 
    downscaling_method = "Statistical", 
    resolution = "3 km", 
    timescale = "daily", 
    scenario = "Historical Climate",
    cached_area = "San Bernardino County", 
    area_average = "Yes"
)