# Accessing data using climakitae

To start with, we need to import climakitae and climakitaegui.

In [1]:
import climakitae as ck
import climakitaegui as ckg

Then we want to import the **get_data()** and **get_data_options()** functions. **get_data()** is the main function that will retrieve the data from the S3 bucket. **get_data_options()** is a helper function that will help us to discover what data is in the data catalog.

In [2]:
from climakitae.core.data_interface import get_data
from climakitae.core.data_interface import get_data_options

We can run the **get_data_options()** function, and it will return a Pandas Dataframe.

In [3]:
get_data_options()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,variable,resolution
downscaling_method,scenario,timescale,Unnamed: 3_level_1,Unnamed: 4_level_1
Statistical,Historical Climate,daily,Maximum relative humidity,3 km
Statistical,Historical Climate,daily,Minimum relative humidity,3 km
Statistical,Historical Climate,daily,Specific humidity at 2m,3 km
Statistical,Historical Climate,daily,Precipitation (total),3 km
Statistical,Historical Climate,daily,Shortwave flux at the surface,3 km
...,...,...,...,...
Dynamical,SSP 5-8.5,hourly,NOAA Heat Index,45 km
Dynamical,SSP 5-8.5,hourly,NOAA Heat Index,9 km
Dynamical,Historical Reconstruction,hourly,NOAA Heat Index,45 km
Dynamical,Historical Reconstruction,hourly,NOAA Heat Index,9 km


This is all sitting on top of intake and should look familiar except it is more presentable. And the more obscure naming nomenclature used in intake has been replaced with a more user friendly language, such as **downscaling_method** for **activity_id**. Statistical means LOCA2, and Dynamical means WRF. We are at the top of the catalog so there are 1200 rows of data in this dataframe.

Let’s refine what we are looking for by choosing LOCA2 daily Maximum air temperature at 2m which is the tasmax variable.

In [4]:
get_data_options(downscaling_method="Statistical", timescale='daily', variable="Maximum air temperature at 2m")

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,variable,resolution
downscaling_method,scenario,timescale,Unnamed: 3_level_1,Unnamed: 4_level_1
Statistical,Historical Climate,daily,Maximum air temperature at 2m,3 km
Statistical,SSP 2-4.5,daily,Maximum air temperature at 2m,3 km
Statistical,SSP 3-7.0,daily,Maximum air temperature at 2m,3 km
Statistical,SSP 5-8.5,daily,Maximum air temperature at 2m,3 km


For this query there are several scenarios available and the data are at 3km resolution.

There is also a helper function for finding spatial subsetting options - **get_subsetting_options()**. We can use that to see that we have all 58 California Counties and you can reference them by name.

In [5]:
from climakitae.core.data_interface import get_subsetting_options
get_subsetting_options(area_subset="CA counties")

Unnamed: 0_level_0,geometry
cached_area,Unnamed: 1_level_1
Alameda County,"POLYGON ((-122.37312 37.88388, -122.37378 37.8..."
Alpine County,"POLYGON ((-120.07333 38.70109, -120.07332 38.7..."
Amador County,"POLYGON ((-121.02771 38.50011, -121.02771 38.5..."
Butte County,"POLYGON ((-122.06943 39.84053, -122.06886 39.8..."
Calaveras County,"POLYGON ((-120.6318 38.34603, -120.6318 38.345..."
Colusa County,"POLYGON ((-121.91512 38.92535, -121.91491 38.9..."
Contra Costa County,"POLYGON ((-121.69732 37.78244, -121.69084 37.7..."
Del Norte County,"POLYGON ((-124.31611 41.72839, -124.3137 41.72..."
El Dorado County,"POLYGON ((-120.18443 39.03101, -120.18838 39.0..."
Fresno County,"POLYGON ((-119.57319 36.48884, -119.57305 36.4..."


These predefined geometries can be used to spatially filter the data, and the software takes care of the WRF projection issue for you. There are several different geometries for filtering, such as states, counties, watersheds and others. 

Now let’s get some data. Let's go for LOCA2, daily, maximum air temperature at three kilometers resolution for SSP 3-7.0, for Sacramento County only, and for the time slice of 2070 to 2100. And let us return the data in degrees centigrade instead of Kelvin.

In [6]:
data = get_data(downscaling_method="Statistical",
                timescale="daily",
                variable="Maximum air temperature at 2m",
                resolution="3 km",
                scenario="SSP 3-7.0",
                cached_area="Sacramento County",
                time_slice=(2070,2100),
                units="degC")
data

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! Returned data array is large. Operations could take up to 5x longer than 1GB of data!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!



Unnamed: 0,Array,Chunk
Bytes,1.56 GiB,4.45 MiB
Shape,"(1, 62, 11322, 23, 26)","(1, 1, 1952, 23, 26)"
Dask graph,434 chunks in 244 graph layers,434 chunks in 244 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.56 GiB 4.45 MiB Shape (1, 62, 11322, 23, 26) (1, 1, 1952, 23, 26) Dask graph 434 chunks in 244 graph layers Data type float32 numpy.ndarray",62  1  26  23  11322,

Unnamed: 0,Array,Chunk
Bytes,1.56 GiB,4.45 MiB
Shape,"(1, 62, 11322, 23, 26)","(1, 1, 1952, 23, 26)"
Dask graph,434 chunks in 244 graph layers,434 chunks in 244 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


We can see here that we have maximum air temperature at two meters. For one scenario and 62 simulations. It has brought back all the simulations for this data.

Simulations are the combination of model and scenario runs.

Let’s select for one particular simulation, MIROC6.

In [7]:
data1 = data.sel(simulation='LOCA2_MIROC6_r1i1p1f1')
data1

Unnamed: 0,Array,Chunk
Bytes,25.83 MiB,4.45 MiB
Shape,"(1, 11322, 23, 26)","(1, 1952, 23, 26)"
Dask graph,7 chunks in 245 graph layers,7 chunks in 245 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 25.83 MiB 4.45 MiB Shape (1, 11322, 23, 26) (1, 1952, 23, 26) Dask graph 7 chunks in 245 graph layers Data type float32 numpy.ndarray",1  1  26  23  11322,

Unnamed: 0,Array,Chunk
Bytes,25.83 MiB,4.45 MiB
Shape,"(1, 11322, 23, 26)","(1, 1952, 23, 26)"
Dask graph,7 chunks in 245 graph layers,7 chunks in 245 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


As an example, let’s calculate the annual averages for the data. Again we will assign year to the data as an additional coordinate.

In [8]:
year = data1['time'].dt.year
data1 = data1.assign_coords({'year':year})
data1

Unnamed: 0,Array,Chunk
Bytes,25.83 MiB,4.45 MiB
Shape,"(1, 11322, 23, 26)","(1, 1952, 23, 26)"
Dask graph,7 chunks in 245 graph layers,7 chunks in 245 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 25.83 MiB 4.45 MiB Shape (1, 11322, 23, 26) (1, 1952, 23, 26) Dask graph 7 chunks in 245 graph layers Data type float32 numpy.ndarray",1  1  26  23  11322,

Unnamed: 0,Array,Chunk
Bytes,25.83 MiB,4.45 MiB
Shape,"(1, 11322, 23, 26)","(1, 1952, 23, 26)"
Dask graph,7 chunks in 245 graph layers,7 chunks in 245 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


And then do the average and ask it for the values at Sacramento.

In [9]:
data2 = data1.groupby('year').mean('time')
data2
data2.sel(lat=38.33,lon=-121.23,method='nearest').values

array([[25.846588, 28.078344, 27.824913, 26.972887, 25.480297, 25.632202,
        27.492113, 26.924417, 25.63273 , 25.713873, 27.275734, 26.574087,
        26.59856 , 25.070007, 26.417114, 25.579666, 26.417738, 26.580412,
        26.620224, 28.057072, 27.104786, 27.232965, 26.107195, 27.548908,
        26.638048, 26.901876, 26.098385, 28.09692 , 27.610828, 26.479551,
        26.462399]], dtype=float32)

We can see here that we have the temperature in centigrade for each year from 2070 to 2100, for the city of Sacramento.

We can now use this command to load the entire dataset we created into memory so we can visualize it.

In [10]:
data2 = ck.load(data2)

Processing data to read 72.41 KB of data into memory... Complete!


Now that we have the data loaded into memory, we can use the climakitaegui function called **view()**. This creates a simple visualization of the data.

In [11]:
ckg.view(data2)

We can see here this is Sacramento County. And if you hover over the map, you get values of individual cells. And you can scroll through the time series to see how the data changes.

We can export the data using climakitae. It can export to netCDF, CSV, and Zarr.

In [12]:
ck.export(data2, 'tasmax_avg_yr.nc', 'netCDF')

Exporting specified data to NetCDF...
Saving file locally as NetCDF4...
Saved! You can find your file in the panel to the left and download to your local machine from there.


That's the basics of how to download data from the S3 bucket using climatekitae.