# Using the Analytics Engine (AE) to access station data
This notebook illustrates how to access the historical meteorological station data at stations of interest throughout WECC.

To execute a given 'cell' of this notebook, place the cursor in the cell and press the 'play' icon, or simply press shift+enter together. Some cells will take longer to run, and you will see a [$\ast$] to the left of the cell while AE is still working.

**Intended Application**: As a user, I want to **<span style="color:#FF0000">access the historical meteorological station data</span>** that is used to localize the dynamically downscaled WRF data in demand forecast modeling through:
1. Accessing and visualizing historical data at a station of interest
2. Exporting the historical data at a station of interest in a reader-friendly format

**Runtime**: With the default settings, this notebook takes approximately **less than 1 minute** to run from start to finish. Modifications to selections may increase the runtime. 

### Step 0: Set-up

In [1]:
import climakitae as ck
import climakitaegui as ckg
import pandas as pd
import xarray as xr
import panel as pn
pn.extension()

from climakitae.util.utils import read_csv_file

### Step 1: Select a station of interest

First, we'll read in a station file of all the HadISD stations currently hooked up to the Analytics Engine. 

In [2]:
# Import stations names and coordinates file
stations = "data/hadisd_stations.csv"
stations_df = read_csv_file(stations)
stations_df

Unnamed: 0.1,Unnamed: 0,state,station,city,ID,LAT_Y,LON_X,station id,elevation
0,0,CA,Bakersfield Meadows Field (KBFL),Bakersfield,KBFL,35.43424,-119.05524,72384023155,149.3
1,1,CA,Blythe Asos (KBLH),Blythe,KBLH,33.61876,-114.71451,74718823158,120.4
2,2,CA,Burbank-Glendale-Pasadena Airport (KBUR),Burbank,KBUR,34.19966,-118.36543,72288023152,222.7
3,3,CA,Needles Airport (KEED),Needles,KEED,34.76783,-114.61842,72380523179,270.6
4,4,CA,Fresno Yosemite International Airport (KFAT),Fresno,KFAT,36.77999,-119.72016,72389093193,101.9
5,5,CA,Imperial County Airport (KIPL),Imperial,KIPL,32.83464,-115.57656,74718599999,-16.0
6,6,CA,Los Angeles International Airport (KLAX),Lax,KLAX,33.93816,-118.3866,72295023174,29.7
7,7,CA,Long Beach Daugherty Field (KLGB),Long Beach,KLGB,33.81177,-118.14718,72297023129,10.2
8,8,CA,Modesto City-County Airport (KMOD),Modesto,KMOD,37.62544,-120.95492,72492623258,26.6
9,9,CA,San Diego Miramar Wscmo (KNKX),Miramar,KNKX,32.866667,-117.133333,99999993107,145.7


The easiest way to access a station is to use the `ID` of the station. We'll demonstrate with "KSAC" for Sacramento Executive Airport. If you are looking to export multiple stations, the following cell is the only one you need to modify in this notebook. 

In [3]:
my_station = 'KSAC' # set to any station of interest

In [4]:
station_id = str(stations_df[stations_df['ID'] == my_station]['station id'].values[0])
filepaths = ["s3://cadcat/hadisd/HadISD_{}.zarr".format(s_id) for s_id in [station_id]]
filepaths # check the filepath

['s3://cadcat/hadisd/HadISD_72483023232.zarr']

### Step 2: Retrieve the station data from the s3 bucket
Here we'll use a handy function to access the data. You don't have to change anything -- this step is set-up with the filepath we identified above.

In [5]:
obs_ds = xr.open_mfdataset(
    filepaths,
    engine="zarr",
    consolidated=False,
    parallel=True,
    backend_kwargs=dict(storage_options={"anon": True}),
)

obs_ds

Unnamed: 0,Array,Chunk
Bytes,2.93 MiB,2.93 MiB
Shape,"(384240,)","(384240,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.93 MiB 2.93 MiB Shape (384240,) (384240,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",384240  1,

Unnamed: 0,Array,Chunk
Bytes,2.93 MiB,2.93 MiB
Shape,"(384240,)","(384240,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Next, we'll load the data into memory, which will help with both visualizing the data and exporting it in Step 3. 

In [6]:
obs_ds = ck.load(obs_ds)

Processing data to read 5.86 MB of data into memory... Complete!


In [7]:
ckg.view(obs_ds)

### Step 3: Export the station into csv format for ease of use
To export into a reader-friendly format, we'll run the next cell. You don't need to modify anything here - it's set-up to automatically identify the station we have been using (KSAC, as the default). 

In [8]:
filename = 'HadISD_{}'.format(station_id)
ck.export(obs_ds, filename, "csv")

  est_file_size = prod(data.dims.values()) * chars_per_line


Exporting specified data to CSV...
NOTE: File metadata will be written in /home/jovyan/cae-notebooks/collaborative/DFU/HadISD_72483023232_metadata.txt. We recommend you download this along with the CSV for your records.
Saved! You can find your file(s) in the panel to the left and download to your local machine from there.


### Step 4: Optional "fast access" mode
We provide this step where all of the above cells are condensed into a single run for faster access. We've set `my_station` to a different station this time, to demonstrate. 

In [None]:
# read in station data
stations = "data/hadisd_stations.csv"
stations_df = read_csv_file(stations)

# identify station to export
my_station = 'KACV' # set to any station of interst

# retrieves data
station_id = str(stations_df[stations_df['ID'] == my_station]['station id'].values[0])
filepaths = ["s3://cadcat/hadisd/HadISD_{}.zarr".format(s_id) for s_id in [station_id]]
obs_ds = xr.open_mfdataset(
    filepaths,
    engine="zarr",
    consolidated=False,
    parallel=True,
    backend_kwargs=dict(storage_options={"anon": True}),
)

# load data in for export
obs_ds = ck.load(obs_ds)

# export
filename = 'HadISD_{}'.format(station_id)
ck.export(obs_ds, filename, "csv")