# Accessing Historic Observation Data Platform data

You can easily access the finalized cloud-optimized data using the Python library `intake`, which interfaces with our data catalog. This notebook demonstrates how to interact with the data catalog to download data, along with some simple plotting code to generate figures of the data. 

In [None]:
import intake 

First, open the catalog using `intake`

In [None]:
cat = intake.open_esm_datastore("https://cadcat.s3.amazonaws.com/histwxstns/era-hdp-collection.json")

Next, view the catalog in table format. You can inspect the first few rows by calling `.head()` on the table.

In [None]:
# Access catalog as dataframe and inspect the first few rows
cat_df = cat.df
cat_df

View all the weather station networks by using the following code 

In [None]:
# See all network options 
cat_df["network_id"].unique()

You can also filter the catalog to see all stations within a network

In [None]:
my_network = "ASOSAWOS"
cat_df[cat_df["network_id"] == my_network]

You can subset the catalog and read in the cloud-optimized data as `xarray.Dataset` objects using the method shown below. To change the data downloaded, simply modify the inputs in the dictionary `query`. These inputs must correspond to valid options in the catalog. 

In [None]:
# Set your query here
query = {
    "network_id": "ASOSAWOS",  # Name of the network
    "station_id": ["ASOSAWOS_A0002694297","ASOSAWOS_A0704900320","ASOSAWOS_72020200118"] # List of stations to get data for 
}

# Subset catalog
cat_subset = cat.search(**query)

# View the data you've selected before downloading
cat_subset.df

In [None]:
cat_subset

Then, you can download all the files. The files will be downloaded as a dictionary, in which each key is a string description of the data, and the item is the data object. 

In [None]:
# Get dataset dictionary 
dsets = cat_subset.to_dataset_dict(
    xarray_open_kwargs={'consolidated':False},
    storage_options={'anon':True}
)

To see all the string IDs for the Datasets in the dictionary, you can print them with the following code: 

In [None]:
list(dsets.keys())

You can easily access the files in the dictionary using the following format: 
```
dsets[<string ID of data>]
```
The string ID of the data is constructed using both the network ID and the station ID for each individual weather station. 

In [None]:
# Retrieve a single file
ds = dsets["ASOSAWOS.ASOSAWOS_72020200118"]
ds

## Make a quick plot of the data 
`xarray` has some nice mapping features that enable you to quickly generate a plot for a single timestep. This lets you get a sense for the data you read in. 

In [None]:
variable_to_plot = "tas"
ds.squeeze()[variable_to_plot].plot(x="time");