# Searching the Catalog

This notebook will walk through quickly getting a list of where each file you're interested in is stored on the Globus endpoint

The package [intake-esm](https://github.com/intake/intake-esm) is required to run this notebook.

In [4]:
import intake
import intake_esm
cat_url = "https://raw.githubusercontent.com/NOAA-GFDL/spear-flp/refs/heads/main/catalog_blue.json"

The first step is to load the catalog:

In [5]:
cat = intake.open_esm_datastore(cat_url)

Then we can run a search. Each argument is `<column> = "<value>"` or `<column> = ["<value1>", "<value2>"]`

In [13]:
subcat = cat.search(variable_id="snow", experiment_id="SPEAR_c192_o1_Hist_AllForc_IC1921_K50",
        time_range=["19510101-19601231", "19410101-19501231"])

A full list of column names can be obtained,

In [19]:
cat.df.columns

Index(['activity_id', 'institution_id', 'source_id', 'experiment_id',
       'frequency', 'realm', 'table_id', 'member_id', 'grid_label',
       'variable_id', 'time_range', 'chunk_freq', 'platform', 'dimensions',
       'cell_methods', 'standard_name', 'pass_qc', 'who_qc', 'path'],
      dtype='object')

As well as the unique options for that column

In [22]:
set(cat.df['realm'])

{'atmos',
 'atmos_4xdaily',
 'atmos_4xdaily_avg',
 'atmos_daily',
 'land_daily',
 'ocean'}

To get the paths, we can access the data directly from the dataframe,

In [14]:
subcat.df.path

0     /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
1     /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
2     /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
3     /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
4     /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
5     /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
6     /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
7     /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
8     /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
9     /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
10    /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
11    /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
12    /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
13    /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
14    /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
15    /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
16    /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_c192...
17    /data/2/GFDL-LARGE-ENSEMBLES/TFTEST/SPEAR_

and we can save this to a file with

In [18]:
subcat.df.path.to_csv("my_search.txt", index=False, header=False)