# Intake Catalog
Similar to the shopping catalog at your favorite online bookstore, the intake catalog contains information (e.g. model, variables, and time range) about each dataset (the title, author, and number of pages of the book, for instance) that you can access before loading the data. It means that thanks to the catalog, you can find where is the book just by using some keywords and you do not need to hold it in your hand to know the number of pages, for instance.

In [None]:
import intake
import pandas as pd

## Load the Intake Catalog
We load the catalog descriptor with the intake package. The catalog is updated daily. The catalog descriptor is created by the DKRZ developers that manage the catalog, you do not need to care so much about it, knowing where it is and loading it is enough:

In [None]:
# Path to catalog descriptor on the DKRZ server
col_url = "/work/ik1017/Catalogs/mistral-cmip6.json"

# Open the catalog with the intake package and name it "col" as short for "collection"
col = intake.open_esm_datastore(col_url)

Let's see what is inside the intake catalog. The underlying data base is given as a pandas dataframe which we can access with "col.df". Then, "col.df.head()" shows us the first rows of the table of the catalog.

This catalog contains all datasets of the CMIP6 archive at DKRZ. In the next step we narrow the results down by chosing a model and variable.

## Browse the Intake Catalog
In this example we chose the Max-Planck Earth System Model in High Resolution Mode ("MPI-ESM1-2-HR") and the maximum temperature near surface ("tasmax") as variable. We also choose an experiment. CMIP6 comprises several kind of experiments. Each experiment has various simulation members. you can find more information in the [CMIP6 Model and Experiment Documentation](https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html#5-model-and-experiment-documentation).

In [None]:
# This is how we tell intake what data we want

query = dict(
    source_id      = "MPI-ESM1-2-LR", # the model 
    variable_id    = "tasmax", # temperature at surface, maximum
    table_id       = "day", # daily maximum
    experiment_id  = "historical", # what we selected in the drop down menu,e.g. SSP2.4-5 2015-2100
    member_id      = "r10i1p1f1", # "r" realization, "i" initialization, "p" physics, "f" forcing
)

# Intake looks for the query we just defined in the catalog of the CMIP6 data pool at DKRZ
cat = col.search(**query)

# Show query results
cat.df

## Data Analysis
Now, it would be time for the analysis of the data. As this notebook focuses only on intake, we recommend reading `use-case_frost_days_intake_xarray_cmip6.ipynb`, `use-case_multimodel_comparison_xarray_cdo_cmip6.ipynb`, `use-case_summer_days_intake_xarray_cmip6.ipynb` or `use-case_tropical_nights_intake_xarray_cmip6.ipynb` for more information on data analysis.

## Save Selection
In order to use the exact same data collection at a later point, you can save your collection. For this, you need to specify a location and a file name. The collection will be saved as human readable `.csv` file.

In [None]:
# File Location
file_loc = '.'
file_name = 'my_col'

In [None]:
cat.df.to_csv(file_loc +'/' +file_name +'.csv', index=False)

## Open Saved Selection
You can access your saved collection by reading the `.csv` file with `pd.read_csv(<file location>/<file name>)`.

In [None]:
my_col = pd.read_csv(file_loc +'/' +file_name +'.csv')

Below you can view your saved collection.

In [None]:
my_col