# Hands On: Find Data with Intake
In this notebook you will:
- Search for an appropriate list of data files (the datasets should contain the variables `tasmin` on a daily basis)
- Save your selection as .csv file, so it can be used by another notebook

In [None]:
import intake
import xarray as xr
import pandas as pd

## <font color='green'>Excercise: Load the Intake Catalog</font> 
- The catalog descriptor is located in `/work/ik1017/Catalogs/mistral-cmip6.json`
- The catalog will be loaded with `intake.open_esm_datastore(<path to catalog descriptor>)`

In [None]:
url = "/work/ik1017/Catalogs/mistral-cmip6.json"
col = intake.open_esm_datastore(url)

## <font color='green'>Excercise: Get to know the Intake Catalog</font>
- How many activities does the Catalog contain?
- How many unique variables has only the CMIP activity?

In [None]:
col.unique(['activity_id'])

In [None]:
col.df[col.df["activity_id"] == "CMIP"]['variable_id'].unique().size

## <font color='green'>Excercise: Browse the Catalog</font>
- Modify the search dictionay

In [None]:
# This is how we tell intake what data we want

query = dict(
    source_id      = "MPI-ESM1-2-LR",   # choose a climate model
    variable_id    = "tasmin",          # minimum temperature
    table_id       = "day",             # daily frequency
    experiment_id  = "ssp585",          # choose an experiment
    member_id      = "r10i1p1f1",       # choose a member ("r" realization, "i" initialization, "p" physics, "f" forcing)
)

# Intake looks for the query we just defined in the catalog of the CMIP6 data pool at DKRZ
cat = col.search(**query)

# Show query results
cat.df

## <font color='green'>Excercise: Save Selection</font>
In order to use the exact same data collection at a later point, you can save your collection. For this, you need to specify a location and a file name. The collection will be saved as human readable `.csv` file.
- save you collection as `.csv` file

In [None]:
cat.df.to_csv('my_tasmin_collection.csv', index=False)

## <font color='green'>Excercise: Open Saved Selection</font>
- Double check the file by reading it and displaying the content

You can access your saved collection by reading the `.csv` file with `pd.read_csv(<file location>/<file name>)`.

In [None]:
my_col = pd.read_csv('my_tasmin_collection.csv')
my_col

## <font color='green'>Bonus Excercise: Where is the Intake Catalog File located</font>
`mistral-cmip6.json` is the catalog descriptor. Where is the actual catalog located?

In [None]:
!cat $url

### <font color='green'>catalog_file": "/mnt/lustre02/work/ik1017/Catalogs/mistral-cmip6.csv.gz</font>