# Hands On: Find Data with Intake

In [17]:
import intake
import xarray as xr
import pandas as pd

## <font color='green'>Excercise: Load the Intake Catalog</font> 
- The catalog descriptor is located in `/work/ik1017/Catalogs/mistral-cmip6.json`
- The catalog will be loaded with `intake.open_esm_datastore(<path to catalog descriptor>)`

In [4]:
url = "/work/ik1017/Catalogs/mistral-cmip6.json"
col = intake.open_esm_datastore(url)

## <font color='green'>Excercise: Get to know the Intake Catalog</font>
- How many activities does the Catalog contain?
- How many unique variables has only the CMIP activity?

In [5]:
col.unique(['activity_id'])

{'activity_id': {'count': 19,
  'values': ['AerChemMIP',
   'C4MIP',
   'CDRMIP',
   'CFMIP',
   'CMIP',
   'DAMIP',
   'DCPP',
   'FAFMIP',
   'GMMIP',
   'GeoMIP',
   'HighResMIP',
   'ISMIP6',
   'LS3MIP',
   'LUMIP',
   'OMIP',
   'PAMIP',
   'PMIP',
   'RFMIP',
   'ScenarioMIP']}}

In [11]:
col.df[col.df["activity_id"] == "CMIP"]['variable_id'].unique().size

1124

## <font color='green'>Excercise: Where is the Intake Catalog File located</font>
`mistral-cmip6.json` is the catalog descriptor. Where is the actual catalog located?

In [29]:
!cat $url

{
  "esmcat_version": "0.1.0",
  "id": "mistral-cmip6",
  "description": "This is an ESM collection for CMIP6 data accessible on the DKRZ's MISTRAL disk storage system in /work/ik1017/CMIP6/data/CMIP6",
  "catalog_file": "/mnt/lustre02/work/ik1017/Catalogs/mistral-cmip6.csv.gz",
  "attributes": [
    {
      "column_name": "activity_id",
      "vocabulary": "https://raw.githubusercontent.com/WCRP-CMIP/CMIP6_CVs/master/CMIP6_activity_id.json"
    },
    {
      "column_name": "source_id",
      "vocabulary": "https://raw.githubusercontent.com/WCRP-CMIP/CMIP6_CVs/master/CMIP6_source_id.json"
    },
    {
      "column_name": "institution_id",
      "vocabulary": "https://raw.githubusercontent.com/WCRP-CMIP/CMIP6_CVs/master/CMIP6_institution_id.json"
    },
    {
      "column_name": "experiment_id",
      "vocabulary": "https://raw.githubusercontent.com/WCRP-CMIP/CMIP6_CVs/master/CMIP6_experiment_id.json"
    },
    { "column_name": "member_id", "vocabulary": "" },

### <font color='green'>catalog_file": "/mnt/lustre02/work/ik1017/Catalogs/mistral-cmip6.csv.gz</font>

## <font color='green'>Excercise: Browse the Catalog</font>
- Modify the search dictionay

In [31]:
# This is how we tell intake what data we want

query = dict(
#    source_id      = "",       # choose a climate model
    variable_id    = "tasmin", # minimum temperature
    table_id       = "day",    # daily frequency
#    experiment_id  = "",       # choose an experiment
#    member_id      = "",       # "r" realization, "i" initialization, "p" physics, "f" forcing
)

# Intake looks for the query we just defined in the catalog of the CMIP6 data pool at DKRZ
cat = col.search(**query)

# Show query results
cat.df

Unnamed: 0,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,dcpp_init_year,version,time_range,path,opendap_url
0,AerChemMIP,BCC,BCC-ESM1,hist-piAer,r2i1p1f1,day,tasmin,gn,,v20200430,18500101-20141231,/mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/Aer...,http://esgf3.dkrz.de/thredds/dodsC/cmip6/AerCh...
1,AerChemMIP,BCC,BCC-ESM1,hist-piAer,r3i1p1f1,day,tasmin,gn,,v20200430,18500101-20141231,/mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/Aer...,http://esgf3.dkrz.de/thredds/dodsC/cmip6/AerCh...
2,AerChemMIP,BCC,BCC-ESM1,hist-piNTCF,r1i1p1f1,day,tasmin,gn,,v20190621,18500101-20141231,/mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/Aer...,http://esgf3.dkrz.de/thredds/dodsC/cmip6/AerCh...
3,AerChemMIP,BCC,BCC-ESM1,hist-piNTCF,r2i1p1f1,day,tasmin,gn,,v20190621,18500101-20141231,/mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/Aer...,http://esgf3.dkrz.de/thredds/dodsC/cmip6/AerCh...
4,AerChemMIP,BCC,BCC-ESM1,hist-piNTCF,r3i1p1f1,day,tasmin,gn,,v20190621,18500101-20141231,/mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/Aer...,http://esgf3.dkrz.de/thredds/dodsC/cmip6/AerCh...
...,...,...,...,...,...,...,...,...,...,...,...,...,...
31747,ScenarioMIP,NUIST,NESM3,ssp585,r1i1p1f1,day,tasmin,gn,,v20190811,20150101-20401231,/mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/Sce...,http://esgf3.dkrz.de/thredds/dodsC/cmip6/Scena...
31748,ScenarioMIP,NUIST,NESM3,ssp585,r1i1p1f1,day,tasmin,gn,,v20190811,20410101-20701231,/mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/Sce...,http://esgf3.dkrz.de/thredds/dodsC/cmip6/Scena...
31749,ScenarioMIP,NUIST,NESM3,ssp585,r1i1p1f1,day,tasmin,gn,,v20190811,20710101-21001231,/mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/Sce...,http://esgf3.dkrz.de/thredds/dodsC/cmip6/Scena...
31750,ScenarioMIP,NUIST,NESM3,ssp585,r2i1p1f1,day,tasmin,gn,,v20190811,20410101-20701231,/mnt/lustre02/work/ik1017/CMIP6/data/CMIP6/Sce...,http://esgf3.dkrz.de/thredds/dodsC/cmip6/Scena...


## <font color='green'>Excercise: Save Selection</font>
In order to use the exact same data collection at a later point, you can save your collection. For this, you need to specify a location and a file name. The collection will be saved as human readable `.csv` file.
- save you collection as `.csv` file

## <font color='green'>Excercise: Open Saved Selection</font>
- Double check the file by reading it and displaying the content

You can access your saved collection by reading the `.csv` file with `pd.read_csv(<file location>/<file name>)`.