### CMIP6 Data from Google Cloud Storage

Details on CMIP data can be found here: https://docs.google.com/document/d/1yUx6jr9EdedCOLd--CPdTfGDwEwzPpCF6p1jRmqx-0Q/edit

This notebook follows from these articles:

https://medium.com/pangeo/cmip6-in-the-cloud-five-ways-96b177abe396

https://github.com/pangeo-data/pangeo-cmip6-examples/blob/master/intake_ESM_example.ipynb

### Import statements

We need `intake`, `intake-esm`, and `gcsfs`.  Install them in the `clim680` environment.

In [24]:
import xarray as xr
import pandas as pd
import zarr
import intake
import matplotlib.pyplot as plt

import gcsfs

ModuleNotFoundError: No module named 'zarr'

### Open the intake catalog

This is a table that can be turned into a pandas Dataframe.  It gives us a standard set of information about the available data.

We can then search on that information to find what datasets we want.

In [9]:
cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,unique
activity_id,15
institution_id,34
source_id,79
experiment_id,107
member_id,213
table_id,30
variable_id,392
grid_label,10
zstore,294669
dcpp_init_year,60


In [10]:
col.df.head()

Unnamed: 0,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,zstore,dcpp_init_year,version
0,AerChemMIP,AS-RCEC,TaiESM1,histSST,r1i1p1f1,AERmon,od550aer,gn,gs://cmip6/AerChemMIP/AS-RCEC/TaiESM1/histSST/...,,20200310
1,AerChemMIP,BCC,BCC-ESM1,histSST,r1i1p1f1,AERmon,mmrbc,gn,gs://cmip6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i...,,20190718
2,AerChemMIP,BCC,BCC-ESM1,histSST,r1i1p1f1,AERmon,mmrdust,gn,gs://cmip6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i...,,20191127
3,AerChemMIP,BCC,BCC-ESM1,histSST,r1i1p1f1,AERmon,mmroa,gn,gs://cmip6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i...,,20190809
4,AerChemMIP,BCC,BCC-ESM1,histSST,r1i1p1f1,AERmon,mmrso4,gn,gs://cmip6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i...,,20191127


### What are the possible experiments I can choose from?

In [11]:
print(col.df.experiment_id.unique())

['histSST' 'piClim-CH4' 'piClim-NTCF' 'piClim-control' 'ssp370'
 'hist-1950HC' 'piClim-2xDMS' 'piClim-2xdust' 'piClim-2xfire'
 'piClim-2xss' 'piClim-BC' 'piClim-HC' 'piClim-N2O' 'piClim-OC'
 'piClim-SO2' 'piClim-aer' '1pctCO2-bgc' '1pctCO2-rad' 'esm-ssp585'
 'hist-bgc' 'amip-4xCO2' 'amip-future4K' 'amip-m4K' 'amip-p4K' 'amip'
 'abrupt-2xCO2' 'abrupt-solp4p' 'abrupt-0p5xCO2' 'amip-lwoff'
 'amip-p4K-lwoff' 'aqua-4xCO2' 'abrupt-solm4p' 'aqua-control-lwoff'
 'aqua-control' 'aqua-p4K-lwoff' 'aqua-p4K' '1pctCO2' 'abrupt-4xCO2'
 'historical' 'piControl' 'esm-hist' 'esm-piControl' 'ssp126' 'ssp245'
 'ssp585' 'esm-piControl-spinup' 'piControl-spinup' 'hist-GHG' 'hist-aer'
 'hist-nat' 'hist-CO2' 'hist-sol' 'hist-stratO3' 'hist-volc' 'ssp245-GHG'
 'ssp245-aer' 'ssp245-nat' 'ssp245-stratO3' 'dcppA-hindcast' 'dcppA-assim'
 'dcppC-hindcast-noAgung' 'dcppC-hindcast-noElChichon'
 'dcppC-hindcast-noPinatubo' 'dcppC-amv-neg' 'dcppC-amv-pos'
 'dcppC-amv-ExTrop-neg' 'dcppC-amv-ExTrop-pos' 'dcppC-amv-Trop-

### Find the data for a specific experiment and model

In [21]:
cat = col.search(experiment_id='historical', 
                 table_id='Oyr', variable_id='o2',
                 grid_label='gn',institution_id='CCCma',
                source_id='CanESM5',
                member_id='r1i1p1f1')
cat.df

Unnamed: 0,activity_id,institution_id,source_id,experiment_id,member_id,table_id,variable_id,grid_label,zstore,dcpp_init_year,version
0,CMIP,CCCma,CanESM5,historical,r1i1p1f1,Oyr,o2,gn,gs://cmip6/CMIP/CCCma/CanESM5/historical/r1i1p...,,20190429


### Get the names of the files we selected from Google cloud storage

In [26]:
datasets = cat.to_dataset_dict()
datasets
#list(dset_dict.keys())


--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'


OSError: 
            Failed to open zarr store.

            *** Arguments passed to xarray.open_zarr() ***:

            - store: <fsspec.mapping.FSMap object at 0x7fefa1106580>
            - kwargs: {}

            *** fsspec options used ***:

            - root: cmip6/CMIP/CCCma/CanESM5/historical/r1i1p1f1/Oyr/o2/gn
            - protocol: ('gcs', 'gs')

            ********************************************
            

In [25]:
ds = dset_dict['CMIP.CCCma.CanESM5.historical.Oyr.gn']
ds

NameError: name 'dset_dict' is not defined