### CMIP6 Data from Google Cloud Storage

Details on CMIP data can be found here: https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html & https://docs.google.com/document/d/1yUx6jr9EdedCOLd--CPdTfGDwEwzPpCF6p1jRmqx-0Q/edit

This notebook follows from these articles:

https://medium.com/pangeo/cmip6-in-the-cloud-five-ways-96b177abe396

https://github.com/pangeo-data/pangeo-cmip6-examples/blob/master/intake_ESM_example.ipynb

In [None]:
import xarray as xr
import pandas as pd
import intake
import matplotlib.pyplot as plt

### Open the intake catalog

This is a table that can be turned into a pandas Dataframe.  It gives us a standard set of information about the available data.

We can then search on that information to find what datasets we want. 

In [None]:
cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col

In [None]:
col.df.head()

### What are the possible experiments I can choose from?

In [None]:
col.df.experiment_id.unique()

### Find the data for a specific experiment and model

In [None]:
cat = col.search(experiment_id='historical', 
                 table_id='Oyr', variable_id='o2',
                 grid_label='gn',institution_id='CCCma',
                source_id='CanESM5',
                member_id='r1i1p1f1')
cat.df

### Get the names of the files we selected from Google cloud storage

In [None]:
datasets = cat.to_dataset_dict()
datasets
list(datasets.keys())

In [None]:
ds = datasets['CMIP.CCCma.CanESM5.historical.Oyr.gn']
ds

In [None]:
ds['o2']

In [None]:
plt.contourf(ds['o2'][0,0,0,:,:])
plt.title('Dissolved Oxygen Concentration[mol m-3]')
plt.colorbar() ;

In [None]:
# Plot illustrating the irregular grid
fig = plt.figure(figsize=(16,9))
plt.scatter(ds['longitude'],ds['latitude'],c=ds['o2'][0,-1,0,:,:]-ds['o2'][0,0,0,:,:],s=0.3)
plt.colorbar()
plt.title('Change in Dissolved Oxygen Concentrations [$mol \; m^{-3}$]') ;

## Alternatives
There are a number of ways to access data in the _cloud_, and the details will vary depending on the service (Amazon AWS, Microsoft Azure, Google Cloud, IBM, Oracle,... etc.).

In this case (Google Cloud Services), there is an alternative called Google Colaboratory (or Google Colab) where you can run in a Jupyter-like environment on Goggle's own servers. If you have a gmail account, [this is actually part](https://colab.research.google.com) of your "Googlesphere" of services. [Here is an interactive example](https://colab.research.google.com/drive/19iEVxE_9QoTeg4st7MmucHJUmO93NXHp#scrollTo=z51j4O2nO754) using Google Colab for CMIP6 data analysis. This is a favorite platform for machine learning applications.

Specific to geoscience applicatons, there is also [Pangeo](https://pangeo.io/index.html), a community platform for Big Data geoscience. Pangeo includes the [Pangeo Cloud](https://pangeo.io/cloud.html).