# Google Cloud CMIP6 Public Data: Historical Global Mean Temperature GCM

*Adapted from:* https://github.com/pangeo-gallery/cmip6/blob/master/basic_search_and_load.ipynb

This notebooks shows how to query the catalog and load the data using python

In [None]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import zarr
import fsspec
import hvplot.xarray
import hvplot.pandas

%matplotlib inline
%config InlineBackend.figure_format = 'retina' 
plt.rcParams['figure.figsize'] = 12, 6

## Browse Catalog

The data catatalog is stored as a CSV file. Here we read it with Pandas.

In [None]:
df = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv')
df.head()

The columns of the dataframe correspond to the CMI6 controlled vocabulary. A beginners' guide to these terms is available in [this document](https://docs.google.com/document/d/1yUx6jr9EdedCOLd--CPdTfGDwEwzPpCF6p1jRmqx-0Q). 

Here we filter the data to find monthly surface air temperature for historical experiments.

In [None]:
df_ta = df.query("activity_id=='CMIP' & table_id == 'Amon' & variable_id == 'tas' \
                 & experiment_id == 'historical' & member_id == 'r1i1p1f1'")
df_ta

Now we do further filtering to find just the models from NCAR.

In [None]:
df_ta_noa = df_ta.query('institution_id == "NCAR"')
df_ta_noa

## Load Data

Now we will load a single store using fsspec, zarr, and xarray.

In [None]:
df_ta_noa.zstore.values[1]

In [None]:
# get the path to a specific zarr store (the first one from the dataframe above)
zstore = df_ta_noa.zstore.values[-1]
print(zstore)

# create a mutable-mapping-style interface to the store
mapper = fsspec.get_mapper(zstore)

# open it using xarray and zarr
ds = xr.open_zarr(mapper, consolidated=True)
# ds = ds -273.15

Plot a map from a specific date.

In [None]:
ds.tas.sel(time='1950-01').squeeze().plot()

Create a timeseries of global-average surface air temperature. For this we need the area weighting factor for each gridpoint.

In [None]:
df_area = df.query("variable_id == 'areacella' & institution_id =='NCAR' & activity_id=='CMIP' \
                   & experiment_id == 'historical' & member_id == 'r1i1p1f1'")
ds_area = xr.open_zarr(fsspec.get_mapper(df_area.zstore.values[2]), consolidated=True)
ds_area

In [None]:
total_area = ds_area.areacella.sum(dim=['lon', 'lat'])
ta_timeseries = (ds.tas * ds_area.areacella).sum(dim=['lon', 'lat']) / total_area
ta_timeseries

By default the data are loaded lazily, as Dask arrays. Here we trigger computation explicitly.

In [None]:
%time ta_timeseries.load()

In [None]:
# Plot the time series of Global Mean Surface Air Temperature
# Plot the rolling mean

In [None]:
ta_timeseries.rename('tas').plot()
ta_timeseries.rename('tas').rolling({'time':12}).mean().plot()

In [None]:
ta_timeseries.rename('tas').hvplot() * ta_timeseries.rename('tas').rolling({'time':12}).mean().hvplot()