<a href="https://colab.research.google.com/github/benmsanderson/tutorial/blob/main/CMIP6_workshop_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/benmsanderson/tutorial.git/HEAD?labpath=CMIP6_workshop_example.ipynb)


# CMIP6 Google cloud example for python workshop


Install xarray and the google cloud modules on the virtual machine


In [None]:
!pip install xarray[viz] gcsfs zarr



Import things we'll need

In [None]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import zarr
import gcsfs
import cftime
import time

## Browse Catalog

The data catatalog is stored as a CSV file. Here we read it with Pandas.

In [None]:
df = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv', low_memory=False)
#let's look at the first few items
df.head()

Variables and experiments in database

In [None]:
#make a list of variables
vars=df.variable_id.unique()
vars.sort()
#Let's look for variables containing the substring 'tas'

[i for i in vars if 'tas' in i]

In [None]:
#make a list of unique experiments
expts_full=df.experiment_id.unique()
expts=pd.Series(expts_full)
#look for all the simulations containing 'ssp5'
expts[expts.str.contains('ssp5')]

Now let's find all instances of SSP5-RCP85 with surface temperature output from NorESM, note there are two model resolution versions - LM and MM

In [None]:
df_tmp=df[(df["experiment_id"] == 'ssp585') & (df["variable_id"]=='tas') & (df["table_id"]=='Amon') & (df["source_id"].str.contains('Nor'))]
df_tmp

## Load Data

Load Google file system


In [None]:
# load Google cloud storage
gcs = gcsfs.GCSFileSystem(token='anon')

Let's make a list of two xarray datasets, corresponding to the low and high resolution model

In [None]:
#make an empty list
dsall=[]
#this is a loop through all of the 'zstore' values in the dataframe - which are the links to the stored data files
for index, item in enumerate(df_tmp.zstore.values, start=0):
        #'item' is now the zstore link
        print('Link '+str(index)+': '+item)
        #the mapper is the function which retrieves the link
        mapper=gcs.get_mapper(item)
        #now we call xarray to open the mapper and make a new dataframe
        dstmp=xr.open_zarr(mapper)
        #and we add this to  a list of xarray dataframes
        dsall.append(dstmp)
#let's print out the metadata for the first dataframe in the list
dsall[0]

The file has one data variable so let's look at its dimensions




In [None]:
dsall[0].tas

Let's plot the temperature for the last month

In [None]:
dsall[0].tas[-1,:,:].plot()

Maybe plot the zonal mean temperature...

In [None]:
dsall[0].tas[-1,:,:].mean(dim='lon').plot()

Now let's calculate the global mean, and combine the two simulations into a single dataframe

In [None]:
  for i,ds in enumerate(dsall,start=0):
    #get the latitude
    lat=ds.tas.lat
    #define a numpy weight vector as the cosine of latitude
    weights = np.cos(np.deg2rad(lat))
    #give it an attribute
    weights.name = "weights"
    #apply the weight and then average along latitude (weighted) and longitude (not weighted)
    tmp_gm=ds.weighted(weights).mean(dim='lat').mean(dim='lon') 
    #add an ensemble dimension and label it with the name of the model
    tmp_gm=tmp_gm.expand_dims({'ens': [ds.source_id]})
    #now concatenate along the ensemble dimension
    if i==0:
        dac=tmp_gm
    else:
        dac=xr.concat([dac,tmp_gm],'ens')



Let's plot the annual means for the two models...  interesting, there's an offset...

In [None]:
dac.tas.groupby('time.year').mean().plot.line(x='year')

Now let's save a new global mean netcdf to use later.

In [None]:
dac.to_netcdf('noresm_gm.nc')
