## CESM2 - LARGE ENSEMBLE (LENS2)

#### by Mauricio Rocha and Dr. Gustavo Marques

#### This Notebook aims to read the CESM Large Ensemble version 2 data for different components and variables. 

#### Before we begin, here are some valuable notes about this large ensemble:
#### * ssp370: refers to the Shared Socioeconomic Pathway (SSP) and represents the forced scenario until the year of 2100. The choice of the SSP3-7.0 scenario forcing follows CMIP6 recommendations (O’Neill et al., 2016) that precisely emphasize the value of this relatively high forcing level so as to quantify changes in natural variability (Rodgers et al. 2021). SSP3–7.0 lies between RCP6.0 and RCP8.5 and represents the medium to the high end of the range of future forcing pathways. 
#### P.S.: RCP8.5 means Representative Concentration Pathways for 8.5 W/m².

#### * historical (1850 - 2015): not projected period, so ideal for doing validation of LENS2 with reanalysis or in situ data. Because of its spatial resolution and temporal extent, ERA5 can be used to do this validation. Considering the AMOC strength at 26.5N as an example {Fig. S2 in Rodgers et al. (2021)}, the ensemble mean AMOC strength for each of the micro-perturbation clusters initialized for years 1231 (strength), 1251 (decreasing), 1281 (increasing), and 1301 (weak) of the preindustrial control run (averaged across 20 members for each case) converge only after several decades, indicative of the timescale over which the initial condition memory persists for AMOC. For this reason, their analysis of internal variability focuses on the period after 1960, more than an entire century after initialization. How to know what is forced variability and natural variability? One way to check this is to differentiate between members of the same experiment.

#### * cmip: The first 50 members of the large ensemble follow CMIP6 protocols (Van Marle et al., 2017), with biomass burning following the description in the CESM2 overview paper (Danabasoglu et al., 2020), and this forcing is referred to as CMIP6 (Rodgers et al. (2021)).  

#### * smbb: For the second set of 50 members, which we refer to as SMBB (for smoothed biomass burning fluxes), the CMIP6 biomass burning emissions of all relevant species for Community Atmosphere Model version 6 (CAM6) were smoothed in time with an 11-year running mean filter. 

#### References: 
#### * Rodgers et al. (2021): https://esd.copernicus.org/articles/12/1393/2021/
#### * Fasullo et al. (2021): https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021GL093841
#### * DeRepentigny et al. (2020): https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2020JC016133
#### * Danabasoglu et al. (2020): https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2019MS001916
#### * van Marle et al. (2017): https://gmd.copernicus.org/articles/10/3329/2017/
#### * O’Neill et al. (2016): https://gmd.copernicus.org/articles/9/3461/2016/


## Imports

In [None]:
import intake
import intake_esm
import xarray as xr
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import numpy as np
import fsspec
import cmocean
import cartopy
import cartopy.feature as cfeature
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
from cartopy.util import add_cyclic_point
import pop_tools

## Data Ingest

#### Path

##### Option 1

In [None]:
#variable = ['SST','SHF','VVEL','UVEL','WVEL','RHO','SALT','Q','TAUX','TAUY','SSH','MOC']                         # Variable
#serie = ['day_1', 'month_1', 'year_1']                                                                           # Temporal series
#period = ['HIST', 'SSP370']                                                                                      # Historial or projection
#year = ['1231', '1251', '1281', '1301']                                                                          # MOC experiments
#biomass = ['cmip6','smbb']                                                                                       # Biomass burning
#member = ['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20']   # Emsemble member

#path = (f'/glade/campaign/cgd/cesm/CESM2-LE/timeseries/ocn/proc/tseries/{serie[0]}/{variable[0]}/b.e21.B{period[1]}{biomass[0]}.f09_g17.LE2-{year[0]}.0{member[0]}.pop.h.nday1.{variable[0]}.*.nc')

#ds = xr.open_mfdataset(path)


##### Option 2

In [None]:
catalog = intake.open_esm_datastore(
    '/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cesm2-le.json'
)

In [None]:
catalog.df

#### How does the variable look like? 

In [None]:
#ds.SST.isel(time = 0).plot()

In [None]:
cat_subset = catalog.search(component='ocn',
                            variable='SST',
                            frequency='day_1')
#              frequency='day_1').df.variable.unique() # Here, you can see all the variables available for the frequency and for the component specified. 

In [None]:
dset_dict_raw = cat_subset.to_dataset_dict()

In [None]:
ds = dset_dict_raw['ocn.historical.pop.h.nday1.cmip6.SST']

##### If you choose the ocean component, you will need to import the POP grid (bellow). For the other components, you can use the emsemble's own grid. 

#### Import the POP grid
##### in ds, TLONG and TLAT have missing values (NaNs), so we need to override them with the values from pop_grid, which does not have missing values

In [None]:
# read the pop 1 deg grid from pop_tools
# we will use variables TLONG and TLAT
pop_grid = pop_tools.get_grid('POP_gx1v7')
ds['TLONG'] = pop_grid.TLONG # longitud
ds['TLAT'] = pop_grid.TLAT # latitudes

In [None]:
plt.figure(figsize=(9,5));
ax = plt.axes(projection=ccrs.Robinson());
pc = ds.SST.isel(time=0, member_id=0).plot.pcolormesh(ax=ax,
                                    transform=ccrs.PlateCarree(),
                                    cmap=cmocean.cm.thermal,
                                    x='TLONG',
                                    y='TLAT',
                                    vmin=-2,
                                    vmax=32,
                                    cbar_kwargs={"orientation": "horizontal"},)
ax.gridlines(draw_labels=True);
ax.coastlines()
ax.gridlines()

#### Centralize the South Atlantic 

##### To do this we need to create an empty array to receive the new data and then put in this order: the longitudes 178-359 and 0-178. We keep the latitudes as they are.

In [None]:
ds

In [None]:
new_var = []
lats = -60
latn = 
lonw = 1
lone = 
a_aux=ds.SST.isel(time=0, member_id=0). sel(TLAT=slice(lats,latn), TLONG=slice(lonw,lone), method='nearest')
#[0,0:260,290:359]
#b_aux = ds.SST[0,0:260,0:60]
#new_var = np.concatenate((a_aux, b_aux), axis=1)
#plt.pcolor(new_var)
a_aux.shape

In [None]:
#a_aux_lon = ds['TLONG'][:,196:359]
#b_aux_lon = ds['TLONG'][:,0:196]
#new_var_lon = np.concatenate((a_aux_lon, b_aux_lon), axis=1)
#a_aux_lat = ds['TLAT'][:,196:359]
#b_aux_lat = ds['TLAT'][:,0:196]
#new_var_lat = np.concatenate((a_aux_lat, b_aux_lat), axis=1)
#new_var_lon[new_var_lon>180]-=360
# plt.plot(new_var_lon[50,:], marker='.')
#plt.pcolor(new_var_lon, new_var_lat, new_var)
#plt.xlim(-180,180)
#plt.ylim(-90,90)