<img src="../images/esgf.png" width=250 alt="ESGF logo"></img>
<img src="../images/arm_logo.png" width=250 alt="ARM logo"></img>

# Compare Data from ESGF and ARM

## Overview

This notebook details how to compare CMIP6 data hosted through the Earth System Grid Federation (ESGF) to observations collected and hosted through the Department of Energy's Atmospheric Radiation Measurement (ARM) user facility.

The measurement of focus is 2 meter air temperature, collected at the Southern Great Plains (SGP) site in Northern Oklahoma. This climate observatory has collected state-of-the-art observations since 1993.

## Prerequisites

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Intro to Xarray](https://foundations.projectpythia.org/core/xarray/xarray-intro.html) | Necessary | |
| [Search and Load CMIP6 Data via ESGF/OPeNDAP](https://projectpythia.org/cmip6-cookbook/notebooks/foundations/esgf-opendap.html) | Necessary | Familiarity with data access patterns |
| [Understanding of NetCDF](https://foundations.projectpythia.org/core/data-formats/netcdf-cf.html) | Helpful | Familiarity with metadata structure |
| [Dask Arrays with Xarray](https://foundations.projectpythia.org/core/xarray/dask-arrays-xarray.html) | Helpful | Familiarity with lazy-loading |

- **Time to learn**: 25 minutes

## Imports

In [None]:
import os
import warnings

import act
from distributed import Client
import holoviews as hv
import hvplot.xarray
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import cf_xarray
import metpy
from pyesgf.search import SearchConnection
import xarray as xr

xr.set_options(display_style='html')
warnings.filterwarnings("ignore")
hv.extension('bokeh')

## Spin up a Dask Cluster
We will use a Dask Local Cluster to compute in parellel and distribute our data, enabling us to work with these large datasets.

In [None]:
client = Client()
client

## Access Data
Our first step is to access data from the ESGF data servers, and the Atmospheric Radiation Measurement (ARM) user facility, which has a long term site in Northern Oklahoma.

### Access ESGF Data
A tutorial on how to access ESGF-hosted CMIP6 data is included in the Foundations section of this cookbook:
- [ESGF OpenDAP Tutorial](https://projectpythia.org/cmip6-cookbook/notebooks/foundations/esgf-opendap.html)

We use the following block of code to search for a single earth system model simulation, the Energe Exascale Earth System Model (E3SM), which is the Department of Energy's flagship coupled Earth System Model.

In [None]:
conn = SearchConnection('https://esgf-node.llnl.gov/esg-search',
                        distrib=False)
ctx = conn.new_context(
    facets='project,experiment_id',
    project='CMIP6',
    table_id='Amon',
    institution_id = 'E3SM-Project',
    experiment_id='historical',
    source_id='E3SM-1-0',
    variable='tas',
    variant_label='r1i1p1f1',
)
result = ctx.search()[1]
files = result.file_context().search()
opendap_urls = [file.opendap_url for file in files]

In [None]:
esgf_ds = xr.open_mfdataset(opendap_urls,
                       combine='by_coords',
                       chunks={'time':480})
esgf_ds

### Clean up the dataset
We need to adjust the 0 to 360 degree longitude to be -180 to 180 - we can do this generically using the climate forecast (CF) conventions.

In [None]:
lon_coord = esgf_ds.cf['X'].name
esgf_ds[lon_coord] = (esgf_ds[lon_coord] + 180) % 360 - 180
esgf_ds = esgf_ds.sortby(lon_coord)

## Access ARM Data
We use the ARM data API, which is included in the Atmospheric Data Community Toolkit (ACT) to access the data.

### Setup the Search

Before downloading our data, we need to make sure we have an ARM Data Account, and ARM Live token. Both of these can be found using this link:
- [ARM Live Signup](https://adc.arm.gov/armlive/livedata/home)

Once you sign up, you will see your token. Copy and replace that where we have `arm_username` and `arm_password` below.

In [None]:
arm_username = os.getenv("ARM_USERNAME")
arm_password = os.getenv("ARM_PASSWORD")

# Meteorological observations at the Southern Great Plains site
datastream = "sgpmetE13.b1"

start_date = "2013-01-01"
end_date = "2013-02-28"
files = act.discovery.download_data(arm_username,
                                    arm_password,
                                    datastream,
                                    start_date,
                                    end_date
                                   )

### Load the Data Using Xarray

In [None]:
arm_ds = xr.open_mfdataset(files,
                           combine='nested',
                           concat_dim='time',
                           chunks={'time':86400})

## Subset and Prepare Data to be Compared
We need to subset the climate model output for the nearest grid point, over the SGP site.

In [None]:
lat = arm_ds.lat.values[0]
lon = arm_ds.lon.values[0]
lat, lon

Xarray offers this subsetting functionality, and we specify we want the **nearest** gird point to the site.

In [None]:
cmip6_nearest = esgf_ds.cf.sel(lat=lat,
                               lon=lon,
                               method='nearest')
cmip6_nearest

We need to convert our time to datetime to make it easier to compare.

In [None]:
cmip6_nearest['time'] = cmip6_nearest.indexes['time'].to_datetimeindex()

Next, we select the times we have data from the SGP site, specified earlier in the notebook.

In [None]:
cmip6_nearest = cmip6_nearest.sel(time=slice(start_date,
                                             end_date)).resample(time='1M').mean()

### Calculate Monthly Mean Temperature at SGP
We can calculate the monthly average temperature at the SGP site using the `resample` method in `Xarray`.

In [None]:
arm_ds = arm_ds.sortby('time')
sgp_monthly_mean_temperature = arm_ds.temp_mean.resample(time='1M').mean().compute().rename('tas (ARM)')

We need to apply some data cleaning here too - converting our units of temperature to degrees Celsius for the CMIP6 data.

In [None]:
cmip6_monthly_mean_temperature = cmip6_nearest.tas.compute().metpy.quantify()

In [None]:
cmip6_monthly_mean_temperature = cmip6_monthly_mean_temperature.metpy.convert_units('degC').rename("tas (CMIP6)")

## Visaulize the Output
Once we have our comparisons ready, we can visualize using `hvPlot`, which produces an interactive visualization!

In [None]:
esgf_plot = cmip6_monthly_mean_temperature.hvplot.bar(title='Average Surface Temperature \n near the Southern Great Plains Field Site',
                                                       xlabel='Time')
arm_plot = sgp_monthly_mean_temperature.hvplot.bar(ylabel='Average Temperature (degC)',
                                                    xlabel='Time')

esgf_plot * arm_plot

## Summary
In this notebook, we searched for and opened a CMIP6 E3SM dataset using the ESGF API and OPeNDAP, and compared to an ARM dataset collected at the Southern Great Plains climate observatory.

### What's next?
We will see some more advanced examples of using the CMIP6 and obsverational data.

## Resources and references
- [ARM Surface Meteorological Handbook](https://www.arm.gov/publications/tech_reports/handbooks/met_handbook.pdf)