<img src="../images/esgf.png" width=250 alt="ESGF logo"></img>
<img src="../images/arm_logo.png" width=250 alt="ARM logo"></img>

# Compare Data from ESGF and ARM

## Overview

This notebook details how to compare CMIP6 data hosted through the Earth System Grid Federation (ESGF) to observations collected and hosted through the Department of Energy's Atmospheric Radiation Measurement (ARM) user facility.

The measurement of focus is 2 meter air temperature, collected at the Southern Great Plains (SGP) site in Northern Oklahoma. This climate observatory has collected state-of-the-art observations since 1993.

## Prerequisites

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Intro to Xarray](https://foundations.projectpythia.org/core/xarray/xarray-intro.html) | Necessary | |
| [Search and Load CMIP6 Data via ESGF/OPeNDAP](https://projectpythia.org/cmip6-cookbook/notebooks/foundations/esgf-opendap.html) | Necessary | Familiarity with data access patterns |
| [Understanding of NetCDF](https://foundations.projectpythia.org/core/data-formats/netcdf-cf.html) | Helpful | Familiarity with metadata structure |
| [Dask Arrays with Xarray](https://foundations.projectpythia.org/core/xarray/dask-arrays-xarray.html) | Helpful | Familiarity with lazy-loading |

- **Time to learn**: 25 minutes

## Imports

In [1]:
import os
import warnings

import act
from distributed import Client
import holoviews as hv
import hvplot.xarray
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import cf_xarray
import metpy
from pyesgf.search import SearchConnection
import xarray as xr

xr.set_options(display_style='html')
warnings.filterwarnings("ignore")
hv.extension('bokeh')

## Spin up a Dask Cluster
We will use a Dask Local Cluster to compute in parellel and distribute our data, enabling us to work with these large datasets.

In [2]:
client = Client()
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 8
Total threads: 32,Total memory: 122.83 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:45375,Workers: 8
Dashboard: http://127.0.0.1:8787/status,Total threads: 32
Started: Just now,Total memory: 122.83 GiB

0,1
Comm: tcp://127.0.0.1:44507,Total threads: 4
Dashboard: http://127.0.0.1:38921/status,Memory: 15.35 GiB
Nanny: tcp://127.0.0.1:44155,
Local directory: /tmp/dask-scratch-space/worker-gc5030b1,Local directory: /tmp/dask-scratch-space/worker-gc5030b1

0,1
Comm: tcp://127.0.0.1:38941,Total threads: 4
Dashboard: http://127.0.0.1:46817/status,Memory: 15.35 GiB
Nanny: tcp://127.0.0.1:37931,
Local directory: /tmp/dask-scratch-space/worker-xdy9mvkw,Local directory: /tmp/dask-scratch-space/worker-xdy9mvkw

0,1
Comm: tcp://127.0.0.1:45347,Total threads: 4
Dashboard: http://127.0.0.1:43015/status,Memory: 15.35 GiB
Nanny: tcp://127.0.0.1:41155,
Local directory: /tmp/dask-scratch-space/worker-n2pevf76,Local directory: /tmp/dask-scratch-space/worker-n2pevf76

0,1
Comm: tcp://127.0.0.1:46251,Total threads: 4
Dashboard: http://127.0.0.1:35635/status,Memory: 15.35 GiB
Nanny: tcp://127.0.0.1:39461,
Local directory: /tmp/dask-scratch-space/worker-6_t1i7on,Local directory: /tmp/dask-scratch-space/worker-6_t1i7on

0,1
Comm: tcp://127.0.0.1:40625,Total threads: 4
Dashboard: http://127.0.0.1:35093/status,Memory: 15.35 GiB
Nanny: tcp://127.0.0.1:38067,
Local directory: /tmp/dask-scratch-space/worker-cax7sgtj,Local directory: /tmp/dask-scratch-space/worker-cax7sgtj

0,1
Comm: tcp://127.0.0.1:44109,Total threads: 4
Dashboard: http://127.0.0.1:35349/status,Memory: 15.35 GiB
Nanny: tcp://127.0.0.1:34561,
Local directory: /tmp/dask-scratch-space/worker-e3_kplyz,Local directory: /tmp/dask-scratch-space/worker-e3_kplyz

0,1
Comm: tcp://127.0.0.1:36431,Total threads: 4
Dashboard: http://127.0.0.1:36537/status,Memory: 15.35 GiB
Nanny: tcp://127.0.0.1:39393,
Local directory: /tmp/dask-scratch-space/worker-af8bdx5e,Local directory: /tmp/dask-scratch-space/worker-af8bdx5e

0,1
Comm: tcp://127.0.0.1:42365,Total threads: 4
Dashboard: http://127.0.0.1:35587/status,Memory: 15.35 GiB
Nanny: tcp://127.0.0.1:39437,
Local directory: /tmp/dask-scratch-space/worker-cwuclc4g,Local directory: /tmp/dask-scratch-space/worker-cwuclc4g


## Access Data
Our first step is to access data from the ESGF data servers, and the Atmospheric Radiation Measurement (ARM) user facility, which has a long term site in Northern Oklahoma.

### Access ESGF Data
A tutorial on how to access ESGF-hosted CMIP6 data is included in the Foundations section of this cookbook:
- [ESGF OpenDAP Tutorial](https://projectpythia.org/cmip6-cookbook/notebooks/foundations/esgf-opendap.html)

We use the following block of code to search for a single earth system model simulation, the Energe Exascale Earth System Model (E3SM), which is the Department of Energy's flagship coupled Earth System Model.

In [3]:
conn = SearchConnection('https://esgf-node.llnl.gov/esg-search',
                        distrib=False)
ctx = conn.new_context(
    facets='project,experiment_id',
    project='CMIP6',
    table_id='Amon',
    institution_id = 'E3SM-Project',
    experiment_id='historical',
    source_id='E3SM-1-0',
    variable='tas',
    variant_label='r1i1p1f1',
)
result = ctx.search()[1]
files = result.file_context().search()
opendap_urls = [file.opendap_url for file in files]

In [4]:
esgf_ds = xr.open_mfdataset(opendap_urls,
                       combine='by_coords',
                       chunks={'time':480})
esgf_ds

Unnamed: 0,Array,Chunk
Bytes,30.94 kiB,4.69 kiB
Shape,"(1980, 2)","(300, 2)"
Dask graph,7 chunks in 15 graph layers,7 chunks in 15 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 30.94 kiB 4.69 kiB Shape (1980, 2) (300, 2) Dask graph 7 chunks in 15 graph layers Data type object numpy.ndarray",2  1980,

Unnamed: 0,Array,Chunk
Bytes,30.94 kiB,4.69 kiB
Shape,"(1980, 2)","(300, 2)"
Dask graph,7 chunks in 15 graph layers,7 chunks in 15 graph layers
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,5.44 MiB,843.75 kiB
Shape,"(1980, 180, 2)","(300, 180, 2)"
Dask graph,7 chunks in 22 graph layers,7 chunks in 22 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 5.44 MiB 843.75 kiB Shape (1980, 180, 2) (300, 180, 2) Dask graph 7 chunks in 22 graph layers Data type float64 numpy.ndarray",2  180  1980,

Unnamed: 0,Array,Chunk
Bytes,5.44 MiB,843.75 kiB
Shape,"(1980, 180, 2)","(300, 180, 2)"
Dask graph,7 chunks in 22 graph layers,7 chunks in 22 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,10.88 MiB,1.65 MiB
Shape,"(1980, 360, 2)","(300, 360, 2)"
Dask graph,7 chunks in 22 graph layers,7 chunks in 22 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 10.88 MiB 1.65 MiB Shape (1980, 360, 2) (300, 360, 2) Dask graph 7 chunks in 22 graph layers Data type float64 numpy.ndarray",2  360  1980,

Unnamed: 0,Array,Chunk
Bytes,10.88 MiB,1.65 MiB
Shape,"(1980, 360, 2)","(300, 360, 2)"
Dask graph,7 chunks in 22 graph layers,7 chunks in 22 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,489.44 MiB,74.16 MiB
Shape,"(1980, 180, 360)","(300, 180, 360)"
Dask graph,7 chunks in 15 graph layers,7 chunks in 15 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 489.44 MiB 74.16 MiB Shape (1980, 180, 360) (300, 180, 360) Dask graph 7 chunks in 15 graph layers Data type float32 numpy.ndarray",360  180  1980,

Unnamed: 0,Array,Chunk
Bytes,489.44 MiB,74.16 MiB
Shape,"(1980, 180, 360)","(300, 180, 360)"
Dask graph,7 chunks in 15 graph layers,7 chunks in 15 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


### Clean up the dataset
We need to adjust the 0 to 360 degree longitude to be -180 to 180 - we can do this generically using the climate forecast (CF) conventions.

In [5]:
lon_coord = esgf_ds.cf['X'].name
esgf_ds[lon_coord] = (esgf_ds[lon_coord] + 180) % 360 - 180
esgf_ds = esgf_ds.sortby(lon_coord)

## Access ARM Data
We use the ARM data API, which is included in the Atmospheric Data Community Toolkit (ACT) to access the data.

### Setup the Search

Before downloading our data, we need to make sure we have an ARM Data Account, and ARM Live token. Both of these can be found using this link:
- [ARM Live Signup](https://adc.arm.gov/armlive/livedata/home)

Once you sign up, you will see your token. Copy and replace that where we have `arm_username` and `arm_password` below.

In [6]:
arm_username = os.getenv("ARM_USERNAME")
arm_password = os.getenv("ARM_PASSWORD")

# Meteorological observations at the Southern Great Plains site
datastream = "sgpmetE13.b1"

start_date = "2013-01-01"
end_date = "2013-02-28"
files = act.discovery.download_data(arm_username,
                                    arm_password,
                                    datastream,
                                    start_date,
                                    end_date
                                   )

[DOWNLOADING] sgpmetE13.b1.20130101.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130102.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130103.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130104.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130105.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130106.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130107.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130108.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130109.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130110.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130111.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130112.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130113.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130114.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130115.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130116.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130117.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130118.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130119.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130120.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130121.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130122.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130123.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130124.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130125.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130126.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130127.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130128.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130129.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130130.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130131.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130201.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130202.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130203.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130204.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130205.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130206.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130207.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130208.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130209.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130210.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130211.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130212.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130213.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130214.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130215.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130216.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130217.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130218.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130218.170700.cdf


[DOWNLOADING] sgpmetE13.b1.20130219.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130220.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130221.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130222.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130223.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130224.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130225.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130226.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130227.000000.cdf


[DOWNLOADING] sgpmetE13.b1.20130228.000000.cdf



If you use these data to prepare a publication, please cite:

Kyrouac, J., Shi, Y., & Tuftedal, M. Surface Meteorological Instrumentation
(MET). Atmospheric Radiation Measurement (ARM) User Facility.
https://doi.org/10.5439/1786358



### Load the Data Using Xarray

In [7]:
arm_ds = xr.open_mfdataset(files,
                           combine='nested',
                           concat_dim='time',
                           chunks={'time':86400})

## Subset and Prepare Data to be Compared
We need to subset the climate model output for the nearest grid point, over the SGP site.

In [8]:
lat = arm_ds.lat.values[0]
lon = arm_ds.lon.values[0]
lat, lon

(36.605, -97.485)

Xarray offers this subsetting functionality, and we specify we want the **nearest** gird point to the site.

In [9]:
cmip6_nearest = esgf_ds.cf.sel(lat=lat,
                               lon=lon,
                               method='nearest')
cmip6_nearest

Unnamed: 0,Array,Chunk
Bytes,30.94 kiB,4.69 kiB
Shape,"(1980, 2)","(300, 2)"
Dask graph,7 chunks in 15 graph layers,7 chunks in 15 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 30.94 kiB 4.69 kiB Shape (1980, 2) (300, 2) Dask graph 7 chunks in 15 graph layers Data type object numpy.ndarray",2  1980,

Unnamed: 0,Array,Chunk
Bytes,30.94 kiB,4.69 kiB
Shape,"(1980, 2)","(300, 2)"
Dask graph,7 chunks in 15 graph layers,7 chunks in 15 graph layers
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,30.94 kiB,4.69 kiB
Shape,"(1980, 2)","(300, 2)"
Dask graph,7 chunks in 23 graph layers,7 chunks in 23 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 30.94 kiB 4.69 kiB Shape (1980, 2) (300, 2) Dask graph 7 chunks in 23 graph layers Data type float64 numpy.ndarray",2  1980,

Unnamed: 0,Array,Chunk
Bytes,30.94 kiB,4.69 kiB
Shape,"(1980, 2)","(300, 2)"
Dask graph,7 chunks in 23 graph layers,7 chunks in 23 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,30.94 kiB,4.69 kiB
Shape,"(1980, 2)","(300, 2)"
Dask graph,7 chunks in 24 graph layers,7 chunks in 24 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 30.94 kiB 4.69 kiB Shape (1980, 2) (300, 2) Dask graph 7 chunks in 24 graph layers Data type float64 numpy.ndarray",2  1980,

Unnamed: 0,Array,Chunk
Bytes,30.94 kiB,4.69 kiB
Shape,"(1980, 2)","(300, 2)"
Dask graph,7 chunks in 24 graph layers,7 chunks in 24 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,7.73 kiB,1.17 kiB
Shape,"(1980,)","(300,)"
Dask graph,7 chunks in 17 graph layers,7 chunks in 17 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 7.73 kiB 1.17 kiB Shape (1980,) (300,) Dask graph 7 chunks in 17 graph layers Data type float32 numpy.ndarray",1980  1,

Unnamed: 0,Array,Chunk
Bytes,7.73 kiB,1.17 kiB
Shape,"(1980,)","(300,)"
Dask graph,7 chunks in 17 graph layers,7 chunks in 17 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


We need to convert our time to datetime to make it easier to compare.

In [10]:
cmip6_nearest['time'] = cmip6_nearest.indexes['time'].to_datetimeindex()

Next, we select the times we have data from the SGP site, specified earlier in the notebook.

In [11]:
cmip6_nearest = cmip6_nearest.sel(time=slice(start_date,
                                             end_date)).resample(time='1M').mean()

### Calculate Monthly Mean Temperature at SGP
We can calculate the monthly average temperature at the SGP site using the `resample` method in `Xarray`.

In [12]:
arm_ds = arm_ds.sortby('time')
sgp_monthly_mean_temperature = arm_ds.temp_mean.resample(time='1M').mean().compute().rename('tas (ARM)')

We need to apply some data cleaning here too - converting our units of temperature to degrees Celsius for the CMIP6 data.

In [13]:
cmip6_monthly_mean_temperature = cmip6_nearest.tas.compute().metpy.quantify()

In [14]:
cmip6_monthly_mean_temperature = cmip6_monthly_mean_temperature.metpy.convert_units('degC').rename("tas (CMIP6)")

## Visaulize the Output
Once we have our comparisons ready, we can visualize using `hvPlot`, which produces an interactive visualization!

In [15]:
esgf_plot = cmip6_monthly_mean_temperature.hvplot.bar(title='Average Surface Temperature \n near the Southern Great Plains Field Site',
                                                       xlabel='Time')
arm_plot = sgp_monthly_mean_temperature.hvplot.bar(ylabel='Average Temperature (degC)',
                                                    xlabel='Time')

esgf_plot * arm_plot

## Summary
In this notebook, we searched for and opened a CMIP6 E3SM dataset using the ESGF API and OPeNDAP, and compared to an ARM dataset collected at the Southern Great Plains climate observatory.

### What's next?
We will see some more advanced examples of using the CMIP6 and obsverational data.

## Resources and references
- [ARM Surface Meteorological Handbook](https://www.arm.gov/publications/tech_reports/handbooks/met_handbook.pdf)