# Visualizing emission datasets in Python

The goal of this Notebook is to download VOC emission datasets online, and compare those by plotting them. We will focus on datasets from the EU SEEDS project (https://www.seedsproject.eu/) and on datasets available at the ECCAD data portal (https://eccad.aeris-data.fr/). Firstly, we will start by plotting biogenic VOC emission datasets, focussing on isoprene.

## Initial setup

To start, we will install and import some libraries that we will need throughout this Notebook.

In [None]:
!pip install cartopy xarray[complete] zarr==2.18.3 numcodecs==0.15.1

In [None]:
# Core scientific stack
import numpy as np
import xarray as xr
import matplotlib as mpl
import matplotlib.pyplot as plt

# Mapping
import cartopy.crs as ccrs
import cartopy.feature as cfeature

# Utilities
from mpl_toolkits.axes_grid1 import make_axes_locatable
import fsspec  # for cloud data access
import netCDF4  # netCDF support

# increases the standard font size in figures
plt.rcParams.update({'font.size': 18})

## Download SEEDS data

Next, we will download SEEDS datasets from the seedsproject.eu website. The data portal provides some information on how to download the data in Python. The data format is .zarr, hence we need to use the xarray library which includes a wrapper for .zarr files. Using the url provided on the website, we can access the top-down isoprene emission dataset of SEEDS.

In [None]:
# Open remote dataset
url = "https://data.seedsproject.eu/seeds_top-down-isoprene-emissions_bira-iasb_20180101-20221231_magritte_v2/slices.zarr"
store = fsspec.get_mapper(url)
ds = xr.open_zarr(store = store)
ds

The dataset contains latitude, longitude, and time coordinates, as well as the isoprene flux. We can store specific chunks of data in netCDF format to store locally.

In [None]:
CONFIG = {
    "data_dir": "",
    "time_range": slice("2019-01-01", "2019-12-31"),
    "lat_bounds": slice(40, 45),
    "lon_bounds": slice(0, 5),
}

In [None]:
#Subset a time range, save as NetCDF
time_subset = ds.isoprene_flux.sel(time=CONFIG["time_range"])
time_subset.to_netcdf(CONFIG["data_dir"]+'isoprene_flux_time_subset.nc')

#Subset a time and geographical range, save as NetCDF
time_geo_subset = ds.isoprene_flux.sel(time=CONFIG["time_range"], latitude=CONFIG["lat_bounds"], longitude=CONFIG["lon_bounds"])
time_geo_subset.to_netcdf(CONFIG["data_dir"]+'isoprene_flux_time_geo_subset.nc')


Let's open the dataset and look at its contents.

In [None]:
nc = netCDF4.Dataset(CONFIG["data_dir"]+'isoprene_flux_time_geo_subset.nc','r')
nc

In [None]:
nc = netCDF4.Dataset(CONFIG["data_dir"]+'isoprene_flux_time_subset.nc','r')
nc.variables['isoprene_flux']

## Plot SEEDS data

The isoprene flux dataset contains daily emissions over the European domain, in units of 1e10 molec. cm-2 s-1. Let's plot this dataset on a map.

In [None]:
lat_seeds = nc.variables['latitude'][:]
lon_seeds = nc.variables['longitude'][:]
time = nc.variables['time'][:]
flux_seeds = nc.variables['isoprene_flux'][:,:,:]

Define the plot routine

In [4]:
def plotmap_cartopy(lonplot, latplot, borders, plotdata, label, colormap, vmin, vmax, steps, lognorm):
    # This function plots a dataset with longitude and latitude coordinates
    # onto a cartopy map. It takes arguments for the figure label, colormap,
    # minimum and maximum values of the colorbar, and whether to plot the colorbar
    # logarithmically or not.
    
    fig, ax = plt.subplots(1, 1, figsize=(10,10),
                    subplot_kw={'projection': ccrs.PlateCarree()})

    ax.set_extent(borders, crs=ccrs.PlateCarree())
    ax.coastlines()
    gls = ax.gridlines(draw_labels = True)
    gls.top_labels = False
    gls.right_labels = False
    ax.add_feature(cfeature.BORDERS)

    # Use turbo color map and shave off the edges a bit
    cmap = plt.get_cmap(colormap)
    if colormap == 'turbo':
        lb = 0.1
    else:
        lb = 0
    cmap = cmap(np.linspace(lb,1,steps))
    cmap = mpl.colors.ListedColormap(cmap)

    # plot the data
    lonplotmap, latplotmap = np.meshgrid(lonplot, latplot)
    if lognorm:
        h = ax.pcolormesh(lonplotmap,latplotmap,plotdata,cmap = cmap, transform=ccrs.PlateCarree(), norm=mpl.colors.LogNorm(vmin=vmin, vmax=vmax))
    else:
        h = ax.pcolormesh(lonplotmap,latplotmap,plotdata,cmap = cmap, transform=ccrs.PlateCarree(), vmin=vmin, vmax=vmax)

    # plot the colorbar
    divider = make_axes_locatable(ax)
    ax_cb = divider.new_horizontal(size="3%", pad=0.1, axes_class=plt.Axes)
    cbar = plt.colorbar(h, label=label, cax=ax_cb, orientation ='vertical')
    if lognorm:
        if vmax == 10:
            ax_cb.set_yticks([0.1,1,10], ['0.1','1','10'], minor = False)
        else:
            ax_cb.set_yticks([1,10,100], ['1','10','100'], minor = False)
    fig.add_axes(ax_cb)

In [None]:
# take the average over the time dimension of the isoprene data 
flux_seeds_av = np.nanmean(flux_seeds, axis = 0)

vmin = 1 # lower edge of colorbar
vmax = 100 # Upper edge of colorbar
steps = 20 # number of steps in colorbar
lognorm = True # logarithmic colorbar
colormap = 'turbo' # colorbar
borders = [lon_seeds[0], lon_seeds[-1], lat_seeds[-1], lat_seeds[0]] # edges of the plotdomain
figlabel = 'Isoprene emissions \n 1e10 molec. cm-2 s-1'

# this function plots the map
plotmap_cartopy(lon_seeds, lat_seeds, borders, flux_seeds_av, figlabel, colormap, vmin, vmax, steps, lognorm)

## Download CAMS data

At this point we have downloaded and plotted the top-down isoprene emission dataset of the SEEDS project. As a next step, it would be interesting to see how this compares to bottom emissions from CAMS. Biogenic VOC emissions are available from the CAMS-GLOB-BIO dataset. The latest published version of this dataset is v3.1 (Sindelarova et al. 2022) and can be accessed through the ECCAD data portal (https://eccad.aeris-data.fr/). After logging in, go to data download and get the CAMS --> CAMS-GLOB-BIO --> v3.1 --> isoprene dataset. Obtain a download url for the zip file and extract it here:

In [None]:
import requests, zipfile, io

# Once download url is obtained, insert it below
zip_file_url = 'https://api.sedoo.fr/eccad-download-rest/public/links/##########'

r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall(CONFIG["data_dir"])

Now, the netCDF dataset has been downloaded and extracted to your target folder. Lets check its contents.

In [None]:
nc = netCDF4.Dataset('CAMS-GLOB-BIO_Glb_0.25x0.25_bio_isoprene_v3.1_monthly/CAMS-GLOB-BIO_Glb_0.25x0.25_bio_isoprene_v3.1_monthly_2019.nc','r')
print(nc)
print(nc.variables['emiss_bio'])

The dataset is comprised of monthly averaged isoprene emissions, so there are 12 time values in each yearly file. To get a single month (e.g. June), we only select part of the data array with index 5. Additionally, the units are 'kg m-2 s-1', so we need to convert to 'molec. cm-2 s-1'.

In [None]:
lat_cams = nc.variables['lat'][:]
lon_cams = nc.variables['lon'][:]
flux_cams = nc.variables['emiss_bio']

molec_weight = 68.0 # molecular weight of isoprene (g/mol)
Nav = 6.022e23 # molecules per mole
unit_conversion = (1e3*Nav)/(1e4*1e10*molec_weight)

flux_cams_mon = unit_conversion*flux_cams[5,:,:]

Plot this dataset in the same way as we plotted the SEEDS dataset before.

## Plot CAMS data

In [None]:
plotmap_cartopy(lon_cams, lat_cams, borders, flux_cams_mon, figlabel, colormap, vmin, vmax, steps, lognorm)

It is clear from a visual expection that the emissions of CAMS-GLOB-BIO are much lower in June 2019 compared to the top-down SEEDS emissions. To quantify the discrepancy, we can calculate the total emissions over a specific domain and compare this in a time series. Let's take the emissions over the Iberian peninsula in the range 35 to 45 degrees north, 0 to 10 degrees west.

In [None]:
###### CAMS-GLOB-BIO ######
# indices of lats and lons in Iberian box
ilats = np.argwhere((35<=lat_cams)&(45>lat_cams)).flatten()
ilons = np.argwhere((-10<=lon_cams)&(0>lon_cams)).flatten()


# It is more interesting to derive the monthly total emissions in units of Tg per month over the domain. 
# To do this, we need to multiply each gridcell with its surface area and with seconds per month.
# flux_cams is in kg m-2 s-1, so use the appropriate units.
Rearth2 = 6.371e6**2 # cm2
deg_to_rad = 3.14159/180.
ndaymon = np.array([31,28,31,30,31,30,31,31,30,31,30,31])
secday = 3600*24
constant = 1e-9 # to go from kg to Tg

surface_area_cams = np.empty((len(lat_cams), len(lon_cams)))
delta_lon = np.abs(lon_cams[1]-lon_cams[0])
delta_lat = np.abs(lat_cams[1]-lat_cams[0])
for ilat in range(len(lat_cams)):
    area = Rearth2*(np.sin((lat_cams[ilat]+delta_lat*0.5)*deg_to_rad)-np.sin((lat_cams[ilat]-delta_lat*0.5)*deg_to_rad))*delta_lon*deg_to_rad
    for ilon in range(len(lon_cams)):
        surface_area_cams[ilat,ilon] = area


# now we can calculate the total fluxes
flux_cams = flux_cams[:,:,:]

total_flux_cams = secday * ndaymon * constant * np.nansum(flux_cams[np.ix_(np.arange(flux_cams.shape[0]),ilats,ilons)]*surface_area_cams[np.ix_(ilats,ilons)],axis = (1,2))

###### SEEDS #####
# do the same for the SEEDS coordinate system
# flux_seeds is in 1e10 molec. cm-2 s-1, so adapt the units and constants
Rearth2 = 6.371e8**2 # cm2
constant = molec_weight*1e10*1e-12/Nav


ilats = np.argwhere((35<=lat_seeds)&(45>lat_seeds)).flatten()
ilons = np.argwhere((-10<=lon_seeds)&(0>lon_seeds)).flatten()

surface_area_seeds = np.empty((len(lat_seeds), len(lon_seeds)))
delta_lon = np.abs(lon_seeds[1]-lon_seeds[0])
delta_lat = np.abs(lat_seeds[1]-lat_seeds[0])
for ilat in range(len(lat_seeds)):
    area = Rearth2*(np.sin((lat_seeds[ilat]+delta_lat*0.5)*deg_to_rad)-np.sin((lat_seeds[ilat]-delta_lat*0.5)*deg_to_rad))*delta_lon*deg_to_rad
    for ilon in range(len(lon_seeds)):
        surface_area_seeds[ilat,ilon] = area


# We need to average the total flux per month as well
total_flux_seeds = np.empty(12)
day_in_year = 0
for im in range(len(ndaymon)):
    total_flux_seeds[im] = secday * constant * np.nansum(np.tensordot(flux_seeds[day_in_year:day_in_year+ndaymon[im],ilats, ilons],surface_area_seeds[ilats, ilons], axes = 0))
    day_in_year = day_in_year + ndaymon[im]


## Plot time series

Make the plot routine for time series

In [None]:
def plot_timeseries(xvals, data, datalabels, colorlist, xlabel, ylabel, xticklabels, figname): 
    # This function allow to plot different datasets onto a time series plot.
    # The xvals determines the x-coordinates of the figure, the xticklabels are used
    # to substitute the x scale with the month names.

    
    fig, ax = plt.subplots(1,1,figsize =(len(xvals),8))

    Ndatasets = len(datalabels) # number of datasets to plot

    for n in range(Ndatasets):
        if n == 0:
            # first dataset
            lns = ax.plot(xvals, data[n,:], color = colorlist[n], marker = 'o', markersize = 16, linestyle ='-', lw = 4, label = datalabels[n])
        else:
            # other datasets
            ln = ax.plot(xvals, data[n,:], color = colorlist[n], marker = 'o', markersize = 16, linestyle ='-', lw = 4, label = datalabels[n])
            lns = lns + ln

    trans = mpl.transforms.blended_transform_factory(ax.transData, ax.transAxes)

    ax.set_xlim(-0.8, xvals[-1]+0.8)
    ax.set_ylim(0.,1.1*(np.nanmax(data)))

    ax.set_xticks(ticks = xvals, labels = xticklabels)
    
    ax.tick_params(axis = 'x', which='major', direction = 'in', length = 5, width = 2, top = True)
    ax.tick_params(axis = 'y', which='major', direction = 'in', length = 5, width = 2, right = True)

    # plot gridlines to separate the months
    minor_locator = mpl.ticker.AutoMinorLocator(2)
    ax.xaxis.set_minor_locator(minor_locator)
    ax.grid(which='minor')
    
    ax.set_ylabel(ylabel)

    labs = [l.get_label() for l in lns]
    ax.legend(lns, labs, loc=1,fontsize=24)
    
    # fig.savefig(figname,dpi=300,format='png',bbox_inches='tight') # to save the figure as png in high resolution
    plt.show()
    plt.close()


In [None]:
# definitions for the time series plot
xvals = np.arange(12)
data = np.array([total_flux_cams, total_flux_seeds])
datalabels = ['CAMS-GLOB-BIOv3.1', 'SEEDS top-down']
colorlist = ['C0', 'C1']
xlabel = 'Months'
ylabel = 'Isoprene emissions (Tg)'
xticklabels = ['J', 'F', 'M', 'A', 'M', 'J', 'J', 'A', 'S', 'O', 'N', 'D']
figname = 'isoprene_timeseries.png'

plot_timeseries(xvals, data, datalabels, colorlist, xlabel, ylabel, xticklabels, figname)