Skip to content

GEUS-Glaciology-and-Climate/freshwater

Repository files navigation

Table of contents

Greenland liquid water discharge from 1950 through 2021

This is the source for “Greenland liquid water discharge from 1958 through 2019” and subsequent versions.

The source for this work is hosted on GitHub at https://github.com/GEUS-PROMICE/freshwater. GitHub issues are used to collect suggested improvements to the paper or problems that made it through review. The work may be under be under active development, including updating data (and therefore tables) within the source document.

Updates since last publication

v 2023

release_2023 has the following changes from v2022-08. See 2023 Milestone, https://github.com/GEUS-Glaciology-and-Climate/freshwater/compare/release_2022…release_2023, and git log for more details

  • Added 2022 RACMO data
  • Added 2022 MAR data
  • Updated MAR data from 3.12 to 3.13
  • Updated BedMachine from v4 to v5
  • Updated ArcticDEM from v3.0 to v4.1

v 2022-10

v 2022-10 has the following changes:

  • No change to the data values
  • Data product has been reformatted to 4 NetCDF files: One per RCM (2) and domain (2), each containing all years of data
    • E.g., MAR_ice, MAR_land, RACMO_ice, RACMO_Land
  • Data files are now part of the dataset that contains the Geopackages of streams, outlets, and basins, rather than their own dataset on the dataverse
    • DOI: 10.22008/FK2/AA6MTB has been Deaccesioned
    • DOI: 10.22008/FK2/XKQVL7 now contains the discharge data
  • The README has been updated to show some additional examples using the metadata added in v3

v 2022-08

v 2022-08 has the following changes (see GitHub diff above for more details):

  • Update from BedMachine v3 to v4
  • Data now spans 1950 through 2021, instead of 1958 through 2019
  • Internal NetCDF variable renamed from ‘runoff’ to ‘discharge’
  • Significant improvement in metadata supporting better query by basin, region, or sector
  • Recognition that land runoff with depth << 0 is valid

WARNING

Related Work

Citation

@article{mankoff_2020_liquid,
  author    = {Mankoff, Kenneth D. and Noël, Brice and Fettweis, Xavier and Ahlstrøm, Andreas P. and
                  Colgan, William and Kondo, Ken and Langley, Kirsty and Sugiyama, Shin and van As,
                  Dirk and Fausto, Robert S.},
  title     = {{G}reenland liquid water discharge from 1958 through 2019},
  journal   = {Earth System Science Data},
  year 	    = 2020,
  volume    = 12,
  number    = 4,
  pages     = {2811–2841},
  month     = 11,
  DOI 	    = {10.5194/essd-12-2811-2020},
  publisher = {Copernicus GmbH}
}

Funding

DatesOrganizationProgramEffort
2023 –NASA GISSModeling Analysis and Prediction program.Maintenance
2022 –GEUSPROMICEDistribution (data hosting)
2018 – 2022GEUSPROMICEDevelopment; publication; distribution




Accessing this data

Introduction

NOTE: Data can be accessed directly from the NetCDF files. Querying the NetCDF files directly allows more advanced queries on the metadata, for example, `all outlets with Jakobshavn Isbræ as the nearest discharge gate, excluding outlets more than 5 km away`. The `5 km` filter removes stream discharge from Disko Island which has Jakobshavn Isbræ as the nearest discharge gate, but should not be counted as discharge from that basin.

As an example, it is easiest to begin working with the outlets, save subsetted data, visually check in QGIS, and then when your algorithm appears to work, apply the same query to the discharge NetCDF files. Example:

import pandas as pd
import geopandas as gp

df = pd.read_csv('./freshwater/ice/outlets.csv', index_col=0)
gdf = gp.GeoDataFrame(df, geometry=gp.points_from_xy(df['lon'],df['lat']))

# select subglacial discharge within 2.5 km of basins
gdf = gdf[(gdf['elev'] < -10) &
          (gdf['M2019_ID_dist'] < 2500)]

gdf.to_file("foo.gpkg", driver="GPKG")

Similar queries might include:

  • Pandas groupby to combine outlets per gate, basin, sector, or region
  • Examining the ice outlet location, and the downstream coastal outlet location. If the two are the same, then the outlet is marine terminating. This may give better results than querying based on the BedMachine provided elev metadata.

If you prefer to not access the NetCDF files directly, after the data have been downloaded the discharge.py script allows access to outlets, basins, and their discharge within a region of interest (ROI). The ROI can be a point, a list describing a polygon, or a file. Optionally, upstream outlets, basins, and discharge from any land outlet(s) can be included. The script can be called from the command line (CLI) or within Python.

The ROI coordinate units can be either EPSG:4326 (lon,lat) or EPSG:3413. The units for the coordinates are guessed using the range of values. If the ROI is a point, basins that contain that point are selected. Either 1 (if the point is on land) or two (ice and the downstream land, if the point is on the ice) basins are selected, and optionally, all ice basins upstream from the one land basin. If the ROI is a polygon, all outlets within the polygon are selected. The polygon does not have to be closed - a convex hull is wrapped around it. If the argument is a file (e.g. KML file) then the first polygon is selected and used.

When the script is run from the command line, CSV data is written to stdout and can be redirected to a file. When the API is accessed from within Python, if the script is used to access outlets, a GeoPandas GeoDataFrame is returned and can be used for further analysis within Python, or written to any file format supported by GeoPandas or Pandas, for example CSV, or GeoPackage for QGIS. If the script is used to access discharge, an xarray Dataset is returned, and can be used for further analysis within Python, or written to any file format supported by xarray, for example CSV or NetCDF.

Database Format

  • The cat column in the CSVs file links to the station vector in the NetCDF.

This script queries two database:

land
The land coast outlets and land basins.
ice
ice margin outlets and ice basins.

The folder structure required is a root folder (named freshwater in the examples below, but can be anything) and then a land and ice sub-folder. The geospatial files for land and ice must be in these folders (i.e. the k=1.0 Streams, Outlets, and Basins dataset from https://dataverse.geus.dk/dataverse/freshwater), along with a MAR.nc and RACMO.nc in each of the land and ice folders.

Example:

./freshwater/ice/
./freshwater/ice/basins.csv
./freshwater/ice/basins_filled.gpkg
./freshwater/ice/basins.gpkg
./freshwater/ice/MAR.nc
./freshwater/ice/outlets.csv
./freshwater/ice/outlets.gpkg
./freshwater/ice/RACMO.nc
./freshwater/ice/streams.csv
./freshwater/ice/streams.gpkg
./freshwater/land/
./freshwater/land/basins.csv
./freshwater/land/basins_filled.gpkg
./freshwater/land/basins.gpkg
./freshwater/land/MAR.nc
./freshwater/land/outlets.csv
./freshwater/land/outlets.gpkg
./freshwater/land/RACMO.nc
./freshwater/land/streams.csv
./freshwater/land/streams.gpkg

Warnings

  • The script takes a few seconds to query the outlets and basins. The script takes ~10s of seconds to query each of the discharge time series datasets. Because there may be up to 6 discharge queries (2 RCMs for each of 1 land domain + ice domain + upstream ice), it can several minutes on a fast laptop to extract the data. To track progress, do not set the quiet flag to True.
  • If a polygon includes ice outlets, and the upstream flag is set, some ice outlets, basins, and discharge may be included twice, once as a “direct” selection within the polygon and once as an upstream outlet and basin from the land polygon. Further processing by the user can remove duplicates (see examples below).
  • The id column may not be unique for multiple reasons:
    • As above, the same outlet may be included twice.
    • id’s are unique within a dataset (i.e. land, and ice), but not between datasets.
  • Due to bash command-line parsing behavior, the syntax --roi -60,60 does not work. Use --roi=-60,06.
  • Longitude is expected in degrees East, and should therefore probably be negative.
  • The cat column in the CSVs file links to the station vector in the NetCDF.
  • If possible, avoid using index-based lookups, and query based on location or station.

Requirements

See environment.yml file in Git repository, or

mamba create -n freshwater_user python=3.7 xarray=0.20.2 fiona=1.8.21 shapely=1.8.2 geopandas=0.7.0 netcdf4=1.6.0 dask=2.15.0
mamba activate freshwater_user

Examples

Command line interface

Usage Instructions

python ./discharge.py -h
usage: discharge.py [-h] --base BASE --roi ROI [-u] (-o | -d) [-q]

Discharge data access

optional arguments:
  -h, --help       show this help message and exit
  --base BASE      Folder containing freshwater data
  --roi ROI        x,y OR lon,lat OR x0,y0 x1,y1 ... xn,yn OR lon0,lat0 lon1,lat1 ... lon_n,lat_n. [lon: degrees E]
  -u, --upstream   Include upstream ice outlets draining into land basins
  -o, --outlets    Return outlet IDs (same as basin IDs)
  -d, --discharge  Return RCM discharge for each domain (outlets merged)
  -q, --quiet      Be quiet

Outlets and basins

One point

The simplest example is a point, in this case near the Watson River outlet. Because we select one point over land and do not request upstream outlets and basins, only one row should be returned.

python ./discharge.py --base ./freshwater --roi=-50.5,67.2 -o -q
indexidlonlatxyelevZ2012_sectorZ2012_sector_distM2019_IDM2019_ID_distM2019_basinM2019_regionM2020_gateM2020_gate_distB2015_nameB2015_distdomainupstreamcoast_idcoast_loncoast_latcoast_xcoast_y
0121108-51.21967.153-271550-2492150462383207138035ISUNNGUATA-RUSSELLSW195193828Isunnguata Sermia45930landFalse-1-1-1

If we move 10° east to somewhere over the ice, there should be four rows: one for the land outlet and basin, and three more for the three ice scenario:

python ./discharge.py --base ./freshwater --roi=-40.5,67.2 -o -q
indexidlonlatxyelevZ2012_sectorZ2012_sector_distM2019_IDM2019_ID_distM2019_basinM2019_regionM2020_gateM2020_gate_distB2015_nameB2015_distdomainupstreamcoast_idcoast_loncoast_latcoast_xcoast_y
0126875-38.07166.330313650-2580750-187415796630HELHEIMGLETSCHERSE2319650Helheim Gletsjer11776landFalse-1-1-1
167985-38.11066.333311850-2580650-244414177630HELHEIMGLETSCHERSE2317850Helheim Gletsjer10042iceFalse126875-38.07166.330313650-2580750
Polygon covering multiple land and ice outlets

Here a polygon covers several land outlets near the end of a fjord, and several ice outlets of the nearby ice margin. In addition, we request all ice outlets upstream of all selected land basins.

We use the following simple KML file for our ROI (this can be copied-and-pasted into the Google Earth side-bar to see it). Rather than use this file with --roi=/path/to/file.kml, we use the coordinates directly, and demonstrate dropping the last coordinate because the code will wrap any polygon in a convex hull.

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
  <name>Ice and Land Sample</name>
  <Placemark>
    <name>ice and land</name>
    <LineString>
      <tessellate>1</tessellate>
      <coordinates>-51.50,66.93 -51.21,66.74 -49.44,66.91 -49.84,67.18 -51.50,66.93</coordinates>
    </LineString>
  </Placemark>
</Document>
</kml>

In this example, we query for upstream outlets, and for brevity show just the first three and last three lines.

python ./discharge.py --base ./freshwater --roi="-51.50,66.93 -51.21,66.74 -49.44,66.91 -49.84,67.18" -q -u -o | (head -n3 ;tail -n4)
indexidlonlatxyelevZ2012_sectorZ2012_sector_distM2019_IDM2019_ID_distM2019_basinM2019_regionM2020_gateM2020_gate_distB2015_nameB2015_distdomainupstreamcoast_idcoast_loncoast_latcoast_xcoast_y
0122055-50.71367.002-251250-25114502062221847122906ISUNNGUATA-RUSSELLSW195207779Isunnguata Sermia31644landFalse-1-1-1
1122222-50.73566.988-252350-2512850762236837124427ISUNNGUATA-RUSSELLSW195209355Isunnguata Sermia33360landFalse-1-1-1
20367946-49.52166.438-203950-2579550767620400SAQQAP-MAJORQAQ-SOUTHTERRUSSEL_SOUTHQUARUSSELSW262199999Quantum Gletsjer80065iceTrue123466-50.65266.868-250050-2526750
20468014-49.54466.419-205150-258155082562040184SAQQAP-MAJORQAQ-SOUTHTERRUSSEL_SOUTHQUARUSSELSW262197830Quantum Gletsjer78386iceTrue123466-50.65266.868-250050-2526750
20568056-49.53566.407-204850-2582950859620400SAQQAP-MAJORQAQ-SOUTHTERRUSSEL_SOUTHQUARUSSELSW262196497Quantum Gletsjer78340iceTrue123466-50.65266.868-250050-2526750

Discharge

The discharge examples here use the same code as the “outlets and basins” examples above, except we use --discharge rather than --outlet.

One point

The simplest example is a point, in this case near the Watson River outlet. Because we select one point over land and do not request upstream outlets and basins, two time series should be returned: MAR_land and RACMO_land. Rather than showing results for every day from 1958 through 2019, we limit to the header and the first 10 days of June, 2012.

python ./discharge.py --base ./freshwater --roi=-50.5,67.2 -q -d | (head -n1; grep -A9 "^2012-06-01")
timeMAR_landRACMO_land
2012-06-0111.8937550.029936
2012-06-0210.1269990.001237
2012-06-038.1147530.001323
2012-06-043.9705800.000000
2012-06-050.313908-0.001191
2012-06-060.4785920.303289
2012-06-070.3301840.007452
2012-06-082.8577320.193424
2012-06-090.3084890.087070
2012-06-100.3087550.024483
  • If we move 10° east to somewhere over the ice we add two columns: One for each of the two RCMs over the ice domain.
  • If the --upstream flag is set, we add two columns: One for each of the RCMs over the upstream ice domains. Results are summed across outlets per domain.
  • Results are therefore one of the following
    • Two columns: 2 RCM * 1 land domain
    • Four columns: 2 RCM * (1 land + 1 ice domain)
    • Four columns: 2 RCM * (1 land + 1 upstream ice domain)
    • Six columns: 2 RCM * (1 land + 1 ice + 1 upstream ice domain)
Polygon covering multiple land and ice outlets

When querying using an ROI that covers multiple outlets, discharge is summed by domain. Therefore, even if 100s of outlets are within the ROI, either two columns, eight, eight, or fourteen columns are returned depending on the options.

Python API

The python API is similar to the command line interface, but rather than printing results to stdout, returns a GeoPandas GeoDataFrame of outlets, an xarray Dataset of discharge. The discharge is not summed by domain, but instead contains discharge for each outlet.

Outlets and basins

One point

The simplest example is a point, in this case near the Watson River outlet. Because we select one point over land and do not request upstream outlets and basins, only one row should be returned.

from discharge import discharge 
df = discharge(base="./freshwater", roi="-50.5,67.2", quiet=True).outlets()

The df variable is a Pandas GeoDataFrame.

It includes two geometry columns

outlet
A point for the location of the outlet (also available as the x and y columns)
basin
A polygon describing this basin

Because the geometry columns do not display well in tabular form, we drop them.

df.drop(columns=["outlet","basin"])
indexidlonlatxyelevZ2012_sectorZ2012_sector_distM2019_IDM2019_ID_distM2019_basinM2019_regionM2020_gateM2020_gate_distB2015_nameB2015_distdomainupstreamcoast_idcoast_loncoast_latcoast_xcoast_y
0121108-51.218567.1535-271550-2492150462383207138035ISUNNGUATA-RUSSELLSW195193828Isunnguata Sermia45930landFalse-1nannan-1-1
Polygon covering multiple land and ice outlets

Here a polygon covers several land outlets near the end of a fjord, and several ice outlets of the nearby ice margin. In addition, we request all ice outlets upstream of all selected land basins. Results are shown in tabular form and written to geospatial file formats.

from discharge import discharge
df = discharge(base="./freshwater", roi="-51.50,66.93 -51.21,66.74 -49.44,66.91 -49.84,67.18", quiet=True, upstream=True).outlets()

View the first few rows, excluding the geometry columns

df.drop(columns=["outlet","basin"]).head()
indexidlonlatxyelevZ2012_sectorZ2012_sector_distM2019_IDM2019_ID_distM2019_basinM2019_regionM2020_gateM2020_gate_distB2015_nameB2015_distdomainupstreamcoast_idcoast_loncoast_latcoast_xcoast_y
0122055-50.71367.0017-251250-25114502062221847122906ISUNNGUATA-RUSSELLSW195207779Isunnguata Sermia31644landFalse-1nannan-1-1
1122222-50.734666.9884-252350-2512850762236837124427ISUNNGUATA-RUSSELLSW195209355Isunnguata Sermia33360landFalse-1nannan-1-1
2122251-50.774866.985-254150-2513050-162254447126179ISUNNGUATA-RUSSELLSW195209887Isunnguata Sermia34934landFalse-1nannan-1-1
3122275-50.870766.9767-258450-2513550462296827130397ISUNNGUATA-RUSSELLSW195211236Isunnguata Sermia38789landFalse-1nannan-1-1
4122285-50.856966.9764-257850-25136501562291417129862ISUNNGUATA-RUSSELLSW195211209Isunnguata Sermia38336landFalse-1nannan-1-1

View the last few rows:

Note that the domain and upstream columns can be used to subset the table.

df.drop(columns=["outlet","basin"]).tail()
indexidlonlatxyelevZ2012_sectorZ2012_sector_distM2019_IDM2019_ID_distM2019_basinM2019_regionM2020_gateM2020_gate_distB2015_nameB2015_distdomainupstreamcoast_idcoast_loncoast_latcoast_xcoast_y
20167919-49.499666.4435-202950-2578950791620406SAQQAP-MAJORQAQ-SOUTHTERRUSSEL_SOUTHQUARUSSELSW262200758Quantum Gletsjer81191iceTrue123466-50.651766.8677-250050-2526750
20267935-49.538566.4378-204750-2579450764620400SAQQAP-MAJORQAQ-SOUTHTERRUSSEL_SOUTHQUARUSSELSW262199967Quantum Gletsjer79323iceTrue123466-50.651766.8677-250050-2526750
20367946-49.520666.4375-203950-2579550767620400SAQQAP-MAJORQAQ-SOUTHTERRUSSEL_SOUTHQUARUSSELSW262199999Quantum Gletsjer80065iceTrue123466-50.651766.8677-250050-2526750
20468014-49.543666.419-205150-258155082562040184SAQQAP-MAJORQAQ-SOUTHTERRUSSEL_SOUTHQUARUSSELSW262197830Quantum Gletsjer78386iceTrue123466-50.651766.8677-250050-2526750
20568056-49.534666.4068-204850-2582950859620400SAQQAP-MAJORQAQ-SOUTHTERRUSSEL_SOUTHQUARUSSELSW262196497Quantum Gletsjer78340iceTrue123466-50.651766.8677-250050-2526750

Finally, write data to various file formats. GeoPandas DataFrames can only have one geometry, so we must select one and drop the other before writing the file.

df.drop(columns=["outlet","basin"]).to_csv("outlets.csv")
df.set_geometry("outlet").drop(columns="basin").to_file("outlets.gpkg", driver="GPKG")
df.set_geometry("basin").drop(columns="outlet").to_file("basins.gpkg", driver="GPKG")

Discharge

The code here is the same as above from the “Outlets and basins” section, but we call discharge() rather than outlets().

One point

The simplest example is a point, in this case near the Watson River outlet. Because we select one point over land and do not request upstream outlets and basins, only one row should be returned.

from discharge import discharge
ds = discharge(base="./freshwater", roi="-50.5,67.2").discharge()

Print the xarray Dataset:

print(ds)
<xarray.Dataset>
Dimensions:     (land: 1, time: 26663)
Coordinates:
  * time        (time) datetime64[ns] 1950-01-01 1950-01-02 ... 2022-12-31
  * land        (land) uint64 121108
Data variables:
    MAR_land    (time, land) float64 0.0007218 0.0007235 ... 0.6995 0.7007
    RACMO_land  (time, land) float64 nan nan nan nan ... 0.1555 0.1591 0.1549

Display the time series. Unlike the command line interface, here the outlets are not merged.

ds.sel(time=slice('2012-06-01','2012-06-10')).to_dataframe()
MAR_landRACMO_land
(121108, Timestamp(‘2012-06-01 00:00:00’, freq=’D’))11.89380.029936
(121108, Timestamp(‘2012-06-02 00:00:00’, freq=’D’))10.1270.00123702
(121108, Timestamp(‘2012-06-03 00:00:00’, freq=’D’))8.114750.00132286
(121108, Timestamp(‘2012-06-04 00:00:00’, freq=’D’))3.970580
(121108, Timestamp(‘2012-06-05 00:00:00’, freq=’D’))0.313908-0.0011907
(121108, Timestamp(‘2012-06-06 00:00:00’, freq=’D’))0.4785920.303289
(121108, Timestamp(‘2012-06-07 00:00:00’, freq=’D’))0.3301840.00745243
(121108, Timestamp(‘2012-06-08 00:00:00’, freq=’D’))2.857730.193424
(121108, Timestamp(‘2012-06-09 00:00:00’, freq=’D’))0.3084890.0870701
(121108, Timestamp(‘2012-06-10 00:00:00’, freq=’D’))0.3087550.0244829

In order to merge the outlets, select all coordinates that are not time and merge them. Also, apply a rolling mean:

dims = [_ for _ in ds.dims.keys() if _ != 'time']  # get all dimensions except the time dimension
ds.sum(dim=dims)\
  .rolling(time=7)\
  .mean()\
  .sel(time=slice('2012-06-01','2012-06-10'))\
  .to_dataframe()
timeMAR_landRACMO_land
2012-06-01 00:00:0030.6441.39377
2012-06-02 00:00:0031.10311.2407
2012-06-03 00:00:0027.59090.458691
2012-06-04 00:00:0021.04250.157925
2012-06-05 00:00:0014.34860.0893565
2012-06-06 00:00:008.402020.0880673
2012-06-07 00:00:005.032680.0488637
2012-06-08 00:00:003.741820.0722192
2012-06-09 00:00:002.339180.084481
2012-06-10 00:00:001.224030.0877896
Polygon covering multiple land and ice outlets

Here a polygon covers several land outlets near the end of a fjord, and several ice outlets of the nearby ice margin. In addition, we request all ice outlets upstream of all selected land basins.

from discharge import discharge
ds = discharge(base="./freshwater", roi="-51.50,66.93 -51.21,66.74 -49.44,66.91 -49.84,67.18", quiet=True, upstream=True).discharge()

What are the dimensions (i.e. how many outlets in each domain?)

print(ds)
<xarray.Dataset>
Dimensions:             (ice: 33, ice_upstream: 85, land: 88, time: 26663)
Coordinates:
  * ice_upstream        (ice_upstream) uint64 66407 66414 66416 ... 68014 68056
  * time                (time) datetime64[ns] 1950-01-01 ... 2022-12-31
  * land                (land) uint64 122055 122222 122251 ... 123897 123926
  * ice                 (ice) uint64 66425 66427 66444 ... 66595 66596 66639
Data variables:
    MAR_land            (time, land) float64 0.0002109 1.244e-06 ... 0.005236
    MAR_ice             (time, ice) float64 2.94e-16 2.026e-17 ... 2.785e-18
    RACMO_land          (time, land) float64 nan nan nan ... 0.001346 0.1365
    RACMO_ice           (time, ice) float64 nan nan nan ... 0.0001123 0.004071
    MAR_ice_upstream    (time, ice_upstream) float64 1.261e-17 ... 1.855e-17
    RACMO_ice_upstream  (time, ice_upstream) float64 nan nan ... 5.79e-05

With these results:

  • Sum all outlets within each domain
  • Drop the land discharge and the upstream domains (keep only ice discharge explicitly within our ROI)
  • Apply a 5-day rolling mean
  • Plot 2012 discharge season
d = [_ for _ in ds.dims.keys() if _ != 'time'] # dims for summing (don't sum time dimension)
v = [_ for _ in ds.data_vars if ('land' in _) | ('_u' in _)] # vars containing '_u'

r = ds.sum(dim=d)\
      .drop_vars(v)\
      .rolling(time=5).mean()

import matplotlib.pyplot as plt
plt.style.use('seaborn')

for d in r.data_vars: r[d].sel(time=slice('2012-04-01','2012-11-15')).plot(drawstyle='steps', label=d)
_ = plt.legend()
plt.savefig("./fig/api_example.png", bbox_inches='tight')

./fig/api_example.png