Add the Pan-Arctic Catchment Database layer #29

robyngit · 2022-09-20T17:54:32Z

Paper: The Pan-Arctic Catchment Database (ARCADE) (pre-print)
Data: via dataverse - currently "under review" & not available to download
DOI: https://doi.org/10.5194/essd-2022-269

robyngit · 2022-09-30T16:18:37Z

Note: Anna has contacted Niek and Gustaf to request access to the data

robyngit · 2023-02-03T20:35:42Z

@julietcohen is this one the same as #36?

julietcohen · 2023-02-06T17:40:27Z

@robyngit Yes, it is, oops! Let's retain this issue, and I'll close the other one. Now that the paper has been published and the datasets have been archived, here is more information (copied over from the duplicate issue):

Anna suggested adding the pan-ARctic CAtchment DatabasE (ARCADE) which represents watershed boundaries. This dataset is publicly available on DataverseNL. The accompanying paper is here.
The metadata for the dataset is not very specific about the differences between the files ARCADE_V1_36_1KM.ZIP and ARCADE_V1_37_1KM.ZIP. Reading in these shapefiles should reveal which is more applicable for the PDG, if not both.
ARCADE_V1_36_1KM.ZIP is 123.9 MB and described as:
"Shapefile of catchments draining into the Arctic Ocean, Strahler order 5 or up and at least 1km^2 area size. Now with mean and Sen-slope of annual maximum MOD13A1 v006 NDVI (2000-2021) instead of August mean."
ARCADE_V1_37_1KM.ZIP is 125.8 MB and described as:
"ARCADE_v1_37_1km, now includes CGLS-LC100v3 land cover product data."
Each has an excel sheet that describes the variables.

julietcohen · 2023-02-08T22:24:57Z

Initial Dataset Exploration

See datateam: /home/jcohen/PDG_ARCADE_layer/data_explore.ipynb

ARCADE_v1_36_1km.shp

large shapefile: took 19 minutes to read in with geopandas
~47,000 rows, 261 columns
columns are described in the metadata excel file associated with this file, in tabs
geometry column contains two geometry types: some are POLYGON and some are MULTIPOLYGON

Plot:

ARCADE_v1_37_1km.shp

took 23 minutes to read in with geopandas
columns are described in the metadata excel file associated with this file, in tabs
geometry column also contains two geometry types

Plot:

So cool!

Using geopandas difference() reveals their geometry columns are the same:

diff1 = data_36['geometry'].difference(data_37['geometry'])
diff2 = data_37['geometry'].difference(data_36['geometry'])
# both output geoseries contain only 1 value: POLYGON EMPTY

robyngit · 2024-03-11T15:56:14Z

Niek Jesse Speetjens reached out today to ensure that we're aware that the paper (originally pre-print) is now published and the data are publicly available. He also mentioned that he's considering coming up with a second version of this dataset at some point.

julietcohen · 2024-03-11T18:03:20Z

@robyngit Thanks for responding to Niek's email and connecting us

I took another quick look at this dataset to see if I have any questions before I email Niek back.

There are 261 attributes for this data, and some are cryptically named while others we can kinda guess what they represent (examples: pf_frac, area_km2, iwp_frac , soilt_23, soilt_24, terrain_0). Unfortunately, I do not see any attribute descriptions in the metadata. I'll reach out to Niek to let him know I don't know where to find them.
some of these attributes have NA values like the terrain_# ones
we can start with visualizing pf_frac or area_km2 since both seem to be relevant to the PDG and we know the units of both.
the range of pf_frac is [0, 1], has 1 NA value
the range of area_km2 is [1.004, 3117158.519], has no NA values
as I mentioned in an earlier comment, there are mostly polygons (84%) but some rows are multipolygons, so before processing this data I will need to clean it by exploding those multipolygons

julietcohen · 2024-03-11T18:55:03Z

I did find the metadata that describes the attributes, in excel files S1_ARCADE_v1_37_1km.xlsx and S1_ARCADE_v1_36_1km.xlsx that can be downloaded from the metadata page linked above. The metadata includes the label, unit, and a short description and all the attributes are separated into various categories that are tabs in the excel sheet. I like the clear way the ADC documents attributes better.

julietcohen · 2024-03-12T00:08:09Z

This dataset has the same issue as the Circum-Arctic permafrost and ground ice dataset: there are polygons that cross the antimeridian, and they become distorted in the exact same way (wrap the opposite way around the earth) when the data is transformed from its original CRS (in this case, it is EPSG:6931) into EPSG:4326 which is the CRS of the TMS of the viz workflow. I confirmed that when these polygons are removed before transforming, the remaining polygons are transformed smoothly. As part of my email response to Niek, I said:

So before we can visualize your data, I will do some preprocessing:

splitting all multipolygon geometries into singular geometries (this is easy with GeoPandas explode)

removing any rows with NaN values for attributes of interest

spit the polygons that intersect the antimeridian

These first two preprocessing steps are common with many datasets we visualize, so I'm working towards integrating these steps into our generalized workflow. The last step is trickier. I have been working on the code to effectively split the polygons at the antimeridian while the data is still in its original CRS to avoid distorting any of them in the CRS transformation. I will test my code on these polygons to see if that allows a clean transformation. Alternatively, removing these polygons altogether (instead of splitting them) allows all other polygons to be processed and visualized, but Anna expressed her preference is to retain all data from the input dataset before visualization. This makes sense to me as well! However, if you have a version of this dataset in another CRS, such as EPSG:4326, please let me know as that would be a simpler solution to getting your data on the portal.

In the other dataset issue, I did document code that I wrote to split the polygons at the antimeridian. But it did not work for that dataset. I can see if modifying that code will work on this dataset, and hopefully I can eventually integrate a generalized fix for this into viz-staging

robyngit added layer Displaying a specific data product in the PDG portal pdg Permafrost Discovery Gateway labels Sep 20, 2022

julietcohen mentioned this issue Feb 6, 2023

Add the pan-ARctic CAtchment DatabasE (ARCADE) dataset #36

Closed

julietcohen added the data available The complete dataset is on the datateam server and ready to process label Apr 18, 2023

mbjones added this to Data Layers Jan 11, 2024

elongano added the Water data layer category: water label Jan 29, 2024

julietcohen mentioned this issue Feb 6, 2024

Check for invalid data values before staging and integrate removal of observation or fix for observation PermafrostDiscoveryGateway/viz-staging#21

Open

mbjones moved this to Ready in Data Layers Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the Pan-Arctic Catchment Database layer #29

Add the Pan-Arctic Catchment Database layer #29

robyngit commented Sep 20, 2022

robyngit commented Sep 30, 2022

robyngit commented Feb 3, 2023

julietcohen commented Feb 6, 2023

julietcohen commented Feb 8, 2023 •

edited

Loading

robyngit commented Mar 11, 2024

julietcohen commented Mar 11, 2024

julietcohen commented Mar 11, 2024

julietcohen commented Mar 12, 2024

Add the Pan-Arctic Catchment Database layer #29

Add the Pan-Arctic Catchment Database layer #29

Comments

robyngit commented Sep 20, 2022

robyngit commented Sep 30, 2022

robyngit commented Feb 3, 2023

julietcohen commented Feb 6, 2023

julietcohen commented Feb 8, 2023 • edited Loading

Initial Dataset Exploration

ARCADE_v1_36_1km.shp

ARCADE_v1_37_1km.shp

robyngit commented Mar 11, 2024

julietcohen commented Mar 11, 2024

julietcohen commented Mar 11, 2024

julietcohen commented Mar 12, 2024

julietcohen commented Feb 8, 2023 •

edited

Loading