Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the Pan-Arctic Catchment Database layer #29

Open
robyngit opened this issue Sep 20, 2022 · 8 comments
Open

Add the Pan-Arctic Catchment Database layer #29

robyngit opened this issue Sep 20, 2022 · 8 comments
Labels
data available The complete dataset is on the datateam server and ready to process layer Displaying a specific data product in the PDG portal pdg Permafrost Discovery Gateway Water data layer category: water

Comments

@robyngit
Copy link
Member

Paper: The Pan-Arctic Catchment Database (ARCADE) (pre-print)
Data: via dataverse - currently "under review" & not available to download
DOI: https://doi.org/10.5194/essd-2022-269

@robyngit robyngit added layer Displaying a specific data product in the PDG portal pdg Permafrost Discovery Gateway labels Sep 20, 2022
@robyngit
Copy link
Member Author

Note: Anna has contacted Niek and Gustaf to request access to the data

@robyngit
Copy link
Member Author

robyngit commented Feb 3, 2023

@julietcohen is this one the same as #36?

@julietcohen
Copy link

@robyngit Yes, it is, oops! Let's retain this issue, and I'll close the other one. Now that the paper has been published and the datasets have been archived, here is more information (copied over from the duplicate issue):

  • Anna suggested adding the pan-ARctic CAtchment DatabasE (ARCADE) which represents watershed boundaries. This dataset is publicly available on DataverseNL. The accompanying paper is here.
  • The metadata for the dataset is not very specific about the differences between the files ARCADE_V1_36_1KM.ZIP and ARCADE_V1_37_1KM.ZIP. Reading in these shapefiles should reveal which is more applicable for the PDG, if not both.
  • ARCADE_V1_36_1KM.ZIP is 123.9 MB and described as:
    "Shapefile of catchments draining into the Arctic Ocean, Strahler order 5 or up and at least 1km^2 area size. Now with mean and Sen-slope of annual maximum MOD13A1 v006 NDVI (2000-2021) instead of August mean."
  • ARCADE_V1_37_1KM.ZIP is 125.8 MB and described as:
    "ARCADE_v1_37_1km, now includes CGLS-LC100v3 land cover product data."
  • Each has an excel sheet that describes the variables.

@julietcohen
Copy link

julietcohen commented Feb 8, 2023

Initial Dataset Exploration

See datateam: /home/jcohen/PDG_ARCADE_layer/data_explore.ipynb

ARCADE_v1_36_1km.shp

  • large shapefile: took 19 minutes to read in with geopandas
  • ~47,000 rows, 261 columns
  • columns are described in the metadata excel file associated with this file, in tabs
  • geometry column contains two geometry types: some are POLYGON and some are MULTIPOLYGON

Plot:
image

ARCADE_v1_37_1km.shp

  • took 23 minutes to read in with geopandas
  • columns are described in the metadata excel file associated with this file, in tabs
  • geometry column also contains two geometry types

Plot:
image

So cool!

Using geopandas difference() reveals their geometry columns are the same:

diff1 = data_36['geometry'].difference(data_37['geometry'])
diff2 = data_37['geometry'].difference(data_36['geometry'])
# both output geoseries contain only 1 value: POLYGON EMPTY

@robyngit
Copy link
Member Author

Niek Jesse Speetjens reached out today to ensure that we're aware that the paper (originally pre-print) is now published and the data are publicly available. He also mentioned that he's considering coming up with a second version of this dataset at some point.

@julietcohen
Copy link

@robyngit Thanks for responding to Niek's email and connecting us

I took another quick look at this dataset to see if I have any questions before I email Niek back.

  • There are 261 attributes for this data, and some are cryptically named while others we can kinda guess what they represent (examples: pf_frac, area_km2, iwp_frac , soilt_23, soilt_24, terrain_0). Unfortunately, I do not see any attribute descriptions in the metadata. I'll reach out to Niek to let him know I don't know where to find them.
  • some of these attributes have NA values like the terrain_# ones
  • we can start with visualizing pf_frac or area_km2 since both seem to be relevant to the PDG and we know the units of both.
  • the range of pf_frac is [0, 1], has 1 NA value
  • the range of area_km2 is [1.004, 3117158.519], has no NA values
  • as I mentioned in an earlier comment, there are mostly polygons (84%) but some rows are multipolygons, so before processing this data I will need to clean it by exploding those multipolygons

@julietcohen
Copy link

I did find the metadata that describes the attributes, in excel files S1_ARCADE_v1_37_1km.xlsx and S1_ARCADE_v1_36_1km.xlsx that can be downloaded from the metadata page linked above. The metadata includes the label, unit, and a short description and all the attributes are separated into various categories that are tabs in the excel sheet. I like the clear way the ADC documents attributes better.

@julietcohen
Copy link

This dataset has the same issue as the Circum-Arctic permafrost and ground ice dataset: there are polygons that cross the antimeridian, and they become distorted in the exact same way (wrap the opposite way around the earth) when the data is transformed from its original CRS (in this case, it is EPSG:6931) into EPSG:4326 which is the CRS of the TMS of the viz workflow. I confirmed that when these polygons are removed before transforming, the remaining polygons are transformed smoothly. As part of my email response to Niek, I said:

So before we can visualize your data, I will do some preprocessing:

  • splitting all multipolygon geometries into singular geometries (this is easy with GeoPandas explode)
  • removing any rows with NaN values for attributes of interest
  • spit the polygons that intersect the antimeridian

These first two preprocessing steps are common with many datasets we visualize, so I'm working towards integrating these steps into our generalized workflow. The last step is trickier. I have been working on the code to effectively split the polygons at the antimeridian while the data is still in its original CRS to avoid distorting any of them in the CRS transformation. I will test my code on these polygons to see if that allows a clean transformation. Alternatively, removing these polygons altogether (instead of splitting them) allows all other polygons to be processed and visualized, but Anna expressed her preference is to retain all data from the input dataset before visualization. This makes sense to me as well! However, if you have a version of this dataset in another CRS, such as EPSG:4326, please let me know as that would be a simpler solution to getting your data on the portal.

In the other dataset issue, I did document code that I wrote to split the polygons at the antimeridian. But it did not work for that dataset. I can see if modifying that code will work on this dataset, and hopefully I can eventually integrate a generalized fix for this into viz-staging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data available The complete dataset is on the datateam server and ready to process layer Displaying a specific data product in the PDG portal pdg Permafrost Discovery Gateway Water data layer category: water
Projects
Status: Ready
Development

No branches or pull requests

3 participants