Add aggregation datasets for lake time series data: Permafrost Database for Northern Alaska and Circumpolar Thermokarst Landscapes #37

initze · 2023-04-05T09:02:50Z

Datasets for lake statistics aggregation

Regional Dataset

Jorgenson: Ecological Mapping and Permafrost Database for Northern Alaska (Final Ecological Mapping Update (2014))

Archive/Dataset Link: https://catalog.northslopescience.org/dataset/2236

For data aggregation I would propose to use the field "ECOREGION" and "LITHOLOGY" to start with, i guess once set up we could add others

Pan-Arctic Dataset

Olefeldt: Circumpolar Thermokarst Landscapes, 2015, Circum-Arctic

[Data Download Link (GPKG), with fixed geometries (I had some issues with original file)] https://1drv.ms/u/s!AobXXrP933xWh8lqz5It06Zf8AT9JA?e=yBQaDY

Archive/Dataset Link: https://apgc.awi.de/dataset/ctl-arctic

For data aggregation I would propose to use the field "TKThLP"

initze · 2023-04-05T10:59:35Z

Google slides with some simple sketches for data visualizations.

in progress --> will be updated from time-to-time

https://docs.google.com/presentation/d/1egijizjw9boYxuuMBpfGwE7tX4M4KmYiJq0wbocO3Zc/edit#slide=id.g22b5ba41013_0_0

initze · 2023-04-05T13:50:06Z

Lake area time-series

Here we have 2 datasets:
GDrive Link: https://drive.google.com/drive/folders/1RH3G-u6mKRSZ82Fok9I1CPUpK8vj8EtH?usp=share_link

Both datasets can be joined via the ID_merged attribute

INitze_Lakes_merged_v3_PDG_Testset.gpkg

Geopackage file with polygon geometries with static lake change statistics for the period 1999-2014 from Nitze et al 2018
published dataset with some cleaning, merged 4 transects

Lakes_IngmarPDG_annual_area.nc

lake area (perennial + seasonal) in ha per lake (polygon) of dataset 1 in ha taken from JRC Surface water dataset v1.4 (https://developers.google.com/earth-engine/datasets/catalog/JRC_GSW1_4_GlobalSurfaceWater)
Coordinates are ID_merged (main identifier), as well as year (integer, might make sense to transform it to datetime)
I chose nc/xarray for its support for high-dimensional data

julietcohen · 2023-04-05T16:46:51Z

Thank you for this dataset, metadata, and suggestions for processing, Ingmar! Very much looking forward to visualizing this on the PDG.

julietcohen · 2023-04-06T00:28:57Z

The 2 files for lake area time series have been uploaded to the NCEAS datateam server: /home/pdg/data/nitze_lake_change/time_series_2023-04-05

julietcohen · 2023-07-19T20:23:52Z

Update: After rearranging Datateam directories, the data now is on Datateam at: /var/data/submission/pdg/nitze_lake_change/time_series_2023-04-05

julietcohen · 2023-07-25T18:10:59Z

Initial exploration

GeoPackage:

geometries that represent lake data in non-adjacent UTM zones:

projected CRS: EPSG 3995, WGS 84 Arctic Polar Stereographic
601,486 unique values for ID_merged
has 114,821 NA values in each of the columns: raten_cm_y & rateg_cm_y, across 114,821 unique ID values

NetCDF:

601,286 unique values for ID_merged
no geometries
several years of data for each ID (1984-1988, 2017-2021, ...) with 38 unique years represented
data for annual permanent water and seasonal water
has 3,800 NA values for both attributes
- the NA values span across all 38 years, but are only present for 100 ID values (38*100=3,800)

julietcohen · 2023-07-26T17:25:34Z

Combined data: Lake Area Time Series

Converted the NetCDF file to a dataframe:

Combined with the GeoPackage with an inner join, retaining only the ID_merged values that are within both files. The merged data lives in:
/var/data/submission/pdg/nitze_lake_change/time_series_2023-04-05/merged_lakes_area.gpkg

merge.py

# Merge the two lake change datasets from Ingmar Nitze
# Author: Juliet Cohen
# Date: 07/26/23
# use conda environment "geospatial"

import geopandas as gpd
import xarray as xar
import pandas as pd

# Lake area time series data GeoPackage
gdf = gpd.read_file("/var/data/submission/pdg/nitze_lake_change/time_series_2023-04-05/INitze_Lakes_merged_v3_PDG_Testset.gpkg")
# Lake area time series data NetCDF
area = xar.open_dataset("/var/data/submission/pdg/nitze_lake_change/time_series_2023-04-05/Lakes_IngmarPDG_annual_area.nc")

# convert the NetCDF file into a dataframe, with columns for:
# ID_merged, year, permanent_water, seasonal_water
area_df = area.to_dataframe().reset_index()

# merge the dataframe, retaining only ID_merged values (lakes) that exist in both files
# the spatial dataframe must be the left argument to retain the geometries
merged_data = gdf.merge(right = area_df,
                        how = 'inner', # only retain lakes with data for the measurements for permanent water and seasonal water 
                        on = 'ID_merged')

# Save as a gpkg for input into the viz-workflow
merged_data.to_file("/var/data/submission/pdg/nitze_lake_change/time_series_2023-04-05/merged_lakes_area.gpkg", driver = "GPKG")

Next step:
Set up a parsl run to process this file for initial testing, with no deduplication and max z-level 11. The gpkg is quite large, so reading it in take a long time. When working with the entire lake change time series dataset (rather than just these few UTM zones) it is a good idea to separate the merged gpkg files by UTM zone so they are easier to parallelize with batches during staging.

julietcohen · 2023-12-08T18:59:39Z

Update on lake area time series visualization

subset the merged gpkg data by transect (so 4 files), in order to save as files and read it in on Datateam in a reasonable amount of time for initial workflow testing
since the data is annual and there are 38 unique years, the first 38 polygons have the exact same geometries and much of the same attribute values
T2 is the smallest of the 4 subset files, but still took 40 min to read it in (!), so I subset that further to a file in my home dir for workflow testing with an even smaller subset (760 rows to get 20 unique polygons, with 0 NA values in all columns woohoo)
multipolygon geometries were exploded into a single row per polygon for this small subset. This needs to be done for all merged data prior to input into the workflow for the full run. However, looks like (based on just the sample of rows I'm working with) the polygons were already single polygons but labeled as multipolygons. I haven't seen this before but it's not important at this moment because explode does not error in this case and works as expected.
attribute descriptions would be helpful, since most are acronyms and are therefore hard to decipher, and we'll need them eventually anyway for the metadata, even if I don't need them all for visualizing
the nodata_val must be 0 until I merge the branch that allows the nodata_val to be a different value, so best to choose an attribute to visualize that has all positive values for now

julietcohen · 2023-12-12T03:13:17Z

First draft of lake size time series dataset

attribute: permanent_water (need units from Ingmar)
only transect 2 (there are 4 in total)
polygon colors are retained as user zooms in ✅
need to choose more intuitive palette (discuss with Ingmar)
consider using higher z-level than max 11 (discuss with Ingmar)
explanation of how the years are represented, the polygons are identical for the years but values change for attribute(s) of interest, which change the color

config

{
  "deduplicate_clip_to_footprint": false,
  "deduplicate_method": "None",
  "deduplicate_at": "None",
  "dir_output": "/home/jcohen/lake_change_time_series/parsl_workflow/", 
  "dir_input": "/var/data/submission/pdg/nitze_lake_change/time_series_2023-04-05/exploded/", 
  "ext_input": ".gpkg",
  "dir_staged": "/home/jcohen/lake_change_time_series/parsl_workflow/staged/", 
  "dir_geotiff": "/home/jcohen/lake_change_time_series/parsl_workflow/geotiff/",  
  "dir_web_tiles": "/home/jcohen/lake_change_time_series/parsl_workflow/web_tiles/", 
  "filename_staging_summary": "/home/jcohen/lake_change_time_series/parsl_workflow/staging_summary.csv",
  "filename_rasterization_events": "/home/jcohen/lake_change_time_series/parsl_workflow/raster_events.csv",
  "filename_rasters_summary": "/home/jcohen/lake_change_time_series/parsl_workflow/raster_summary.csv",
  "simplify_tolerance": 0.1,
  "tms_id": "WGS1984Quad",
  "z_range": [
    0,
    11
  ],
  "geometricError": 57,
  "z_coord": 0,
  "statistics": [
    {
      "name": "permanent_water", 
      "weight_by": "area",
      "property": "permanent_water", 
      "aggregation_method": "max",
      "resampling_method": "mode", 
      "val_range": [
        0,
        4000
      ],
      "palette": [
        "#fae500",
        "#00d8db",
        "#2100db",
        "#9a00fa"
      ],
      "nodata_val": 0,
      "nodata_color": "#ffffff00" 
    }
  ]
}

parsl workflow

# Processing the merged lake change time series data sample from Ingmar
# see issue: https://github.com/PermafrostDiscoveryGateway/pdg-portal/issues/37
# staging through web tiling with parsl
# conda environment: viz-local, with local installs
# for viz-staging and viz-raster, main branch
# and pip install for parsl==2023.11.27

from datetime import datetime
import json
import logging
import logging.handlers
import os

import pdgstaging
import pdgraster

import parsl
from parsl import python_app
from parsl.config import Config
from parsl.channels import LocalChannel
from parsl.executors import HighThroughputExecutor
from parsl.providers import LocalProvider

import subprocess
from subprocess import Popen
user = subprocess.check_output("whoami").strip().decode("ascii")

# configure logger
logger = logging.getLogger("logger")
# Remove any existing handlers from the logger
for handler in logger.handlers[:]:
    logger.removeHandler(handler)
# prevent logging statements from being printed to terminal
logger.propagate = False
# set up new handler
handler = logging.FileHandler("/tmp/log.log")
formatter = logging.Formatter(logging.BASIC_FORMAT)
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)


activate_conda = 'source /home/jcohen/.bashrc; conda activate viz-local'
htex_local = Config(
    executors=[
        HighThroughputExecutor(
            label="htex_local",
            worker_debug=False,
            #cores_per_worker=1,
            max_workers=10,
            provider=LocalProvider(
                #channel=LocalChannel(),
                init_blocks=1,
                max_blocks=1,
                worker_init=activate_conda
            ),
        )
    ],
)
parsl.clear()
parsl.load(htex_local)

def run_pdg_workflow(
    workflow_config,
    batch_size = 300
):
    """
    Run the main PDG workflow for the following steps:
    1. staging
    2. raster highest
    3. raster lower
    4. web tiling

    Parameters
    ----------
    workflow_config : dict
        Configuration for the PDG staging workflow, tailored to rasterization and 
        web tiling steps only.
    batch_size: int
        How many staged files, geotiffs, or web tiles should be included in a single creation
        task? (each task is run in parallel) Default: 300
    """

    start_time = datetime.now()

    logging.info("Staging initiated.")

    stager = pdgstaging.TileStager(workflow_config)
    #tile_manager = rasterizer.tiles
    tile_manager = stager.tiles
    config_manager = stager.config

    input_paths = stager.tiles.get_filenames_from_dir('input')
    input_batches = make_batch(input_paths, batch_size)

    # Stage all the input files (each batch in parallel)
    app_futures = []
    for i, batch in enumerate(input_batches):
        app_future = stage(batch, workflow_config)
        app_futures.append(app_future)
        logging.info(f'Started job for batch {i} of {len(input_batches)}')

    # Don't continue to next step until all files have been staged
    [a.result() for a in app_futures]

    logging.info("Staging complete.")

    # ----------------------------------------------------------------

    # Create highest geotiffs 
    rasterizer = pdgraster.RasterTiler(workflow_config)

    # Process staged files in batches
    logging.info(f'Collecting staged file paths to process...')
    staged_paths = tile_manager.get_filenames_from_dir('staged')
    logging.info(f'Found {len(staged_paths)} staged files to process.')
    staged_batches = make_batch(staged_paths, batch_size)
    logging.info(f'Processing staged files in {len(staged_batches)} batches.')

    app_futures = []
    for i, batch in enumerate(staged_batches):
        app_future = create_highest_geotiffs(batch, workflow_config)
        app_futures.append(app_future)
        logging.info(f'Started job for batch {i} of {len(staged_batches)}')

    # Don't move on to next step until all geotiffs have been created
    [a.result() for a in app_futures]

    logging.info("Rasterization highest complete. Rasterizing lower z-levels.")

    # ----------------------------------------------------------------

    # Rasterize composite geotiffs
    min_z = config_manager.get_min_z()
    max_z = config_manager.get_max_z()
    parent_zs = range(max_z - 1, min_z - 1, -1)

    # Can't start lower z-level until higher z-level is complete.
    for z in parent_zs:

        # Determine which tiles we need to make for the next z-level based on the
        # path names of the geotiffs just created
        logging.info(f'Collecting highest geotiff paths to process...')
        child_paths = tile_manager.get_filenames_from_dir('geotiff', z = z + 1)
        logging.info(f'Found {len(child_paths)} highest geotiffs to process.')
        # create empty set for the following loop
        parent_tiles = set()
        for child_path in child_paths:
            parent_tile = tile_manager.get_parent_tile(child_path)
            parent_tiles.add(parent_tile)
        # convert the set into a list
        parent_tiles = list(parent_tiles)

        # Break all parent tiles at level z into batches
        parent_tile_batches = make_batch(parent_tiles, batch_size)
        logging.info(f'Processing highest geotiffs in {len(parent_tile_batches)} batches.')

        # Make the next level of parent tiles
        app_futures = []
        for parent_tile_batch in parent_tile_batches:
            app_future = create_composite_geotiffs(
                parent_tile_batch, workflow_config)
            app_futures.append(app_future)

        # Don't start the next z-level, and don't move to web tiling, until the
        # current z-level is complete
        [a.result() for a in app_futures]

    logging.info("Composite rasterization complete. Creating web tiles.")

    # ----------------------------------------------------------------

    # Process web tiles in batches
    logging.info(f'Collecting file paths of geotiffs to process...')
    geotiff_paths = tile_manager.get_filenames_from_dir('geotiff')
    logging.info(f'Found {len(geotiff_paths)} geotiffs to process.')
    geotiff_batches = make_batch(geotiff_paths, batch_size)
    logging.info(f'Processing geotiffs in {len(geotiff_batches)} batches.')

    app_futures = []
    for i, batch in enumerate(geotiff_batches):
        app_future = create_web_tiles(batch, workflow_config)
        app_futures.append(app_future)
        logging.info(f'Started job for batch {i} of {len(geotiff_batches)}')

    # Don't record end time until all web tiles have been created
    [a.result() for a in app_futures]

    end_time = datetime.now()
    logging.info(f'⏰ Total time to create all z-level geotiffs and web tiles: '
                 f'{end_time - start_time}')

# ----------------------------------------------------------------

# Define the parsl functions used in the workflow:

@python_app
def stage(paths, config):
    """
    Stage a file
    """
    from datetime import datetime
    import json
    import logging
    import logging.handlers
    import os
    import pdgstaging

    # configure logger:
    logger = logging.getLogger("logger")
    # Remove any existing handlers from the logger
    for handler in logger.handlers[:]:
        logger.removeHandler(handler)
    # prevent logging statements from being printed to terminal
    logger.propagate = False
    # set up new handler
    handler = logging.FileHandler("/tmp/log.log")
    formatter = logging.Formatter(logging.BASIC_FORMAT)
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    logger.setLevel(logging.INFO)

    stager = pdgstaging.TileStager(config)
    for path in paths:
        stager.stage(path)
    return True

# Create highest z-level geotiffs from staged files
@python_app
def create_highest_geotiffs(staged_paths, config):
    """
    Create a batch of geotiffs from staged files
    """
    from datetime import datetime
    import json
    import logging
    import logging.handlers
    import os
    import pdgraster

    # configure logger:
    logger = logging.getLogger("logger")
    # Remove any existing handlers from the logger
    for handler in logger.handlers[:]:
        logger.removeHandler(handler)
    # prevent logging statements from being printed to terminal
    logger.propagate = False
    # set up new handler
    handler = logging.FileHandler("/tmp/log.log")
    formatter = logging.Formatter(logging.BASIC_FORMAT)
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    logger.setLevel(logging.INFO)

    # rasterize the vectors, highest z-level only
    rasterizer = pdgraster.RasterTiler(config)
    return rasterizer.rasterize_vectors(
        staged_paths, make_parents = False)
    # no need to update ranges because only using coverage statistic, which ranges 0-1

# ----------------------------------------------------------------

# Create composite geotiffs from highest z-level geotiffs 
@python_app
def create_composite_geotiffs(tiles, config):
    """
    Create a batch of composite geotiffs from highest geotiffs
    """
    from datetime import datetime
    import json
    import logging
    import logging.handlers
    import os
    import pdgraster

    # configure logger:
    logger = logging.getLogger("logger")
    # Remove any existing handlers from the logger
    for handler in logger.handlers[:]:
        logger.removeHandler(handler)
    # prevent logging statements from being printed to terminal
    logger.propagate = False
    # set up new handler
    handler = logging.FileHandler("/tmp/log.log")
    formatter = logging.Formatter(logging.BASIC_FORMAT)
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    logger.setLevel(logging.INFO)

    rasterizer = pdgraster.RasterTiler(config)
    return rasterizer.parent_geotiffs_from_children(
        tiles, recursive = False)

# ----------------------------------------------------------------

# Create a batch of webtiles from geotiffs
@python_app
def create_web_tiles(geotiff_paths, config):
    """
    Create a batch of webtiles from geotiffs
    """

    from datetime import datetime
    import json
    import logging
    import logging.handlers
    import os
    import pdgraster

    # configure logger:
    logger = logging.getLogger("logger")
    # Remove any existing handlers from the logger
    for handler in logger.handlers[:]:
        logger.removeHandler(handler)
    # prevent logging statements from being printed to terminal
    logger.propagate = False
    # set up new handler
    handler = logging.FileHandler("/tmp/log.log")
    formatter = logging.Formatter(logging.BASIC_FORMAT)
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    logger.setLevel(logging.INFO)

    rasterizer = pdgraster.RasterTiler(config)
    return rasterizer.webtiles_from_geotiffs(
        geotiff_paths, update_ranges = False)
        # no need to update ranges because the range is [1,4] for
        # all z-levels, and is defined in the config


def make_batch(items, batch_size):
    """
    Create batches of a given size from a list of items.
    """
    return [items[i:i + batch_size] for i in range(0, len(items), batch_size)]

# ----------------------------------------------------------------

# run the workflow
config_file = '/home/jcohen/lake_change_time_series/parsl_workflow/config.json'
logger.info(f'🗂 Workflow configuration loaded from {config_file}')
print("Loaded config. Running workflow.")
logger.info(f'Starting PDG workflow: staging, rasterization, and web tiling')
run_pdg_workflow(config_file)
# Shutdown and clear the parsl executor
htex_local.executors[0].shutdown()
parsl.clear()

# transfer log from /tmp to user dir
cmd = ['mv', '/tmp/log.log', f'/home/{user}/lake_change_time_series/parsl_workflow/']
# initiate the process to run that command
process = Popen(cmd)

print("Script complete.")

julietcohen · 2023-12-13T17:48:39Z

Producing annual layers for static lake size: `permanent_water`, 2017-2021

Anna requested that in anticipation of the google fellows starting on the team in January, our top priority should be to create annual layers for lake size for at least permanent_water and also potentially seasonal_water. Ingmar suggested that this be done for the past 5 years of data, because those years have the best quality observations and the fewest NA values for these attributes. I merged the data just as before, plus the step to remove NA values from these attributes that was not present before. I then subset the merged data (all transects) to 5 annual files.

At our meeting, Ingmar helpfully suggested using a mathematical way to distribute the color palette so that all colors are represented, even with skewed data. The way we have done it previously included plotting the input data attribute values in a binned histogram, and setting the range in the config to the min and max values that encompass the vast majority of the data, so the outliers are assigned the same color as the min and max values (depending on which side of the range they fall), and that way the data is much easier to interpret in a map. In order to attempt to introduce a more mathematical approach, I calculated the 95th quantile for each of the annual files. The values are similar for each attribute across the 5 years.

annual percentiles

2017 permanent water 95th percentile: 48.29246735806339
2017 seasonal water 95th percentile: 2.6606105447148294

2018 permanent water 95th percentile: 48.476413012662455
2018 seasonal water 95th percentile: 2.471091755819858

2019 permanent water 95th percentile: 48.25663980988141
2019 seasonal water 95th percentile: 2.6432749690066695

2020 permanent water 95th percentile: 47.909318479470706
2020 seasonal water 95th percentile: 3.0058684271837874

2021 permanent water 95th percentile: 47.84887662863641
2021 seasonal water 95th percentile: 2.8591087319313626

julietcohen · 2023-12-13T22:54:55Z

First version of the 2017 lake time series data

There are 2 new layers. One is the "permanent water", and the other is "seasonal water". Units are hectares.

recall that this dataset has 4 transect regions, meaning the data is distributed to several noncontiguous regions across the Arctic. This is important because the "zoom to entire dataset" button is not as helpful for these datasets as it is for smaller datasets, or datasets with adjacent regions. I did play around with specifying lower zoom level tile bounds for the zoom feature. When I specified any of the several regions with polygons, the zoom to feature works great in the sense that it does zoom the user to that region. Unfortunately, it makes the tiles in the other regions invisible. Luckily, zooming into the portal and panning around a little allows you to spot some of the bigger blue lakes pretty easily, and you can zoom into each region individually. Please let me know if I can clarify this issue. I certainly want to resolve this "zoom to entire dataset" issue.
if you're having trouble viewing the tiles, please clear your browsing data and reload the page
for Ingmar: I did not use deduplication because of the lack of overlap I observed when plotting the data. Please correct me if I misinterpreted this, and I actually should use deduplication.
the same blue palette was used for both statistics, but the numerical ranges of the data are different, since generally the permanent water values are larger than the seasonal water values. Should these 2 layers have totally different palettes so users do not have to hover over the legend to understand that the ranges are different?
as we get into the more recent years, I expect that more polygons will be present

initze · 2023-12-14T17:41:32Z

@julietcohen
Could you add the link to the dev version? I just lost it

for Ingmar: I did not use deduplication because of the lack of overlap I observed when plotting the data. Please correct me if I misinterpreted this, and I actually should use deduplication.

In your dataset, the four regions are separate from each other. Technically they were merged and deduplicated in a previous step. So here there should be no need for that.

the same blue palette was used for both statistics, but the numerical ranges of the data are different, since generally the permanent water values are larger than the seasonal water values

Maybe we can increase the max val (upper limit) of the visualization. For my taste the differences are not super great.

Some cool idea

we could calculate the diff of each year versus the previous (or some kind of aggregated number of years), e.g. 2019-2018. Then we should be able to visualize major changes such as lake drainage 😄

julietcohen · 2023-12-15T18:57:45Z

@initze Thank you for your feedback. Here is the link to the PDG demo portal: https://demo.arcticdata.io/portals/permafrost

I'll increase the upper limit of the range in my config file and see if that helps create more diversity in colors associated with polygons of different sizes. For example, I could increase the quantile that I am using to set the max value. I was using the 95th quantile, which resulted in a max range value for permanent water of ~49. Increasing to the 99.99th quantile would be a max value of ~6,088.

Calculating the diff of each year would be very interesting! I do not think it would be difficult either, since I have already produced the annual files. I can ensure that each file has the same geometries, and we can take the difference for the variables of interest for each geometry. I can work on that after I visualize the 5 years independently.

Also, I think maybe a purple palette would be nice for the seasonal_water to contrast the blue that we can keep for the permanent_water. Just considering that the IWP layer is yellow, and the Arctic communities layer is green. Using red for any layer that represents a static attribute such as permanent or seasonal water seems like it would imply decreasing, so we should reserve red for the end of the spectrum for a variable that represents loss of size.

julietcohen · 2023-12-15T19:00:45Z

Another note: I have increased the maximum z-level for this data to z-12 (just 1 level higher) after consulting the metadata for the TMS, because we want the resolution of this data to be ~30m (correct me if that's wrong)

julietcohen · 2023-12-18T23:26:05Z

2017 Lake Data Update

I created web tiles for 2017 with different palettes for permanent water and seasonal water, and used the 99.99th percentile for the highest value in the palette range to show more diversity of colors in the lower values. These are now on the demo portal.

Polygon color diversity

Speaking about the visual colors of the polygons, we see that changing the range in the workflow config to the 99.99th percentile for the max value did succeed in showing more diversity for permanent water. In my opinion, the pink & purple palette for seasonal water does not show enough diversity in the smaller lakes, and I should re-process the web tiles with a different percentile for the max value for that attribute. Maybe @initze has feedback on this. We should also keep in mind that the more tailoring we have to do in order to find the best range for the config values to show the most color diversity, the further this workflow gets from being automated. Ideally, we would mathematically determine the best range of values for the config to optimize the color diversity in the polygons without having to guess and check each time. I would appreciate input from anyone who has a good way to achieve this.

Legends

The legends for these 2 layers are accurate, as the max value shown for each layer is indeed the max value for that attribute, not the 99.99th percentile. Additionally, both layers have a min value of 0, and this was the min value in the legend. However, the value range for permanent water is so large that we encounter the same issue we ran into with the Arctic Communities layer: when you hover over the legend, it shows scientific notation, which is accurate but not ideal. This was acceptable for the communities layer, so I assume this is acceptable for this layer, too, for now.

Processing other lake data years

I have moved forward to process 2018-2021, and have already completed 2018 (update: 2018 geotiff data was corrupted during a directory transfer cancelled midway when VS code lost connection. geotiffs need to be re-created). However, there's no point in processing the web tiles for these years until we determine the best way to set the range for each attribute, based on our guess & checks for 2017. I will continue to process staged tiles and geotiffs for all years, but it would be very time consuming to guess and check for the optimal range of values to best represent the polygons.

julietcohen · 2023-12-20T16:53:32Z

2017 Lake Data Update

We agreed that the 99.99th percentile for the blue palette for permanent water looks good. Increasing the percentile used for the max value of the range for the config reduced the amount of darkest blue color, and increased the amount of lighter shades in the polygons.

For the pink & purple palette for the seasonal water, we had the opposite problem: too much of the lighter shades, and not enough darker shades, so the approach is to reduce the percentile. I tried 92, but that was too much! This resulted in too many polygons with the darkest purple color. See the same region as in the last comment for comparison:

Then created web tiles with the 95th percentile for the highest value in the palette range. There is still too much purple:

But importantly, if we zoom in more to that same area pictured above, or in a different area such as the north slope of AK, we can see the 95th percentile works pretty well:

Comparing percentiles with the seasonal water data distribution

We can visualize 5 different percentiles with the data distribution here:

julietcohen · 2023-12-20T22:46:17Z

I have simplified and added documentation to the script used to clean, merge, and parse Ingmar's 2 input files into the 5 most recent years (merge_explode_annual.py). I ran it to re-produce the annual input files for the visualization workflow. This script has been uploaded to the new ADC package I created for this data. Now that the input files for the viz workflow have been re-produced, I will start again with 2017 at staging through web tiling in one sweep, with the max value of the range for the permanent water being the 99.99th percentile, and the max value for seasonal water being the 95th percentile. I will do the same for the following 4 years. Keeping these percentiles the same for all years would be ideal.

julietcohen · 2023-12-28T22:44:41Z

All years 2017-2021 have been processed into staged, geotiffs, and web tiles for permanent water and seasonal water. All processing was done on Delta with the ray workflow. The 2017-2018 files have already completed the transfer to Datateam with the pre-issued DOI A28G8FK10, and they are visualized on the demo portal. Years 2019-2021 have not yet fully transferred, Globus has been experiencing some NCSA endpoint authentication issues for the past week, making large transfers take >3 days and sometimes the transfers fail all together. So they will be up on the demo portal when those make it through!

initze assigned robyngit and julietcohen Apr 5, 2023

julietcohen mentioned this issue Apr 5, 2023

Display Ingmar Nitze's lake change dataset #28

Open

julietcohen added data available The complete dataset is on the datateam server and ready to process priority: high labels Jul 25, 2023

julietcohen added the layer Displaying a specific data product in the PDG portal label Aug 7, 2023

robyngit mentioned this issue Aug 16, 2023

Design the MVP Plot Viewer for the PDG portal NCEAS/metacatui#2182

Open

2 tasks

julietcohen mentioned this issue Dec 11, 2023

Convert multipolygon geometries into single polygons before staging PermafrostDiscoveryGateway/viz-staging#34

Open

julietcohen mentioned this issue Dec 28, 2023

Automate setting range in config to values that best represent the palette PermafrostDiscoveryGateway/viz-staging#35

Open

elongano added Water data layer category: water pdg Permafrost Discovery Gateway and removed priority: high labels Jan 29, 2024

julietcohen mentioned this issue Feb 6, 2024

Check for invalid data values before staging and integrate removal of observation or fix for observation PermafrostDiscoveryGateway/viz-staging#21

Open

julietcohen changed the title ~~lake data visualization and aggregation~~ Add aggregation datasets for lake data: Permafrost Database for Northern Alaska and Circumpolar Thermokarst Landscapes Mar 19, 2024

julietcohen changed the title ~~Add aggregation datasets for lake data: Permafrost Database for Northern Alaska and Circumpolar Thermokarst Landscapes~~ Add aggregation datasets for lake time series data: Permafrost Database for Northern Alaska and Circumpolar Thermokarst Landscapes Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add aggregation datasets for lake time series data: Permafrost Database for Northern Alaska and Circumpolar Thermokarst Landscapes #37

Add aggregation datasets for lake time series data: Permafrost Database for Northern Alaska and Circumpolar Thermokarst Landscapes #37

initze commented Apr 5, 2023 •

edited

initze commented Apr 5, 2023

initze commented Apr 5, 2023

julietcohen commented Apr 5, 2023

julietcohen commented Apr 6, 2023

julietcohen commented Jul 19, 2023

julietcohen commented Jul 25, 2023 •

edited

julietcohen commented Jul 26, 2023

julietcohen commented Dec 8, 2023 •

edited

julietcohen commented Dec 12, 2023

julietcohen commented Dec 13, 2023

julietcohen commented Dec 13, 2023 •

edited

initze commented Dec 14, 2023

julietcohen commented Dec 15, 2023

julietcohen commented Dec 15, 2023

julietcohen commented Dec 18, 2023 •

edited

julietcohen commented Dec 20, 2023 •

edited

julietcohen commented Dec 20, 2023

julietcohen commented Dec 28, 2023

Add aggregation datasets for lake time series data: Permafrost Database for Northern Alaska and Circumpolar Thermokarst Landscapes #37

Add aggregation datasets for lake time series data: Permafrost Database for Northern Alaska and Circumpolar Thermokarst Landscapes #37

Comments

initze commented Apr 5, 2023 • edited

Datasets for lake statistics aggregation

Regional Dataset

Jorgenson: Ecological Mapping and Permafrost Database for Northern Alaska (Final Ecological Mapping Update (2014))

Pan-Arctic Dataset

Olefeldt: Circumpolar Thermokarst Landscapes, 2015, Circum-Arctic

initze commented Apr 5, 2023

Google slides with some simple sketches for data visualizations.

initze commented Apr 5, 2023

Lake area time-series

julietcohen commented Apr 5, 2023

julietcohen commented Apr 6, 2023

julietcohen commented Jul 19, 2023

julietcohen commented Jul 25, 2023 • edited

Initial exploration

julietcohen commented Jul 26, 2023

Combined data: Lake Area Time Series

julietcohen commented Dec 8, 2023 • edited

Update on lake area time series visualization

julietcohen commented Dec 12, 2023

First draft of lake size time series dataset

julietcohen commented Dec 13, 2023

Producing annual layers for static lake size: permanent_water, 2017-2021

julietcohen commented Dec 13, 2023 • edited

First version of the 2017 lake time series data

initze commented Dec 14, 2023

Some cool idea

julietcohen commented Dec 15, 2023

julietcohen commented Dec 15, 2023

julietcohen commented Dec 18, 2023 • edited

2017 Lake Data Update

Polygon color diversity

Legends

Processing other lake data years

julietcohen commented Dec 20, 2023 • edited

2017 Lake Data Update

Comparing percentiles with the seasonal water data distribution

julietcohen commented Dec 20, 2023

julietcohen commented Dec 28, 2023

initze commented Apr 5, 2023 •

edited

julietcohen commented Jul 25, 2023 •

edited

julietcohen commented Dec 8, 2023 •

edited

Producing annual layers for static lake size: `permanent_water`, 2017-2021

julietcohen commented Dec 13, 2023 •

edited

julietcohen commented Dec 18, 2023 •

edited

julietcohen commented Dec 20, 2023 •

edited