ENH: Recovery Target API Updates #91

szwiep · 2024-03-19T19:35:25Z

Opening this issue for API changes for recovery target methods. Hoping to get some input before a development starts.

Current

Currently, users import a class from the targets modules, parameterize it and then pass that object to compute_metrics

from spectral_recovery.targets import MedianTarget, WindowedTarget

mpixel = MedianTarget(scale="pixel")
mpoly = MedianTarget(scale="polygon")
win = WindowedTarget(N=3, rm_na=False)

metrics = compute_metrics(
     ....
     recovery_target_method=mpixel # or mpoly or win
)

Then, inside of RestorationArea , the compute_recovery_targets (see #84) function is called, which clips an appropriate reference stack and then passes that to the recovery_target_method provided.

Known downsides of this implementation?

Impossible for users to pass custom methods.
Does not allow users to investigate the recovery target.
timeseries DataArray input must contain both reference and restoration polygons.

Positives?

Users don't have to compute recovery targets themselves.

Proposed

Have users compute the targets using provided (or custom) functions and then pass the targets to compute_metrics function. The shape of the recovery_targets xr.DataArray would need to be broadcastable to timeseries clipped to the restoration polygon. e.g:

from spectral_recovery.targets import median_target, windowed_target

mpixel_array = median_target(timeseries, polygons, scale="pixel")
mpoly_array = median_target(timeseries, polygons, scale="polygon")
win_array = windowed_target(timeseries, polygons, N=3, rm_na=False)
custom_array = some_custom_func(timeseries, polygons, ...)

metrics = compute_metrics(
     ....
     recovery_targets=mpixel_array # or mpoly_array, win_array, or custom_array
)

This would 1) allow users to use custom target functions, 2) provide users with direct access to the recovery target values, and 3) allow users to read in a smaller array just over the reference polygons (if different than restoration polygons).

Potential downsides?

More complicated. Requires users to compute targets rather than RestorationArea coordinating the computation.
Opens up room for error? If timeseries given to the targets and the timeseries given to compute_metrics are different and/or from different data sources, the recovery target would be invalid.

Potential variations:

Still allow for a default recovery target computation. This would force users to make sure their input timeseries contains both reference and restoration polygons, if recovery_targets is not None.

Please share your thoughts! @melissabirch @KavlinClein @wdkang

The text was updated successfully, but these errors were encountered:

melissabirch · 2024-03-20T16:54:41Z

The proposed new method is where I get a bit confused, as it seems very similar to me? One difference being that there are more arguments in the original step:

mpixel = MedianTarget(scale="pixel")
mpixel_array = median_target(timeseries, polygons, scale="pixel")

Is this what you mean by users compute metrics themselves? Because instead of simply defining metrics as a method like before, that then is used inside of RestorationArea, the user puts in the inputs/arguments and it sort of becomes its own function separate from RestorationArea?

If so, that seems minimally different from a user's perspective and might actually help their understanding of what is going on in the tool.

melissabirch · 2024-03-20T16:55:07Z

mpixel_array = median_target(timeseries, polygons, scale="pixel")
One question: in this case, would you default the timeseries to be the one that is already read in/computed? the indices stack?

melissabirch · 2024-03-20T16:55:42Z

Also could you please expand on this part:
"Potential variations:
Still allow for a default recovery target computation. This would force users to make sure their input timeseries contains both reference and restoration polygons, if recovery_targets is not None."

szwiep · 2024-03-20T18:06:20Z

Is this what you mean by users compute metrics themselves? Because instead of simply defining metrics as a method like before, that then is used inside of RestorationArea, the user puts in the inputs/arguments and it sort of becomes its own function separate from RestorationArea?

Basically, yes. In the current version users pass methods, i.e specific ways that a target should be computed. Then the magic of using that method to produce targets is done inside RestorationArea, without any further involvement from users. This new version would still have users choose the method but asks them to also use the method themselves to compute targets then pass that to compute_targets. No magic would happen inside anymore.

One question: in this case, would you default the timeseries to be the one that is already read in/computed? the indices stack?

This is a good point. It should be the indices stack. Which begs the point, should we pull compute_indices out of compute_metrics? Or just have compute_indices implemented each target method? e.g

# your stack of annual images. 
timeseries = sr.read_timseries("dir/", band_names={0: "red", 1: "nir", 2: "swir16"})

# option 1. compute indices inside each function
mpixel_array = median_target(timeseries, polygons, indices=["NBR", "NDVI"], scale="pixel")
metrics = sr.compute_metrics(
    timeseries,
    indices=["NBR","NDVI"],
    restoration_polygon=restoration_polygons, 
    recovery_targets=mpixel_array,
)

# option 2. compute indices seperately
indices = sr.compute_indices(timeseries, indices=["NBR", "NDVI"])
mpixel_array = median_target(indices, polygons, scale="pixel")
metrics = sr.compute_metrics(
    indices,
    restoration_polygon=restoration_polygons, 
    recovery_targets=mpixel_array,
)

Also could you please expand on this part:
"Potential variations:
Still allow for a default recovery target computation. This would force users to make sure their input timeseries contains both reference and restoration polygons, if recovery_targets is not None."

This should say "if recovery_targets is None". For example:

timeseries = sr.read_timseries("dir/", band_names={0: "red", 1: "nir", 2: "swir16"})
rp = sr.read_restoration_polygons("polygons.gpkg")
refp = sr.read_reference_polygons("ref_polygons.gpkg")

# Don't choose explicitily choose a recovery target method?
# If we let `recovery_target=` be optional, we would use a default method 
# of median_target(..., scale="polygon"). 
metrics = sr.compute_metrics(
    timeseries,
    indices=["NBR","NDVI"],
    restoration_polygon=rp,
    reference_polygons=refp
)
# -----> what happens inside? 
#  median_target(timeseries, restoration_polygons, reference_polygons, scale="polygon") is
# called, using what is passed to compute_metrics. If the images in timeseries do not contain
# the reference_polygons, this will fail! 

# Do choose a recovery target method?
# You could read in and/or clip to a smaller subset of data that covers individual polygons
rest_timeseries = sr.read_timseries("dir_for_restoration_polygon_clips/", band_names={0: "red", 1: "nir", 2: "swir16"})
ref_timeseries = sr.read_timseries("dir_for_ref_polygon_clips/", band_names={0: "red", 1: "nir", 2: "swir16"})

# You only have to deal with images/polygons related to your reference system!
mpixel_array = median_target(ref_timeseries, ref_polygons, indices=["NBR", "NDVI"], scale="pixel")

# You only have to deal with images/polygons related to your restoration site!
metrics = sr.compute_metrics(
    rest_timeseries,
    indices=["NBR","NDVI"],
    restoration_polygon=restoration_polygons, 
    recovery_targets=mpixel_array,
)

Basically, if we use a default recovery target method, and users want to pass use a reference system, their images would have to contain both the restoration and reference polygons or the default call to the method would fail. This is the same behaviour that we see now. Functions fails if all the polygons are not contained.

melissabirch · 2024-03-20T21:23:18Z

Is this what you mean by users compute metrics themselves? Because instead of simply defining metrics as a method like before, that then is used inside of RestorationArea, the user puts in the inputs/arguments and it sort of becomes its own function separate from RestorationArea?

Basically, yes. In the current version users pass methods, i.e specific ways that a target should be computed. Then the magic of using that method to produce targets is done inside RestorationArea, without any further involvement from users. This new version would still have users choose the method but asks them to also use the method themselves to compute targets then pass that to compute_targets. No magic would happen inside anymore.

I think based on how I am envisioning it - that this is alright. I know we want the tool workflow to be as simple as possible for users, but in a strange way this might help explain the step-by-step workflow and force users to consider what the tool is doing. As long as it is still easy and as minimal lines of code as possible, I believe the increased flexibility would outweigh the cons associated with asking users to compute the recovery target as a separate step.

One question: in this case, would you default the timeseries to be the one that is already read in/computed? the indices stack?

This is a good point. It should be the indices stack. Which begs the point, should we pull compute_indices out of compute_metrics? Or just have compute_indices implemented each target method? e.g

# your stack of annual images. 
timeseries = sr.read_timseries("dir/", band_names={0: "red", 1: "nir", 2: "swir16"})

# option 1. compute indices inside each function
mpixel_array = median_target(timeseries, polygons, indices=["NBR", "NDVI"], scale="pixel")
metrics = sr.compute_metrics(
    timeseries,
    indices=["NBR","NDVI"],
    restoration_polygon=restoration_polygons, 
    recovery_targets=mpixel_array,
)

# option 2. compute indices seperately
indices = sr.compute_indices(timeseries, indices=["NBR", "NDVI"])
mpixel_array = median_target(indices, polygons, scale="pixel")
metrics = sr.compute_metrics(
    indices,
    restoration_polygon=restoration_polygons, 
    recovery_targets=mpixel_array,
)

I think the indices computation should be separate. For one thing - it prevents unnecessarily computing the indices multiple times, in the case of re-running with different recovery target methods (as the analysis has done). It also allows for users to extract the indices alone instead of having outputs be tied to metrics of indices.

Also could you please expand on this part:
"Potential variations:
Still allow for a default recovery target computation. This would force users to make sure their input timeseries contains both reference and restoration polygons, if recovery_targets is not None."

This should say "if recovery_targets is None". For example:

timeseries = sr.read_timseries("dir/", band_names={0: "red", 1: "nir", 2: "swir16"})
rp = sr.read_restoration_polygons("polygons.gpkg")
refp = sr.read_reference_polygons("ref_polygons.gpkg")

# Don't choose explicitily choose a recovery target method?
# If we let `recovery_target=` be optional, we would use a default method 
# of median_target(..., scale="polygon"). 
metrics = sr.compute_metrics(
    timeseries,
    indices=["NBR","NDVI"],
    restoration_polygon=rp,
    reference_polygons=refp
)
# -----> what happens inside? 
#  median_target(timeseries, restoration_polygons, reference_polygons, scale="polygon") is
# called, using what is passed to compute_metrics. If the images in timeseries do not contain
# the reference_polygons, this will fail! 

# Do choose a recovery target method?
# You could read in and/or clip to a smaller subset of data that covers individual polygons
rest_timeseries = sr.read_timseries("dir_for_restoration_polygon_clips/", band_names={0: "red", 1: "nir", 2: "swir16"})
ref_timeseries = sr.read_timseries("dir_for_ref_polygon_clips/", band_names={0: "red", 1: "nir", 2: "swir16"})

# You only have to deal with images/polygons related to your reference system!
mpixel_array = median_target(ref_timeseries, ref_polygons, indices=["NBR", "NDVI"], scale="pixel")

# You only have to deal with images/polygons related to your restoration site!
metrics = sr.compute_metrics(
    rest_timeseries,
    indices=["NBR","NDVI"],
    restoration_polygon=restoration_polygons, 
    recovery_targets=mpixel_array,
)

Basically, if we use a default recovery target method, and users want to pass use a reference system, their images would have to contain both the restoration and reference polygons or the default call to the method would fail. This is the same behaviour that we see now. Functions fails if all the polygons are not contained.

Gotcha - I think this is a good plan of action.

A follow-up question, which I may have missed with the new implementation of using polygon attributes to select dates, but if sr.compute_metrics calls reference polygons, what are the reference years assumed to be?

szwiep · 2024-03-21T20:32:23Z

@wdkang @KavlinClein, do you have any thoughts on where compute_indices should be called re: my comment above? Or about any of the other points?

c2shang · 2024-03-23T00:19:51Z

I had to think about the process multiple times before I got a good grasp of the whole process. IMO, it certainly increases the complexity on the users part to run the code, where confusion and errors may emerge, but the added flexibility and the ability to closely inspect the recovery target is nice to have. From the discussion above, it is clear that it is totally doable.

A detailed and carefully phrased documentation/tutorial could be crucial to help users understand what purpose each blocks of code serves, especially with the interactive notebook to be launched. One caveat is that the developers know all the behind the scene stuff too well in their mind, and some of the important details might be left out in the documentation as a result. If someone else wrote it from an user's perspective, it would steer our users in the right direction both in terms of running the code and understanding what the tool is doing. For example, having examples of the reference polygon and the restoration polygon examples next to each other and visuals of the resulting recovery target arrays would highlight the key premise this tool operates based on: comparing the "current" conditions of the restoration area to the "desired" conditions (values), which could be derived using a number of different ways. Speaking of which, I want to doublecheck that having time series data of different lengths, using a shorter time series (or even just a single year) over the reference polygon to represent the desired vegetation conditions for example, is supported by the tool. This will save the users time on generating unnecessary composites if they know which are the good years to use as reference.

Another point I want to make is that if the user could identify which indices to use early in the workflow, they could just run compute_indices up front, such that they are working with annual NDVI time series (for example). This is similar to how it is taught in introductory remote sensing classes, and it could make the overall process easier to follow and may clear up some confusion that may surface later.

Hope that makes sense.

szwiep added enhancement New feature or request question Further information is requested labels Mar 19, 2024

szwiep self-assigned this Mar 19, 2024

szwiep mentioned this issue Mar 21, 2024

ENH: Allow skip of index computation #90

Closed

This was referenced Mar 25, 2024

ENH: Split composite_stack into restoration and reference specific params #58

Closed

ENH: Allow polygon date params #98

Merged

ENH: Allow dict of TIF files #25

Closed

ENH: Introduce automatic padding for windowed mean target #88

Open

szwiep mentioned this issue Apr 4, 2024

ENH: Move compute_indices outside of compute_metrics #101

Closed

szwiep mentioned this issue Apr 16, 2024

API: Expose recovery target computation to users #112

Merged

szwiep linked a pull request Apr 16, 2024 that will close this issue

API: Expose recovery target computation to users #112

Merged

szwiep closed this as completed in #112 Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Recovery Target API Updates #91

ENH: Recovery Target API Updates #91

szwiep commented Mar 19, 2024 •

edited

melissabirch commented Mar 20, 2024

melissabirch commented Mar 20, 2024

melissabirch commented Mar 20, 2024

szwiep commented Mar 20, 2024 •

edited

melissabirch commented Mar 20, 2024

szwiep commented Mar 21, 2024

c2shang commented Mar 23, 2024

ENH: Recovery Target API Updates #91

ENH: Recovery Target API Updates #91

Comments

szwiep commented Mar 19, 2024 • edited

Current

Proposed

melissabirch commented Mar 20, 2024

melissabirch commented Mar 20, 2024

melissabirch commented Mar 20, 2024

szwiep commented Mar 20, 2024 • edited

melissabirch commented Mar 20, 2024

szwiep commented Mar 21, 2024

c2shang commented Mar 23, 2024

szwiep commented Mar 19, 2024 •

edited

szwiep commented Mar 20, 2024 •

edited