Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Recovery Target API Updates #91

Closed
szwiep opened this issue Mar 19, 2024 · 7 comments · Fixed by #112
Closed

ENH: Recovery Target API Updates #91

szwiep opened this issue Mar 19, 2024 · 7 comments · Fixed by #112
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@szwiep
Copy link
Contributor

szwiep commented Mar 19, 2024

Opening this issue for API changes for recovery target methods. Hoping to get some input before a development starts.

Current

Currently, users import a class from the targets modules, parameterize it and then pass that object to compute_metrics

from spectral_recovery.targets import MedianTarget, WindowedTarget

mpixel = MedianTarget(scale="pixel")
mpoly = MedianTarget(scale="polygon")
win = WindowedTarget(N=3, rm_na=False)

metrics = compute_metrics(
     ....
     recovery_target_method=mpixel # or mpoly or win
)

Then, inside of RestorationArea , the compute_recovery_targets (see #84) function is called, which clips an appropriate reference stack and then passes that to the recovery_target_method provided.

Known downsides of this implementation?

  • Impossible for users to pass custom methods.
  • Does not allow users to investigate the recovery target.
  • timeseries DataArray input must contain both reference and restoration polygons.

Positives?

  • Users don't have to compute recovery targets themselves.

Proposed

Have users compute the targets using provided (or custom) functions and then pass the targets to compute_metrics function. The shape of the recovery_targets xr.DataArray would need to be broadcastable to timeseries clipped to the restoration polygon. e.g:

from spectral_recovery.targets import median_target, windowed_target

mpixel_array = median_target(timeseries, polygons, scale="pixel")
mpoly_array = median_target(timeseries, polygons, scale="polygon")
win_array = windowed_target(timeseries, polygons, N=3, rm_na=False)
custom_array = some_custom_func(timeseries, polygons, ...)

metrics = compute_metrics(
     ....
     recovery_targets=mpixel_array # or mpoly_array, win_array, or custom_array
)

This would 1) allow users to use custom target functions, 2) provide users with direct access to the recovery target values, and 3) allow users to read in a smaller array just over the reference polygons (if different than restoration polygons).

Potential downsides?

  • More complicated. Requires users to compute targets rather than RestorationArea coordinating the computation.
  • Opens up room for error? If timeseries given to the targets and the timeseries given to compute_metrics are different and/or from different data sources, the recovery target would be invalid.

Potential variations:

  • Still allow for a default recovery target computation. This would force users to make sure their input timeseries contains both reference and restoration polygons, if recovery_targets is not None.

Please share your thoughts! @melissabirch @KavlinClein @wdkang

@szwiep szwiep added enhancement New feature or request question Further information is requested labels Mar 19, 2024
@szwiep szwiep self-assigned this Mar 19, 2024
@melissabirch
Copy link
Contributor

The proposed new method is where I get a bit confused, as it seems very similar to me? One difference being that there are more arguments in the original step:

mpixel = MedianTarget(scale="pixel")
mpixel_array = median_target(timeseries, polygons, scale="pixel")

Is this what you mean by users compute metrics themselves? Because instead of simply defining metrics as a method like before, that then is used inside of RestorationArea, the user puts in the inputs/arguments and it sort of becomes its own function separate from RestorationArea?

If so, that seems minimally different from a user's perspective and might actually help their understanding of what is going on in the tool.

@melissabirch
Copy link
Contributor

mpixel_array = median_target(timeseries, polygons, scale="pixel")
One question: in this case, would you default the timeseries to be the one that is already read in/computed? the indices stack?

@melissabirch
Copy link
Contributor

Also could you please expand on this part:
"Potential variations:
Still allow for a default recovery target computation. This would force users to make sure their input timeseries contains both reference and restoration polygons, if recovery_targets is not None."

@szwiep
Copy link
Contributor Author

szwiep commented Mar 20, 2024

Is this what you mean by users compute metrics themselves? Because instead of simply defining metrics as a method like before, that then is used inside of RestorationArea, the user puts in the inputs/arguments and it sort of becomes its own function separate from RestorationArea?

Basically, yes. In the current version users pass methods, i.e specific ways that a target should be computed. Then the magic of using that method to produce targets is done inside RestorationArea, without any further involvement from users. This new version would still have users choose the method but asks them to also use the method themselves to compute targets then pass that to compute_targets. No magic would happen inside anymore.

One question: in this case, would you default the timeseries to be the one that is already read in/computed? the indices stack?

This is a good point. It should be the indices stack. Which begs the point, should we pull compute_indices out of compute_metrics? Or just have compute_indices implemented each target method? e.g

# your stack of annual images. 
timeseries = sr.read_timseries("dir/", band_names={0: "red", 1: "nir", 2: "swir16"})

# option 1. compute indices inside each function
mpixel_array = median_target(timeseries, polygons, indices=["NBR", "NDVI"], scale="pixel")
metrics = sr.compute_metrics(
    timeseries,
    indices=["NBR","NDVI"],
    restoration_polygon=restoration_polygons, 
    recovery_targets=mpixel_array,
)

# option 2. compute indices seperately
indices = sr.compute_indices(timeseries, indices=["NBR", "NDVI"])
mpixel_array = median_target(indices, polygons, scale="pixel")
metrics = sr.compute_metrics(
    indices,
    restoration_polygon=restoration_polygons, 
    recovery_targets=mpixel_array,
)

Also could you please expand on this part:
"Potential variations:
Still allow for a default recovery target computation. This would force users to make sure their input timeseries contains both reference and restoration polygons, if recovery_targets is not None."

This should say "if recovery_targets is None". For example:

timeseries = sr.read_timseries("dir/", band_names={0: "red", 1: "nir", 2: "swir16"})
rp = sr.read_restoration_polygons("polygons.gpkg")
refp = sr.read_reference_polygons("ref_polygons.gpkg")

# Don't choose explicitily choose a recovery target method?
# If we let `recovery_target=` be optional, we would use a default method 
# of median_target(..., scale="polygon"). 
metrics = sr.compute_metrics(
    timeseries,
    indices=["NBR","NDVI"],
    restoration_polygon=rp,
    reference_polygons=refp
)
# -----> what happens inside? 
#  median_target(timeseries, restoration_polygons, reference_polygons, scale="polygon") is
# called, using what is passed to compute_metrics. If the images in timeseries do not contain
# the reference_polygons, this will fail! 

# Do choose a recovery target method?
# You could read in and/or clip to a smaller subset of data that covers individual polygons
rest_timeseries = sr.read_timseries("dir_for_restoration_polygon_clips/", band_names={0: "red", 1: "nir", 2: "swir16"})
ref_timeseries = sr.read_timseries("dir_for_ref_polygon_clips/", band_names={0: "red", 1: "nir", 2: "swir16"})

# You only have to deal with images/polygons related to your reference system!
mpixel_array = median_target(ref_timeseries, ref_polygons, indices=["NBR", "NDVI"], scale="pixel")

# You only have to deal with images/polygons related to your restoration site!
metrics = sr.compute_metrics(
    rest_timeseries,
    indices=["NBR","NDVI"],
    restoration_polygon=restoration_polygons, 
    recovery_targets=mpixel_array,
)

Basically, if we use a default recovery target method, and users want to pass use a reference system, their images would have to contain both the restoration and reference polygons or the default call to the method would fail. This is the same behaviour that we see now. Functions fails if all the polygons are not contained.

@melissabirch
Copy link
Contributor

Is this what you mean by users compute metrics themselves? Because instead of simply defining metrics as a method like before, that then is used inside of RestorationArea, the user puts in the inputs/arguments and it sort of becomes its own function separate from RestorationArea?

Basically, yes. In the current version users pass methods, i.e specific ways that a target should be computed. Then the magic of using that method to produce targets is done inside RestorationArea, without any further involvement from users. This new version would still have users choose the method but asks them to also use the method themselves to compute targets then pass that to compute_targets. No magic would happen inside anymore.

I think based on how I am envisioning it - that this is alright. I know we want the tool workflow to be as simple as possible for users, but in a strange way this might help explain the step-by-step workflow and force users to consider what the tool is doing. As long as it is still easy and as minimal lines of code as possible, I believe the increased flexibility would outweigh the cons associated with asking users to compute the recovery target as a separate step.

One question: in this case, would you default the timeseries to be the one that is already read in/computed? the indices stack?

This is a good point. It should be the indices stack. Which begs the point, should we pull compute_indices out of compute_metrics? Or just have compute_indices implemented each target method? e.g

# your stack of annual images. 
timeseries = sr.read_timseries("dir/", band_names={0: "red", 1: "nir", 2: "swir16"})

# option 1. compute indices inside each function
mpixel_array = median_target(timeseries, polygons, indices=["NBR", "NDVI"], scale="pixel")
metrics = sr.compute_metrics(
    timeseries,
    indices=["NBR","NDVI"],
    restoration_polygon=restoration_polygons, 
    recovery_targets=mpixel_array,
)

# option 2. compute indices seperately
indices = sr.compute_indices(timeseries, indices=["NBR", "NDVI"])
mpixel_array = median_target(indices, polygons, scale="pixel")
metrics = sr.compute_metrics(
    indices,
    restoration_polygon=restoration_polygons, 
    recovery_targets=mpixel_array,
)

I think the indices computation should be separate. For one thing - it prevents unnecessarily computing the indices multiple times, in the case of re-running with different recovery target methods (as the analysis has done). It also allows for users to extract the indices alone instead of having outputs be tied to metrics of indices.

Also could you please expand on this part:
"Potential variations:
Still allow for a default recovery target computation. This would force users to make sure their input timeseries contains both reference and restoration polygons, if recovery_targets is not None."

This should say "if recovery_targets is None". For example:

timeseries = sr.read_timseries("dir/", band_names={0: "red", 1: "nir", 2: "swir16"})
rp = sr.read_restoration_polygons("polygons.gpkg")
refp = sr.read_reference_polygons("ref_polygons.gpkg")

# Don't choose explicitily choose a recovery target method?
# If we let `recovery_target=` be optional, we would use a default method 
# of median_target(..., scale="polygon"). 
metrics = sr.compute_metrics(
    timeseries,
    indices=["NBR","NDVI"],
    restoration_polygon=rp,
    reference_polygons=refp
)
# -----> what happens inside? 
#  median_target(timeseries, restoration_polygons, reference_polygons, scale="polygon") is
# called, using what is passed to compute_metrics. If the images in timeseries do not contain
# the reference_polygons, this will fail! 

# Do choose a recovery target method?
# You could read in and/or clip to a smaller subset of data that covers individual polygons
rest_timeseries = sr.read_timseries("dir_for_restoration_polygon_clips/", band_names={0: "red", 1: "nir", 2: "swir16"})
ref_timeseries = sr.read_timseries("dir_for_ref_polygon_clips/", band_names={0: "red", 1: "nir", 2: "swir16"})

# You only have to deal with images/polygons related to your reference system!
mpixel_array = median_target(ref_timeseries, ref_polygons, indices=["NBR", "NDVI"], scale="pixel")

# You only have to deal with images/polygons related to your restoration site!
metrics = sr.compute_metrics(
    rest_timeseries,
    indices=["NBR","NDVI"],
    restoration_polygon=restoration_polygons, 
    recovery_targets=mpixel_array,
)

Basically, if we use a default recovery target method, and users want to pass use a reference system, their images would have to contain both the restoration and reference polygons or the default call to the method would fail. This is the same behaviour that we see now. Functions fails if all the polygons are not contained.

Gotcha - I think this is a good plan of action.

A follow-up question, which I may have missed with the new implementation of using polygon attributes to select dates, but if sr.compute_metrics calls reference polygons, what are the reference years assumed to be?

@szwiep
Copy link
Contributor Author

szwiep commented Mar 21, 2024

@wdkang @KavlinClein, do you have any thoughts on where compute_indices should be called re: my comment above? Or about any of the other points?

@c2shang
Copy link

c2shang commented Mar 23, 2024

I had to think about the process multiple times before I got a good grasp of the whole process. IMO, it certainly increases the complexity on the users part to run the code, where confusion and errors may emerge, but the added flexibility and the ability to closely inspect the recovery target is nice to have. From the discussion above, it is clear that it is totally doable.

A detailed and carefully phrased documentation/tutorial could be crucial to help users understand what purpose each blocks of code serves, especially with the interactive notebook to be launched. One caveat is that the developers know all the behind the scene stuff too well in their mind, and some of the important details might be left out in the documentation as a result. If someone else wrote it from an user's perspective, it would steer our users in the right direction both in terms of running the code and understanding what the tool is doing. For example, having examples of the reference polygon and the restoration polygon examples next to each other and visuals of the resulting recovery target arrays would highlight the key premise this tool operates based on: comparing the "current" conditions of the restoration area to the "desired" conditions (values), which could be derived using a number of different ways. Speaking of which, I want to doublecheck that having time series data of different lengths, using a shorter time series (or even just a single year) over the reference polygon to represent the desired vegetation conditions for example, is supported by the tool. This will save the users time on generating unnecessary composites if they know which are the good years to use as reference.

Another point I want to make is that if the user could identify which indices to use early in the workflow, they could just run compute_indices up front, such that they are working with annual NDVI time series (for example). This is similar to how it is taught in introductory remote sensing classes, and it could make the overall process easier to follow and may clear up some confusion that may surface later.

Hope that makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants