Feature Request: Automatic Good/Bad Image Filter #171

2320sharon · 2023-08-15T16:48:21Z

Good Bad Image Filter

Description:
Users need a way to automatically get rid of images that are not usable.

Proposed Solution:
Using rioxarray, we can create a dataset of all the downloaded images. Then, by utilizing the time-averaged images, the RMSE (Root Mean Squared Error) and PSNR (Peak Signal to Noise Ratio) for each image can be determined. Good images are characterized by a low RMSE (indicating that the pixel values don't differ much from the time-averaged image) and a high PSNR (measuring how much the image differs from the time-averaged image, with a higher value indicating a better quality image).

Benefits:

Automatically sorts out good & bad imagery, enhancing the user experience and ensuring quality.
By automatically sorting the bad imagery out better quality shorelines can be extracted.
By automatically sorting the bad imagery out less images need to be segmented, thus saving time

Drawbacks:

Adds xarray as a dependency.
Adds rioxarray as a dependency.
Could slow down process of extracting shorelines

Additional Context:

Builds upon the good-bad filter concept discussed in issue Processing imagery with dask/xarray: example application for identifying outlier imagery #154.

Checklist:

Implement the RMSE and PSNR calculations using rioxarray.
Add new dependencies to the pyproject.toml
Decide where to implement automatic good/bad filtering
Build a prototype on a separate branch
Test the filter on a sample set of images.
Update documentation to explain the new feature and its dependencies.

Peak Signal to Noise Ration Explanation

PSNR stands for Peak Signal-to-Noise Ratio. It's a metric used primarily in the fields of image and video processing to measure the quality of a reconstructed or compressed image (or video) as compared to the original one. Essentially, it quantifies how much the reconstructed image differs from the original image. The higher the PSNR, the closer the reconstructed image is to the original, and hence the better the quality.

The text was updated successfully, but these errors were encountered:

dbuscombe-usgs · 2023-08-17T01:56:04Z

I had some comments that relate to this issue here #168 (comment)

In summary, I think I prefer a less aggressive approach to filtering images than what you (we) propose here. Even though I did suggest it as a potential solution, on reflection, and having looked at the psnr and rmse scores of a lot of imagery from different locations this week, I think it would be very hard to determine a threshold that worked well. I think we would end up throwing out a lot of good images and keeping in a lot of bad images, unless we tweaked the threshold of psnr (or rmse) quite a bit. Additionally, it would not be useful for short time-series, because it relies on a stable average image that would ideally be drawn from at least several tens of relatively good quality images. That can be sometimes hard to come by, for example when limited to Landsat only, or short time-periods.

Instead, I think I would prefer the following:

a more simple criterion for filtering out the worst images, such as the % black pixel filter set to a high threshold. Or we can continue to research how best to do this... ultimately I would like to use ML for this problem and have some ideas we can discuss
the method I outlined in filter_good_labels.py script in this zipped folder https://github.com/Doodleverse/CoastSeg/files/12297423/new_shoreline_detect_workflow.zip , which works by comparing each model output to the average of model outputs (I elaborate more here Exploring ideas for new shoreline extraction routines for application on label outputs of 4-class segmentation models #168 (comment)). So, it does need several good outputs, but those outputs are much lower dimensional, so stable averages require overall many fewer samples.

The downside is that we have to compute the label for a lot of bad images, but we can devote some time to speeding up the model calls, which is Doodleverse/doodleverse_utils#31

Also, it still adds xarray and rioxarray as a dependency.

We can discuss.

2320sharon · 2023-08-17T02:07:09Z

Hi Dan I forgot to update this issue after I tested the filter_good_labels.py script, I was planning on using the logic outlined in this script to perform the good/bad model output filtering. I'll make sure to update this issue tomorrow

dbuscombe-usgs · 2023-08-17T02:07:26Z

However, I do think there are some good ideas in the original workflow #154 (comment) for filtering out glitchy images. I use the term 'glitch' to refer to sensor errors. They typically involve a completely different colorspace....

some examples from a site I was looking at today (I am pulling lots of examples of different types of noise together to form the basis of a ML training data set - yes a new attempt!)

I think I will work on researching a new type of filter that uses ideas in the original workflow #154 (comment) for filtering out glitchy images. then after some testing we can see whether it should be included in coastseg, so I am proposing we still implement this idea, but for a low-key filter that detects the really rare glitches. I would do this by seeing what the dominant colors were and throw them out if they are in a certain range.

Another idea I had to adapt this workflow was to throw out images smaller than the requested ROI. In this scope, xarray would be useful with dask to speed up reading the shape of each image. That's a common thing - yesterday I had 178 partial images out of a total of 920, or about one in five!

…en shorelines are extracted

2320sharon added the enhancement New feature or request label Aug 15, 2023

2320sharon self-assigned this Aug 15, 2023

dbuscombe-usgs added V2 for version 2 of coastseg and removed V2 for version 2 of coastseg labels Aug 17, 2023

2320sharon added a commit that referenced this issue Aug 17, 2023

Automatic Good/Bad Image Filter #171 V1

0a53da3

2320sharon added the V2 for version 2 of coastseg label Aug 24, 2023

2320sharon mentioned this issue Oct 5, 2023

Zoo Classifier Workflow Update #197

Open

3 tasks

2320sharon added the Optional An optional feature that's not necessary to be built. label Dec 22, 2023

2320sharon added a commit that referenced this issue May 9, 2024

#171 add code to support a good bad classifier for imagery it runs wh…

230780c

…en shorelines are extracted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Automatic Good/Bad Image Filter #171

Feature Request: Automatic Good/Bad Image Filter #171

2320sharon commented Aug 15, 2023 •

edited

dbuscombe-usgs commented Aug 17, 2023 •

edited

2320sharon commented Aug 17, 2023 •

edited

dbuscombe-usgs commented Aug 17, 2023

Feature Request: Automatic Good/Bad Image Filter #171

Feature Request: Automatic Good/Bad Image Filter #171

Comments

2320sharon commented Aug 15, 2023 • edited

Good Bad Image Filter

Peak Signal to Noise Ration Explanation

dbuscombe-usgs commented Aug 17, 2023 • edited

2320sharon commented Aug 17, 2023 • edited

dbuscombe-usgs commented Aug 17, 2023

2320sharon commented Aug 15, 2023 •

edited

dbuscombe-usgs commented Aug 17, 2023 •

edited

2320sharon commented Aug 17, 2023 •

edited