Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add firm (semi-soft) and non-negative garotte thresholding #354

Merged
merged 8 commits into from
Jul 2, 2018

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented Feb 28, 2018

This PR adds two additional thresholding options that are common in the literature. Each of these is intermediate between soft and hard thresholding as shown in a new demo:

new_thresholds

At the moment I left the firm (semi-soft) thresholding in a separate function because unlike all of the other thresholds available, it requires a second value in the definition of the threshold (The two values define the width of the transition region between soft and hard thresholding behavior as shown in the right panel of the figure above). If preferred, this could instead be a mode within the existing threshold function, but this would require adding a new argument for the upper threshold that is not used by any of the other functions.

Here are citation counts for two references related to the non-negative Garrote::

  • Breiman et. al. Better subset regression using the nonnegative garrote:
    citations: 1012 (Google), 436 (Web of Science)
  • Gao et. al. Wavelet Shrinkage Denoising Using the Non-Negative Garrote:
    citations: 433 (Google), 126 (Web of Science)

Here are citation counts for the two references related to firm shrinkage in the docstring::

  • Gao et. al. WaveShrink with firm shrinkage.
    citations: 432 (Google), 142 (Web of Science)
  • Bruce et. al. WaveShrink: Shrinkage functions and thresholds
    citations: 74 (Google)

@codecov-io
Copy link

codecov-io commented Feb 28, 2018

Codecov Report

Merging #354 into master will decrease coverage by 0.1%.
The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #354      +/-   ##
==========================================
- Coverage   84.36%   84.26%   -0.11%     
==========================================
  Files          22       22              
  Lines        3512     3539      +27     
  Branches      596      600       +4     
==========================================
+ Hits         2963     2982      +19     
- Misses        485      489       +4     
- Partials       64       68       +4
Impacted Files Coverage Δ
pywt/_thresholding.py 87.3% <71.42%> (-12.7%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dfd4031...43952c7. Read the comment docs.

@rgommers
Copy link
Member

Nice. Makes sense to me to add these thresholding modes, they seem fairly popular. I don't have a strong opinion between mode and threshold_firm, both are slightly non-ideal in different ways.

@rgommers
Copy link
Member

Did a quick readthrough of the code, all looks fine. Will have a more detailed look later.

@grlee77
Copy link
Contributor Author

grlee77 commented Feb 28, 2018

One subtle difference from the published equations is that the forms as written here also work correctly with complex numbers. These are equivalent to the modifications made for soft thresholding in #322. The tests also include complex-valued inputs to validate the expected behavior.

For the non-negative garrote, this just corresponds to adding an abs so that the equation from the publication:
x * (1 - (value/x)**2)
becomes:
x * (1 - (value / abs(x))**2)

For the firm threshold, the equation in the publication for values between value_low and value_high is:
sgn(x) * (value_high * (abs(x) - value_low)) / (value_high - value_low)

but sgn as defined in numpy will not work properly for complex-valued data in the above form. An expression that is equivalent for real-valued data and will work as expected for complex-valued inputs is:
x * (value_high * (1 - value_low / abs(x))) / (value_high - value_low)

@rgommers rgommers added this to the v1.0 milestone Jun 29, 2018
@rgommers
Copy link
Member

I see that I completely forgot about this. Will have another read through in the next few days.


.. image:: ../_static/threshold_types.png
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems okay to me if your rationale here is that the plotting code was overly long. We shouldn't make it a habit though, better to generate plots on the fly.

@@ -72,6 +90,11 @@ def threshold(data, value, mode='soft', substitute=0):
less than the value param are replaced with `substitute`. Data values with
absolute value greater or equal to the thresholding value stay untouched.

``garotte`` corresponds to the Non-negative garrote threshold [2]_, [3]_.
It is intermediate between ``hard`` and ``soft`` thresholding. It behaves
like soft thresholdding for small data values and approaches hard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo thresholdding

data : array-like
The data to threshold. This can be either real or complex-valued.
value_low : float
Any values smaller `value_low` will be set to zero.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: smaller than

@@ -123,3 +163,82 @@ def threshold(data, value, mode='soft', substitute=0):
sorted(thresholding_options.keys()))
raise ValueError("The mode parameter only takes values from: {0}."
.format(', '.join(keys)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to add a garotte example, all the other modes have one (http://pywavelets.readthedocs.io/en/latest/ref/thresholding-functions.html)

@rgommers
Copy link
Member

Overall LGTM, make a couple of minor comments.

@rgommers
Copy link
Member

Looking at the figures in the description, I do have the feeling that there really should be a smooth thresholding function; the discontinuity in the first derivative of these thresholding curves must be problematic for some kinds of spectral analyses (don't ask me which ones - it's just a hunch).

@grlee77
Copy link
Contributor Author

grlee77 commented Jul 2, 2018

Thanks for the review. I have made the requested changes.

I moved plot_thresholds.py into a pyplots subfolder of the documentation so we don't have to include the code inline in the documentation itself. This just required adding an extension to conf.py for sphinx.

Locally it looks like:
docs_screenshot

It looked like we already have matplotlib in our CI configurations so hopefully it will work there as well.

@grlee77
Copy link
Contributor Author

grlee77 commented Jul 2, 2018

I do have the feeling that there really should be a smooth thresholding function; the discontinuity in the first derivative of these thresholding curves must be problematic for some kinds of spectral analyses (don't ask me which ones - it's just a hunch).

I don't know of one off the top of my head, but if you find a reference for one you would like to have I can take a look.

@rgommers rgommers merged commit 5b5fd77 into PyWavelets:master Jul 2, 2018
@rgommers
Copy link
Member

rgommers commented Jul 2, 2018

That change to pyplots subfolder looks useful, thanks. LGTM now, merged.

@grlee77 grlee77 deleted the firm_thresh branch July 20, 2018 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants