Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed SR begining/ending spikes #396

Merged
merged 12 commits into from Jan 6, 2022
Merged

Conversation

RobertSamoilescu
Copy link
Contributor

This PR addresses the spikes at the beginning and ending of the scores reported in #290.
Previous implementation was performing the convolution over the entire spectrum returned by the FFT, thus including the initial bias term. Besides the asymmetry introduced by the convolution operation, which resulted in complex numbers when applying the IFFT, the convolution operation was introducing a considerable bias since the numpy implementation pads the signal with 0 before convolving. The same bias is also valid in time domain. To address the bias, I introduced an option to chose the padding strategy:

  • constant - pads the signal with 0
  • replicate - replicates the most extreme value
  • reflect - reflects the sginal

After benchmarking the various padding strategy, I concluded that the reflect strategy works best, with the window_amp centered in the current value (i.e., side=bilateral) .

All the issues listed above were significant when the synthetic signal from the example notebook was lifted (i.e., add an offset of 100).

In addition to the previous fixes, I modified the implementation to performe the averaging in time domain over the previous window_local data points (i.e., is a local average of the preceding window_local points for the current index) as suggested in the paper.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Comment on lines 11 to 21
# padding values
PADDING_CONSTANT = 'constant'
PADDING_REPLICATE = 'replicate'
PADDING_REFLECT = 'reflect'
PADDINGS = [PADDING_CONSTANT, PADDING_REFLECT, PADDING_REPLICATE]

# padding sides
SIDE_BILATERAL = 'bilateral'
SIDE_LEFT = 'left'
SIDE_RIGHT = 'right'
SIDES = [SIDE_BILATERAL, SIDE_RIGHT, SIDE_LEFT]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be slightly nicer to make these enums, inheriting from both str and enum.Enum, e.g. class Padding(str, Enum) so then can refer to e.g. Padding.CONSTANT and you could validate user passed strings against allowed values using one of the methods here: https://stackoverflow.com/questions/29503339/how-to-get-all-values-from-python-enum-class/29503414

@ascillitoe
Copy link
Contributor

I haven't looked through in detail yet, but two minor thoughts; given the relative complexity of the new kwargs, it might be worth updating doc/source/od/methods/sr.ipynb with more details, and also adding some basic tests for the different padding strategies and padding sides?

Comment on lines 30 to 32
padding_amp_method: str = PADDING_REFLECT,
padding_local_method: str = PADDING_REFLECT,
padding_amp_side: str = SIDE_BILATERAL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is user facing, I think we should be more explicit, e.g. types as Literal['constant', 'replicate', 'reflect'] = 'reflect' which unfortunately introduces some name duplication (unless there's a way to define Literal types given a list of string constant from the padding enum?) but is likely worth it for clarity.

Comment on lines 129 to 130
method: str = "replicate",
side: str = 'bilateral'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more explicit typing via Literal as above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, consistent use of ' over " please.

@RobertSamoilescu RobertSamoilescu changed the title fixed begining/ending spikes fixed SR begining/ending spikes Nov 30, 2021
import logging
import numpy as np
from typing import Dict
from typing import Dict, Literal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thinkg I forgot, Literal is not in the standard lib for Python 3.6 so for support would need to do something like this:
https://github.com/SeldonIO/alibi/blob/0ccf1b726b6a865b5a94d727847653ffc7e68a4f/alibi/explainers/ale.py#L11-L14

We will drop Python 3.6 support very soon though, though probably not for the next release so best to put this in. @ascillitoe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is only present in standard lib for Python 3.8+ so we won't be able to get rid of the version check for a while as 3.7 will be supported for some time.

import logging
import numpy as np
from typing import Dict
from alibi_detect.base import BaseDetector, ThresholdMixin, outlier_prediction_dict

if sys.version_info >= (3, 8):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to be adding Literal in lots of places in a future PR, so it might be useful to move this to alibi_detect._types or alibi_detect.utils._types. Although I can refactor this in the future if its easier...

@ascillitoe
Copy link
Contributor

@RobertSamoilescu I've opened this #398 to remind us to uncomment data.meta['attack_type'] == attack once the dataset is fixed.

Comment on lines 221 to 222
assert X.shape[0] > self.conv_amp.shape[0], "The length of the input signal should be greater " \
"than the amplitude window"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like we should raise an error instead if this is triggered by user error? Or is this something that should never really happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will raise an error

Copy link
Contributor

@ascillitoe ascillitoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me now, although still think it's worth updating doc/source/od/methods/sr.ipynb to include the new kwarg's.

Copy link
Member

@jklaise jklaise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll defer to @arnaudvl to have a look at the maths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants