New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed SR begining/ending spikes #396
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
alibi_detect/od/sr.py
Outdated
# padding values | ||
PADDING_CONSTANT = 'constant' | ||
PADDING_REPLICATE = 'replicate' | ||
PADDING_REFLECT = 'reflect' | ||
PADDINGS = [PADDING_CONSTANT, PADDING_REFLECT, PADDING_REPLICATE] | ||
|
||
# padding sides | ||
SIDE_BILATERAL = 'bilateral' | ||
SIDE_LEFT = 'left' | ||
SIDE_RIGHT = 'right' | ||
SIDES = [SIDE_BILATERAL, SIDE_RIGHT, SIDE_LEFT] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be slightly nicer to make these enums, inheriting from both str
and enum.Enum
, e.g. class Padding(str, Enum)
so then can refer to e.g. Padding.CONSTANT
and you could validate user passed strings against allowed values using one of the methods here: https://stackoverflow.com/questions/29503339/how-to-get-all-values-from-python-enum-class/29503414
I haven't looked through in detail yet, but two minor thoughts; given the relative complexity of the new kwargs, it might be worth updating |
alibi_detect/od/sr.py
Outdated
padding_amp_method: str = PADDING_REFLECT, | ||
padding_local_method: str = PADDING_REFLECT, | ||
padding_amp_side: str = SIDE_BILATERAL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is user facing, I think we should be more explicit, e.g. types as Literal['constant', 'replicate', 'reflect'] = 'reflect'
which unfortunately introduces some name duplication (unless there's a way to define Literal
types given a list of string constant from the padding enum?) but is likely worth it for clarity.
alibi_detect/od/sr.py
Outdated
method: str = "replicate", | ||
side: str = 'bilateral' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more explicit typing via Literal
as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, consistent use of '
over "
please.
alibi_detect/od/sr.py
Outdated
import logging | ||
import numpy as np | ||
from typing import Dict | ||
from typing import Dict, Literal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thinkg I forgot, Literal
is not in the standard lib for Python 3.6 so for support would need to do something like this:
https://github.com/SeldonIO/alibi/blob/0ccf1b726b6a865b5a94d727847653ffc7e68a4f/alibi/explainers/ale.py#L11-L14
We will drop Python 3.6 support very soon though, though probably not for the next release so best to put this in. @ascillitoe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it is only present in standard lib for Python 3.8+ so we won't be able to get rid of the version check for a while as 3.7 will be supported for some time.
alibi_detect/od/sr.py
Outdated
import logging | ||
import numpy as np | ||
from typing import Dict | ||
from alibi_detect.base import BaseDetector, ThresholdMixin, outlier_prediction_dict | ||
|
||
if sys.version_info >= (3, 8): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to be adding Literal
in lots of places in a future PR, so it might be useful to move this to alibi_detect._types
or alibi_detect.utils._types
. Although I can refactor this in the future if its easier...
@RobertSamoilescu I've opened this #398 to remind us to uncomment |
8248f86
to
e71c36c
Compare
alibi_detect/od/sr.py
Outdated
assert X.shape[0] > self.conv_amp.shape[0], "The length of the input signal should be greater " \ | ||
"than the amplitude window" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like we should raise an error instead if this is triggered by user error? Or is this something that should never really happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will raise an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good to me now, although still think it's worth updating doc/source/od/methods/sr.ipynb
to include the new kwarg's.
8768da6
to
cc36d78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'll defer to @arnaudvl to have a look at the maths.
cc36d78
to
972191e
Compare
This PR addresses the spikes at the beginning and ending of the scores reported in #290.
Previous implementation was performing the convolution over the entire spectrum returned by the FFT, thus including the initial bias term. Besides the asymmetry introduced by the convolution operation, which resulted in complex numbers when applying the IFFT, the convolution operation was introducing a considerable bias since the numpy implementation pads the signal with 0 before convolving. The same bias is also valid in time domain. To address the bias, I introduced an option to chose the padding strategy:
constant
- pads the signal with 0replicate
- replicates the most extreme valuereflect
- reflects the sginalAfter benchmarking the various padding strategy, I concluded that the
reflect
strategy works best, with thewindow_amp
centered in the current value (i.e.,side=bilateral
) .All the issues listed above were significant when the synthetic signal from the example notebook was lifted (i.e., add an offset of 100).
In addition to the previous fixes, I modified the implementation to performe the averaging in time domain over the previous
window_local
data points (i.e., is a local average of the precedingwindow_local
points for the current index) as suggested in the paper.