Conversation
|
I'm not sure why the tests aren't being ran here. |
|
|
||
|
|
||
| def _get_first_mstump_profile(start, T, m, excl_zone, M_T, Σ_T): | ||
| def _get_first_mstump_profile(start, T_A, T_B, m, excl_zone, M_T, Σ_T): |
There was a problem hiding this comment.
This implies that we are prepared to perform AB-joins in addition to self-joins?
There was a problem hiding this comment.
For the first mstump profile yes, then we can directly use this function.
Currently I implemented it here to implement nan/inf handling the same way as in stump. There (for the first subsequence) all nans in T_A are replaced with zeros, while T_B still contains them, because otherwise there is no way to pass this information to core.mass.
To actually support AB-joins, there also have to be done some changes in _stump itself that are currently a bit out of range.
There was a problem hiding this comment.
Frankly, I'm not sure what an AB-join for multi-dimensional matrix profile actually means. I don't remember the original authors discussing it and I am concerned with supporting this for users if we haven't thought through it.
To actually support AB-joins, there also have to be done some changes in _stump itself that are currently a bit out of range.
I'm not sure I understand, technically, _stump already supports AB-joins. Or, do you mean AB-joins that contain nan/inf in both sequences?
There was a problem hiding this comment.
Alright, I think I am okay with this since it is a private function
| zone_start = max(0, start - excl_zone) | ||
| zone_stop = min(n - m + 1, start + excl_zone) |
There was a problem hiding this comment.
For AB-joins, this is not necessary. This is only needed for self-joins
There was a problem hiding this comment.
Yes, but I did not plan to support AB-joins at the moment.
| hosts = list(dask_client.ncores().keys()) | ||
| nworkers = len(hosts) | ||
| T_A = np.asarray(core.transpose_dataframe(T)).copy() | ||
| T_B = T_A.copy() |
There was a problem hiding this comment.
I am a bit confused as to why we need T_B? It doesn't look like its being used for anything
There was a problem hiding this comment.
I wanted to implement nan/inf handling the same way as we did in stump. There you first calculate the mean/stddev and for each subsequence to ignore you set the mean to inf. Then you set T_A=0 everywhere, where it was nan. Then you calculate the first matrix profile value, and here T_B can still be nan, because this information has to be passed to core.mass. Afterwards you set T_B to zero everywhere where it was nan and pass those two altered timeserieses to _stump, where the decision if one subsequence should be ignored is made only by looking at the mean and checking if it is inf.
So actually to support AB-joins we only need to pass T_B to _mstump and handle it from there, but I skipped this as this is a completely different issue.
Basically, at the moment T_B is only used to correctly calculate the first matrix profile value.
|
|
||
| M_T, Σ_T = core.compute_mean_std(T, m) | ||
| right_P, right_I = _get_first_mstump_profile(start, T, m, excl_zone, M_T, Σ_T) | ||
| right_P, right_I = _get_first_mstump_profile(start, T, T, m, excl_zone, M_T, Σ_T) |
There was a problem hiding this comment.
This seems weird. I feel like _get_first_mstump_profile should detect whether one or two time series are being passed to it
There was a problem hiding this comment.
stump._get_first_stump_profile doesn't, so I kept it this way in mstump. The detection is done in the main stump function.
There was a problem hiding this comment.
Got it! I overlooked the fact that:
- This is a private function
- This sets us up in case we decide to allow AB-joins in the future (currently, this is blocked/omitted in the public function)
|
When I run the tests locally with the master branch (prior to this PR), I notice that the tests are consistently getting stuck on |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@mexxexx Thank you for completing this PR! This is another great step in getting STUMPY to the next level. I really appreciate it. Additionally, given the current global situation, I hope that you and your loved ones are staying safe out there. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Thank you very much, I really appreciate it. I wish you all the best too! |
|
Btw, there was indeed an issue with Azure Pipelines (they changed some things that caused CI to not get triggered). I ended up adding a couple of new lines to the |
Codecov Report
@@ Coverage Diff @@
## master #142 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 12 12
Lines 842 836 -6
=========================================
- Hits 842 836 -6
Continue to review full report at Codecov.
|
This PR fixes the handling of nans in mstump and mstumped (issues #117 and #130 ). It also contains a bit of refactoring in order to clean the code (issue #135 )
The refactoring was done to improve readability and to have the internals more or less identical to
stump.