Skip to content
This repository has been archived by the owner on Sep 20, 2022. It is now read-only.

[WIP] Implement SST-based change-point detector #11

Closed
wants to merge 8 commits into from

Conversation

myui
Copy link
Member

@myui myui commented Dec 6, 2016

This PR is based on a pending PR by @takuti that is sent before Hivemall entered Apache Incubator.

See JIRA for tracking the status of this issue.


Sample table

time x
1 182.478
2 176.231
3 183.917
4 177.798
5 165.469
... ...

(14398 points from twitter data)

Usage

create temporary function sst as 'hivemall.anomaly.SingularSpectrumTransformUDF';
SELECT
  time,
  -- x is double or array<double>
  -- sst(x) AS res
  sst(x, "-th 0.005") AS res
FROM
  twitter_timeseries
ORDER BY time ASC
;

Results

7551    {"changepoint_score":0.00453049288071683,"is_changepoint":false}
7552    {"changepoint_score":0.004711244102524104,"is_changepoint":false}
7553    {"changepoint_score":0.004814871928978115,"is_changepoint":false}
7554    {"changepoint_score":0.004968089640799422,"is_changepoint":false}
7555    {"changepoint_score":0.005709056330104878,"is_changepoint":true}
7556    {"changepoint_score":0.0044279766655132,"is_changepoint":false}
7557    {"changepoint_score":0.0034694956722586268,"is_changepoint":false}
7558    {"changepoint_score":0.002549056569322694,"is_changepoint":false}
7559    {"changepoint_score":0.0017395109108403473,"is_changepoint":false}
7560    {"changepoint_score":0.0010629833145070489,"is_changepoint":false}

On the naive SVD-based implementation, elapsed time was about 20 sec. for the 14398 samples (vs. 10 sec on ChangeFinder)

Observations

The change-point scores are much more stable compared to ChangeFinder, and change-point scores are always in [0, 1]. However, since the scores are quite noisy, too many change-points are detected. Smoothing scores like ChangeFinder is practically important.

In terms of running time, the naive SVD-based implementation is clearly inefficient. So, the Lanczos-based efficient variant should be supported.

@myui myui closed this Dec 6, 2016
@myui myui reopened this Dec 6, 2016
@myui myui changed the title [WIP] Support Feature Selection UDFs [WIP] Implement SST-based change-point detector Dec 6, 2016
@myui
Copy link
Member Author

myui commented Dec 13, 2016

@takuti Could you provide a markdown documentation docs/gitbook/anomaly/sst.md for user guide?

cd docs/gitbook; gitbook serve to check a user guide on a local machine.

@myui
Copy link
Member Author

myui commented Dec 13, 2016

@takuti Could you review my changes?
e867b45

Applied refactoring based on my review comments.

@myui
Copy link
Member Author

myui commented Dec 13, 2016

@maropu @amaya382

Support for sst UDF in Hivemall Spark module is very welcome.

@coveralls
Copy link

coveralls commented Dec 13, 2016

Coverage Status

Coverage increased (+0.4%) to 35.951% when pulling c4de063 on myui:JIRA-22/pr-356 into 9422977 on apache:master.

@maropu
Copy link
Member

maropu commented Dec 14, 2016

@myui okay, I'll do.

@myui
Copy link
Member Author

myui commented Dec 14, 2016

@maropu Thanks. anomaly/changefinder is another feature to add in v0.5.
myui/hivemall#333

The interface/functionality is similar to sst.

@takuti
Copy link
Member

takuti commented Dec 14, 2016

@myui Sorry for my late response to your review, and thanks for your backup. I move on to reviewing your modifications and gitbook documentation from now.

@coveralls
Copy link

coveralls commented Dec 14, 2016

Coverage Status

Coverage increased (+0.4%) to 35.951% when pulling 0fd64e6 on myui:JIRA-22/pr-356 into 9422977 on apache:master.

@asfgit asfgit closed this in 9c0aae3 Dec 19, 2016
@myui myui deleted the JIRA-22/pr-356 branch December 19, 2016 05:06
@myui
Copy link
Member Author

myui commented Dec 19, 2016

@takuti Merged this PR. When you write markdown documents, please send a PR.

Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants