Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Model Change Point Detection #5

Open
Pete-York opened this issue Mar 11, 2019 · 3 comments
Open

Multi-Model Change Point Detection #5

Pete-York opened this issue Mar 11, 2019 · 3 comments

Comments

@Pete-York
Copy link

Hi!

I am hoping to use Ruptures for a project where I have a multi-dimensional or multivariate signal (I'm not 100% sure on the terminology) where I expect different models for some of the dimensions to follow different models (e.g. costs). From what I've read of the survey and looking at the current Ruptures implementation it seems like this hasn't been done currently.

I have some ideas for how to do it and have started implementing it. Is this something you would be interested in having a pull request for? Do you have any suggestions of existing work in this area I may have overlooked?

Thanks, for the package by the way, it's great!

@deepcharles
Copy link
Owner

deepcharles commented Mar 15, 2019

Hello,

Thanks for your message.
That would be a great addition to ruptures, but there are several possible implementations. For instance, do you specify beforehand which cost function is applied to which dimensions? Something equivalent to

  # say we have a 4D signal in a numpy array called "signal"

  # split the signal into two 2D signals
  signal_subdim1 = signal[:, :2]
  signal_subdim2 = signal[:, 2:]
  # detect changes in each 2D signals, with different costs
  breakpoints1 = rpt.Dynp(model="rbf").fit(signal_subdim1).predict(3)
  breakpoints2 = rpt.Dynp(model="l2").fit(signal_subdim2).predict(5)
  # merge change points 
  breakpoints = list(set(breakpoints1 + breakpoints2))

How do you imagine it?

Thanks,

Charles

@Pete-York
Copy link
Author

The way I currently have it implemented is pretty straightforward, there is a cost type called CombinatorialCost, which takes a list of costs with length equal to the dimensionality of your signal.

E.g.

# say we have a 4D signal in a numpy array called "signal"
combinatorial_cost = CombinatorialCost([CostAR(), CostAR(), CostL1(), CostLinear())

algo = rpt.Pelt(min_size=5, custom_cost=combinatorial_cost).fit(signal)
result = algo.predict(pen=penalty)

To calculate the cost of a segment it simply sums the cost of each dimension according to that dimension's cost.

In terms of issues I immediately see with this approach the one that jumps out is some way of normalising cost magnitude: will different costs have different size and variance e.g. will CostL1 be very different in magnitude to CostAR() depending on the dataset.

I can make a pull request so you can see what I have so far if you like.

@deepcharles
Copy link
Owner

That sounds great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants