Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skeletonization of RPs and border effect correction. #98

Closed
hkraemer opened this issue Oct 12, 2020 · 14 comments
Closed

Skeletonization of RPs and border effect correction. #98

hkraemer opened this issue Oct 12, 2020 · 14 comments
Assignees
Labels
help wanted wanted feature Feature we would like to have!

Comments

@hkraemer
Copy link
Contributor

hkraemer commented Oct 12, 2020

Hi there,

the finite size of a recurrence plot (RP) can cause border effects in the RQA-quantifiers. Also the samplingrate of the data and the chosen recurrence threshold selection method (fixed, fixedrate, fan) plays a crucial role. They can cause the thickening of diagonal lines in the RP. Both problems lead to biased line-based RQA-quantifiers and is discussed in Kraemer & Marwan, Phys. Lett. A (2019). The proposed methods are implemented in Matlab and are open source.

According to the above mentioned two major problems, there are two methods needed, which will fix them. In the first case one simply needs to supress lines in the diagonal line length histogram, which cross the borders of the RP in an appropriate way (in the paper we compared four different ways, but the kelo=keep longest line-approach will do - see attachement). The second problem is way trickier to tackle. The problem corresponds to applying a Theiler window, but not only for the fiducial point on the trajectory, but also for its nearest neighbours falling within the recurrence threshold. We proposed a skeletonization scheme, which seems to work well (rp_diagonal, attached).

Basically, we

  • first transform the RP by rotating it by 90°. This way the diagonal lines become horizontal lines. That might also speed up computation.

  • We pick the longest line from the diagonal line length histogram and iterate over each point of this line. At each point, we check if there is a neighbouring point/line. If yes, we repeat the same operation on this line until we do not find any neighbouring points and recursively delete all these adjacent lines from the histogram and the RP.

  • Then we pick the second longest line from the histogram and repeat the mentioned procedure.

  • We do this until there is no more line to pick from the histogram.

This leaves us with a RP only consisting of diagonal lines of thickness 1. Since the algorithm is not so complicated and basically consists of iterating over a histogram over and over agai,n I assume it could be way more performant than the Matlab implementation, what do you think?

I would be very grateful to get any help, while implementing this in Julia.

Cheers,
Hauke


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@hkraemer
Copy link
Contributor Author

Here the rp_diagonal skeletonization scheme and here the dl_kelo border line correction Matlab code.

@hkraemer hkraemer added help wanted wanted feature Feature we would like to have! labels Oct 12, 2020
@Datseris Datseris removed their assignment Oct 12, 2020
@Datseris
Copy link
Member

(I don't think I will have much time to contribute here, besides leaving some helpful tips on the PR...)

@heliosdrm
Copy link
Collaborator

Thanks. A small but important detail: if the plan is writing a Julia implementation of this method based on the Matlab code that you linked, I think that it is not possible to do it in this package, because the Matlab code is GPL-licensed, and this package is published with MIT license.

That being said, I don't think that this is a big problem - probably it's a better idea to devise a fresh new Julia code based on the description of the algorithm in the paper, rather than on Matlab's particular implementation.

@hkraemer
Copy link
Contributor Author

Hey,

That being said, I don't think that this is a big problem - probably it's a better idea to devise a fresh new Julia code based on the description of the algorithm in the paper, rather than on Matlab's particular implementation.

Sure! My intention was only to illustrate the main procedure. Of course it will be totally different in Julia, structurewise - way more auxiliary functions. I only posted it, in order to see if you agree on the main concept. Maybe you have a totally different idea of the implementation?

@heliosdrm
Copy link
Collaborator

I only posted it, in order to see if you agree on the main concept. Maybe you have a totally different idea of the implementation?

No, I have not had time to look at it yet, sorry.

@hkraemer
Copy link
Contributor Author

hkraemer commented Mar 1, 2021

Hey @heliosdrm ,

it is time now to implement this :) Of course, I could simply start hacking something, probably sticking to my Matlab-implementation, but as you mentioned before: it might be beneficial to make a plan beforehand, using the structure you implemented for the RQA.

In my Matlab-implementation I first transformed the Recurrence Matrix into a close returns map, i.e. some special rotation by 45 degrees, so that all diagonal lines become horizontal lines. Then I wrote a function, which extracts all the horizontal (diagonal) lines from the rotated Recurrence Matrix and stores them in a (3-by-N)-Matrix, where N is the total number of horizontal (diagonal) lines found in the rotated Recurrence Matrix. In the first line I stored the length of each diagonal line and in the consecutive second and third line I stored its starting indices in the rotated Recurrence Matix, i.e. row- and column-index.

Then I iterate over this sorted (3-by-N)-Matrix, starting with the longest line, "go along" each line and check for smaller, neighbouring lines. If there are any, I delete them from the sorted (3-by-N)-Matrix and keep going until I can not find any more adjacent lines. Then I proceed with the next line in the sorted (3-by-N)-Matrix until I am done. The final rotated Recurrence Matrix gets then constructed from the (3-by-N)-Matrix, since it stores all lines left in the rotated Recurrence Matrix and finally gets transformed back into a regular matrix-shape (Recurrence Plot).

I think I would be fine to implement it like I outlined it, but I kindly ask for your comment if this makes sense at all. Specifically, the Recurrence Matrices are in a sparse-format, which will probably cause problems, when I want to check for recurrence points at an index-tuple (i,j) and "track" diagonal lines in them? As far as I can see I could simply use rowvals() and colvals()-methods, but is there an easy way of extracting whole diagonals from a (sparse) Matrix? Something like the LinearAlgebra.diag()-function, but here working on the sparse Recurrence Matrices?

I appreciate any help.
Best,
Hauke

@heliosdrm
Copy link
Collaborator

Hi, the extraction of lines (both vertical and diagonal) from sparse matrices is coded in histograms.jl. It's not very intuitive, but let me explain:

  • _linehistograms is the core function. It takes the indices given by rowvals() and colvals(), and traverses them looking for discontinuities in the row values, interruptions by the Theiler window, or changes of column to measure the lengths of the lines.
  • verticalhistograms just checks the value of the Theiler window and call rowvals and colvals to pass them to _linehistograms.
  • diagonalhistogram uses colvals(x) .- rowvals(x) instead of colvals(x) as trick to get the diagonal values, and sorts them properly before calling _linehistograms -- that may be equivalent to your "rotation" of the matrix, I think. The code of this function is longer because it checks if the matrix is symmetric to look into the diagonals of only one side if suitable. And it also treats the Theiler window in a different way: it is not used to "break" the lines as in the counting of vertical lines, but to omit certain diagonals.

Maybe that code might serve as starting point. _linehistograms doesn't keep the positions of the individual lines; it just counts them according to their length. Nearly everything that is not a trivial operation is documented by commentaries alongside the code, so adapting it to the new purpose might be possible -- if maybe a bit annoying too, because what has to be changed is in the innermost part of the loops and if blocks (replace the lines where bins etc. are updated by the code needed to update the newly defined matrix).

If this is done, we might look for a smart way for re-using those operations. I mean: once you have that (3-by-N)-Matrix with the data of the individual diagonal lines, it would be faster to make the histogram of diagonal lines from it, instead of from the re-transformed recurrence matrix. I have this idea:

  • That skeletonization might be an option of the function recurrencestructures that is used to extract the diagonal and vertical structures. That function uses those calculations to calculate the histograms, but does not reconstruct the new matrix.
  • At the same time, we add a function recurrencestructures! that would replace the older matrix by the "skeletonized" one.
  • Also, we make a new method of rqa that takes the value returned by recurrencestructures (or recurrencestructures!), instead of the matrix. (That should be very straightforward, actually I wonder why I didn't do it before.)

@heliosdrm
Copy link
Collaborator

An alternative to transforming the matrix with recurrencestructures!, if we are still on time for breaking changes (@Datseris), is that recurrencestructures does not return a dictionary, but its own type. One of the fields of that type might be the "source" recurrence matrix. Thus, if the matrix is skeletonized, we could store it in that field without transforming the original one.

@heliosdrm
Copy link
Collaborator

An additional question for @hkraemer: will the skeletonization preserve symmetry? I think that this is a desirable property, specially if we want to return a transformed matrix.

@hkraemer
Copy link
Contributor Author

hkraemer commented Mar 2, 2021

An additional question for @hkraemer: will the skeletonization preserve symmetry? I think that this is a desirable property, specially if we want to return a transformed matrix.

Yepp, it does. In case of symmetry, it also checks and renderes the lines of one triangle of the RP.

@hkraemer
Copy link
Contributor Author

hkraemer commented Mar 2, 2021

Hi, the extraction of lines (both vertical and diagonal) from sparse matrices is coded in histograms.jl. It's not very intuitive, but let me explain:

* `_linehistograms` is the core function. It takes the indices given by `rowvals()` and `colvals()`, and traverses them looking for discontinuities in the row values, interruptions by the Theiler window, or changes of column to measure the lengths of the lines.

* `verticalhistograms` just checks the value of the Theiler window and call `rowvals` and `colvals` to pass them to `_linehistograms`.

* `diagonalhistogram` uses `colvals(x) .- rowvals(x)` instead of `colvals(x)` as trick to get the diagonal values, and sorts them properly before calling `_linehistograms` -- that may be equivalent to your "rotation" of the matrix, I think. The code of this function is longer because it checks if the matrix is symmetric to look into the diagonals of only one side if suitable. And it also treats the Theiler window in a different way: it is not used to "break" the lines as in the counting of vertical lines, but to omit certain diagonals.

Maybe that code might serve as starting point. _linehistograms doesn't keep the positions of the individual lines; it just counts them according to their length. Nearly everything that is not a trivial operation is documented by commentaries alongside the code, so adapting it to the new purpose might be possible -- if maybe a bit annoying too, because what has to be changed is in the innermost part of the loops and if blocks (replace the lines where bins etc. are updated by the code needed to update the newly defined matrix).

If this is done, we might look for a smart way for re-using those operations. I mean: once you have that (3-by-N)-Matrix with the data of the individual diagonal lines, it would be faster to make the histogram of diagonal lines from it, instead of from the re-transformed recurrence matrix. I have this idea:

* That skeletonization might be an option of the function `recurrencestructures` that is used to extract the diagonal and vertical structures. That function uses those calculations to calculate the histograms, but does not reconstruct the new matrix.

* At the same time, we add a function `recurrencestructures!` that would replace the older matrix by the "skeletonized" one.

* Also, we make a new method of `rqa` that takes the value returned by `recurrencestructures` (or `recurrencestructures!`), instead of the matrix. (That should be very straightforward, actually I wonder why I didn't do it before.)

I am very much in favour of the idea to extend recurrencestructures. You and @Datseris shall decide whether this function eventually returns its own type or a dict and if there is the need to construct the skeletonized RP via recurrencestructures! or not.

A few comments:

  • Note that the Theiler window does not play any role for the skeletonization, since the whole idea is kind of a Theiler window, but not only applied to the neighbours of the ficucial point, but also for a neighbourhood of nearest neighbours of the fiducial point. Translated to RPs: Not only lines adjacent to the LOI are excluded, but also lines around other lines in the plot are.
  • recurrencestructures takes as kwarg border. I am not quite sure what this is meant to be. From my point of view it would be beneficial if we add another keyword regarding the treatmeant of border-lines. As pointed out in Kraemer & Marwan, Phys. Lett. A (2019), there are several ideas to treat lines, which cut the border of the RP. In conclusion there were two main strategies, which are also not too complicated to implement: 1) assign all border diagonals the length of the LOI (Censi correction) or 2) all (semi-)border diagonals are discarded, but the longest one (Kelo correction). Since these corrections can be crucial for an unbiased RQA and they have to be incorporated in diagonalhistogram (as far as I have understood) anyway, these keyword arguments should also be applicable in the rqa-method. And as you pointed out earlier: It makes sense to write a rqa-method that takes the value returned by recurrencestructures (or recurrencestructures!) anyway.

I will start with trying to extract the (3-by-N)-Matrix from a RP, which would also include to incorporate the border line treatments as pointed out. Furthermore I have to think about, if it is neccessary to work on a close returns map, instead of the original RP. One of the main reason why I did that was, that it is way easier to to find "adjacent" lines from a line, since when it is a horizontal line instead of a diagonal one, one only needs to scan the line-value above and underneeth the point, where one is actually at.

Let me know what you think. Cheers

@pucicu
Copy link
Contributor

pucicu commented Mar 16, 2021 via email

@hkraemer
Copy link
Contributor Author

Censi et al. is actually replacing border lines with the LOI. "The first correction we introduced concerns the border effect: we considered a border diagonal to be as long as the main diagonal (N samples), by introducing a diagonal length correction (corrected diagonal length, CDL) [...]" (p. 857). Anyway, it does not matter much, but for semi-border lines, which only cut one border for the RP, I guess.

@Datseris
Copy link
Member

This was implemented in #128 if I understood that correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted wanted feature Feature we would like to have!
Projects
None yet
Development

No branches or pull requests

4 participants