Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warped series is not of the same length as the reference and query #7

Closed
JulietteVTrigt opened this issue Nov 10, 2020 · 9 comments
Closed

Comments

@JulietteVTrigt
Copy link

JulietteVTrigt commented Nov 10, 2020

  • Python port of R's Comprehensive Dynamic Time Warp algorithm package version:
  • Python version: 3.7.3
  • Operating System: OS X

Description

If the query and reference series are of size N, then the warped series is always of size N-1. This can be explained by how the warp function is set up:

jset = alignment.index2
jmax = numpy.max(jset)
ii = function(numpy.arange(jmax))
return ii

As Python starts counting from 0, jmax is always N-1. I was wondering whether this is the desired output or whether the length of the warped series should also be N.

What I Did

alignment = dtw(
+        y_pred,
+        y_true,
+        keep_internals=True,
+        step_pattern=step_pattern,
+        window_type="sakoechiba",
+        window_args={"window_size": 4},
+    )
+    warped_indices = warp(alignment, index_reference=False)
+    aligned_series = y_pred[warped_indices]
@tonigi
Copy link
Collaborator

tonigi commented Nov 10, 2020

It is intended. jmax is the index to the element containing the last aligned point. It may differ from N-1 for open_end=True alignments.

@tonigi tonigi closed this as completed Nov 10, 2020
@davidmwatson
Copy link

I realise this issue has been closed, however I've been experiencing the same problem. There is an off-by-one difference, where the warped series is always one sample short of the length of the series it is being warped to.

I can see that jmax should be the index of the last element, but I was wondering if the issue is more in the generation of the indices to be fed to the interpolation: ii = ifun(numpy.arange(jmax)) . arange will generate values up to but not including the given value, so the length will be one short of jmax. For example, if jmax = 9 (corresponding to the 10th position), then arange(jmax) will only return 9 numbers (0 through 8 inclusive), and so will miss the final index of jmax. Might it make more sense to do arange(jmax + 1) instead, which would give the full length sequence ending on the value of jmax?

I've tried making that change in my own local copy, and it doesn't seem to have broken anything. If nothing else, maybe an extra argument could be added to the function giving the user the option to choose?

@tonigi
Copy link
Collaborator

tonigi commented Feb 11, 2021

I think you rather want either the .index1 or .index2 properties in the result. jmax is just an index.
They will be of the same length and be correct for any step pattern chosen.

@davidmwatson
Copy link

Okay, here's an example adapted from the docs to show what I mean. First, modify the warp function in warp.py to allow incrementing jmax if needed.

def warp(d, index_reference=False, full_length=False):
    if not index_reference:
        iset = d.index1
        jset = d.index2
    else:
        iset = d.index2
        jset = d.index1

    jmax = numpy.max(jset)

    # Scipy interp1d is buggy. it does not deal with leading
    # duplicated values of x. It returns different values
    # depending on the dtypes of arguments.
    ifun = _interp(x=jset, y=iset)

    ## Change here ##
    n = jmax + 1 if full_length else jmax 
    ii = ifun(numpy.arange(n))
    ## End change ##

    # Quick fix for bug
    if numpy.isnan(ii[0]):
        ii[numpy.isnan(ii)] = iset[0]
    return ii.astype(int)

Then run the example from the docs

idx = np.linspace(0, 2 * np.pi, num=100)
query = np.sin(idx) + np.random.uniform(size=100)/10.0
reference = np.cos(idx)

print(len(reference))  # -> 100

alignment = dtw(query, reference)

# Existing behaviour
wq1 = warp(alignment)
warped_query1 = query[wq1]
print(len(wq1), len(warped_query1))  # -> 99

# Modified behaviour
wq2 = warp(alignment, full_length=True)
warped_query2 = query[wq2]
print(len(wq2), len(warped_query2))  # -> 100

The existing warp behaviour makes wq (and hence also warped_query) one short of the length of the reference. I was expecting that if I warp something to a given reference, the warped series should be the same length as that reference, not one short of it. Using jmax+1 produces the behaviour I was expecting, at least. The .index1/2 attributes are both the same length as each other, but not the same length as the reference, so wouldn't really do what I want either.

Using the R implementation, the same example produces the behaviour I was expecting (the warped query is the same length as the reference):

idx <- seq(0, 2 * pi, length.out=100)
query <- sin(idx) + runif(100)/10.0
reference <- cos(idx)

print(length(reference))  # -> 100

alignment <- dtw(query, reference)

wq <- warp(alignment)
warped_query <- query[wq]
print(c(length(wq), length(warped_query)))  # -> 100

So the python and R implementations are slightly inconsistent here. The difference arises because python and R use different indexing (numpy's arange goes up to but not including jmax, while R's colon operator goes up to and including jmax). If I understand correctly, using arange(jmax+1) in python should reproduce the behaviour of the R implementation.

@tonigi
Copy link
Collaborator

tonigi commented Feb 11, 2021

Thanks for the added details. I'll have a look. The R warp function should be correct.

@tonigi tonigi reopened this Feb 15, 2021
@tonigi
Copy link
Collaborator

tonigi commented Jun 30, 2021

Sorry for the delay. I still haven't had the time to carefully check the fix and think about the corner cases, but in the meantime, it sounds good for people stumbling upon the bug.

@steelec
Copy link

steelec commented Sep 16, 2022

I have also recently come across this issue and decided on the same fix for the code. Is there any confirmation that this change will then provide the same output as the R version?

@tonigi
Copy link
Collaborator

tonigi commented Sep 16, 2022

Python version's warp function relies on scipy interpolation, which is different from R's. This should be fixed in the future. So, even with the fix there still can be discrepancies.

@tonigi
Copy link
Collaborator

tonigi commented Mar 18, 2024

I incorporated the change in 9ad9bf9 and released 1.4.0 . Sorry for taking it so long.

@tonigi tonigi closed this as completed Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants