resample: ValueError: x and y arrays must be equal in length along interpolation axis. #361

adelavega · 2019-01-28T22:51:37Z

When trying to resample a dense variable (physio) with a sampling rate of 0.49999984750004656
the length of the x given to interp1d is one longer than values. My guess is that this has to do with a rounding error with the funky sampling_rate.

The text was updated successfully, but these errors were encountered:

adelavega · 2019-01-28T22:58:55Z

Actually, looks like the issue is len(self.index) != len(self.values) for that variable. Not sure why though.

adelavega · 2019-01-28T23:24:44Z

This traces back to _build_entity_index.
When calculating the index, it uses the following formula:
reps = int(math.ceil(run.duration * sampling_rate))

In this dataset, with duration=876.00031328201294 and sampling_rate=500, that equals:
438000.15664100647 which math.ceil rounds to 438001.

The question is if that strong ceiling rounding is fair here. I could imagine why only 438000 samples are provided. Maybe regular np.round is more appropriate, or somehow this function can be sensitive to the actual number of samples?

tyarkoni · 2019-01-29T01:02:15Z

I think changing it to round is probably fine. But either way, what we should probably do is add a check (called after every update or transformation) that makes sure that all of the internal arrays have the same length (i.e., duration, onset, amplitude, and values) for sparse variables, and that the index has the same length as the values for dense ones.

adelavega · 2019-01-29T01:16:08Z

Agreed. But if its off by one? Should this check adjust it? I can just see either number of samples being valid (although round seems like a saner default to me if we had to choose).

tyarkoni · 2019-01-29T01:29:38Z

Yes, the idea would be to trim or pad one or the other (or at the very least, raise a more comprehensible exception). Probably the values should take priority and the index adjusted to respect its dimensions.

effigies · 2019-01-29T14:46:16Z

Possibly related to #355, #358?

adswa · 2019-01-29T14:46:23Z

I believe I ran into this (and referenced it incoherently in the lower end of #355)

adelavega · 2019-01-29T15:11:10Z

@effigies try changing the rounding from math.ceil to round in your PR (in _build_entity_index. but I think it occurs in at least one other place) to see if that fixes your issue. For issues such as mine (undetermined actual number of volumes, in between whole numbers), we'd need to implement the function that evens out the # of vols.

effigies · 2019-01-29T15:12:05Z

f936aed

adelavega · 2019-02-10T20:32:00Z

This was fixed by #365

adelavega · 2019-03-06T00:09:11Z

This is coming up for me again, in the context of densifying a sparse variable after HRF (in #411), for a dataset with a TR of 2.00000061.

Anyone else have seen this issue crop up again? I thought #365 added a checkt o align the values even if there was a rounding mismatch

adelavega · 2019-03-06T00:14:13Z

Looks like it's because for me this issue is arising at __init__ (index is generated incorrectly there given a run_info) rather than resample.

Since this happens in conjunction with Convolve, its possible that the inaccuracy is actually occuring upstream. Either way, it seems that there should be a check here.

tyarkoni · 2019-03-06T00:33:25Z

I was looking at the resampling code earlier to day and spotted some potential problems. I'm not sure if this issue is coming from the same place (probably not), but FWIW, this line is probably broken:

# This...
x_new = np.linspace(0, n - 1, num=num)
# Should be something like...
x_new = np.linspace(0, n , num=num, endpoint=False)

That would however probably require tweaking some things elsewhere in the code (e.g., in _build_entity_index()). So probably the whole resampling logic needs a thorough re-evaluation. Given that we need to update it to handle sparse acquisition (see here), it might be worth just re-implementing from scratch—I wrote this fairly hastily, and didn't give much thought to various edge cases.

If we really want to do this right, we may also want to consider selecting resampling methods intelligently based on data and target sampling rate (e.g., in cases where we're downsampling by a relatively small factor, we should maybe use scipy.signal.decimate instead of interp1d).

adelavega · 2019-03-06T00:38:48Z

I think for me it was simply a rounding error, so I changed my code to match, but its not clear which is "correct" to me. Probably worth looking at, and abstract that logic away to be the same function to consistency (if/when resampling is overhauled)

adelavega · 2019-03-07T00:52:45Z

@tyarkoni This resampling issue just happend to come up for me. In a run with 1033 binary "speech" events, resampling to 10hz result in an average value of around 0.3. But below 5hz (and especially at TR), lots of these values seem to get lost/averaged out. In the end, at sampling_rate = TR, only a single non-zero value remained.... I'll look into it in more detail tomorrow

tyarkoni · 2019-03-07T01:29:59Z

How long are the events, and how widely are they spaced? This doesn't necessarily sound problematic—possibly just an indication that the sampling rate is too low (even at 10 Hz).

Also, are you talking about the convolved values, or just the raw values run through just the Resample transformation? The 0.3 thing would be weird if the latter, but seems plausible if the former.

adelavega · 2019-03-07T21:00:45Z

It only happens on raw binary values. with very short durations (0.01-0.5s). Usually they are in chunks. I don't think I'd noticed before because I always convolve this regressor.

I suppose the code for to_dense will omit vales shorter than the duration of the sampling_rate, whereas I expected some interpolation to occur.

Edit: The issue is actually that if an events duration * sampling_rate is less than 0.5, it will get rounded down to 0 and omitted:

pybids/bids/variables/variables.py

Line 330 in b4bb3cd

durations = np.round(self.duration * sampling_rate).astype(int)

tyarkoni · 2019-03-07T21:29:16Z

Yeah, that makes sense. The resampling/densification code needs a complete overhaul. I have a plan for it, but need to find the time. Bunch of things to review in the next week or so, but probably after that.

adelavega · 2019-03-07T21:34:09Z

No rush, convolution "fixes" this for me for now.

adelavega · 2020-01-03T21:31:21Z

Proof of concept of the seemingly faulty interpolation of resample (this has diverged a lot from the original issue so maybe a new issue is best).

Given the following SpareRunVariable:

   onset  amplitude  duration
0      1          1         1
1      3          1         1
2      5          1         1
3      7          1         1
4      9          0         1
5     11          1         1

applying to_dense(1) gives you the values that you'd expect:

However, with sampling_rate that is sparser than the events (e.g. 0.5), it seems to not interpolate:

satra · 2020-01-03T21:40:36Z

@adelavega - have you tested this without using the antialiasing filter? i suspect that is what is causing this to return what you have.

just from looking at the dense variable(which looks to be high frequency) the output looks reasonably correct.

adelavega · 2020-01-03T21:51:41Z

Thanks, although I don't think that's the issue.

Looks like to_dense does not do any resampling at all. That happens in the resample function which is only available for Dense variables.

However, when requesting get_design_matrix(sampling_rate=0.5), it uses to_dense, to turn any Sparse variables to Dense, rather than resample.

Looks we should doing a to_dense (at a high frequency) and then a downsample using resample?

satra · 2020-01-03T21:58:53Z

Looks we should doing a to_dense (at a high frequency) and then a downsample using resample?

which for this particular vector could still likely return zeros. upsampling does not change frequency characteristics, downsampling does (via aliasing), hence there is an asymmetry in processes.

adelavega · 2020-01-03T21:59:05Z

Trying with a slightly different inputs, the outputs of resample looks more like what I'd expect:

In [127]: onsets = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]                                                                                                                                                       

In [128]: amplitude = [1, 1, 1, 1, 1, 1, 0, 0, 1, 1]


In [139]: test_var.to_dense(1).values                                                                                                                                                                        
Out[139]: 
    0
0   0
1   1
2   0
3   1
4   0
5   1
6   0
7   1
8   0
9   1
10  0
11  1
12  0
13  0
14  0
15  0
16  0
17  1
18  0
19  1

In [141]: test_var.to_dense(1).resample(0.5).values                                                                                                                                                          
Out[141]: 
          0
0 -0.000382
1  0.614899
2  0.450140
3  0.528530
4  0.475532
5  0.534520
6  0.070803
7  0.018681
8  0.361497
9  1.000018

adelavega · 2020-01-03T22:03:40Z

I think you make a valid point @satra, thanks, but I think there's a more basic problem here with how we handle automatic densification in get_design_matrix. Arguably, the user should specify their own Resample transformation to avoid this, which is fair. But I think the default behavior is pretty confusing.

effigies · 2020-01-06T18:30:55Z

To ensure a faithful representation of sparse variables with dense variables, I think the sampling rate needs to be 1/interval where interval = gcd(*onset_times, *durations). onset_times and durations should probably be in milliseconds, not seconds.

If we're going to allow a sampling rate with ToDense, then we can check whether it's larger than this minimum, and insert a resampling step if needed.

adelavega · 2020-01-06T18:48:52Z

Tal and I just discussed this and I think the take home is that ToDense is dumb on purpose, as its a cheap alternative to Resample. It's really meant to be used in situations when you have a dense variable that is stored as sparse, and you want to convert it and ensure no resampling occurs.

However, the problem is that get_design_matrix via BIDSVariable.resample was actually calling to_dense when it should be calling resample. That is basically a bug, as its inconsistent with resample in BIDSCollection.resample. I'll try this and hopefully this is enough to solve the issue; if not we can add something like what you suggest.

It would be worthwhile to think about smarter ways to apply resampling which may inspect the variable itself to ensure the right thing is done, such as what you suggested, @effigies , so I may open an issue on that, although it does not seem critical.

adelavega added the bug label Jan 28, 2019

adswa mentioned this issue Jan 29, 2019

FIX: Edge case where a resampled column was too-long-by-one #365

Merged

adelavega closed this as completed Feb 10, 2019

adelavega reopened this Mar 6, 2019

adelavega mentioned this issue Mar 6, 2019

ENH: Improve convolution performance for Sparse variables #411

Merged

yarikoptic mentioned this issue Mar 6, 2019

ENH(OPT,TST): make test_transformations run 7 sec instead of 150 or so #413

Merged

adelavega mentioned this issue Jan 6, 2020

Upscale to collection sampling rate prior to resampling #568

Merged

effigies mentioned this issue Jan 8, 2020

ENH: Calculate default sampling rate for SparseRunVariable.to_dense #571

Merged

effigies closed this as completed in #568 Jan 8, 2020

adelavega mentioned this issue Mar 4, 2022

Derivatives in Convolve and Lag Transform with error poldracklab/fitlins#342

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resample: ValueError: x and y arrays must be equal in length along interpolation axis. #361

resample: ValueError: x and y arrays must be equal in length along interpolation axis. #361

adelavega commented Jan 28, 2019

adelavega commented Jan 28, 2019

adelavega commented Jan 28, 2019

tyarkoni commented Jan 29, 2019

adelavega commented Jan 29, 2019 •

edited

Loading

tyarkoni commented Jan 29, 2019

effigies commented Jan 29, 2019

adswa commented Jan 29, 2019 •

edited

Loading

adelavega commented Jan 29, 2019 •

edited

Loading

effigies commented Jan 29, 2019

adelavega commented Feb 10, 2019

adelavega commented Mar 6, 2019

adelavega commented Mar 6, 2019

tyarkoni commented Mar 6, 2019 •

edited

Loading

adelavega commented Mar 6, 2019

adelavega commented Mar 7, 2019

tyarkoni commented Mar 7, 2019 •

edited

Loading

adelavega commented Mar 7, 2019 •

edited

Loading

tyarkoni commented Mar 7, 2019

adelavega commented Mar 7, 2019

adelavega commented Jan 3, 2020 •

edited

Loading

satra commented Jan 3, 2020

adelavega commented Jan 3, 2020

satra commented Jan 3, 2020

adelavega commented Jan 3, 2020

adelavega commented Jan 3, 2020

effigies commented Jan 6, 2020

adelavega commented Jan 6, 2020

resample: ValueError: x and y arrays must be equal in length along interpolation axis. #361

resample: ValueError: x and y arrays must be equal in length along interpolation axis. #361

Comments

adelavega commented Jan 28, 2019

adelavega commented Jan 28, 2019

adelavega commented Jan 28, 2019

tyarkoni commented Jan 29, 2019

adelavega commented Jan 29, 2019 • edited Loading

tyarkoni commented Jan 29, 2019

effigies commented Jan 29, 2019

adswa commented Jan 29, 2019 • edited Loading

adelavega commented Jan 29, 2019 • edited Loading

effigies commented Jan 29, 2019

adelavega commented Feb 10, 2019

adelavega commented Mar 6, 2019

adelavega commented Mar 6, 2019

tyarkoni commented Mar 6, 2019 • edited Loading

adelavega commented Mar 6, 2019

adelavega commented Mar 7, 2019

tyarkoni commented Mar 7, 2019 • edited Loading

adelavega commented Mar 7, 2019 • edited Loading

tyarkoni commented Mar 7, 2019

adelavega commented Mar 7, 2019

adelavega commented Jan 3, 2020 • edited Loading

satra commented Jan 3, 2020

adelavega commented Jan 3, 2020

satra commented Jan 3, 2020

adelavega commented Jan 3, 2020

adelavega commented Jan 3, 2020

effigies commented Jan 6, 2020

adelavega commented Jan 6, 2020

adelavega commented Jan 29, 2019 •

edited

Loading

adswa commented Jan 29, 2019 •

edited

Loading

adelavega commented Jan 29, 2019 •

edited

Loading

tyarkoni commented Mar 6, 2019 •

edited

Loading

tyarkoni commented Mar 7, 2019 •

edited

Loading

adelavega commented Mar 7, 2019 •

edited

Loading

adelavega commented Jan 3, 2020 •

edited

Loading