Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix time range finding issues #231

Merged
merged 7 commits into from Feb 11, 2020
Merged

Fix time range finding issues #231

merged 7 commits into from Feb 11, 2020

Conversation

JelleAalbers
Copy link
Member

This is a fix for #230, i.e. the treatment of the time_range / seconds_range / time_within options.

Background info:

  • Strax supports (a) data without time information, such as peaklet_classification or corrected_areas and (b) data that is never stored, such as peaks, or the temporary datatypes that are made to merge e.g. peaks and peak_basics if you load them together.
  • Strax even allows time selection with a no-time datatype as long as you load it alongside data of the same kind that does have time info (like peaks). It will then use the latter to map the time range to row numbers, which can be selected in the no-time datatypes.

There were two bugs:

  • Strax was too coarse when converting time ranges to row numbers. We picked whichever time-bearing datatype was listed first, found out in which chunk the start and stop of the range were contained, then loaded the row numbers corresponding to those entire chunks. If this data type happened to be very slim (few columns), a chunk would correspond to way more time than you selected (since we try to save in fixed-size files), possibly the entire run. If the simultaneously loaded data types were not slim, your RAM might explode.
  • The time range argument and associated conversion to row ranges could be spuriously skipped. When loading a never-stored datatype such as peaks together with a time-bearing datatype of the same kind, such as peak_basics, peaks still has to be reconstructed from peaklets and merged_s2s. However, the time range was not passed on to the latter, so strax attempted to load the full data in the run, starting with the first chunk, which would usually cause a merge failure. (... but not always; due to the very coarse chunk mapping you were fine if the run only had one chunk or perhaps if you were just loading the first chunk.)

Now:

  • Strax actually loads data to find out the exact row numbers that correspond to the time range (only for chunks containing the start and stop time, so possibly just one chunk). This does mean that some data is loaded twice when doing time_range selections.
  • If a time_range is specified, it will apply to all data types that are loaded. There are likely still cases where the current logic fails, e.g. when you make some deep chain of no-time-bearing and never-stored plugins. Then you will get a neater error: it will say which data type it tried in vain to find time information for (rather than a weird merge failure or blowing up your memory).
  • _time_range_to_n_range has been renamed to time_range_to_row_range for clarify.
  • An additional option _chunk_number is added to get_array, get_df, etc, to load just a single chunk. This is used internally in time_range_to_row_range.

Taking a step back, this is all additional complexity that comes from supporting data types without time information (see also #79). If I knew how much trouble this would have brought I would never have supported it -- it saves a bit of space, but not so much. However, by now, it is probably too big of a downstream change to stop supporting this.

@JelleAalbers
Copy link
Member Author

So now codefactor thinks that using an f-string in an error message is a possible SQL injection vulnerability... right.

@JelleAalbers JelleAalbers merged commit c68891a into master Feb 11, 2020
@JelleAalbers JelleAalbers deleted the fix_time_range branch February 11, 2020 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant