Skip to content

Conversation

@mchestnut91
Copy link
Contributor

No description provided.

@mchestnut91
Copy link
Contributor Author

mchestnut91 commented Dec 3, 2020

The DS team brought it to my attention that StreamSet.earliest() and StreamSet.latest() queries were taking ~30 minutes to get all of the points for a streamset containing 183 streams.

I took a look and the issue was that StreamSet.versions() was issuing a call to StreamSet._latest_versions() each time, and never called StreamSet.pin_versions() so the versions could actually get pinned for later calls.

I wasn't exactly sure what to return in StreamSet.pin_versions(), I decided to go with self._pinned_versions() rather than self, but I am open to other suggestions for how to handle this.

btrdb/stream.py Outdated

self._pinned_versions = self._latest_versions() if not versions else versions
return self
return self._pinned_versions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the concept of method chaining. Streamset methods like this should always return self.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to adjust my brain to think about the problem you are running into... will return with comments but I dont think this is the correct fix in this case.

btrdb/stream.py Outdated
"""
return self._pinned_versions if self._pinned_versions else self._latest_versions()
return self._pinned_versions if self._pinned_versions else self.pin_versions()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to above.

@mchestnut91
Copy link
Contributor Author

@looselycoupled I took another look at this after our conversation last week, and I believe that the approach that you mentioned during our call is the best fix - in the StreamSet.earliest() and StreamSet.latest() functions, we should be calling self.versions() once. This will return a dict of {UUID: version}, which we can use to look up the correct version for each stream by UUID.

Let me know if you think I'm missing something. I tested it out and the earliest/latest calls are down to ~30 seconds for a StreamSet with 183 streams

Copy link
Member

@looselycoupled looselycoupled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@looselycoupled looselycoupled merged commit 9c22285 into master Dec 16, 2020
@looselycoupled looselycoupled deleted the ch9628-streamset-earliest/latest-fix branch December 16, 2020 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants