-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast track strategy for searching single recent beliefs #179
base: main
Are you sure you want to change the base?
Conversation
…strategy for fast access to single recent beliefs Signed-off-by: Nicolas Höning <nicolas@seita.nl>
Signed-off-by: Nicolas Höning <nicolas@seita.nl>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good feature!
@@ -487,6 +504,17 @@ def apply_belief_timing_filters(q): | |||
sources: list = [source] if not isinstance(source, list) else source | |||
q = q.join(source_class).filter(cls.source_id.in_([s.id for s in sources])) | |||
|
|||
if most_recent_belief_only: | |||
# most recent belief only makes sense on one defined event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, the most recent belief is not necessarily about the most recent event.
I do like the simplifying assumption, so let's just provide clarity through documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it doesn't seem immediately clear to me why we couldn't cover this use case with a combination of most_recent_beliefs_only=True
and most_recent_event_only=True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still get multiple events if you don't limit to one source.
But foremost, it's really quite slow to use a subquery or even two subqueries. (also with our recent improvement of how the subqueries work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I was not suggesting to just use the two old arguments, but to use only 1 of the new arguments, which already limits the query to a single event (and actually, also already limits it to a single belief). It's not obvious to me why the other new argument is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't see you used a new one here.
Two reasons come to mind:
- old argument uses a subquery which is slow
- I added the
most_recent_belief_only
argument also for consistency in arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old argument uses a subquery which is slow
We are free to internally disconnect that logic and switch over to the fast track in case of a specific combination of filters, if it results in identical search results.
I added the
most_recent_belief_only
argument also for consistency in arguments.
That would make sense, but it doesn't really do what its name suggests. My thinking about this is still evolving. Since my previous comment I realized that picking the most recent event but not the most recent belief about that event doesn't seem like a practical use case.
Current situation:
- The
most_recent_belief_only
filter takes the most recent event and returns the most recent belief about that event that it finds. -> A practical use case. - The
most_recent_event_only
filter takes the most recent event and returns the first belief about that event that it finds. -> Not a practical use case imo.
My preferred situation involves switching to introducing only one new filter, and renaming it:
- The
most_recent_only
filter takes the most recent event and returns the most recent belief about that event that it finds.
Here I'm disregarding any thoughts on n_most_recent_events
and probabilistic beliefs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically your current thoughts are to
- only add
most_recent_only
filter - internally switch over to its logic when the user selects one source together with
most_recent_events_only=True
- document the above point, but also the current state of non-deterministic beliefs being poorly supported here.
Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, with one correction:
- internally switch over to its logic when the user selects one source together with
most_recent_events_only=True
andmost_recent_beliefs_only=True
.
In conjunction, these filters should yield a single row (but only for a sensor holding only deterministic beliefs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a simple & fast way to tell if a sensor has only deterministic beliefs? Otherwise it's not a good idea to do this, maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way would be to count the unique cumulative_probability
values. But since that would also be new functionality, I agree it is not a good idea to do this within this PR.
# from sqlalchemy.dialects import postgresql | ||
# print(q.compile(dialect=postgresql.dialect())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to keep these lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am considering it - they can come in quite handy whenever we work in this code.
timely_beliefs/beliefs/classes.py
Outdated
# apply fast-track most recent event|belief only approach | ||
if most_recent_event_only: | ||
if most_recent_belief_only: | ||
q = q.order_by(cls.event_start.desc(), cls.belief_horizon.desc()).limit( | ||
1 | ||
) | ||
else: | ||
q = q.order_by(cls.event_start.desc()).limit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document that this is only meant to retrieve a deterministic belief. A probabilistic belief would have multiple rows (sharing the same event start and belief horizon).
@@ -311,6 +316,8 @@ def search_session( # noqa: C901 | |||
source: BeliefSource | list[BeliefSource] | None = None, | |||
most_recent_beliefs_only: bool = False, | |||
most_recent_events_only: bool = False, | |||
most_recent_belief_only: bool = False, | |||
most_recent_event_only: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether we should (also) use this fast track in case of a most_recent_events_only=True
and a source
filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean because that combination also leads to only one returned event?
:param most_recent_events_only: only return (post knowledge time) beliefs for the most recent event (maximum event start) | ||
:param most_recent_events_only: only return (post knowledge time) beliefs for the most recent event (maximum event start) for each source | ||
:param most_recent_belief_only: only return the most recent belief of the most recent event which fits all filter criteria (will also apply most_recent_event_only) | ||
:param most_recent_event_only: only return the most recent event which fits all filter criteria |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which -> that
@@ -336,7 +351,9 @@ def search_session( # noqa: C901 | |||
:param horizons_at_most: only return beliefs with a belief horizon equal or less than this timedelta (for example, use timedelta(0) to get post knowledge time beliefs) | |||
:param source: only return beliefs formed by the given source or list of sources | |||
:param most_recent_beliefs_only: only return the most recent beliefs for each event from each source (minimum belief horizon) | |||
:param most_recent_events_only: only return (post knowledge time) beliefs for the most recent event (maximum event start) | |||
:param most_recent_events_only: only return (post knowledge time) beliefs for the most recent event (maximum event start) for each source | |||
:param most_recent_belief_only: only return the most recent belief of the most recent event which fits all filter criteria (will also apply most_recent_event_only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which -> that
Co-authored-by: Felix Claessen <30658763+Flix6x@users.noreply.github.com>
This PR adds a "ORDER_BY and LIMIT 1" strategy to
search_session()
, which drastically shortens the execution time for queries which really only want one (most recent) belief, e.g. to check connectivity status or show the latest measured value.In this PR; we add two search filters
most_recent_event_only
andmost_revent_belief_only
(notice the missing "s").Their addition retains backwards compatibility, so packages using timely beliefs should have an easy time starting to use them.
Also, this PR adds an index to make sure the search time is constant, no matter how far in the past the most recent event might be. The index only adds fields used in the ORDER BY which is the recommended practice.
The psql session below demonstrates its effect in such a case (most recent event quite a while in the past - for sensors where the most recent even is really recent, execution time is fast anyway). I repeated each search twice to make sure we see a consistent behavior.
I tested the case with
most_recent_belief_only
- the index doesn't apply for it, but the performance was decent already.We also need to look into the space we are using with indices.