-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunk session recordings #3566
Chunk session recordings #3566
Conversation
I seem to have gotten flaky tests. If they'd pass, this PR would be ready to be reviewed. I'm especially looking for scrutiny of the shape of the chunk objects. Before merging any session recording chunking, and to keep the unknown-to-me dragons away, I think we also need @macobo 's blessing on this approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me, though TestClickhouseSessionsList
and TestClickhouseSessions
seem to be exceptionally flaky for this PR… Maybe there's a condition when merging doesn't occur properly, or it can be a false alarm.
Wanted to test playing session recording, but unfortunately when trying out this PR's Postgres review app, no session in the list has "Play recording" available. :(
I did not yet dig deep in here, however the missing piece here is compressing the json payload before chunking. With the data we're working with gzip can make a difference of 10-100x of the data. |
def run(self, team: Team, session_recording_id: str, *args, **kwargs) -> Dict[str, Any]: | ||
from posthog.api.person import PersonSerializer | ||
|
||
distinct_id, start_time, snapshots = self.query_recording_snapshots(team, session_recording_id) | ||
distinct_id, start_time, unmerged_snapshots = self.query_recording_snapshots(team, session_recording_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will SESSIONS_IN_RANGE_QUERY above still continue working with this setup? From a cursory glance at the code it seems not.
Basically there will always be sessions with incomplete data - e.g. person leaves the page before posthog.js finishes sending the full payload event or the larger payload gets blocked. We should not show sessions where there's no full snapshot event which the below line accomplishes:
COUNT(*) FILTER(where snapshot_data->>'type' = '2') as full_snapshots
Superceded by another PR that is already live on cloud and coming up with the enxt release! |
Changes
Checklist