Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk session recordings #3566

Closed
wants to merge 13 commits into from
Closed

Chunk session recordings #3566

wants to merge 13 commits into from

Conversation

mariusandra
Copy link
Collaborator

@mariusandra mariusandra commented Mar 3, 2021

Changes

  • Chunks session recordings into blocks of 512KB when ingesting
  • Combines the blocks when reading back from the database

Checklist

  • All querysets/queries filter by Organization, by Team, and by User
  • Django backend tests
  • Jest frontend tests
  • Cypress end-to-end tests

@timgl timgl temporarily deployed to posthog-chunk-session-r-q5romk March 3, 2021 11:48 Inactive
@mariusandra mariusandra temporarily deployed to posthog-chunk-session-r-q5romk March 3, 2021 17:04 Inactive
@mariusandra mariusandra temporarily deployed to posthog-chunk-session-r-q5romk March 3, 2021 22:23 Inactive
@mariusandra mariusandra temporarily deployed to posthog-chunk-session-r-q5romk March 3, 2021 23:57 Inactive
@mariusandra
Copy link
Collaborator Author

I seem to have gotten flaky tests. If they'd pass, this PR would be ready to be reviewed.

I'm especially looking for scrutiny of the shape of the chunk objects.

Before merging any session recording chunking, and to keep the unknown-to-me dragons away, I think we also need @macobo 's blessing on this approach.

@mariusandra mariusandra marked this pull request as ready for review March 4, 2021 00:19
@Twixes Twixes temporarily deployed to posthog-chunk-session-r-q5romk March 4, 2021 10:48 Inactive
@Twixes Twixes temporarily deployed to posthog-chunk-session-r-q5romk March 4, 2021 10:57 Inactive
Copy link
Collaborator

@Twixes Twixes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me, though TestClickhouseSessionsList and TestClickhouseSessions seem to be exceptionally flaky for this PR… Maybe there's a condition when merging doesn't occur properly, or it can be a false alarm.
Wanted to test playing session recording, but unfortunately when trying out this PR's Postgres review app, no session in the list has "Play recording" available. :(

@mariusandra
Copy link
Collaborator Author

To be fair, that test seems flaky everywhere now :/

Also in This PR for example:

image

image

Also in this PR: #3579

@macobo
Copy link
Contributor

macobo commented Mar 8, 2021

I did not yet dig deep in here, however the missing piece here is compressing the json payload before chunking. With the data we're working with gzip can make a difference of 10-100x of the data.

def run(self, team: Team, session_recording_id: str, *args, **kwargs) -> Dict[str, Any]:
from posthog.api.person import PersonSerializer

distinct_id, start_time, snapshots = self.query_recording_snapshots(team, session_recording_id)
distinct_id, start_time, unmerged_snapshots = self.query_recording_snapshots(team, session_recording_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will SESSIONS_IN_RANGE_QUERY above still continue working with this setup? From a cursory glance at the code it seems not.

Basically there will always be sessions with incomplete data - e.g. person leaves the page before posthog.js finishes sending the full payload event or the larger payload gets blocked. We should not show sessions where there's no full snapshot event which the below line accomplishes:

COUNT(*) FILTER(where snapshot_data->>'type' = '2') as full_snapshots

@macobo macobo closed this Mar 19, 2021
@macobo
Copy link
Contributor

macobo commented Mar 25, 2021

Superceded by another PR that is already live on cloud and coming up with the enxt release!

@Twixes Twixes deleted the chunk-session-recordings branch April 30, 2021 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants