Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(insights): Speed up loading properties #11037

Merged
merged 13 commits into from
Aug 17, 2022

Conversation

timgl
Copy link
Collaborator

@timgl timgl commented Jul 28, 2022

Problem

Loading properties is painful, takes about 7 seconds on our team. Reason is that we didn't set an offset/limit on the query, so it was always going through 100Ks of properties for each query.

In general the way we do properties needs work. We now spend ~5 minutes loading all properties for our team into browser memory, which you really start to notice. We should probably send property definitions along with properties wherever applicable instead

Changes

set limit/offset
also bump up the breakpoint for loading properties (and other stuff), as you very often end up loading lots in parallel.

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

How did you test this code?

@timgl timgl requested a review from EDsCODE July 28, 2022 20:28
@@ -115,6 +115,9 @@ def get_queryset(self):
**search_kwargs,
}

limit = self.paginator.get_limit(self.request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these params being passed already?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

@@ -130,6 +133,7 @@ def get_queryset(self):
WHERE team_id = %(team_id)s AND name NOT IN %(excluded_properties)s
{name_filter} {numerical_filter} {search_query} {event_property_filter}
ORDER BY is_event_property DESC, query_usage_30_day DESC NULLS LAST, name ASC
LIMIT {limit} OFFSET {offset}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flyby....

I think by setting limit and offset manually here we only ever get one page.

DRF LimitOffsetPagination uses the queryset to count the results here https://github.com/encode/django-rest-framework/blob/master/rest_framework/pagination.py#L387

And then uses that count to decide whether there are more pages.

It slices the queryset before loading the results.

So if we add limit and offset to the queryset the count is always equal to one page.

The TTFB on cloud for this endpoint is wild though even so

@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@@ -4,7 +4,7 @@ describe('a11y', () => {
it('home should have no accessibility violations', () => {
cy.get('[data-attr="menu-item-projecthomepage"]').click()
cy.injectAxe()
reportA11y({ includedImpacts: ['critical'] }, 'home-page-critical', false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This silences an accessibility failure that I thought was resolved.

I'll follow this up in separate work.

The change to false here still runs the accessibility test but doesn't fail the test run when violations are found

@pauldambra
Copy link
Member

@neilkakkar I think this covers the conversation here https://posthog.slack.com/archives/C0368RPHLQH/p1660727404221419

I'd be interested in releasing this so we can measure it without trying to cache the count

Copy link
Collaborator

@neilkakkar neilkakkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused over the Pagination class, but the querying code looks right to me!

Good call on not caching right now

"type": "string",
"nullable": True,
"format": "uri",
"example": "http://api.example.org/accounts/?{offset_param}=400&{limit_param}=100".format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, same below!

Suggested change
"example": "http://api.example.org/accounts/?{offset_param}=400&{limit_param}=100".format(
"example": f"http://api.example.org/accounts/?{self.offset_query_param}=400&{self.limit_query_param}=100"

},
}

def get_next_link(self) -> Optional[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll probably need to override get_previous_link as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not quite sure what this is adding over the default implementation? - as long as count is set, the default seems to be fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! I'm not removing count any more so can override way less of this.

The default implementation asks the queryset for the count and slices the results. But that's being done externally.

Copy link
Collaborator

@neilkakkar neilkakkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner, nice work!

@pauldambra pauldambra merged commit 5780a28 into master Aug 17, 2022
@pauldambra pauldambra deleted the speed-up-loading-properties branch August 17, 2022 15:19
@pauldambra pauldambra added the highlight ⭐ Release highlight label Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
highlight ⭐ Release highlight stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants