New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture Endpoint Refactor #6230
Conversation
@@ -53,7 +57,6 @@ def _to_arguments(self, patch_process_event_with_plugins: Any) -> dict: | |||
"sent_at": sent_at, | |||
} | |||
|
|||
@patch("posthog.models.team.TEAM_CACHE", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since I was updating a lot of tests, felt like this change could be bundled up here.
We got rid of TEAM_CACHE
a long time ago
Codecov Report
@@ Coverage Diff @@
## master #6230 +/- ##
=========================================
Coverage ? 91.86%
=========================================
Files ? 400
Lines ? 30294
Branches ? 2566
=========================================
Hits ? 27830
Misses ? 1935
Partials ? 529 Continue to review full report at Codecov.
|
"You need to set user distinct ID field `distinct_id`.", code="required", attr="distinct_id" | ||
), | ||
) | ||
# An invalid distinct ID will not return an error code, instead we will capture an exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see. Is this a change that might need coordination with SDKs, docs and so on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't need coordination with SDKs, docs might be a good addition.
While this sounds like a big change (our most important API behaves differently after this), it's not that bad when you consider that this was actually a bug.
Currently, if you send us 999 valid events, and the last event is an invalid one, we'll send you back a 4xx response. That might lead a client to retry sending these (think platforms like RudderStack or Segment), leading to a lot of duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, none of our SDKs rely on these particular errors. But indeed it should be made clear on the docs that if you send us a batch of events we will not tell you if some events in there are invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a start on docs: PostHog/posthog.com#2178
The whole platform team is tagged to review just so people have context. But tagged you @Twixes for quickly batting an eye just because you've worked a lot with ingestion/our API in general and would be important to have this change be well-reviewed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't review thoroughly yet
A lot of tests/improvements incoming |
You mean manually testing it? @guidoiaquinti |
@guidoiaquinti @tiina303 I made this much easier to review by splitting the DLQ code into another PR with this as its base (#6311). That should be merged before this. This PR now is very limited in behavior changes and is mostly a refactor with some tests. The key behavioral change is not sending errors to the client on an invalid event, and instead processing the rest of the batch. That's like a few lines. The rest is moving stuff around and adding more tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work & thanks for breaking it up & making the PRs smaller. Much easier to review this way =)
all addressed, thanks a lot! Will check that tests pass and resolve the conflicts with #6311 so we can merge that here and get this in early next week (when we can keep a close eye on it) |
* Add dead letter queue to capture endpoint This reverts commit 34e7441. * try except kafka produce * format * statsd + log for dlq produce error * fix types * fix * fix * legit fix * ignore type
Ok so we're back to one PR now. @tiina303 maybe give this a final look focusing only on any potential serious bugs/holes. Anything stylistic we can fix elsewhere - the scope is already much larger than what we started with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, added two nits, but feel free to ignore or consider doing in a follow-up.
if not is_clickhouse_enabled(): | ||
error_response = cors_response( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd drop this condition (since it's not this function's job to decide if it's used or not, later we might do something different for single capture vs batch for example) & in the usage part check
if not team and not send_to_dead_letter_queue:
return error_response
In the future I'd recommend doing it the other way - land this one & then rebase the other pr on top of master & then land it ... smaller changes ... easier to revert. |
Important
This is currently not up for review, but when it is, it should be reviewed with care as it's a refactor of the entry point for our ingestion pipeline!
Changes
Refactoring the capture endpoint as outlined in #6227.
All in all, this:
It's recommended to merge #6193 first, although it doesn't really matter.
There are certainly other things that could be done, but this PR is already large enough for ingestion code, so would like to keep the scope at this for now.
How did you test this code?
Updated Postgres tests, added a new Postgres test, added EE tests.