Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture Endpoint Refactor #6230

Merged
merged 25 commits into from Oct 11, 2021
Merged

Capture Endpoint Refactor #6230

merged 25 commits into from Oct 11, 2021

Conversation

yakkomajuri
Copy link
Collaborator

@yakkomajuri yakkomajuri commented Oct 4, 2021

Important

This is currently not up for review, but when it is, it should be reviewed with care as it's a refactor of the entry point for our ingestion pipeline!

Changes

Refactoring the capture endpoint as outlined in #6227.

All in all, this:

  • Ensures we process other events if one event in a batch is invalid
  • Sends events to a dead letter queue if there's a DB error e.g. Postgres is down (+ metrics)
  • Adds tests for the capture endpoint on EE deployments (they didn't exist)

It's recommended to merge #6193 first, although it doesn't really matter.

There are certainly other things that could be done, but this PR is already large enough for ingestion code, so would like to keep the scope at this for now.

How did you test this code?

Updated Postgres tests, added a new Postgres test, added EE tests.

@@ -53,7 +57,6 @@ def _to_arguments(self, patch_process_event_with_plugins: Any) -> dict:
"sent_at": sent_at,
}

@patch("posthog.models.team.TEAM_CACHE", {})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since I was updating a lot of tests, felt like this change could be bundled up here.

We got rid of TEAM_CACHE a long time ago

@timgl timgl temporarily deployed to posthog-pr-6230 October 4, 2021 15:22 Inactive
@codecov-commenter
Copy link

codecov-commenter commented Oct 4, 2021

Codecov Report

❗ No coverage uploaded for pull request base (master@823b3e7). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #6230   +/-   ##
=========================================
  Coverage          ?   91.86%           
=========================================
  Files             ?      400           
  Lines             ?    30294           
  Branches          ?     2566           
=========================================
  Hits              ?    27830           
  Misses            ?     1935           
  Partials          ?      529           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 823b3e7...34e7441. Read the comment docs.

@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6230 October 4, 2021 17:09 Inactive
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6230 October 5, 2021 10:26 Inactive
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6230 October 5, 2021 10:30 Inactive
posthog/api/capture.py Outdated Show resolved Hide resolved
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6230 October 5, 2021 10:39 Inactive
@yakkomajuri yakkomajuri changed the title WIP Capture Endpoint Refactor + Dead Letter Queue Capture Endpoint Refactor + Dead Letter Queue Oct 5, 2021
"You need to set user distinct ID field `distinct_id`.", code="required", attr="distinct_id"
),
)
# An invalid distinct ID will not return an error code, instead we will capture an exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see. Is this a change that might need coordination with SDKs, docs and so on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't need coordination with SDKs, docs might be a good addition.

While this sounds like a big change (our most important API behaves differently after this), it's not that bad when you consider that this was actually a bug.

Currently, if you send us 999 valid events, and the last event is an invalid one, we'll send you back a 4xx response. That might lead a client to retry sending these (think platforms like RudderStack or Segment), leading to a lot of duplicates.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, none of our SDKs rely on these particular errors. But indeed it should be made clear on the docs that if you send us a batch of events we will not tell you if some events in there are invalid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a start on docs: PostHog/posthog.com#2178

@yakkomajuri
Copy link
Collaborator Author

The whole platform team is tagged to review just so people have context. But tagged you @Twixes for quickly batting an eye just because you've worked a lot with ingestion/our API in general and would be important to have this change be well-reviewed

Copy link
Contributor

@tiina303 tiina303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't review thoroughly yet

posthog/api/capture.py Show resolved Hide resolved
posthog/api/capture.py Outdated Show resolved Hide resolved
posthog/api/capture.py Outdated Show resolved Hide resolved
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6230 October 7, 2021 11:27 Inactive
@yakkomajuri
Copy link
Collaborator Author

A lot of tests/improvements incoming

@yakkomajuri yakkomajuri requested a deployment to posthog-pr-6230 October 7, 2021 14:08 Abandoned
@yakkomajuri
Copy link
Collaborator Author

You mean manually testing it? @guidoiaquinti

@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6230 October 7, 2021 14:37 Inactive
@yakkomajuri yakkomajuri changed the title Capture Endpoint Refactor + Dead Letter Queue Capture Endpoint Refactor Oct 7, 2021
@yakkomajuri
Copy link
Collaborator Author

@guidoiaquinti @tiina303 I made this much easier to review by splitting the DLQ code into another PR with this as its base (#6311). That should be merged before this.

This PR now is very limited in behavior changes and is mostly a refactor with some tests.

The key behavioral change is not sending errors to the client on an invalid event, and instead processing the rest of the batch. That's like a few lines. The rest is moving stuff around and adding more tests.

Copy link
Contributor

@tiina303 tiina303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work & thanks for breaking it up & making the PRs smaller. Much easier to review this way =)

posthog/api/utils.py Outdated Show resolved Hide resolved
posthog/api/utils.py Outdated Show resolved Hide resolved
posthog/api/utils.py Outdated Show resolved Hide resolved
posthog/api/utils.py Outdated Show resolved Hide resolved
posthog/api/decide.py Outdated Show resolved Hide resolved
posthog/api/utils.py Outdated Show resolved Hide resolved
posthog/api/utils.py Outdated Show resolved Hide resolved
posthog/api/capture.py Outdated Show resolved Hide resolved
posthog/api/test/test_capture.py Show resolved Hide resolved
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6230 October 8, 2021 10:45 Inactive
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6230 October 8, 2021 10:54 Inactive
@yakkomajuri
Copy link
Collaborator Author

all addressed, thanks a lot!

Will check that tests pass and resolve the conflicts with #6311 so we can merge that here and get this in early next week (when we can keep a close eye on it)

* Add dead letter queue to capture endpoint

This reverts commit 34e7441.

* try except kafka produce

* format

* statsd + log for dlq produce error

* fix types

* fix

* fix

* legit fix

* ignore type
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6230 October 8, 2021 15:48 Inactive
@yakkomajuri
Copy link
Collaborator Author

Ok so we're back to one PR now. @tiina303 maybe give this a final look focusing only on any potential serious bugs/holes. Anything stylistic we can fix elsewhere - the scope is already much larger than what we started with.

Copy link
Contributor

@tiina303 tiina303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, added two nits, but feel free to ignore or consider doing in a follow-up.

posthog/api/capture.py Show resolved Hide resolved
Comment on lines +154 to +155
if not is_clickhouse_enabled():
error_response = cors_response(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd drop this condition (since it's not this function's job to decide if it's used or not, later we might do something different for single capture vs batch for example) & in the usage part check

if not team and not send_to_dead_letter_queue:
  return error_response

@tiina303
Copy link
Contributor

tiina303 commented Oct 8, 2021

In the future I'd recommend doing it the other way - land this one & then rebase the other pr on top of master & then land it ... smaller changes ... easier to revert.

@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6230 October 11, 2021 09:10 Inactive
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6230 October 11, 2021 10:02 Inactive
@yakkomajuri yakkomajuri temporarily deployed to posthog-pr-6230 October 11, 2021 10:06 Inactive
@yakkomajuri yakkomajuri merged commit 3ec5cc0 into master Oct 11, 2021
@yakkomajuri yakkomajuri deleted the capture-refactor-plus-dlq branch October 11, 2021 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants