Skip to content

feat(oauth): log scope-ceiling rejections at /authorize#61216

Open
MattBro wants to merge 3 commits into
masterfrom
matt/oauth-ceiling-reject-log
Open

feat(oauth): log scope-ceiling rejections at /authorize#61216
MattBro wants to merge 3 commits into
masterfrom
matt/oauth-ceiling-reject-log

Conversation

@MattBro
Copy link
Copy Markdown
Contributor

@MattBro MattBro commented Jun 2, 2026

Problem

The per-app OAuth scope ceiling (#60477) enforces, at /authorize, that a client may only be granted scopes within its OAuthApplication.scopes set. When a request asks for a scope outside that ceiling, OAuthValidator.validate_scopes returns False, and oauthlib turns that into a 302 redirect carrying error=invalid_scope.

That reject path is silent: no logger call, no capture_exception, and a 302 is not a 4xx/5xx, so error-rate and APM dashboards miss it too. The failure mode this hides is a first-party app whose scopes ceiling is empty or only partially seeded. Its OAuth clients (for example the setup wizard, which requests llm_gateway:read among others) begin failing /authorize with invalid_scope, users can't complete login, and nothing server-side records which client requested which scope.

This is the observability gap behind a real regression: after the ceiling enforcement shipped, a first-party client's logins broke with invalid_scope and there was no server-side trace of the rejection. The only signals were the client's own exception telemetry (fragmented across install paths, unalerted) and users noticing.

Changes

  • posthog/api/oauth/views.py: in validate_scopes, emit a structured logger.warning("oauth_scope_ceiling_rejected", ...) on the reject branch, carrying client_id, is_first_party, the requested out-of-ceiling scopes, and the app's effective ceiling. The boolean resolution is byte-for-byte unchanged; this only adds a log side-effect on the existing False path.
  • posthog/api/oauth/test_views.py: asserts the event fires once with the expected fields when a scope is rejected.

is_first_party is included so a downstream alert can page only on first-party rejections, which are near-zero baseline and almost always a misconfiguration (an unseeded or partially-seeded ceiling). Third-party clients over-asking is expected and logs at is_first_party=false.

Follow-ups

  • A log alert on oauth_scope_ceiling_rejected filtered to is_first_party=true, created once this deploys (the event has to exist before there's anything to alert on). That alert is what turns this from "users report it" into "we're paged at deploy time."
  • Client-side telemetry enrichment + issue grouping for the wizard: feat: enrich oauth login failure telemetry for diagnosis wizard#501

How did you test this code?

I'm an agent (Claude). Automated only:

  • test_authorize_rejection_emits_ceiling_log patches the module logger and asserts a single oauth_scope_ceiling_rejected warning with the expected client_id / is_first_party / requested / ceiling. It ran green in the Django CI matrix.
  • ruff check and ruff format clean on both files.
  • I could not run the Django test harness locally in this environment; the backend suite (including the new test) ran in CI.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Docs update

skip-inkeep-docs — no user-facing docs change.

🤖 Agent context

Authored by Claude Code (Opus 4.8). Follow-up observability on the per-app scope ceiling (#60477): the reject path in validate_scopes was silent, so a misconfigured first-party ceiling produced invalid_scope with no server-side signal. Chose a single logger.warning on the existing False branch (structlog, matching the other logger.warning calls in this file) over capture_exception, since the rejection is a normal 302 redirect rather than a 4xx/5xx error path. Boolean resolution is unchanged; only the log was added.

The per-app scope ceiling rejects out-of-ceiling requests by returning
False, which oauthlib turns into a 302 with error=invalid_scope. That
path emitted no log or capture, so a misconfigured first-party app
(empty/partial ceiling) failed silently. Emit a structured warning
carrying client_id, is_first_party, and the requested vs allowed sets so
a log alert can page on first-party rejections.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@MattBro MattBro added the skip-inkeep-docs Use this label to skip an Inkeep docs PR in posthog.com label Jun 2, 2026
@MattBro MattBro requested review from a team, fercgomes and rafaeelaudibert and removed request for a team June 2, 2026 17:42
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@MattBro MattBro marked this pull request as ready for review June 2, 2026 18:10
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 2, 2026

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
posthog/api/oauth/test_views.py:2290-2305
**Prefer a parameterised test to cover both rejection paths**

The new test only exercises the `has_ceiling=True` branch. The `has_ceiling=False` branch (no app ceiling, client requests a privileged or wildcard scope) also reaches the `logger.warning` call and is worth asserting in the same spirit. Following the team's preference for parameterised tests, both paths could live under a single `@pytest.mark.parametrize` (or `subTest`): one case with a ceiling set where the requested scope falls outside it, and one with no ceiling where a privileged scope is requested.

Reviews (1): Last reviewed commit: "chore: ruff format the ceiling-rejection..." | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

🎭 Playwright report · View test results →

⚠️ 1 flaky test:

  • Change date range and toggle comparison (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

Comment thread posthog/api/oauth/test_views.py Outdated
Comment on lines +2290 to +2305
def test_authorize_rejection_emits_ceiling_log(self):
self._set_ceiling("experiment:read")
with patch("posthog.api.oauth.views.logger") as mock_logger:
response = self.client.get(f"{self.base_authorization_url}&scope=experiment:write")
self.assertEqual(response.status_code, status.HTTP_302_FOUND)
rejection_calls = [
call
for call in mock_logger.warning.call_args_list
if call.args and call.args[0] == "oauth_scope_ceiling_rejected"
]
self.assertEqual(len(rejection_calls), 1)
kwargs = rejection_calls[0].kwargs
self.assertEqual(kwargs["client_id"], "test_confidential_client_id")
self.assertEqual(kwargs["is_first_party"], self.confidential_application.is_first_party)
self.assertEqual(kwargs["requested"], ["experiment:write"])
self.assertEqual(kwargs["ceiling"], ["experiment:read"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Prefer a parameterised test to cover both rejection paths

The new test only exercises the has_ceiling=True branch. The has_ceiling=False branch (no app ceiling, client requests a privileged or wildcard scope) also reaches the logger.warning call and is worth asserting in the same spirit. Following the team's preference for parameterised tests, both paths could live under a single @pytest.mark.parametrize (or subTest): one case with a ceiling set where the requested scope falls outside it, and one with no ceiling where a privileged scope is requested.

Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/api/oauth/test_views.py
Line: 2290-2305

Comment:
**Prefer a parameterised test to cover both rejection paths**

The new test only exercises the `has_ceiling=True` branch. The `has_ceiling=False` branch (no app ceiling, client requests a privileged or wildcard scope) also reaches the `logger.warning` call and is worth asserting in the same spirit. Following the team's preference for parameterised tests, both paths could live under a single `@pytest.mark.parametrize` (or `subTest`): one case with a ceiling set where the requested scope falls outside it, and one with no ceiling where a privileged scope is requested.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, but it's a nit

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterized it. Covers the empty-ceiling reject path now too: 7d74e35

Comment thread posthog/api/oauth/test_views.py Outdated
Comment on lines +2290 to +2305
def test_authorize_rejection_emits_ceiling_log(self):
self._set_ceiling("experiment:read")
with patch("posthog.api.oauth.views.logger") as mock_logger:
response = self.client.get(f"{self.base_authorization_url}&scope=experiment:write")
self.assertEqual(response.status_code, status.HTTP_302_FOUND)
rejection_calls = [
call
for call in mock_logger.warning.call_args_list
if call.args and call.args[0] == "oauth_scope_ceiling_rejected"
]
self.assertEqual(len(rejection_calls), 1)
kwargs = rejection_calls[0].kwargs
self.assertEqual(kwargs["client_id"], "test_confidential_client_id")
self.assertEqual(kwargs["is_first_party"], self.confidential_application.is_first_party)
self.assertEqual(kwargs["requested"], ["experiment:write"])
self.assertEqual(kwargs["ceiling"], ["experiment:read"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, but it's a nit

Parameterize the test so it asserts the oauth_scope_ceiling_rejected log
fires on both branches: a set ceiling with an out-of-ceiling scope, and an
empty ceiling with a privileged scope excluded by the broad default.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

ClickHouse migration SQL per cloud environment

  • unset
    • all
      CREATE OR REPLACE VIEW persons_batch_export ON CLUSTER posthog AS (
          with new_persons as (
              select
                  id,
                  max(version) as version,
                  argMax(_timestamp, person.version) AS _timestamp2
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in (
                      select
                          id
                      from
                          person
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  id
              having
                  (
                      _timestamp2 >= {interval_start:DateTime64}
                      AND _timestamp2 < {interval_end:DateTime64}
                  )
          ),
          new_distinct_ids as (
              SELECT
                  argMax(person_id, person_distinct_id2.version) as person_id
              from
                  person_distinct_id2
              where
                  team_id = {team_id:Int64}
                  and distinct_id in (
                      select
                          distinct_id
                      from
                          person_distinct_id2
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  distinct_id
              having
                  (
                      argMax(_timestamp, person_distinct_id2.version) >= {interval_start:DateTime64}
                      AND argMax(_timestamp, person_distinct_id2.version) < {interval_end:DateTime64}
                  )
          ),
          all_new_persons as (
              select
                  id,
                  version
              from
                  new_persons
              UNION
              ALL
              select
                  id,
                  max(version)
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in new_distinct_ids
              group by
                  id
          )
          select
              p.team_id AS team_id,
              pd.distinct_id AS distinct_id,
              toString(p.id) AS person_id,
              p.properties AS properties,
              pd.version AS person_distinct_id_version,
              p.version AS person_version,
              p.created_at AS created_at,
              multiIf(
                  (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  ),
                  pd._timestamp,
                  (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  ),
                  p._timestamp,
                  least(p._timestamp, pd._timestamp)
              ) AS _inserted_at
          from
              person p
              INNER JOIN (
                  SELECT
                      distinct_id,
                      max(version) AS version,
                      argMax(person_id, person_distinct_id2.version) AS person_id2,
                      argMax(_timestamp, person_distinct_id2.version) AS _timestamp
                  FROM
                      person_distinct_id2
                  WHERE
                      team_id = {team_id:Int64}
                      and person_id IN (
                          select
                              id
                          from
                              all_new_persons
                      )
                  GROUP BY
                      distinct_id
              ) AS pd ON p.id = pd.person_id2
          where
              team_id = {team_id:Int64}
              and (id, version) in all_new_persons
          ORDER BY
              _inserted_at
      )
      CREATE OR REPLACE VIEW events_batch_export ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events.distinct_id), cityHash64(events.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              COALESCE(inserted_at, _timestamp) AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events
          PREWHERE
              COALESCE(events.inserted_at, events._timestamp) >= {interval_start:DateTime64}
              AND COALESCE(events.inserted_at, events._timestamp) < {interval_end:DateTime64}
          WHERE
              team_id = {team_id:Int64}
              AND events.timestamp >= {interval_start:DateTime64} - INTERVAL {lookback_days:Int32} DAY
              AND events.timestamp < {interval_end:DateTime64} + INTERVAL 1 DAY
              AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
              AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
          ORDER BY
              _inserted_at, event
          SETTINGS optimize_aggregation_in_order=1
      )
      CREATE OR REPLACE VIEW events_batch_export_unbounded ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events.distinct_id), cityHash64(events.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              COALESCE(inserted_at, _timestamp) AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events
          PREWHERE
              COALESCE(events.inserted_at, events._timestamp) >= {interval_start:DateTime64}
              AND COALESCE(events.inserted_at, events._timestamp) < {interval_end:DateTime64}
          WHERE
              team_id = {team_id:Int64}
              AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
              AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
          ORDER BY
              _inserted_at, event
          SETTINGS optimize_aggregation_in_order=1
      )
      CREATE OR REPLACE VIEW events_batch_export_backfill ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events.distinct_id), cityHash64(events.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              timestamp AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events
          WHERE
              team_id = {team_id:Int64}
              AND events.timestamp >= {interval_start:DateTime64}
              AND events.timestamp < {interval_end:DateTime64}
              AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
              AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
          ORDER BY
              _inserted_at, event
          SETTINGS optimize_aggregation_in_order=1
      )
      CREATE OR REPLACE VIEW persons_batch_export ON CLUSTER posthog AS (
          with new_persons as (
              select
                  id,
                  max(version) as version,
                  argMax(_timestamp, person.version) AS _timestamp2
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in (
                      select
                          id
                      from
                          person
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  id
              having
                  (
                      _timestamp2 >= {interval_start:DateTime64}
                      AND _timestamp2 < {interval_end:DateTime64}
                  )
          ),
          new_distinct_ids as (
              SELECT
                  argMax(person_id, person_distinct_id2.version) as person_id
              from
                  person_distinct_id2
              where
                  team_id = {team_id:Int64}
                  and distinct_id in (
                      select
                          distinct_id
                      from
                          person_distinct_id2
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  distinct_id
              having
                  (
                      argMax(_timestamp, person_distinct_id2.version) >= {interval_start:DateTime64}
                      AND argMax(_timestamp, person_distinct_id2.version) < {interval_end:DateTime64}
                  )
          ),
          all_new_persons as (
              select
                  id,
                  version
              from
                  new_persons
              UNION
              ALL
              select
                  id,
                  max(version)
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in new_distinct_ids
              group by
                  id
          )
          select
              p.team_id AS team_id,
              pd.distinct_id AS distinct_id,
              toString(p.id) AS person_id,
              p.properties AS properties,
              pd.version AS person_distinct_id_version,
              p.version AS person_version,
              p.created_at AS created_at,
              multiIf(
                  (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  ),
                  pd._timestamp,
                  (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  ),
                  p._timestamp,
                  least(p._timestamp, pd._timestamp)
              ) AS _inserted_at
          from
              person p
              INNER JOIN (
                  SELECT
                      distinct_id,
                      max(version) AS version,
                      argMax(person_id, person_distinct_id2.version) AS person_id2,
                      argMax(_timestamp, person_distinct_id2.version) AS _timestamp
                  FROM
                      person_distinct_id2
                  WHERE
                      team_id = {team_id:Int64}
                      and person_id IN (
                          select
                              id
                          from
                              all_new_persons
                      )
                  GROUP BY
                      distinct_id
              ) AS pd ON p.id = pd.person_id2
          where
              team_id = {team_id:Int64}
              and (id, version) in all_new_persons
          ORDER BY
              _inserted_at
      )
      CREATE OR REPLACE VIEW events_batch_export ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events.distinct_id), cityHash64(events.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              COALESCE(inserted_at, _timestamp) AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events
          PREWHERE
              COALESCE(events.inserted_at, events._timestamp) >= {interval_start:DateTime64}
              AND COALESCE(events.inserted_at, events._timestamp) < {interval_end:DateTime64}
          WHERE
              team_id = {team_id:Int64}
              AND events.timestamp >= {interval_start:DateTime64} - INTERVAL {lookback_days:Int32} DAY
              AND events.timestamp < {interval_end:DateTime64} + INTERVAL 1 DAY
              AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
              AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
          ORDER BY
              _inserted_at, event
          SETTINGS optimize_aggregation_in_order=1
      )
      CREATE OR REPLACE VIEW events_batch_export_unbounded ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events.distinct_id), cityHash64(events.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              COALESCE(inserted_at, _timestamp) AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events
          PREWHERE
              COALESCE(events.inserted_at, events._timestamp) >= {interval_start:DateTime64}
              AND COALESCE(events.inserted_at, events._timestamp) < {interval_end:DateTime64}
          WHERE
              team_id = {team_id:Int64}
              AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
              AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
          ORDER BY
              _inserted_at, event
          SETTINGS optimize_aggregation_in_order=1
      )
      CREATE OR REPLACE VIEW events_batch_export_backfill ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events.distinct_id), cityHash64(events.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              timestamp AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events
          WHERE
              team_id = {team_id:Int64}
              AND events.timestamp >= {interval_start:DateTime64}
              AND events.timestamp < {interval_end:DateTime64}
              AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
              AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
          ORDER BY
              _inserted_at, event
          SETTINGS optimize_aggregation_in_order=1
      )
      CREATE OR REPLACE VIEW persons_batch_export ON CLUSTER posthog AS (
          with new_persons as (
              select
                  id,
                  max(version) as version,
                  argMax(_timestamp, person.version) AS _timestamp2
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in (
                      select
                          id
                      from
                          person
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  id
              having
                  (
                      _timestamp2 >= {interval_start:DateTime64}
                      AND _timestamp2 < {interval_end:DateTime64}
                  )
          ),
          new_distinct_ids as (
              SELECT
                  argMax(person_id, person_distinct_id2.version) as person_id
              from
                  person_distinct_id2
              where
                  team_id = {team_id:Int64}
                  and distinct_id in (
                      select
                          distinct_id
                      from
                          person_distinct_id2
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  distinct_id
              having
                  (
                      argMax(_timestamp, person_distinct_id2.version) >= {interval_start:DateTime64}
                      AND argMax(_timestamp, person_distinct_id2.version) < {interval_end:DateTime64}
                  )
          ),
          all_new_persons as (
              select
                  id,
                  version
              from
                  new_persons
              UNION
              ALL
              select
                  id,
                  max(version)
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in new_distinct_ids
              group by
                  id
          )
          select
              p.team_id AS team_id,
              pd.distinct_id AS distinct_id,
              toString(p.id) AS person_id,
              p.properties AS properties,
              pd.version AS person_distinct_id_version,
              p.version AS person_version,
              p.created_at AS created_at,
              multiIf(
                  (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  ),
                  pd._timestamp,
                  (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  ),
                  p._timestamp,
                  least(p._timestamp, pd._timestamp)
              ) AS _inserted_at
          from
              person p
              INNER JOIN (
                  SELECT
                      distinct_id,
                      max(version) AS version,
                      argMax(person_id, person_distinct_id2.version) AS person_id2,
                      argMax(_timestamp, person_distinct_id2.version) AS _timestamp
                  FROM
                      person_distinct_id2
                  WHERE
                      team_id = {team_id:Int64}
                      and person_id IN (
                          select
                              id
                          from
                              all_new_persons
                      )
                  GROUP BY
                      distinct_id
              ) AS pd ON p.id = pd.person_id2
          where
              team_id = {team_id:Int64}
              and (id, version) in all_new_persons
          ORDER BY
              _inserted_at
      )
      CREATE OR REPLACE VIEW persons_batch_export ON CLUSTER posthog AS (
          with new_persons as (
              select
                  id,
                  max(version) as version,
                  argMax(_timestamp, person.version) AS _timestamp2
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in (
                      select
                          id
                      from
                          person
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  id
              having
                  (
                      _timestamp2 >= {interval_start:DateTime64}
                      AND _timestamp2 < {interval_end:DateTime64}
                  )
          ),
          new_distinct_ids as (
              SELECT
                  argMax(person_id, person_distinct_id2.version) as person_id
              from
                  person_distinct_id2
              where
                  team_id = {team_id:Int64}
                  and distinct_id in (
                      select
                          distinct_id
                      from
                          person_distinct_id2
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  distinct_id
              having
                  (
                      argMax(_timestamp, person_distinct_id2.version) >= {interval_start:DateTime64}
                      AND argMax(_timestamp, person_distinct_id2.version) < {interval_end:DateTime64}
                  )
          ),
          all_new_persons as (
              select
                  id,
                  version
              from
                  new_persons
              UNION
              ALL
              select
                  id,
                  max(version)
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in new_distinct_ids
              group by
                  id
          )
          select
              p.team_id AS team_id,
              pd.distinct_id AS distinct_id,
              toString(p.id) AS person_id,
              p.properties AS properties,
              pd.version AS person_distinct_id_version,
              p.version AS person_version,
              p.created_at AS created_at,
              multiIf(
                  (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  ),
                  pd._timestamp,
                  (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  ),
                  p._timestamp,
                  least(p._timestamp, pd._timestamp)
              ) AS _inserted_at
          from
              person p
              INNER JOIN (
                  SELECT
                      distinct_id,
                      max(version) AS version,
                      argMax(person_id, person_distinct_id2.version) AS person_id2,
                      argMax(_timestamp, person_distinct_id2.version) AS _timestamp
                  FROM
                      person_distinct_id2
                  WHERE
                      team_id = {team_id:Int64}
                      and person_id IN (
                          select
                              id
                          from
                              all_new_persons
                      )
                  GROUP BY
                      distinct_id
              ) AS pd ON p.id = pd.person_id2
          where
              team_id = {team_id:Int64}
              and (id, version) in all_new_persons
          ORDER BY
              _inserted_at
      )
      CREATE OR REPLACE VIEW events_batch_export_backfill ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events.distinct_id), cityHash64(events.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              timestamp AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events
          WHERE
              team_id = {team_id:Int64}
              AND events.timestamp >= {interval_start:DateTime64}
              AND events.timestamp < {interval_end:DateTime64}
              AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
              AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
          ORDER BY
              _inserted_at, event
          SETTINGS optimize_aggregation_in_order=1
      )
      CREATE OR REPLACE VIEW persons_batch_export_backfill ON CLUSTER posthog AS (
          SELECT
              pd.team_id AS team_id,
              pd.distinct_id AS distinct_id,
              toString(p.id) AS person_id,
              p.properties AS properties,
              pd.version AS person_distinct_id_version,
              p.version AS person_version,
              p.created_at AS created_at,
              multiIf(
                  pd._timestamp < {interval_end:DateTime64}
                      AND NOT p._timestamp < {interval_end:DateTime64},
                  pd._timestamp,
                  p._timestamp < {interval_end:DateTime64}
                      AND NOT pd._timestamp < {interval_end:DateTime64},
                  p._timestamp,
                  least(p._timestamp, pd._timestamp)
              ) AS _inserted_at
          FROM (
              SELECT
                  team_id,
                  distinct_id,
                  max(version) AS version,
                  argMax(person_id, person_distinct_id2.version) AS person_id,
                  argMax(_timestamp, person_distinct_id2.version) AS _timestamp
              FROM
                  person_distinct_id2
              PREWHERE
                  team_id = {team_id:Int64}
              GROUP BY
                  team_id,
                  distinct_id
          ) AS pd
          INNER JOIN (
              SELECT
                  team_id,
                  id,
                  max(version) AS version,
                  argMax(properties, person.version) AS properties,
                  argMax(created_at, person.version) AS created_at,
                  argMax(_timestamp, person.version) AS _timestamp
              FROM
                  person
              PREWHERE
                  team_id = {team_id:Int64}
              GROUP BY
                  team_id,
                  id
          ) AS p ON p.id = pd.person_id AND p.team_id = pd.team_id
          WHERE
              pd.team_id = {team_id:Int64}
              AND p.team_id = {team_id:Int64}
              AND (
                  pd._timestamp < {interval_end:DateTime64}
                  OR p._timestamp < {interval_end:DateTime64}
              )
          ORDER BY
              _inserted_at
      )
      CREATE OR REPLACE VIEW persons_batch_export ON CLUSTER posthog AS (
          with new_persons as (
              select
                  id,
                  max(version) as version,
                  argMax(_timestamp, person.version) AS _timestamp2
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in (
                      select
                          id
                      from
                          person
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  id
              having
                  (
                      _timestamp2 >= {interval_start:DateTime64}
                      AND _timestamp2 < {interval_end:DateTime64}
                  )
          ),
          new_distinct_ids as (
              SELECT
                  argMax(person_id, person_distinct_id2.version) as person_id
              from
                  person_distinct_id2
              where
                  team_id = {team_id:Int64}
                  and distinct_id in (
                      select
                          distinct_id
                      from
                          person_distinct_id2
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  distinct_id
              having
                  (
                      argMax(_timestamp, person_distinct_id2.version) >= {interval_start:DateTime64}
                      AND argMax(_timestamp, person_distinct_id2.version) < {interval_end:DateTime64}
                  )
          ),
          all_new_persons as (
              select
                  id,
                  version
              from
                  new_persons
              UNION
              ALL
              select
                  id,
                  max(version)
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in new_distinct_ids
              group by
                  id
          )
          select
              p.team_id AS team_id,
              pd.distinct_id AS distinct_id,
              toString(p.id) AS person_id,
              p.properties AS properties,
              pd.version AS person_distinct_id_version,
              p.version AS person_version,
              p.created_at AS created_at,
              multiIf(
                  (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  ),
                  pd._timestamp,
                  (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  ),
                  p._timestamp,
                  least(p._timestamp, pd._timestamp)
              ) AS _inserted_at
          from
              person p
              INNER JOIN (
                  SELECT
                      distinct_id,
                      max(version) AS version,
                      argMax(person_id, person_distinct_id2.version) AS person_id2,
                      argMax(_timestamp, person_distinct_id2.version) AS _timestamp
                  FROM
                      person_distinct_id2
                  WHERE
                      team_id = {team_id:Int64}
                      and person_id IN (
                          select
                              id
                          from
                              all_new_persons
                      )
                  GROUP BY
                      distinct_id
              ) AS pd ON p.id = pd.person_id2
          where
              team_id = {team_id:Int64}
              and (id, version) in all_new_persons
          ORDER BY
              _inserted_at
      )
      CREATE OR REPLACE VIEW persons_batch_export_backfill ON CLUSTER posthog AS (
          SELECT
              pd.team_id AS team_id,
              pd.distinct_id AS distinct_id,
              toString(p.id) AS person_id,
              p.properties AS properties,
              pd.version AS person_distinct_id_version,
              p.version AS person_version,
              p.created_at AS created_at,
              multiIf(
                  pd._timestamp < {interval_end:DateTime64}
                      AND NOT p._timestamp < {interval_end:DateTime64},
                  pd._timestamp,
                  p._timestamp < {interval_end:DateTime64}
                      AND NOT pd._timestamp < {interval_end:DateTime64},
                  p._timestamp,
                  least(p._timestamp, pd._timestamp)
              ) AS _inserted_at
          FROM (
              SELECT
                  team_id,
                  distinct_id,
                  max(version) AS version,
                  argMax(person_id, person_distinct_id2.version) AS person_id,
                  argMax(_timestamp, person_distinct_id2.version) AS _timestamp
              FROM
                  person_distinct_id2
              PREWHERE
                  team_id = {team_id:Int64}
              GROUP BY
                  team_id,
                  distinct_id
          ) AS pd
          INNER JOIN (
              SELECT
                  team_id,
                  id,
                  max(version) AS version,
                  argMax(properties, person.version) AS properties,
                  argMax(created_at, person.version) AS created_at,
                  argMax(_timestamp, person.version) AS _timestamp
              FROM
                  person
              PREWHERE
                  team_id = {team_id:Int64}
              GROUP BY
                  team_id,
                  id
          ) AS p ON p.id = pd.person_id AND p.team_id = pd.team_id
          WHERE
              pd.team_id = {team_id:Int64}
              AND p.team_id = {team_id:Int64}
              AND (
                  pd._timestamp < {interval_end:DateTime64}
                  OR p._timestamp < {interval_end:DateTime64}
              )
          ORDER BY
              _inserted_at
      )
      CREATE OR REPLACE VIEW persons_batch_export ON CLUSTER posthog AS (
          with new_persons as (
              select
                  id,
                  max(version) as version,
                  argMax(_timestamp, person.version) AS _timestamp2
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in (
                      select
                          id
                      from
                          person
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  id
              having
                  (
                      _timestamp2 >= {interval_start:DateTime64}
                      AND _timestamp2 < {interval_end:DateTime64}
                  )
          ),
          new_distinct_ids as (
              SELECT
                  argMax(person_id, person_distinct_id2.version) as person_id
              from
                  person_distinct_id2
              where
                  team_id = {team_id:Int64}
                  and distinct_id in (
                      select
                          distinct_id
                      from
                          person_distinct_id2
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  distinct_id
              having
                  (
                      argMax(_timestamp, person_distinct_id2.version) >= {interval_start:DateTime64}
                      AND argMax(_timestamp, person_distinct_id2.version) < {interval_end:DateTime64}
                  )
          ),
          all_new_persons as (
              select
                  id,
                  version
              from
                  new_persons
              UNION
              ALL
              select
                  id,
                  max(version)
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in new_distinct_ids
              group by
                  id
          )
          select
              p.team_id AS team_id,
              pd.distinct_id AS distinct_id,
              toString(p.id) AS person_id,
              p.properties AS properties,
              pd.version AS person_distinct_id_version,
              p.version AS person_version,
              p.created_at AS created_at,
              multiIf(
                  (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  ),
                  pd._timestamp,
                  (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  ),
                  p._timestamp,
                  least(p._timestamp, pd._timestamp)
              ) AS _inserted_at
          from
              person p
              INNER JOIN (
                  SELECT
                      distinct_id,
                      max(version) AS version,
                      argMax(person_id, person_distinct_id2.version) AS person_id2,
                      argMax(_timestamp, person_distinct_id2.version) AS _timestamp
                  FROM
                      person_distinct_id2
                  WHERE
                      team_id = {team_id:Int64}
                      and person_id IN (
                          select
                              id
                          from
                              all_new_persons
                      )
                  GROUP BY
                      distinct_id
              ) AS pd ON p.id = pd.person_id2
          where
              team_id = {team_id:Int64}
              and (id, version) in all_new_persons
          ORDER BY
              _inserted_at
      )
      CREATE OR REPLACE VIEW events_batch_export_recent ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events_recent.distinct_id), cityHash64(events_recent.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              inserted_at AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events_recent
          PREWHERE
              events_recent.inserted_at >= {interval_start:DateTime64}
              AND events_recent.inserted_at < {interval_end:DateTime64}
          WHERE
              team_id = {team_id:Int64}
              AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
              AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
          ORDER BY
              _inserted_at, event
          SETTINGS optimize_aggregation_in_order=1
      )
      CREATE OR REPLACE VIEW persons_batch_export ON CLUSTER posthog AS (
          with new_persons as (
              select
                  id,
                  max(version) as version,
                  argMax(_timestamp, person.version) AS _timestamp2
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in (
                      select
                          id
                      from
                          person
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  id
              having
                  (
                      _timestamp2 >= {interval_start:DateTime64}
                      AND _timestamp2 < {interval_end:DateTime64}
                  )
          ),
          new_distinct_ids as (
              SELECT
                  argMax(person_id, person_distinct_id2.version) as person_id
              from
                  person_distinct_id2
              where
                  team_id = {team_id:Int64}
                  and distinct_id in (
                      select
                          distinct_id
                      from
                          person_distinct_id2
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  distinct_id
              having
                  (
                      argMax(_timestamp, person_distinct_id2.version) >= {interval_start:DateTime64}
                      AND argMax(_timestamp, person_distinct_id2.version) < {interval_end:DateTime64}
                  )
          ),
          all_new_persons as (
              select
                  id,
                  version
              from
                  new_persons
              UNION
              ALL
              select
                  id,
                  max(version)
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in new_distinct_ids
              group by
                  id
          )
          select
              p.team_id AS team_id,
              pd.distinct_id AS distinct_id,
              toString(p.id) AS person_id,
              p.properties AS properties,
              pd.version AS person_distinct_id_version,
              p.version AS person_version,
              p.created_at AS created_at,
              multiIf(
                  (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  ),
                  pd._timestamp,
                  (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  ),
                  p._timestamp,
                  least(p._timestamp, pd._timestamp)
              ) AS _inserted_at
          from
              person p
              INNER JOIN (
                  SELECT
                      distinct_id,
                      max(version) AS version,
                      argMax(person_id, person_distinct_id2.version) AS person_id2,
                      argMax(_timestamp, person_distinct_id2.version) AS _timestamp
                  FROM
                      person_distinct_id2
                  WHERE
                      team_id = {team_id:Int64}
                      and person_id IN (
                          select
                              id
                          from
                              all_new_persons
                      )
                  GROUP BY
                      distinct_id
              ) AS pd ON p.id = pd.person_id2
          where
              team_id = {team_id:Int64}
              and (id, version) in all_new_persons
          ORDER BY
              _inserted_at
      )
      CREATE OR REPLACE VIEW persons_batch_export_backfill ON CLUSTER posthog AS (
          SELECT
              pd.team_id AS team_id,
              pd.distinct_id AS distinct_id,
              toString(p.id) AS person_id,
              p.properties AS properties,
              pd.version AS person_distinct_id_version,
              p.version AS person_version,
              p.created_at AS created_at,
              multiIf(
                  pd._timestamp < {interval_end:DateTime64}
                      AND NOT p._timestamp < {interval_end:DateTime64},
                  pd._timestamp,
                  p._timestamp < {interval_end:DateTime64}
                      AND NOT pd._timestamp < {interval_end:DateTime64},
                  p._timestamp,
                  least(p._timestamp, pd._timestamp)
              ) AS _inserted_at
          FROM (
              SELECT
                  team_id,
                  distinct_id,
                  max(version) AS version,
                  argMax(person_id, person_distinct_id2.version) AS person_id,
                  argMax(_timestamp, person_distinct_id2.version) AS _timestamp
              FROM
                  person_distinct_id2
              PREWHERE
                  team_id = {team_id:Int64}
              GROUP BY
                  team_id,
                  distinct_id
          ) AS pd
          INNER JOIN (
              SELECT
                  team_id,
                  id,
                  max(version) AS version,
                  argMax(properties, person.version) AS properties,
                  argMax(created_at, person.version) AS created_at,
                  argMax(_timestamp, person.version) AS _timestamp
              FROM
                  person
              PREWHERE
                  team_id = {team_id:Int64}
              GROUP BY
                  team_id,
                  id
          ) AS p ON p.id = pd.person_id AND p.team_id = pd.team_id
          WHERE
              pd.team_id = {team_id:Int64}
              AND p.team_id = {team_id:Int64}
              AND (
                  pd._timestamp < {interval_end:DateTime64}
                  OR p._timestamp < {interval_end:DateTime64}
              )
          ORDER BY
              _inserted_at
      )
  • US, EU, DEV
    • data
      CREATE OR REPLACE VIEW persons_batch_export ON CLUSTER posthog AS (
          with new_persons as (
              select
                  id,
                  max(version) as version,
                  argMax(_timestamp, person.version) AS _timestamp2
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in (
                      select
                          id
                      from
                          person
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  id
              having
                  (
                      _timestamp2 >= {interval_start:DateTime64}
                      AND _timestamp2 < {interval_end:DateTime64}
                  )
          ),
          new_distinct_ids as (
              SELECT
                  argMax(person_id, person_distinct_id2.version) as person_id
              from
                  person_distinct_id2
              where
                  team_id = {team_id:Int64}
                  and distinct_id in (
                      select
                          distinct_id
                      from
                          person_distinct_id2
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  distinct_id
              having
                  (
                      argMax(_timestamp, person_distinct_id2.version) >= {interval_start:DateTime64}
                      AND argMax(_timestamp, person_distinct_id2.version) < {interval_end:DateTime64}
                  )
          ),
          all_new_persons as (
              select
                  id,
                  version
              from
                  new_persons
              UNION
              ALL
              select
                  id,
                  max(version)
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in new_distinct_ids
              group by
                  id
          )
          select
              p.team_id AS team_id,
              pd.distinct_id AS distinct_id,
              toString(p.id) AS person_id,
              p.properties AS properties,
              pd.version AS person_distinct_id_version,
              p.version AS person_version,
              p.created_at AS created_at,
              multiIf(
                  (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  ),
                  pd._timestamp,
                  (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  ),
                  p._timestamp,
                  least(p._timestamp, pd._timestamp)
              ) AS _inserted_at
          from
              person p
              INNER JOIN (
                  SELECT
                      distinct_id,
                      max(version) AS version,
                      argMax(person_id, person_distinct_id2.version) AS person_id2,
                      argMax(_timestamp, person_distinct_id2.version) AS _timestamp
                  FROM
                      person_distinct_id2
                  WHERE
                      team_id = {team_id:Int64}
                      and person_id IN (
                          select
                              id
                          from
                              all_new_persons
                      )
                  GROUP BY
                      distinct_id
              ) AS pd ON p.id = pd.person_id2
          where
              team_id = {team_id:Int64}
              and (id, version) in all_new_persons
          ORDER BY
              _inserted_at
      )
      CREATE OR REPLACE VIEW events_batch_export ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events.distinct_id), cityHash64(events.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              COALESCE(inserted_at, _timestamp) AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events
          PREWHERE
              COALESCE(events.inserted_at, events._timestamp) >= {interval_start:DateTime64}
              AND COALESCE(events.inserted_at, events._timestamp) < {interval_end:DateTime64}
          WHERE
              team_id = {team_id:Int64}
              AND events.timestamp >= {interval_start:DateTime64} - INTERVAL {lookback_days:Int32} DAY
              AND events.timestamp < {interval_end:DateTime64} + INTERVAL 1 DAY
              AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
              AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
          ORDER BY
              _inserted_at, event
          SETTINGS optimize_aggregation_in_order=1
      )
      CREATE OR REPLACE VIEW events_batch_export_unbounded ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events.distinct_id), cityHash64(events.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              COALESCE(inserted_at, _timestamp) AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events
          PREWHERE
              COALESCE(events.inserted_at, events._timestamp) >= {interval_start:DateTime64}
              AND COALESCE(events.inserted_at, events._timestamp) < {interval_end:DateTime64}
          WHERE
              team_id = {team_id:Int64}
              AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
              AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
          ORDER BY
              _inserted_at, event
          SETTINGS optimize_aggregation_in_order=1
      )
      CREATE OR REPLACE VIEW events_batch_export_backfill ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events.distinct_id), cityHash64(events.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              timestamp AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events
          WHERE
              team_id = {team_id:Int64}
              AND events.timestamp >= {interval_start:DateTime64}
              AND events.timestamp < {interval_end:DateTime64}
              AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
              AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
          ORDER BY
              _inserted_at, event
          SETTINGS optimize_aggregation_in_order=1
      )
      CREATE OR REPLACE VIEW persons_batch_export ON CLUSTER posthog AS (
          with new_persons as (
              select
                  id,
                  max(version) as version,
                  argMax(_timestamp, person.version) AS _timestamp2
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in (
                      select
                          id
                      from
                          person
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  id
              having
                  (
                      _timestamp2 >= {interval_start:DateTime64}
                      AND _timestamp2 < {interval_end:DateTime64}
                  )
          ),
          new_distinct_ids as (
              SELECT
                  argMax(person_id, person_distinct_id2.version) as person_id
              from
                  person_distinct_id2
              where
                  team_id = {team_id:Int64}
                  and distinct_id in (
                      select
                          distinct_id
                      from
                          person_distinct_id2
                      where
                          team_id = {team_id:Int64}
                          and _timestamp >= {interval_start:DateTime64}
                          AND _timestamp < {interval_end:DateTime64}
                  )
              group by
                  distinct_id
              having
                  (
                      argMax(_timestamp, person_distinct_id2.version) >= {interval_start:DateTime64}
                      AND argMax(_timestamp, person_distinct_id2.version) < {interval_end:DateTime64}
                  )
          ),
          all_new_persons as (
              select
                  id,
                  version
              from
                  new_persons
              UNION
              ALL
              select
                  id,
                  max(version)
              from
                  person
              where
                  team_id = {team_id:Int64}
                  and id in new_distinct_ids
              group by
                  id
          )
          select
              p.team_id AS team_id,
              pd.distinct_id AS distinct_id,
              toString(p.id) AS person_id,
              p.properties AS properties,
              pd.version AS person_distinct_id_version,
              p.version AS person_version,
              p.created_at AS created_at,
              multiIf(
                  (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  ),
                  pd._timestamp,
                  (
                      p._timestamp >= {interval_start:DateTime64}
                      AND p._timestamp < {interval_end:DateTime64}
                  )
                  AND NOT (
                      pd._timestamp >= {interval_start:DateTime64}
                      AND pd._timestamp < {interval_end:DateTime64}
                  ),
                  p._timestamp,
                  least(p._timestamp, pd._timestamp)
              ) AS _inserted_at
          from
              person p
              INNER JOIN (
                  SELECT
                      distinct_id,
                      max(version) AS version,
                      argMax(person_id, person_distinct_id2.version) AS person_id2,
                      argMax(_timestamp, person_distinct_id2.version) AS _timestamp
                  FROM
                      person_distinct_id2
                  WHERE
                      team_id = {team_id:Int64}
                      and person_id IN (
                          select
                              id
                          from
                              all_new_persons
                      )
                  GROUP BY
                      distinct_id
              ) AS pd ON p.id = pd.person_id2
          where
              team_id = {team_id:Int64}
              and (id, version) in all_new_persons
          ORDER BY
              _inserted_at
      )
      CREATE OR REPLACE VIEW events_batch_export ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events.distinct_id), cityHash64(events.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              COALESCE(inserted_at, _timestamp) AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events
          PREWHERE
              COALESCE(events.inserted_at, events._timestamp) >= {interval_start:DateTime64}
              AND COALESCE(events.inserted_at, events._timestamp) < {interval_end:DateTime64}
          WHERE
              team_id = {team_id:Int64}
              AND events.timestamp >= {interval_start:DateTime64} - INTERVAL {lookback_days:Int32} DAY
              AND events.timestamp < {interval_end:DateTime64} + INTERVAL 1 DAY
              AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
              AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
          ORDER BY
              _inserted_at, event
          SETTINGS optimize_aggregation_in_order=1
      )
      CREATE OR REPLACE VIEW events_batch_export_unbounded ON CLUSTER posthog AS (
          SELECT DISTINCT ON (team_id, event, cityHash64(events.distinct_id), cityHash64(events.uuid))
              team_id AS team_id,
              timestamp AS timestamp,
              event AS event,
              distinct_id AS distinct_id,
              toString(uuid) AS uuid,
              COALESCE(inserted_at, _timestamp) AS _inserted_at,
              created_at AS created_at,
              elements_chain AS elements_chain,
              toString(person_id) AS person_id,
              nullIf(properties, '') AS properties,
              nullIf(person_properties, '') AS person_properties,
              nullIf(JSONExtractString(properties, '$set'), '') AS set,
              nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
          FROM
              events
          PREWHERE
              COALESCE(events.inserted_at, events._timestamp) >= {
      

…truncated. See the full SQL in the workflow logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-inkeep-docs Use this label to skip an Inkeep docs PR in posthog.com

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants