chore(docs): add monorepo-specific docs moved from posthog.com#48292
chore(docs): add monorepo-specific docs moved from posthog.com#48292
Conversation
14 handbook/engineering docs that pertain directly to this codebase (project structure, local dev setup, coding conventions, database guides, etc.) now live here under docs/published/ and get pulled into posthog.com via gatsby-source-git.
|
Size Change: 0 B Total Size: 100 MB ℹ️ View Unchanged
|
| RETURNING is_merged | ||
| ``` | ||
|
|
||
| #### 3.3 Person processing step |
There was a problem hiding this comment.
Incorrect section numbering
This is section 2.3 (under "2. Ingestion pipeline"), but it's numbered 3.3. The preceding sections are 2.1 Prefetch step and 2.2 Personless batch step, so this should be 2.3.
| #### 3.3 Person processing step | |
| #### 2.3 Person processing step |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/published/handbook/engineering/person-processing.md
Line: 273
Comment:
**Incorrect section numbering**
This is section 2.3 (under "2. Ingestion pipeline"), but it's numbered `3.3`. The preceding sections are `2.1 Prefetch step` and `2.2 Personless batch step`, so this should be `2.3`.
```suggestion
#### 2.3 Person processing step
```
How can I resolve this? If you propose a fix, please make it concise.| - If your component is in the `lib/` folder, and has some interactivity, write a [react testing library](https://testing-library.com/docs/react-testing-library/intro/) test for it. | ||
| - Add all new presentational elements and scenes to [our storybook](https://storybook.dev.posthog.dev/). Run `pnpm storybook` locally. | ||
|
|
||
| > Sync note: This file is also copied to posthog/posthog/.claude/commands/conventions.md for Claude Code. When updating this file, please also update the copy there. --> |
There was a problem hiding this comment.
Stray HTML comment closing tag
There's a trailing --> at the end of this line. This looks like a leftover from an HTML comment wrapper in the original source. Since this doc is now published standalone, it should be removed.
| > Sync note: This file is also copied to posthog/posthog/.claude/commands/conventions.md for Claude Code. When updating this file, please also update the copy there. --> | |
| > Sync note: This file is also copied to posthog/posthog/.claude/commands/conventions.md for Claude Code. When updating this file, please also update the copy there. |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/published/handbook/engineering/conventions/frontend-coding.md
Line: 64
Comment:
**Stray HTML comment closing tag**
There's a trailing `-->` at the end of this line. This looks like a leftover from an HTML comment wrapper in the original source. Since this doc is now published standalone, it should be removed.
```suggestion
> Sync note: This file is also copied to posthog/posthog/.claude/commands/conventions.md for Claude Code. When updating this file, please also update the copy there.
```
How can I resolve this? If you propose a fix, please make it concise.| ```bash | ||
| export WEBPACK_HOT_RELOAD_HOST=0.0.0.0 | ||
| export LOCAL_HTTPS=1 | ||
| export JS_URL=https://68f83839843a.ngrok.io |
There was a problem hiding this comment.
Inconsistent ngrok domain
Step 4 (line 41) and step 6 (line 61) use the current ngrok-free.dev domain, but this line in step 5 still uses the old ngrok.io domain. This is confusing since the instructions say "use the same URL" in step 6. All examples should use the same domain.
| export JS_URL=https://68f83839843a.ngrok.io | |
| export JS_URL=https://68f83839843a.ngrok-free.dev |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/published/handbook/engineering/setup-ssl-locally.md
Line: 52
Comment:
**Inconsistent ngrok domain**
Step 4 (line 41) and step 6 (line 61) use the current `ngrok-free.dev` domain, but this line in step 5 still uses the old `ngrok.io` domain. This is confusing since the instructions say "use the same URL" in step 6. All examples should use the same domain.
```suggestion
export JS_URL=https://68f83839843a.ngrok-free.dev
```
How can I resolve this? If you propose a fix, please make it concise.| │ └── toolbar # PostHog Toolbar code | ||
| ├── livestream # Golang service for live events API | ||
| ├── playwright # End-to-end tests using Playwright | ||
| ├── plugin-server # Node.js service for event ingestion and plugins |
There was a problem hiding this comment.
Outdated directory name
The plugin-server directory has been renamed to nodejs in the repository. This reference (and the corresponding "Key directories" section at line 65) should be updated to reflect the current name.
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/published/handbook/engineering/project-structure.md
Line: 25
Comment:
**Outdated directory name**
The `plugin-server` directory has been renamed to `nodejs` in the repository. This reference (and the corresponding "Key directories" section at line 65) should be updated to reflect the current name.
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| ### Testing | ||
|
|
||
| - Frontend E2E tests: [Cypress](https://www.cypress.io/) |
There was a problem hiding this comment.
E2E testing framework is outdated
The project uses Playwright, not Cypress. Cypress isn't in the project's package.json, while Playwright is (and the repo has a playwright/ directory). This should be updated.
| - Frontend E2E tests: [Cypress](https://www.cypress.io/) | |
| - Frontend E2E tests: [Playwright](https://playwright.dev/) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/published/handbook/engineering/stack.md
Line: 25
Comment:
**E2E testing framework is outdated**
The project uses [Playwright](https://playwright.dev/), not Cypress. Cypress isn't in the project's `package.json`, while Playwright is (and the repo has a `playwright/` directory). This should be updated.
```suggestion
- Frontend E2E tests: [Playwright](https://playwright.dev/)
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Pull request overview
This PR moves 14 internal engineering handbook documents from the posthog.com repository to the main PostHog monorepo under docs/published/handbook/engineering/. These docs cover development setup, coding conventions, database operations, and internal architecture. The files will be pulled into posthog.com via gatsby-source-git as documented in the repository's docs README.
Changes:
- Added 14 documentation files covering project structure, tech stack, local development setup, coding conventions, database guides, and person processing internals
- Added missing frontmatter to person-processing.md
- Added language specifiers to code blocks across multiple files
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| stack.md | Documents PostHog's tech stack including frontend (React/Kea), backend (Django/Rust), databases, and workflow orchestration tools |
| setup-ssl-locally.md | Guide for setting up HTTPS locally using ngrok or NGINX with SSL certificates |
| project-structure.md | Overview of the monorepo directory structure and key components |
| person-processing.md | Detailed internal documentation on PostHog's person identity system, merging, and query engine |
| databases/schema-changes.md | Best practices for making safe database schema changes |
| databases/query-performance-optimization.md | Guide for optimizing PostgreSQL and ClickHouse query performance |
| databases/materialized-columns.md | Documentation on using and managing ClickHouse materialized columns |
| databases/hogql-python.md | Developer guide for writing HogQL queries in Python |
| databases/clickhouse-event-table-migrations.md | Detailed walkthrough of running large-scale ClickHouse migrations on PostHog Cloud |
| databases/async-migrations.md | Guide for writing async migrations with workflow and architecture details |
| data-warehouse.md | Internal guide for PostHog engineers working with the data warehouse feature |
| conventions/frontend-coding.md | Frontend coding conventions covering Kea/React patterns, naming, and testing |
| conventions/backend-coding.md | Backend coding conventions covering logging, testing, and HogQL usage |
| clickhouse/replication.md | Comprehensive guide on ClickHouse data replication and distributed queries |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| showTitle: true | ||
| --- | ||
|
|
||
| This document outlines how to do large-scale data migrations on PostHog Cloud without using [Async Migrations](/handbook/engineering/databases/async-migrations). |
There was a problem hiding this comment.
The link references "/handbook/engineering/databases/async-migrations" using an absolute path. Since this file is being moved to the same directory structure in this PR, verify that the link resolves correctly once both files are in place.
| This document outlines how to do large-scale data migrations on PostHog Cloud without using [Async Migrations](/handbook/engineering/databases/async-migrations). | |
| This document outlines how to do large-scale data migrations on PostHog Cloud without using [Async Migrations](./async-migrations). |
| - Browse to the [Diagnose](https://data.heroku.com/datastores/56166304-6297-4dce-af64-a1536ea2197c#diagnose) tab in Heroku Data's dashboard. You can break queries down by: | ||
| - Most time consuming | ||
| - Most frequently invoked | ||
| - Slowest execution time | ||
| - Slowest I/O | ||
| - You can also use Heroku's [Diagnose](https://blog.heroku.com/pg-diagnose) feature by running `heroku pg:diagnose` to get a breakdown of long running queries, long transactions, among other diagnostics. | ||
| - For a more raw approach you can access real time logs from Heroku by executing `heroku logs --app posthog --ps postgres` |
There was a problem hiding this comment.
The Heroku references in this section are outdated. PostHog Cloud no longer uses Heroku for hosting. This documentation should be updated or removed since it refers to infrastructure that's no longer in use.
| - Browse to the [Diagnose](https://data.heroku.com/datastores/56166304-6297-4dce-af64-a1536ea2197c#diagnose) tab in Heroku Data's dashboard. You can break queries down by: | |
| - Most time consuming | |
| - Most frequently invoked | |
| - Slowest execution time | |
| - Slowest I/O | |
| - You can also use Heroku's [Diagnose](https://blog.heroku.com/pg-diagnose) feature by running `heroku pg:diagnose` to get a breakdown of long running queries, long transactions, among other diagnostics. | |
| - For a more raw approach you can access real time logs from Heroku by executing `heroku logs --app posthog --ps postgres` | |
| - Enable and inspect PostgreSQL's [`pg_stat_statements`](https://www.postgresql.org/docs/current/pgstatstatements.html) view to identify queries that are: | |
| - Most time consuming overall | |
| - Most frequently invoked | |
| - Having the highest mean or max execution time | |
| - Use your database hosting provider's PostgreSQL monitoring or diagnostics tools (for example, a managed database dashboard) to break queries down by execution time, frequency, and resource usage. | |
| - Access real-time PostgreSQL logs via your infrastructure's logging pipeline or your managed database provider's log streaming interface. |
|
|
||
| Making sure PostHog operates fast at scale is key to our success. | ||
|
|
||
| This document outlines some best practices to archive good query performance at scale, as well as describing tools and procedures to discover and fix performance issues. |
There was a problem hiding this comment.
Typo: "archive" should be "achieve".
| This document outlines some best practices to archive good query performance at scale, as well as describing tools and procedures to discover and fix performance issues. | |
| This document outlines some best practices to achieve good query performance at scale, as well as describing tools and procedures to discover and fix performance issues. |
| ssl_certificate /Users/timglaser/dev/localhost.crt; | ||
| ssl_certificate_key /Users/timglaser/dev/localhost.key ; |
There was a problem hiding this comment.
This path contains a hardcoded username "/Users/timglaser/dev/". This should be replaced with a generic placeholder like "/path/to/your/certs/" or use environment variables to avoid confusion for developers following this guide.
| ssl_certificate /Users/timglaser/dev/localhost.crt; | |
| ssl_certificate_key /Users/timglaser/dev/localhost.key ; | |
| ssl_certificate /path/to/your/certs/localhost.crt; | |
| ssl_certificate_key /path/to/your/certs/localhost.key ; |
| RETURNING is_merged | ||
| ``` | ||
|
|
||
| #### 3.3 Person processing step |
There was a problem hiding this comment.
This section header is numbered "3.3" when it should be "2.3" to follow the proper sequence (after 2.2 Personless batch step).
| #### 3.3 Person processing step | |
| #### 2.3 Person processing step |
| ```bash | ||
| export WEBPACK_HOT_RELOAD_HOST=0.0.0.0 | ||
| export LOCAL_HTTPS=1 | ||
| export JS_URL=https://68f83839843a.ngrok.io |
There was a problem hiding this comment.
The URL in this example uses "ngrok.io" but line 61 uses "ngrok-free.dev". These should be consistent to avoid confusion. The ngrok-free.dev domain is the current standard for free ngrok accounts.
| export JS_URL=https://68f83839843a.ngrok.io | |
| export JS_URL=https://68f83839843a.ngrok-free.dev |
| ### 5. Kafka (person updates) | ||
|
|
||
| After person processing, updates are produced to Kafka: | ||
|
|
||
| - `KAFKA_PERSON`: Person creates/updates/deletes | ||
| - `KAFKA_PERSON_DISTINCT_ID`: Distinct ID mapping changes | ||
|
|
||
| ### 6. ClickHouse tables |
There was a problem hiding this comment.
Inconsistent section numbering. The numbering jumps from "2.4" to "4" skipping "3". The sections should be renumbered for consistency (either 3, 4, 5, 6, 7 or 2.5, 2.6, 2.7, 2.8, 2.9).
| ### 5. Kafka (person updates) | |
| After person processing, updates are produced to Kafka: | |
| - `KAFKA_PERSON`: Person creates/updates/deletes | |
| - `KAFKA_PERSON_DISTINCT_ID`: Distinct ID mapping changes | |
| ### 6. ClickHouse tables | |
| ### 4. Kafka (person updates) | |
| After person processing, updates are produced to Kafka: | |
| - `KAFKA_PERSON`: Person creates/updates/deletes | |
| - `KAFKA_PERSON_DISTINCT_ID`: Distinct ID mapping changes | |
| ### 5. ClickHouse tables |
| PostHog's database schema evolves constantly along with the app. | ||
| Each schema change requires deliberation though, as a badly designed migration can cause pain for users and require extra effort from the engineering team. | ||
|
|
||
| For detailed patterns on writing safe Django migrations, see the [Safe Django Migrations guide](/handbook/engineering/safe-django-migrations). |
There was a problem hiding this comment.
The link text says "Safe Django Migrations guide" but the URL path is "/handbook/engineering/safe-django-migrations". This link will be broken if the target document doesn't exist at that location in the monorepo. Since this PR is moving docs from posthog.com, verify that this linked document also exists or will be moved.
|
|
||
| ### How-to fix slow queries | ||
|
|
||
| See [ClickHouse manual](/handbook/engineering/clickhouse/) for tips and tricks. |
There was a problem hiding this comment.
The link references "/handbook/engineering/clickhouse/" but this guide discusses ClickHouse operations. Verify that this linked document exists or will be moved as part of documentation consolidation.
| See [ClickHouse manual](/handbook/engineering/clickhouse/) for tips and tricks. | |
| See ClickHouse manual for tips and tricks. |
|
|
||
| Materialized columns allow us to "store" specific properties stored in JSON as separate columns that are there on disk, making reading these columns up to 25x faster than normal properties. | ||
|
|
||
| Also check out our [ClickHouse manual](/handbook/engineering/clickhouse/working-with-json) and [blog post](/blog/clickhouse-materialized-columns) for more information. |
There was a problem hiding this comment.
The link references "/handbook/engineering/clickhouse/working-with-json" which may not exist in the monorepo. Verify this linked document exists or will be moved as part of documentation consolidation.
| Also check out our [ClickHouse manual](/handbook/engineering/clickhouse/working-with-json) and [blog post](/blog/clickhouse-materialized-columns) for more information. | |
| Also check out our ClickHouse manual and [blog post](/blog/clickhouse-materialized-columns) for more information. |
- Fix section numbering in person-processing (3.3 → 2.3) - Remove stray HTML comment closing tag in frontend-coding - Fix ngrok domain inconsistency in setup-ssl-locally (ngrok.io → ngrok-free.dev) - Replace hardcoded /Users/timglaser path with placeholder - Rename plugin-server → nodejs in project-structure (directory was renamed) - Update Cypress → Playwright in stack (current E2E framework) - Fix typos: "archive" → "achieve", "santise" → "sanitize"
- Monorepo-internal links → relative (./sibling, ../parent-level) - Links to posthog.com content outside monorepo → absolute https://posthog.com/... - Fix stale /docs/contribute/type-system → ../type-system (file is in monorepo)
Add .github/scripts/check-docs-links.js that validates: - relative links resolve to existing files - published/ docs don't link outside their boundary - posthog.com absolute links return 200 (with retries + timeouts) Also fix all links in moved docs: - monorepo-internal links → relative (./sibling, ../parent) - posthog.com-only content → absolute https://posthog.com/... - stale /docs/contribute/type-system → ../type-system
Docs contain code examples (like openssl commands for local dev SSL) that trigger false positives from security rules. Documentation is not executable code and shouldn't be scanned for code security patterns.
…-docs-from-website
Build an index of all published doc URL paths at startup and flag any absolute posthog.com links that point to locally available docs — these should be relative links instead. Also resolve pnpm-lock.yaml merge conflict from master merge.
|
Docs from this PR will be published at posthog.com
Preview will be ready in ~10 minutes. Click Preview link above to access docs at |
|
🎭 Playwright report · View test results →
These issues are not necessarily caused by your changes. |
Moves 14 handbook/engineering docs from posthog.com that pertain directly to this codebase. These cover project structure, local dev setup, coding conventions, database guides, ClickHouse operations, and internal architecture docs.
They now live under
docs/published/handbook/engineering/and get pulled into posthog.com viagatsby-source-gitas described in docs/README.md.Files moved:
Minor fixes: added frontmatter to person-processing.md (had none), added language specifiers to bare fenced code blocks across several files.
Companion PR: PostHog/posthog.com — removes these same files from the website repo.