A study of how various popular systems (Teleport, GCP, AWS) capture audit events and session recordings, including the vibe-sketch of a session classifier on top of a live Cloud + External Audit Storage tenant.
| Directory | What's in it | Entry point |
|---|---|---|
notes/ |
Durable research notes — plumbing, storage topology, tap options, multi-step pipeline design, open questions. | notes/README.md |
notes-gcp/ |
Parallel notes for the GCP substrate — Cloud Audit Logs, org-aggregated sink to BigQuery, tap points, GCP-side pipeline design. | notes-gcp/README.md |
prior-art/ |
Bare factual notes on third-party prior art. | prior-art/README.md |
harnesses/ |
Third-party benchmarks or harnesses that could be used as labeled data or test fixtures. | harnesses/README.md |
upstream-repo/ |
Pinned read-only Teleport source (currently 2797910 = Release 17.7.20). |
github.com/gravitational/teleport |
The work has a multi-step plan; each step's output is a durable
artifact under notes/ (Teleport substrate) or notes-gcp/ (GCP
substrate). Both substrates will feed the same sessions.sqlite /
labels schema used in later steps.
- Step 1 — Plumbing research.
- Teleport: See
notes/01-audit-log-plumbing.mdthroughnotes/05-tap-points-for-detection.md, with what couldn't be answered from source alone collected innotes/99-open-questions.md. - GCP: See
notes-gcp/01-cloud-audit-logs.mdthroughnotes-gcp/05-tap-points-for-detection.md, with live-tenant questions innotes-gcp/99-open-questions.md.
- Teleport: See
- Step 2 — Pipeline design.
- Teleport: See
notes/06-pipeline-design.md: a single static Go binaryteleport-analyzethat taps Athena forsession.uploadevents and S3 directly for recordings, parses ProtoStreamV1 vialib/events.NewProtoReader, and upserts per-session features plus Kubernetes-style classification labels into a localsessions.sqlite. - GCP: See
notes-gcp/06-pipeline-design.md: a sibling Go binary (or extension ofteleport-analyze) that queries BigQuery for per-(principal, minute)feature rows, synthesises sessions by gluing adjacent buckets, and writes into the samesessions.sqlitewith GCP-flavoured labels (substrate.kind=gcp-cloud-audit,gcp.ua.tool, etc.).
- Teleport: See
- Step 3 — Classifier. Outstanding. Reads from the shared SQLite extract; phase-1 is rules-only (Teleport: "is this a human terminal?"; GCP: "is this an operator-driven session?"), phase-2 is LLM-on-call-graph (or PTY bytes, on the Teleport side) for sessions phase-1 routes to it.
Shared across notes/, prior-art/, and harnesses/:
path:linereferences innotes/are relative toupstream-repo/unless prefixed.notes-gcp/has no equivalent pin (GCP is a managed service); it cites product names + log names + documented schemas instead.- Tenant-specific values use placeholders. Teleport:
<your-tenant>for the Cloud hostname,<uuid>for AWS resource nonces. GCP:<org-id>,<logging-project>,<bq-dataset>,<gcs-archive-bucket>,<your-domain>. - Code blocks marked
// v17.7.20are copied directly fromupstream-repo/at the pinned commit; everything else is illustrative. - New facts that can't be sourced get added to the relevant
99-open-questions.mdwith a verification note, not inlined as if known.
upstream-repo is gravitational/teleport, which itself declares a
nested submodule e pointing at git@github.com:gravitational/teleport.e.git
— Teleport's proprietary enterprise half, private to Gravitational
employees. Nothing in this study reads from e, so the recommended
clone sequence inits the direct submodules first, marks e inactive,
then recurses for everything else:
git clone <this-repo>
cd <this-repo>
git submodule update --init # direct submodules only
git -C upstream-repo config submodule.e.update none
git submodule update --init --recursive # safe to recurse nowThe submodules are shallow by design; pinning is what guarantees
reproducibility, not history depth. The submodule.e.update = none
setting must live in upstream-repo's local config because that is
the only repo where e is a direct submodule — setting it at the
shellscope or outer-project level has no effect. Subsequent recursive
updates will report Skipping submodule 'upstream-repo/e'.