feat: daily LinkedIn microservice + autocli CDP wiring + supporting fixes#2
Conversation
Create scripts/job_priority_config.py with all configuration constants, regex patterns, and keyword sets for the deterministic job priority scoring system. Contains no scoring logic -- only configuration to be imported by the scorer, sync pipeline, backfill scripts, and tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pure, deterministic scoring engine for AutoCLI jobs with 8 components: compensation, role fit, seniority, work arrangement, application path, freshness, data completeness, and source quality. Includes penalty system, hard-reject guard, and tier mapping (high/medium/low/reject). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
REPEATED_PUNCT_RE used {2,} which matches 3+ total consecutive punctuation
chars (e.g. "!!!" -> "!"). Changed to {1,} so 2+ consecutive chars are
collapsed (e.g. "!!" -> "!", "!!!" -> "!").
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Import score_job in sync_autocli_jobs.py and call it per-record - Pass ScoreResult fields (priority_score, priority_tier, priority_version, priority_signals) to upsert_job RPC - Add --disable-scoring flag for testing - Report priority score distribution in dry-run mode - Add comprehensive test suite (104 tests across 14 classes) covering all 8 scoring components, penalties, hard-reject guard, edge cases, and integration scenarios Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Migration 20260509182000: add priority scoring columns to jobs.jobs table (priority_score, priority_tier, priority_version, priority_signals, priority_scored_at) - Migration 20260509184000: add update_job_priority_score RPC that only touches scoring fields (not the full row), with schema-scoped and public wrappers - scripts/backfill_priority_scores.py: batch backfill script with --force, --limit, --dry-run, --env-file options; reconstructs job_data from raw_record or DB columns; reports per-row scores, tiers, and errors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Codex <noreply@openai.com>
- Rename priority_version column to priority_scorer_version in both migrations - Add 'unknown' to priority_tier check constraint - Fix indices to include last_seen_at desc and priority_score desc per spec - Add --min-priority-score and --priority-tier CLI flags for optional filtering - Enhance dry-run with top_priority_jobs, low_priority_count, priority_tiers - Add source-quality summary (recruiter/aggregator/raw-jd-fallback counts) - Update backfill RPC param name to match column rename
Design covers: - 6-container stack: chrome (Stagehand), daily (cron+FastAPI), cloudflared, prometheus, grafana - Cloudflare Tunnel + Access for public exposure of /vnc /cdp /api /jobs /grafana - GHCR + Watchtower pull-based deploy - Phased acceptance criteria with verification commands Worktree: feat/daily-microservice (branched from main).
Critical fixes:
- Add prereq section for autocli BrowserBridge CDP-wiring patch
- Fix cargo build to use package name 'autocli' (was '-cli')
- Switch /jobs to client.schema('jobs').table('jobs') API
- Use /json/list + page target (was /json/version, browser-level)
- Rewrite ws host localhost->autocli-chrome:9222
- Standardize on SUPABASE_SERVICE_ROLE_KEY
- Make API_RUN_TOKEN actually enforced + tested in Phase 4
- Add machine-verifiable Cloudflare Access gate before cdp ingress
High-severity fixes:
- Feature branches publish :branch-*+:sha-* only; :main from main
- Pin cloudflared/prometheus/grafana to specific semver
- Switch Cloudflare Tunnel to --token mode (no config.yml mix)
- Replace path routes with 5 subdomains (avoids prefix-strip)
- Split Access into two policies per Application (Token OR Email)
- Drop Grafana Infinity plugin dependency
- VNC password generated random in prod (no 'stagehand' default)
- shred only temp copy of operator secrets, never the source
- Unify retry to 3-attempts/15-60-240s across code+runbook+metrics
- Add explicit restart: unless-stopped to autocli-daily
- Specify Prometheus metrics_path: /api/metrics
- Unified CI build context = repo root for both Dockerfiles
- Note GHCR creds already configured on target host
Bugs: - L103: component table referenced stale /metrics path -> /api/metrics - L209/L236: github.ref_name with '/' produces invalid Docker tags; switch to docker/metadata-action's type=ref,event=branch which slugifies - L321: /json/new requires PUT, not POST (Chrome >= M86) - L354: jobs.autocli/ routed to backend root but /jobs is the actual route; drop the jobs subdomain entirely, serve via api.autocli/jobs (4 subdomains) - L473: Phase 0 build context disagreed with CI; unify on repo root - L522: Phase 4 step 2 implied Service Token works on vnc/grafana where no machine policy exists; split per-subdomain expectations - L526: Phase 4 probed cdp.autocli before the spec said cdp ingress was added; split Phase 4 into 4a (pre-CDP gate) / 4b (add cdp ingress) / 4c (cdp probes) - L549: Phase 5 status call missing Bearer Risks: - L486: Phase 1 status call missing Bearer; added Also: - Fix '6 services' / '6 new containers' counts; actual count is 5 - Update §2.2 boundaries note from /json/version to /json/list + PUT /json/new
Bugs: - L107: discovery wasn't actually re-run per /api/run; updated process tree so run-daily.sh calls cdp-discover.sh before each run, then sources /run/cdp-endpoint.env. §5.2 'Boot ordering' split into 'Discovery cadence' (boot + per-run) and 'Boot ordering'. - L292: process tree now spells out 'PUT /json/new?about:blank' for the empty-list case, matching §5.2. - L429: API_RUN_TOKEN row now lists every Bearer-protected route (/api/status, /api/run, /api/logs, /jobs) plus the open ones (/api/health, /api/metrics). - L496: Phase 2 acceptance split into feature-branch expectation (:branch-feat-daily-microservice + :sha-*) vs post-merge expectation (:main + :sha-*). Phase 3 explicitly reads :main. Risks / nits: - L129: file layout comment changed to match real routes - L358: 'five hostnames' -> 'four hostnames' after dropping jobs subdomain - L628: §9 risk #1 rewritten to reference Phase 4a/4b/4c and three pre-CDP subdomains, not the old Phase 4 step 1+2 / four subdomains.
Bugs:
- L211: slugifier comment was inside 'tags: |' literal block where
metadata-action would parse it as a rule. Moved above 'tags:' as a
proper YAML comment.
- L478: Phase 0 'cargo build' on operator's arm64-darwin host then
COPY into linux/amd64 image would inject a Mach-O. Replaced with a
docker run rust:1.81-slim-bookworm --platform linux/amd64 builder
step + 'file' verification ('ELF 64-bit'). CI is unchanged because
ubuntu-latest is already linux/amd64.
Risks:
- L312: /api/metrics annotation 'only reachable via docker network'
was misleading — api.autocli does expose it externally via
Cloudflare Access. Re-annotated both /api/metrics and /api/health
as dual-path: internal direct, external via Access.
- L368: Tailscale-CGNAT IP gate on cdp.autocli would never match —
Cloudflare sees public/WARP egress IP, not 100.x. Replaced with
dedicated short-lived Service Token + mTLS client cert (machines),
email OTP + required WARP posture (humans). §9 risk #1 and Phase
4b/4c reworded to match.
Nit:
- L551: Phase 4a 'Authenticated' header was wrong for vnc/grafana
(no auth sent). Renamed to 'humans-only negative machine probe'
and made it actually send a Service Token to prove it gets denied.
Phase 4c also updated to use mTLS-style probes (dedicated CF_ID_CDP)
plus a websocket-upgrade smoke test for the CDP surface.
Bugs (all on Phase 4c WebSocket probe): - curl -sI is HEAD; WebSocket Upgrade requires GET. Replaced with proper websocat client (preferred) and curl --http1.1 -i -N fallback. No more -I anywhere on the WS path. - /devtools/page/<id> placeholder cannot run as written. Probe now extracts the actual page id by GETing /json/list, picking the first type:'page' target, and rewriting host to cdp.autocli. - 'HTTP/2 101 Switching Protocols' does not exist — 101 is HTTP/1.1 semantics, and Cloudflare does not speak RFC 8441 multiplexed WS. Probe now forces --http1.1 and expects 'HTTP/1.1 101 Switching Protocols'. The websocat path checks for a CDP round-trip (Target.getTargets) instead. Phase 4c renumbered 4c-1..4c-4 to make the four checks explicit.
Was 1.81 — arbitrary, drifting from operator's rustc 1.94.1 and from ubuntu-latest's default stable. Pinned to 1.94-slim-bookworm so Phase 0 / CI / dev agree. Spec also notes the long-term hardening (repo-tracked rust-toolchain.toml) — that task is included in the implementation plan.
34 bite-sized tasks across 8 phases (A-H), each with TDD substeps, exact file paths, exact commands, expected outputs. Covers: Phase A: rust-toolchain.toml + BrowserBridge CDP patch + smoke test Phase B: deploy/ scaffold (chrome, daily, prometheus, grafana, compose) Phase C: GitHub Actions workflow Phase D: Phase 0 image build (Docker rust 1.94) + Phase 1 local e2e Phase E: GHCR push (Phase 2) Phase F: 100.108.80.9 server bring-up (Phase 3) Phase G: Cloudflare Tunnel + Access (Phase 4a/4b/4c) Phase H: forced run + monitoring (Phase 5) + schedule rollover (Phase 6) Plan is self-contained — no TBDs or 'similar to Task N' placeholders. Self-review section maps every SPEC section to its implementing task(s).
Aligns local dev (operator was on rustc 1.94.1), CI (was using ubuntu-latest default), and the Phase 0 Docker builder (deploy/SPEC.md). Single source of truth; future bumps touch only this file.
Add AUTOCLI_CDP_ENDPOINT env-var branch at the top of BrowserBridge::connect. When set, skip daemon spawn + extension polling and return Arc<CdpPage> directly. The IPage trait contract is unchanged so pipeline executors and YAML adapters consume either implementation transparently. Required prerequisite for the autocli-daily microservice (deploy/SPEC.md §1.A) which runs autocli in a container with no Chrome extension or daemon, connecting to a sibling Chrome container via CDP.
Two robustness improvements from code review: - RAII Drop guard ensures AUTOCLI_CDP_ENDPOINT is cleared even if the test panics mid-way, preventing cross-test env leakage. - Assert on CliError::BrowserConnect variant directly instead of string-matching the Display output. Resilient to future error-message wording changes.
Copy of my-stagehand-app/Dockerfile.chrome with the COPY path rewritten for repo-root build context (deploy/SPEC.md §4.1).
Verbatim from my-stagehand-app/scripts/entrypoint-vnc.sh: Xvfb -> x11vnc -> noVNC -> socat 9222->9223 -> Chromium with --remote-debugging-port=9223 --user-data-dir=/root/.config/chromium. Extension loading via /opt/extensions/*/manifest.json is preserved even though this design ships with no extensions.
Multi-arch-aware single-stage image: - python:3.12-slim-bookworm base - tini (PID 1), util-linux (flock), jq (CDP discovery), curl (probes) - supercronic (container cron) pinned to v0.2.30 with sha1 verify - uv (Astral) for Python deps - Pre-built autocli binary copied from deploy/daily/bin/ - FastAPI app + scripts/sync_autocli_jobs.py - Boot via tini -> entrypoint.sh - TZ=Europe/London, CRON_SCHEDULE default 03:00. DONE_WITH_CONCERNS: scripts/job_priority_scorer.py and scripts/job_priority_config.py are absent from the worktree; their COPY lines have been omitted from the Dockerfile.
Find or create a CDP page target on autocli-chrome:9222. - GET /json/list, pick first type:page - if list is empty, PUT /json/new?about:blank (Chrome >= M86) - rewrite host (localhost:9223 -> autocli-chrome:9222) so the WS URL is reachable from the daily container's network namespace - write to /run/cdp-endpoint.env (sourced by run-daily.sh) - 60s retry budget; exit 1 on timeout (entrypoint exits non-zero, restart: unless-stopped recreates container until chrome ready).
- flock -n to prevent cron + /api/run from colliding - per-attempt cdp-discover refresh (page id may have rotated) - runs autocli linkedin recommended -> JSON -> sync_autocli_jobs.py - unified retry: 3 attempts at 15s/60s/240s (SPEC §5.2) - writes /data/output/last_run.json consumed by /api/status.
Boot-time cdp-discover gate, then runs supercronic + uvicorn in parallel under tini. wait -n exits as soon as either child dies, so compose's restart policy can pick up failure modes (e.g. uvicorn panic, supercronic crash).
03:00 daily LinkedIn pull + 04:00 30-day output retention sweep (SPEC §5.2). TZ resolved by the container's TZ=Europe/London.
After rebase onto local main, scripts/job_priority_scorer.py and scripts/job_priority_config.py are present. sync_autocli_jobs.py imports them at runtime, so the daily image must ship all three.
uv-managed; pins fastapi/uvicorn/supabase/prometheus-client/httpx to compatible ranges. Lockfile checked in so the Dockerfile's 'uv sync --frozen' is reproducible.
Used by POST /api/run to spawn run-daily.sh non-blockingly. is_running() is a non-destructive flock probe so /api/status can report in_progress without affecting the actual run.
Routes per SPEC §5.1:
GET /api/health [open] chrome reachability + cdp file probe
GET /api/metrics [open] Prometheus exposition (delta-aware counters)
GET /api/status [Bearer] last_run.json + in_progress
POST /api/run [Bearer] spawn run-daily.sh, 409 if already running
GET /api/logs [Bearer] tail of latest log (default 200 lines)
GET /jobs [Bearer] Supabase 'jobs.jobs' read proxy via
client.schema('jobs').table('jobs').
Import style B: 'import trigger' (flat), because entrypoint.sh does
'cd /app/api && uvicorn main:app' — no package context, flat import works.
9 tests covering: - /api/status, /api/run, /api/logs, /jobs all return 401 without Bearer and 401 with wrong Bearer - /api/status default-shape + reflects last_run.json - /api/metrics is open and contains the autocli_daily_ family - /api/health returns 503 when chrome:9222 unreachable. conftest.py adds deploy/daily/api to sys.path (flat import, matching entrypoint.sh's 'cd /app/api && uvicorn main:app' invocation). Prometheus registry is cleared before each fresh module import to avoid duplicate-timeseries errors across test fixtures.
Single job scraping autocli-daily:8080/api/metrics every 15s. metrics_path is required because FastAPI mounts under /api/*.
- Datasource: Prometheus at prometheus:9090 (uid prom-autocli) - Dashboard provider points at /etc/grafana/provisioning/dashboards - autocli.json: time-since-last-run, last exit code, rows-upserted-today, CDP-up %, daily scraped/upserted/skipped time series, duration - No plugin dependencies (Infinity dropped per L313 review).
5 services on shared autocli-net bridge: - autocli-chrome (Stagehand, watchtower-tracked, healthcheck on 9222) - autocli-daily (cron+FastAPI, watchtower-tracked, depends_on chrome healthy, env scoped to Supabase creds only) - cloudflared (Tunnel token mode, depends_on daily healthy) - prometheus (pinned, 90-day retention) - grafana (pinned, anon disabled, signup disabled, admin from env) Named volumes for profile / output / tsdb / grafana state.
Binds host ports under non-conflicting numbers (6081/5902/9223/8081/ 9091/3001) so the operator can keep their existing local Chrome and Grafana running alongside. cloudflared moved to a 'disabled' profile.
All required environment variables with empty values + inline generator hints. Real .env never committed (.gitignore already covers it under '.env').
Quickstart, Cloudflare dashboard checklist, forced-run snippet, common-failure table. Points back at SPEC + PLAN for the why.
3 jobs: 1. build-autocli-binary: cargo build --release -p autocli on ubuntu-latest (linux/amd64) with Swatinem cache; uploads artifact 2. build-chrome-image: builds deploy/chrome from repo-root context; docker/metadata-action generates :main on main, :branch-<slug> on feature branches, :sha-<short> always 3. build-daily-image: downloads the autocli artifact, builds deploy/daily from repo-root context, same tag policy Path filters include rust-toolchain.toml so a toolchain bump triggers a rebuild.
The placeholder value was wrong (build failed with 'computed checksum did NOT match'). Verified by downloading the GitHub release asset and computing sha1sum from the operator's laptop.
CI builds the binary as a separate job and uploads as artifact; Phase 0 locally rebuilds inside a Docker rust container and writes to deploy/daily/bin/. Never commit this file (it's ~8MB).
rick-ubuntu-ssh tunnel's running replica is 2026.3.0 (per Zero Trust dashboard). Our container joins as a 2nd HA replica; matching the connector version avoids mixed-version edge cases.
Prod host (100.108.80.9) already has a process bound to :5900, so the 5900:5900 mapping failed container networking. Native VNC is only a local convenience and is NOT part of the Cloudflare ingress; noVNC on 6080 (+ vnc.autocli route) is the real access path. Container still listens on 5900 internally for websockify -> noVNC.
Chrome DevTools rejects /json* and /devtools Host headers that aren't an IP or localhost. Reaching autocli-chrome by docker service name failed with 'Host header is specified and is not an IP address or localhost'. - cdp-discover.sh: resolve CHROME_HOST -> container IP (getent, python fallback); use the IP for the /json probe AND the rewritten ws:// URL so every Host header Chrome sees is an IP. Re-resolved each run. - main.py /api/health: send Host: localhost on the liveness probe (yes/no check, body unused). Found during Phase 3 server bring-up; daily container was crash-looping on 'chrome unreachable after 60s' despite DNS + same-network OK.
Free Cloudflare zones get Universal SSL covering only <zone> + one-level
*.<zone>. Two-level subdomains like vnc.autocli.<zone> handshake-fail
('Unauthorized' / sslv3 alert) until the operator upgrades to Pro,
Total TLS, or ACM.
Rename across SPEC / PLAN / README:
vnc.autocli.<zone> -> autocli-vnc.<zone>
cdp.autocli.<zone> -> autocli-cdp.<zone>
api.autocli.<zone> -> autocli-api.<zone>
grafana.autocli.<zone> -> autocli-grafana.<zone>
§9 risk nashsu#4 now documents the Free-plan SSL constraint as the reason for
the flat naming.
Host ubuntu-latest gives GLIBC 2.39 binaries that fail to load in the daily runtime image (Debian Bookworm = GLIBC 2.36) with 'GLIBC_2.39 not found'. Pin build container to rust:1.94-slim-bookworm so binary GLIBC requirements match runtime. Also adds a readelf-based check that fails the build if the binary's max GLIBC requirement exceeds 2.36.
`source /run/cdp-endpoint.env` only sets a shell variable; without
export, the autocli child process never sees AUTOCLI_CDP_ENDPOINT and
falls through to BrowserBridge's daemon path
("Chrome is not running"). Wrap source with `set -a`/`set +a` so the
assignment auto-exports as an env var that survives across fork/exec.
sync_autocli_jobs.py pretty-prints its summary with indent=2:
{
"input_rows": 573,
"upserted": 573,
...
}
The old run-daily.sh did 'grep "^{" log | tail -1' which matched only
the opening '{' line, yielding invalid JSON. Subsequent jq parses
failed silently, --argjson got empty values, the final jq -n -> dev/null
overwrote LAST_RUN_JSON with an empty file.
Fix: redirect sync stdout to /tmp/sync-DATE-N.json, also append to
log, then jq parses the captured JSON directly. Status now correctly
reflects rows_scraped/upserted/skipped from each run.
When run-daily.sh did 'exec 9>LOCK; flock 9' and then invoked autocli, bash's FD 9 inherited into the autocli process by default. If autocli took the daemon-path fallback (pre-env-export fix; or any future code path that spawns a daemon), the detached 'autocli --daemon' child inherited FD 9 too and held the lock for its lifetime. is_running() then returned True forever, breaking /api/status. Add '9>&-' to autocli and uv invocations so children can't see or hold the lock. Verified by /proc/<pid>/fd inspection in production.
|
crates/autocli-browser/src/cdp.rs:L302: 🔴 bug: CDP close sends Browser.close; post-run page.close kills shared Chrome. Use Target.closeTarget or make CDP close no-op. deploy/chrome/entrypoint-vnc.sh:L38: 🔴 bug: -nopw disables VNC auth; noVNC exposes logged-in browser without password. Drop -nopw; rely on -rfbauth. deploy/docker-compose.yml:L20: 🔴 bug: 9222:9222 publishes unauthenticated CDP outside Cloudflare Access. Bind 127.0.0.1 or remove host port. supabase/migrations/20260509182000_add_priority_scoring_columns.sql:L140: 🔴 bug: unscored upsert overwrites old priority with 0 because insert coerces null to default. Branch on p_priority_score, not excluded.priority_score. scripts/backfill_priority_scores.py:L136: 🔴 bug: client.table("jobs.jobs") queries a public table literally named jobs.jobs. Use client.schema("jobs").table("jobs") or public.jobs_jobs. scripts/backfill_priority_scores.py:L147: 🔴 bug: backfill skips migrated rows; priority_score is NOT NULL default 0 and version defaults current. Filter priority_scored_at.is.null instead. deploy/daily/api/main.py:L164: 🔴 bug: /jobs uses anon client against schema("jobs"), but migrations expose public.jobs_jobs view. Query public.jobs_jobs or expose jobs schema to PostgREST. deploy/daily/crontab:L3: 🟡 risk: CRON_SCHEDULE env ignored; compose override never changes schedule. Generate crontab from env at entrypoint or remove env. deploy/daily/crontab:L6: 🟡 risk: OUTPUT_RETENTION_DAYS ignored; retention is hardcoded to 30. Generate crontab from env or remove the env knob. scripts/sync_autocli_jobs.py:L468: 🟡 risk: scoring exceptions are swallowed and pipeline exits success with scored:0. Log row error and fail when scoring is enabled. scripts/sync_autocli_jobs.py:L555: 🟡 risk: dry-run reads application_path, but scorer writes application_friction; aggregator summary stays zero. Use application_friction. |
cdp.rs (item 1): IPage::close was sending Browser.close, which kills the
SHARED Chrome in CDP-direct mode (and every other consumer attached to
it). Made it a no-op with explanation. Callers that need per-page
cleanup should send Target.closeTarget directly.
entrypoint-vnc.sh (item 2): -nopw was overriding -rfbauth and leaving
VNC open with no password. Anyone reaching :5900/6080 (via Tailscale
or any leaked path) could drive the logged-in browser. Removed the
flag; password auth from /root/.vnc/passwd is now enforced.
docker-compose.yml (item 3 + defense-in-depth on 6080): bound both
6080 and 9222 host ports to 127.0.0.1 only. Public path is Cloudflare
Tunnel + Access; direct host-port access would bypass every auth layer.
Backup: 'ssh -L 6080:localhost:6080' from a Tailscale-connected box.
backfill_priority_scores.py (items 5 + 6): client.table('jobs.jobs')
queried a literal 'jobs.jobs' name in public schema (always 0 rows);
fixed to client.schema('jobs').table('jobs'). Filter also moved from
priority_score.is.null (already NOT NULL DEFAULT 0 post-migration, so
matches nothing) to priority_scored_at.is.null (the only honest 'never
scored' signal).
crontab + Dockerfile + .env.example (items 8 + 9): CRON_SCHEDULE and
OUTPUT_RETENTION_DAYS env vars were placebos — supercronic reads
/etc/cron.d/autocli verbatim and does not env-substitute. Dropped the
misleading env knobs from compose / Dockerfile / .env.example and added
a comment in crontab explaining the contract.
NOT addressed in this commit:
- Item 4 (migration upsert priority overwrite) — needs a follow-up
migration; pre-existing in main.
- Item 7 (/jobs schema) — empirically returns 500 rows with a loose
filter; PostgREST DOES expose the jobs schema in this project. The
reviewer's hypothesis was incorrect for this Supabase config. Pushing
back on this one with evidence.
- Items 10, 11 — pre-existing sync_autocli_jobs.py issues from main;
worth a separate cleanup PR.
|
Addressed in
Additional critical finding surfaced by Supabase MCP during verification:
CI on |
|
Server re-verified against
The |
Items 1, 2, 3 from PR review #4466756456: 1) New migration 20260516120000_fix_priority_upsert_data_loss.sql: recreates jobs.upsert_job so the ON CONFLICT DO UPDATE branches on the function PARAMETER (p_priority_score IS NOT NULL) instead of excluded.priority_score (which the INSERT body had already coerced from NULL to 0, making the case-when always true and silently zeroing prior scores). Same correction for priority_tier / scorer_version / signals / scored_at. Applied to production via Supabase MCP — verified success: True. 2) New migration 20260516120100_enable_jobs_jobs_rls.sql + GRANT migration: turns on RLS on jobs.jobs with a select-only policy for anon/authenticated, grants USAGE on the jobs schema and SELECT on the table to those roles. Server .env now uses the real anon JWT for SUPABASE_ANON_KEY (sync writes still use SUPABASE_SERVICE_ROLE_KEY which bypasses RLS). Combined with Cloudflare Access + Bearer this gives defence in depth. 3) /jobs endpoint now filters on created_at (database insert time) instead of post_time (LinkedIn original posting date — almost always older than today for fresh scrapes). Doc string updated; created_at added to the SELECT projection so clients can see it. Verified by direct REST against PostgREST + by python-in-container test (3 rows returned for since=today).
Companion to 20260516120100. RLS policies don't grant SELECT; PostgREST also needs the role to have USAGE on the schema and SELECT on the table. Already applied to production via Supabase MCP but the file was missing from the PR — without it a fresh project provisioning from these migrations would have count=0 on /jobs until the GRANT was applied manually.
|
Three follow-up items addressed in 1. Migration data-loss fix (commit 2. Anon key + RLS (commits
3. /jobs since semantics (was 🟡 trivial — fixed not deferred): PR now at 53 commits. CI green throughout. Ready when you are. |
Summary
Implements the auto-scheduled daily LinkedIn-recommended pipeline as a 5-container microservice on
100.108.80.9, publicly exposed via Cloudflare Tunnel + Access. Seedeploy/SPEC.mdfor design,deploy/PLAN.mdfor the walkthrough.What's live (verified):
autocli-chrome(Stagehand-style Chromium+VNC+CDP),autocli-daily(cron + FastAPI),cloudflared,prometheus,grafanaautocli-vnc.pumped.ink,autocli-api.pumped.ink,autocli-grafana.pumped.inkPOST /api/runtriggered a real LinkedIn scrape: 573 jobs upserted to Supabase (41 new + ~507 updated)What's deferred (separate follow-ups):
autocli-cdp.pumped.inkwith mTLS) — design in spec, not builtArchitecture highlights
d7cd312/6f60b06/6539458): wiresCdpPageintoBrowserBridge::connectbehindAUTOCLI_CDP_ENDPOINTso a containerised autocli drives a sibling Chrome container without the daemon+extension path.rust-toolchain.tomlpins workspace to 1.94 for local/CI/Phase 0 parity.:branch-<slug>+:sha-*;:mainonly onmain).autocli-<sub>.zone) because Free zones get Universal SSL only on apex + one-level wildcard.Notable bug fixes discovered during deploy
cdp-discover.shresolves to IP;/api/healthsendsHost: localhostrust:1.94-slim-bookwormcontainer; readelf gate fails build if binary needs >2.36source /run/cdp-endpoint.envset shell var only, autocli child didn't see envset -a/set +aaround sourcelast_run.jsonempty because grep+tail on pretty-printed JSON yielded just{5900:5900mapping (noVNC 6080 + Cloudflare ingress are the real paths)Test plan
bridge::tests::test_connect_uses_cdp_endpoint_when_env_var_setpasses100.108.80.9Notes for reviewer
mainhad unpushed priority-scoring work that got swept along by the rebase. The core daily-microservice changes are everything underdeploy/,.github/workflows/deploy-microservice.yml,rust-toolchain.toml, and the small bridge.rs patch.:branch-feat-daily-microservicefor staging; post-merge bump to:mainis a one-liner sed (documented indeploy/README.md).