feat: daily LinkedIn microservice + autocli CDP wiring + supporting fixes by RickSanchez88E · Pull Request #2 · RickSanchez88E/AutoCLI

RickSanchez88E · 2026-05-16T10:47:39Z

Summary

Implements the auto-scheduled daily LinkedIn-recommended pipeline as a 5-container microservice on 100.108.80.9, publicly exposed via Cloudflare Tunnel + Access. See deploy/SPEC.md for design, deploy/PLAN.md for the walkthrough.

What's live (verified):

5 healthy containers: autocli-chrome (Stagehand-style Chromium+VNC+CDP), autocli-daily (cron + FastAPI), cloudflared, prometheus, grafana
3 public Cloudflare-Access-gated subdomains: autocli-vnc.pumped.ink, autocli-api.pumped.ink, autocli-grafana.pumped.ink
Service Token (machine auth) + Bearer (app-layer auth) verified end-to-end
Forced POST /api/run triggered a real LinkedIn scrape: 573 jobs upserted to Supabase (41 new + ~507 updated)
Phase 4a probes 9/9 green via public HTTPS

What's deferred (separate follow-ups):

Phase 4b/4c (autocli-cdp.pumped.ink with mTLS) — design in spec, not built
30-day idle observation (Phase 6) — calendar-dependent

Architecture highlights

Hard Rust prereq (commits d7cd312/6f60b06/6539458): wires CdpPage into BrowserBridge::connect behind AUTOCLI_CDP_ENDPOINT so a containerised autocli drives a sibling Chrome container without the daemon+extension path.
rust-toolchain.toml pins workspace to 1.94 for local/CI/Phase 0 parity.
GHCR + Watchtower pull-based deploy with branch-safe slugified tags (feature branches get :branch-<slug> + :sha-*; :main only on main).
Cloudflare Tunnel in token mode; ingress + DNS + Access apps + Service Token all created via Cloudflare API (no dashboard clicks).
Subdomain naming flattened to one level (autocli-<sub>.zone) because Free zones get Universal SSL only on apex + one-level wildcard.

Notable bug fixes discovered during deploy

Issue	Fix
Chrome DevTools DNS-rebinding rejects service-name Host headers	`cdp-discover.sh` resolves to IP; `/api/health` sends `Host: localhost`
CI built binary on ubuntu-latest (GLIBC 2.39) vs runtime Debian Bookworm (GLIBC 2.36)	CI now runs in `rust:1.94-slim-bookworm` container; readelf gate fails build if binary needs >2.36
`source /run/cdp-endpoint.env` set shell var only, autocli child didn't see env	`set -a` / `set +a` around source
`last_run.json` empty because grep+tail on pretty-printed JSON yielded just `{`	Capture sync stdout to dedicated file, jq direct
Token-mode cloudflared running on Mac as a second replica caused intermittent 502s	Operator Ctrl+C'd Mac one; server container is sole replica
Stagehand 5900 host port already taken on prod host	Drop `5900:5900` mapping (noVNC 6080 + Cloudflare ingress are the real paths)

Test plan

Rust unit: bridge::tests::test_connect_uses_cdp_endpoint_when_env_var_set passes
FastAPI: 9 pytest tests all green (auth + route shape)
Phase 0 local: ELF x86-64 verified inside daily image, autocli --version returns
Phase 2 CI: green on every commit, correct tag policy
Phase 3: 5 containers up + healthy on 100.108.80.9
Phase 4a: 9/9 public probes match expected codes (302/200/401)
Phase 5: real LinkedIn scrape, 573 jobs upserted (verified via Supabase MCP)
Phase 6: 2 consecutive scheduled 03:00 BST runs — calendar-dependent, post-merge

Notes for reviewer

Branch carries 40+ commits because local main had unpushed priority-scoring work that got swept along by the rebase. The core daily-microservice changes are everything under deploy/, .github/workflows/deploy-microservice.yml, rust-toolchain.toml, and the small bridge.rs patch.
After merge, server compose was sed'd to :branch-feat-daily-microservice for staging; post-merge bump to :main is a one-liner sed (documented in deploy/README.md).

Create scripts/job_priority_config.py with all configuration constants, regex patterns, and keyword sets for the deterministic job priority scoring system. Contains no scoring logic -- only configuration to be imported by the scorer, sync pipeline, backfill scripts, and tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Pure, deterministic scoring engine for AutoCLI jobs with 8 components: compensation, role fit, seniority, work arrangement, application path, freshness, data completeness, and source quality. Includes penalty system, hard-reject guard, and tier mapping (high/medium/low/reject). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

REPEATED_PUNCT_RE used {2,} which matches 3+ total consecutive punctuation chars (e.g. "!!!" -> "!"). Changed to {1,} so 2+ consecutive chars are collapsed (e.g. "!!" -> "!", "!!!" -> "!"). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Import score_job in sync_autocli_jobs.py and call it per-record - Pass ScoreResult fields (priority_score, priority_tier, priority_version, priority_signals) to upsert_job RPC - Add --disable-scoring flag for testing - Report priority score distribution in dry-run mode - Add comprehensive test suite (104 tests across 14 classes) covering all 8 scoring components, penalties, hard-reject guard, edge cases, and integration scenarios Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Migration 20260509182000: add priority scoring columns to jobs.jobs table (priority_score, priority_tier, priority_version, priority_signals, priority_scored_at) - Migration 20260509184000: add update_job_priority_score RPC that only touches scoring fields (not the full row), with schema-scoped and public wrappers - scripts/backfill_priority_scores.py: batch backfill script with --force, --limit, --dry-run, --env-file options; reconstructs job_data from raw_record or DB columns; reports per-row scores, tiers, and errors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-authored-by: Codex <noreply@openai.com>

- Rename priority_version column to priority_scorer_version in both migrations - Add 'unknown' to priority_tier check constraint - Fix indices to include last_seen_at desc and priority_score desc per spec - Add --min-priority-score and --priority-tier CLI flags for optional filtering - Enhance dry-run with top_priority_jobs, low_priority_count, priority_tiers - Add source-quality summary (recruiter/aggregator/raw-jd-fallback counts) - Update backfill RPC param name to match column rename

Design covers: - 6-container stack: chrome (Stagehand), daily (cron+FastAPI), cloudflared, prometheus, grafana - Cloudflare Tunnel + Access for public exposure of /vnc /cdp /api /jobs /grafana - GHCR + Watchtower pull-based deploy - Phased acceptance criteria with verification commands Worktree: feat/daily-microservice (branched from main).

Critical fixes: - Add prereq section for autocli BrowserBridge CDP-wiring patch - Fix cargo build to use package name 'autocli' (was '-cli') - Switch /jobs to client.schema('jobs').table('jobs') API - Use /json/list + page target (was /json/version, browser-level) - Rewrite ws host localhost->autocli-chrome:9222 - Standardize on SUPABASE_SERVICE_ROLE_KEY - Make API_RUN_TOKEN actually enforced + tested in Phase 4 - Add machine-verifiable Cloudflare Access gate before cdp ingress High-severity fixes: - Feature branches publish :branch-*+:sha-* only; :main from main - Pin cloudflared/prometheus/grafana to specific semver - Switch Cloudflare Tunnel to --token mode (no config.yml mix) - Replace path routes with 5 subdomains (avoids prefix-strip) - Split Access into two policies per Application (Token OR Email) - Drop Grafana Infinity plugin dependency - VNC password generated random in prod (no 'stagehand' default) - shred only temp copy of operator secrets, never the source - Unify retry to 3-attempts/15-60-240s across code+runbook+metrics - Add explicit restart: unless-stopped to autocli-daily - Specify Prometheus metrics_path: /api/metrics - Unified CI build context = repo root for both Dockerfiles - Note GHCR creds already configured on target host

Bugs: - L103: component table referenced stale /metrics path -> /api/metrics - L209/L236: github.ref_name with '/' produces invalid Docker tags; switch to docker/metadata-action's type=ref,event=branch which slugifies - L321: /json/new requires PUT, not POST (Chrome >= M86) - L354: jobs.autocli/ routed to backend root but /jobs is the actual route; drop the jobs subdomain entirely, serve via api.autocli/jobs (4 subdomains) - L473: Phase 0 build context disagreed with CI; unify on repo root - L522: Phase 4 step 2 implied Service Token works on vnc/grafana where no machine policy exists; split per-subdomain expectations - L526: Phase 4 probed cdp.autocli before the spec said cdp ingress was added; split Phase 4 into 4a (pre-CDP gate) / 4b (add cdp ingress) / 4c (cdp probes) - L549: Phase 5 status call missing Bearer Risks: - L486: Phase 1 status call missing Bearer; added Also: - Fix '6 services' / '6 new containers' counts; actual count is 5 - Update §2.2 boundaries note from /json/version to /json/list + PUT /json/new

Bugs: - L107: discovery wasn't actually re-run per /api/run; updated process tree so run-daily.sh calls cdp-discover.sh before each run, then sources /run/cdp-endpoint.env. §5.2 'Boot ordering' split into 'Discovery cadence' (boot + per-run) and 'Boot ordering'. - L292: process tree now spells out 'PUT /json/new?about:blank' for the empty-list case, matching §5.2. - L429: API_RUN_TOKEN row now lists every Bearer-protected route (/api/status, /api/run, /api/logs, /jobs) plus the open ones (/api/health, /api/metrics). - L496: Phase 2 acceptance split into feature-branch expectation (:branch-feat-daily-microservice + :sha-*) vs post-merge expectation (:main + :sha-*). Phase 3 explicitly reads :main. Risks / nits: - L129: file layout comment changed to match real routes - L358: 'five hostnames' -> 'four hostnames' after dropping jobs subdomain - L628: §9 risk #1 rewritten to reference Phase 4a/4b/4c and three pre-CDP subdomains, not the old Phase 4 step 1+2 / four subdomains.

Bugs: - L211: slugifier comment was inside 'tags: |' literal block where metadata-action would parse it as a rule. Moved above 'tags:' as a proper YAML comment. - L478: Phase 0 'cargo build' on operator's arm64-darwin host then COPY into linux/amd64 image would inject a Mach-O. Replaced with a docker run rust:1.81-slim-bookworm --platform linux/amd64 builder step + 'file' verification ('ELF 64-bit'). CI is unchanged because ubuntu-latest is already linux/amd64. Risks: - L312: /api/metrics annotation 'only reachable via docker network' was misleading — api.autocli does expose it externally via Cloudflare Access. Re-annotated both /api/metrics and /api/health as dual-path: internal direct, external via Access. - L368: Tailscale-CGNAT IP gate on cdp.autocli would never match — Cloudflare sees public/WARP egress IP, not 100.x. Replaced with dedicated short-lived Service Token + mTLS client cert (machines), email OTP + required WARP posture (humans). §9 risk #1 and Phase 4b/4c reworded to match. Nit: - L551: Phase 4a 'Authenticated' header was wrong for vnc/grafana (no auth sent). Renamed to 'humans-only negative machine probe' and made it actually send a Service Token to prove it gets denied. Phase 4c also updated to use mTLS-style probes (dedicated CF_ID_CDP) plus a websocket-upgrade smoke test for the CDP surface.

Bugs (all on Phase 4c WebSocket probe): - curl -sI is HEAD; WebSocket Upgrade requires GET. Replaced with proper websocat client (preferred) and curl --http1.1 -i -N fallback. No more -I anywhere on the WS path. - /devtools/page/<id> placeholder cannot run as written. Probe now extracts the actual page id by GETing /json/list, picking the first type:'page' target, and rewriting host to cdp.autocli. - 'HTTP/2 101 Switching Protocols' does not exist — 101 is HTTP/1.1 semantics, and Cloudflare does not speak RFC 8441 multiplexed WS. Probe now forces --http1.1 and expects 'HTTP/1.1 101 Switching Protocols'. The websocat path checks for a CDP round-trip (Target.getTargets) instead. Phase 4c renumbered 4c-1..4c-4 to make the four checks explicit.

Was 1.81 — arbitrary, drifting from operator's rustc 1.94.1 and from ubuntu-latest's default stable. Pinned to 1.94-slim-bookworm so Phase 0 / CI / dev agree. Spec also notes the long-term hardening (repo-tracked rust-toolchain.toml) — that task is included in the implementation plan.

34 bite-sized tasks across 8 phases (A-H), each with TDD substeps, exact file paths, exact commands, expected outputs. Covers: Phase A: rust-toolchain.toml + BrowserBridge CDP patch + smoke test Phase B: deploy/ scaffold (chrome, daily, prometheus, grafana, compose) Phase C: GitHub Actions workflow Phase D: Phase 0 image build (Docker rust 1.94) + Phase 1 local e2e Phase E: GHCR push (Phase 2) Phase F: 100.108.80.9 server bring-up (Phase 3) Phase G: Cloudflare Tunnel + Access (Phase 4a/4b/4c) Phase H: forced run + monitoring (Phase 5) + schedule rollover (Phase 6) Plan is self-contained — no TBDs or 'similar to Task N' placeholders. Self-review section maps every SPEC section to its implementing task(s).

Aligns local dev (operator was on rustc 1.94.1), CI (was using ubuntu-latest default), and the Phase 0 Docker builder (deploy/SPEC.md). Single source of truth; future bumps touch only this file.

Add AUTOCLI_CDP_ENDPOINT env-var branch at the top of BrowserBridge::connect. When set, skip daemon spawn + extension polling and return Arc<CdpPage> directly. The IPage trait contract is unchanged so pipeline executors and YAML adapters consume either implementation transparently. Required prerequisite for the autocli-daily microservice (deploy/SPEC.md §1.A) which runs autocli in a container with no Chrome extension or daemon, connecting to a sibling Chrome container via CDP.

Two robustness improvements from code review: - RAII Drop guard ensures AUTOCLI_CDP_ENDPOINT is cleared even if the test panics mid-way, preventing cross-test env leakage. - Assert on CliError::BrowserConnect variant directly instead of string-matching the Display output. Resilient to future error-message wording changes.

Copy of my-stagehand-app/Dockerfile.chrome with the COPY path rewritten for repo-root build context (deploy/SPEC.md §4.1).

Verbatim from my-stagehand-app/scripts/entrypoint-vnc.sh: Xvfb -> x11vnc -> noVNC -> socat 9222->9223 -> Chromium with --remote-debugging-port=9223 --user-data-dir=/root/.config/chromium. Extension loading via /opt/extensions/*/manifest.json is preserved even though this design ships with no extensions.

Multi-arch-aware single-stage image: - python:3.12-slim-bookworm base - tini (PID 1), util-linux (flock), jq (CDP discovery), curl (probes) - supercronic (container cron) pinned to v0.2.30 with sha1 verify - uv (Astral) for Python deps - Pre-built autocli binary copied from deploy/daily/bin/ - FastAPI app + scripts/sync_autocli_jobs.py - Boot via tini -> entrypoint.sh - TZ=Europe/London, CRON_SCHEDULE default 03:00. DONE_WITH_CONCERNS: scripts/job_priority_scorer.py and scripts/job_priority_config.py are absent from the worktree; their COPY lines have been omitted from the Dockerfile.

Find or create a CDP page target on autocli-chrome:9222. - GET /json/list, pick first type:page - if list is empty, PUT /json/new?about:blank (Chrome >= M86) - rewrite host (localhost:9223 -> autocli-chrome:9222) so the WS URL is reachable from the daily container's network namespace - write to /run/cdp-endpoint.env (sourced by run-daily.sh) - 60s retry budget; exit 1 on timeout (entrypoint exits non-zero, restart: unless-stopped recreates container until chrome ready).

- flock -n to prevent cron + /api/run from colliding - per-attempt cdp-discover refresh (page id may have rotated) - runs autocli linkedin recommended -> JSON -> sync_autocli_jobs.py - unified retry: 3 attempts at 15s/60s/240s (SPEC §5.2) - writes /data/output/last_run.json consumed by /api/status.

Boot-time cdp-discover gate, then runs supercronic + uvicorn in parallel under tini. wait -n exits as soon as either child dies, so compose's restart policy can pick up failure modes (e.g. uvicorn panic, supercronic crash).

03:00 daily LinkedIn pull + 04:00 30-day output retention sweep (SPEC §5.2). TZ resolved by the container's TZ=Europe/London.

After rebase onto local main, scripts/job_priority_scorer.py and scripts/job_priority_config.py are present. sync_autocli_jobs.py imports them at runtime, so the daily image must ship all three.

uv-managed; pins fastapi/uvicorn/supabase/prometheus-client/httpx to compatible ranges. Lockfile checked in so the Dockerfile's 'uv sync --frozen' is reproducible.

Used by POST /api/run to spawn run-daily.sh non-blockingly. is_running() is a non-destructive flock probe so /api/status can report in_progress without affecting the actual run.

Routes per SPEC §5.1: GET /api/health [open] chrome reachability + cdp file probe GET /api/metrics [open] Prometheus exposition (delta-aware counters) GET /api/status [Bearer] last_run.json + in_progress POST /api/run [Bearer] spawn run-daily.sh, 409 if already running GET /api/logs [Bearer] tail of latest log (default 200 lines) GET /jobs [Bearer] Supabase 'jobs.jobs' read proxy via client.schema('jobs').table('jobs'). Import style B: 'import trigger' (flat), because entrypoint.sh does 'cd /app/api && uvicorn main:app' — no package context, flat import works.

9 tests covering: - /api/status, /api/run, /api/logs, /jobs all return 401 without Bearer and 401 with wrong Bearer - /api/status default-shape + reflects last_run.json - /api/metrics is open and contains the autocli_daily_ family - /api/health returns 503 when chrome:9222 unreachable. conftest.py adds deploy/daily/api to sys.path (flat import, matching entrypoint.sh's 'cd /app/api && uvicorn main:app' invocation). Prometheus registry is cleared before each fresh module import to avoid duplicate-timeseries errors across test fixtures.

Single job scraping autocli-daily:8080/api/metrics every 15s. metrics_path is required because FastAPI mounts under /api/*.

- Datasource: Prometheus at prometheus:9090 (uid prom-autocli) - Dashboard provider points at /etc/grafana/provisioning/dashboards - autocli.json: time-since-last-run, last exit code, rows-upserted-today, CDP-up %, daily scraped/upserted/skipped time series, duration - No plugin dependencies (Infinity dropped per L313 review).

5 services on shared autocli-net bridge: - autocli-chrome (Stagehand, watchtower-tracked, healthcheck on 9222) - autocli-daily (cron+FastAPI, watchtower-tracked, depends_on chrome healthy, env scoped to Supabase creds only) - cloudflared (Tunnel token mode, depends_on daily healthy) - prometheus (pinned, 90-day retention) - grafana (pinned, anon disabled, signup disabled, admin from env) Named volumes for profile / output / tsdb / grafana state.

Binds host ports under non-conflicting numbers (6081/5902/9223/8081/ 9091/3001) so the operator can keep their existing local Chrome and Grafana running alongside. cloudflared moved to a 'disabled' profile.

All required environment variables with empty values + inline generator hints. Real .env never committed (.gitignore already covers it under '.env').

Quickstart, Cloudflare dashboard checklist, forced-run snippet, common-failure table. Points back at SPEC + PLAN for the why.

3 jobs: 1. build-autocli-binary: cargo build --release -p autocli on ubuntu-latest (linux/amd64) with Swatinem cache; uploads artifact 2. build-chrome-image: builds deploy/chrome from repo-root context; docker/metadata-action generates :main on main, :branch-<slug> on feature branches, :sha-<short> always 3. build-daily-image: downloads the autocli artifact, builds deploy/daily from repo-root context, same tag policy Path filters include rust-toolchain.toml so a toolchain bump triggers a rebuild.

The placeholder value was wrong (build failed with 'computed checksum did NOT match'). Verified by downloading the GitHub release asset and computing sha1sum from the operator's laptop.

CI builds the binary as a separate job and uploads as artifact; Phase 0 locally rebuilds inside a Docker rust container and writes to deploy/daily/bin/. Never commit this file (it's ~8MB).

rick-ubuntu-ssh tunnel's running replica is 2026.3.0 (per Zero Trust dashboard). Our container joins as a 2nd HA replica; matching the connector version avoids mixed-version edge cases.

Prod host (100.108.80.9) already has a process bound to :5900, so the 5900:5900 mapping failed container networking. Native VNC is only a local convenience and is NOT part of the Cloudflare ingress; noVNC on 6080 (+ vnc.autocli route) is the real access path. Container still listens on 5900 internally for websockify -> noVNC.

Chrome DevTools rejects /json* and /devtools Host headers that aren't an IP or localhost. Reaching autocli-chrome by docker service name failed with 'Host header is specified and is not an IP address or localhost'. - cdp-discover.sh: resolve CHROME_HOST -> container IP (getent, python fallback); use the IP for the /json probe AND the rewritten ws:// URL so every Host header Chrome sees is an IP. Re-resolved each run. - main.py /api/health: send Host: localhost on the liveness probe (yes/no check, body unused). Found during Phase 3 server bring-up; daily container was crash-looping on 'chrome unreachable after 60s' despite DNS + same-network OK.

Free Cloudflare zones get Universal SSL covering only <zone> + one-level *.<zone>. Two-level subdomains like vnc.autocli.<zone> handshake-fail ('Unauthorized' / sslv3 alert) until the operator upgrades to Pro, Total TLS, or ACM. Rename across SPEC / PLAN / README: vnc.autocli.<zone> -> autocli-vnc.<zone> cdp.autocli.<zone> -> autocli-cdp.<zone> api.autocli.<zone> -> autocli-api.<zone> grafana.autocli.<zone> -> autocli-grafana.<zone> §9 risk nashsu#4 now documents the Free-plan SSL constraint as the reason for the flat naming.

Host ubuntu-latest gives GLIBC 2.39 binaries that fail to load in the daily runtime image (Debian Bookworm = GLIBC 2.36) with 'GLIBC_2.39 not found'. Pin build container to rust:1.94-slim-bookworm so binary GLIBC requirements match runtime. Also adds a readelf-based check that fails the build if the binary's max GLIBC requirement exceeds 2.36.

`source /run/cdp-endpoint.env` only sets a shell variable; without export, the autocli child process never sees AUTOCLI_CDP_ENDPOINT and falls through to BrowserBridge's daemon path ("Chrome is not running"). Wrap source with `set -a`/`set +a` so the assignment auto-exports as an env var that survives across fork/exec.

sync_autocli_jobs.py pretty-prints its summary with indent=2: { "input_rows": 573, "upserted": 573, ... } The old run-daily.sh did 'grep "^{" log | tail -1' which matched only the opening '{' line, yielding invalid JSON. Subsequent jq parses failed silently, --argjson got empty values, the final jq -n -> dev/null overwrote LAST_RUN_JSON with an empty file. Fix: redirect sync stdout to /tmp/sync-DATE-N.json, also append to log, then jq parses the captured JSON directly. Status now correctly reflects rows_scraped/upserted/skipped from each run.

When run-daily.sh did 'exec 9>LOCK; flock 9' and then invoked autocli, bash's FD 9 inherited into the autocli process by default. If autocli took the daemon-path fallback (pre-env-export fix; or any future code path that spawns a daemon), the detached 'autocli --daemon' child inherited FD 9 too and held the lock for its lifetime. is_running() then returned True forever, breaking /api/status. Add '9>&-' to autocli and uv invocations so children can't see or hold the lock. Verified by /proc/<pid>/fd inspection in production.

RickSanchez88E · 2026-05-16T11:38:40Z

crates/autocli-browser/src/cdp.rs:L302: 🔴 bug: CDP close sends Browser.close; post-run page.close kills shared Chrome. Use Target.closeTarget or make CDP close no-op.

deploy/chrome/entrypoint-vnc.sh:L38: 🔴 bug: -nopw disables VNC auth; noVNC exposes logged-in browser without password. Drop -nopw; rely on -rfbauth.

deploy/docker-compose.yml:L20: 🔴 bug: 9222:9222 publishes unauthenticated CDP outside Cloudflare Access. Bind 127.0.0.1 or remove host port.

supabase/migrations/20260509182000_add_priority_scoring_columns.sql:L140: 🔴 bug: unscored upsert overwrites old priority with 0 because insert coerces null to default. Branch on p_priority_score, not excluded.priority_score.

scripts/backfill_priority_scores.py:L136: 🔴 bug: client.table("jobs.jobs") queries a public table literally named jobs.jobs. Use client.schema("jobs").table("jobs") or public.jobs_jobs.

scripts/backfill_priority_scores.py:L147: 🔴 bug: backfill skips migrated rows; priority_score is NOT NULL default 0 and version defaults current. Filter priority_scored_at.is.null instead.

deploy/daily/api/main.py:L164: 🔴 bug: /jobs uses anon client against schema("jobs"), but migrations expose public.jobs_jobs view. Query public.jobs_jobs or expose jobs schema to PostgREST.

deploy/daily/crontab:L3: 🟡 risk: CRON_SCHEDULE env ignored; compose override never changes schedule. Generate crontab from env at entrypoint or remove env.

deploy/daily/crontab:L6: 🟡 risk: OUTPUT_RETENTION_DAYS ignored; retention is hardcoded to 30. Generate crontab from env or remove the env knob.

scripts/sync_autocli_jobs.py:L468: 🟡 risk: scoring exceptions are swallowed and pipeline exits success with scored:0. Log row error and fail when scoring is enabled.

scripts/sync_autocli_jobs.py:L555: 🟡 risk: dry-run reads application_path, but scorer writes application_friction; aggregator summary stays zero. Use application_friction.

cdp.rs (item 1): IPage::close was sending Browser.close, which kills the SHARED Chrome in CDP-direct mode (and every other consumer attached to it). Made it a no-op with explanation. Callers that need per-page cleanup should send Target.closeTarget directly. entrypoint-vnc.sh (item 2): -nopw was overriding -rfbauth and leaving VNC open with no password. Anyone reaching :5900/6080 (via Tailscale or any leaked path) could drive the logged-in browser. Removed the flag; password auth from /root/.vnc/passwd is now enforced. docker-compose.yml (item 3 + defense-in-depth on 6080): bound both 6080 and 9222 host ports to 127.0.0.1 only. Public path is Cloudflare Tunnel + Access; direct host-port access would bypass every auth layer. Backup: 'ssh -L 6080:localhost:6080' from a Tailscale-connected box. backfill_priority_scores.py (items 5 + 6): client.table('jobs.jobs') queried a literal 'jobs.jobs' name in public schema (always 0 rows); fixed to client.schema('jobs').table('jobs'). Filter also moved from priority_score.is.null (already NOT NULL DEFAULT 0 post-migration, so matches nothing) to priority_scored_at.is.null (the only honest 'never scored' signal). crontab + Dockerfile + .env.example (items 8 + 9): CRON_SCHEDULE and OUTPUT_RETENTION_DAYS env vars were placebos — supercronic reads /etc/cron.d/autocli verbatim and does not env-substitute. Dropped the misleading env knobs from compose / Dockerfile / .env.example and added a comment in crontab explaining the contract. NOT addressed in this commit: - Item 4 (migration upsert priority overwrite) — needs a follow-up migration; pre-existing in main. - Item 7 (/jobs schema) — empirically returns 500 rows with a loose filter; PostgREST DOES expose the jobs schema in this project. The reviewer's hypothesis was incorrect for this Supabase config. Pushing back on this one with evidence. - Items 10, 11 — pre-existing sync_autocli_jobs.py issues from main; worth a separate cleanup PR.

RickSanchez88E · 2026-05-16T11:46:11Z

Addressed in af1ada7. Per-item:

#	Item	Action
1	`cdp.rs` Browser.close	✅ Fixed — made `IPage::close` a no-op in CDP-direct mode; killed the "kill-the-shared-Chrome" foot-gun. Per-page cleanup should use `Target.closeTarget`.
2	`-nopw`	✅ Removed. `-rfbauth` now actually enforced.
3	`9222:9222` host port	✅ Fixed — both `9222` AND `6080` bound to `127.0.0.1` only. Public path is Cloudflare Tunnel + Access; direct host-port was bypassing every auth layer. (Backup: `ssh -L 6080:localhost:6080`.)
4	migration upsert	⏭️ Out of this PR's scope — needs a follow-up migration to rewrite `upsert_job` branching on `p_priority_score IS NOT NULL` (the RPC param) instead of `excluded.priority_score IS NOT NULL` (which is always non-null because of `NOT NULL DEFAULT 0`). The buggy migration is already applied in prod, so the fix needs a new migration revision. Filed as a follow-up.
5	backfill `client.table("jobs.jobs")`	✅ Fixed — `client.schema("jobs").table("jobs")`. (My earlier fix on the indeed branch got reverted by the rebase to origin/main.)
6	backfill filter	✅ Fixed — `priority_scored_at.is.null` instead of `priority_score.is.null` (the latter matches nothing post-migration).
7	`/jobs` schema	❌ Pushing back — empirically the endpoint returns rows. Test from the public path: `curl ".../jobs?since=1970-01-01" → {"count":500, ...}` with real LinkedIn job rows. PostgREST in this Supabase project IS exposing the `jobs` schema (visible in the Supabase API → Exposed schemas setting). The 0-rows I observed earlier was the `post_time >= today` filter, not a schema-exposure issue — LinkedIn `post_time` is the original posting date (often days old), not `created_at`. If we want "added since" semantics, that's a separate UX fix (switch to `created_at`). Happy to make that change if you confirm the semantics.
8	`CRON_SCHEDULE` env ignored	✅ Fixed — dropped the placebo env from compose / Dockerfile / .env.example. supercronic reads the crontab verbatim with no env substitution. Schedule lives in `deploy/daily/crontab`; comment now explains.
9	`OUTPUT_RETENTION_DAYS` env ignored	✅ Same fix as nashsu#8.
10	sync_autocli_jobs.py L468 scoring swallow	⏭️ Pre-existing in `main` (rebase-carried). Real bug worth fixing in a focused PR on `scripts/sync_autocli_jobs.py`.
11	sync_autocli_jobs.py L555 application_path/friction	⏭️ Same as nashsu#10 — pre-existing, dry-run-only impact.

Additional critical finding surfaced by Supabase MCP during verification:

jobs.jobs has RLS DISABLED. Combined with us mapping SUPABASE_KEY (service-role) to BOTH SUPABASE_SERVICE_ROLE_KEY and SUPABASE_ANON_KEY in the daily container's env, the /jobs route effectively runs with service-role privileges — anyone who passes Cloudflare Access + Bearer can read the entire jobs.jobs table.
Recommended follow-up: (a) create a real anon key, (b) enable RLS with a read-only policy for anon, (c) /jobs continues using anon → defense-in-depth restored. Service-role stays scoped to the sync write path.
Reproducing SQL (don't auto-apply): ALTER TABLE jobs.jobs ENABLE ROW LEVEL SECURITY; plus appropriate CREATE POLICY ... statements.

CI on af1ada7 is rebuilding now; will pull + re-verify on server once green and report back.

RickSanchez88E · 2026-05-16T11:53:00Z

Server re-verified against af1ada7:

5 containers healthy after recreate
9222 + 6080 host bindings now 127.0.0.1 only — external nc 100.108.80.9 9222 / 6080 both refused
Public path https://autocli-vnc.pumped.ink still 302→Access ✅
/api/health (Bearer-protected) returns {"chrome": true, "cdp_endpoint_file": true} ✅
/api/status still reflects the last 572-row successful run

The -nopw removal means the VNC web client now requires the VNC password (in server .env as VNC_PASSWORD) after Cloudflare Access — defense in depth restored.

Items 1, 2, 3 from PR review #4466756456: 1) New migration 20260516120000_fix_priority_upsert_data_loss.sql: recreates jobs.upsert_job so the ON CONFLICT DO UPDATE branches on the function PARAMETER (p_priority_score IS NOT NULL) instead of excluded.priority_score (which the INSERT body had already coerced from NULL to 0, making the case-when always true and silently zeroing prior scores). Same correction for priority_tier / scorer_version / signals / scored_at. Applied to production via Supabase MCP — verified success: True. 2) New migration 20260516120100_enable_jobs_jobs_rls.sql + GRANT migration: turns on RLS on jobs.jobs with a select-only policy for anon/authenticated, grants USAGE on the jobs schema and SELECT on the table to those roles. Server .env now uses the real anon JWT for SUPABASE_ANON_KEY (sync writes still use SUPABASE_SERVICE_ROLE_KEY which bypasses RLS). Combined with Cloudflare Access + Bearer this gives defence in depth. 3) /jobs endpoint now filters on created_at (database insert time) instead of post_time (LinkedIn original posting date — almost always older than today for fresh scrapes). Doc string updated; created_at added to the SELECT projection so clients can see it. Verified by direct REST against PostgREST + by python-in-container test (3 rows returned for since=today).

Companion to 20260516120100. RLS policies don't grant SELECT; PostgREST also needs the role to have USAGE on the schema and SELECT on the table. Already applied to production via Supabase MCP but the file was missing from the PR — without it a fresh project provisioning from these migrations would have count=0 on /jobs until the GRANT was applied manually.

RickSanchez88E · 2026-05-16T12:10:24Z

Three follow-up items addressed in 5143d21 + fix the RLS GRANT migration:

1. Migration data-loss fix (commit 5143d21, file supabase/migrations/20260516120000_fix_priority_upsert_data_loss.sql):
Recreated jobs.upsert_job so the ON CONFLICT DO UPDATE branches on the function PARAMETER p_priority_score IS NOT NULL instead of excluded.priority_score (which the INSERT side had already coerced from NULL to 0, making the case-when always true and silently zeroing prior scores). Same correction for tier / scorer_version / signals / scored_at. Applied to production via Supabase MCP — apply_migration returned {success: true}.

2. Anon key + RLS (commits 5143d21 + grant migration):

Server .env switched: SUPABASE_ANON_KEY now holds the project's real legacy anon JWT (eyJ...role:anon), not the service-role key. Service-role still wires only to SUPABASE_SERVICE_ROLE_KEY for the sync write path.
Migrations 20260516120100_enable_jobs_jobs_rls.sql + 20260516120200_grant_anon_read_jobs_jobs.sql enable RLS on jobs.jobs, grant USAGE on the schema + SELECT on the table to anon, authenticated, and create a select-only policy anon_read_jobs_jobs (USING (true)).
Verified end-to-end:
- GET /jobs?since=2026-05-16 via Cloudflare returns count=42 real LinkedIn rows.
- PATCH /rest/v1/jobs?id=eq.<real-uuid> with anon key + Prefer: return=representation returns [] (0 rows affected) — RLS blocks the write. Read-back confirms priority_score unchanged.

3. /jobs since semantics (was 🟡 trivial — fixed not deferred):
gte("post_time", since) → gte("created_at", since). post_time is LinkedIn's posting date (often days/weeks old even for fresh scrapes); created_at is the database insert time which matches the "jobs added since" expectation. Ordering also flipped to created_at desc. created_at added to the response projection so callers can see it. 9/9 FastAPI tests still pass.

PR now at 53 commits. CI green throughout. Ready when you are.

RickSanchez88E_a8cc and others added 30 commits May 9, 2026 19:45

Normalize job fields before scoring

0c25ca8

Co-authored-by: Codex <noreply@openai.com>

fix: priority scoring backfill + JD fallback

51b9a5c

Merge branch 'feat/job-priority-scoring-v1'

7450fe9

chore: remove tracked local artifacts

fbb21f9

chore: pin workspace Rust toolchain to 1.94

e07db5d

Aligns local dev (operator was on rustc 1.94.1), CI (was using ubuntu-latest default), and the Phase 0 Docker builder (deploy/SPEC.md). Single source of truth; future bumps touch only this file.

feat(deploy): chrome image Dockerfile

3562fad

Copy of my-stagehand-app/Dockerfile.chrome with the COPY path rewritten for repo-root build context (deploy/SPEC.md §4.1).

feat(deploy): daily entrypoint.sh

e19ff41

Boot-time cdp-discover gate, then runs supercronic + uvicorn in parallel under tini. wait -n exits as soon as either child dies, so compose's restart policy can pick up failure modes (e.g. uvicorn panic, supercronic crash).

feat(deploy): supercronic crontab

6cf906a

03:00 daily LinkedIn pull + 04:00 30-day output retention sweep (SPEC §5.2). TZ resolved by the container's TZ=Europe/London.

fix(deploy): copy job_priority_scorer + config into daily image

2b9a1ab

After rebase onto local main, scripts/job_priority_scorer.py and scripts/job_priority_config.py are present. sync_autocli_jobs.py imports them at runtime, so the daily image must ship all three.

feat(deploy): FastAPI project metadata + lockfile

52c7277

uv-managed; pins fastapi/uvicorn/supabase/prometheus-client/httpx to compatible ranges. Lockfile checked in so the Dockerfile's 'uv sync --frozen' is reproducible.

RickSanchez88E_a8cc added 20 commits May 16, 2026 02:17

feat(deploy): trigger.py — shared run-daily executor

05ad14f

Used by POST /api/run to spawn run-daily.sh non-blockingly. is_running() is a non-destructive flock probe so /api/status can report in_progress without affecting the actual run.

feat(deploy): prometheus scrape config

f64ffb3

Single job scraping autocli-daily:8080/api/metrics every 15s. metrics_path is required because FastAPI mounts under /api/*.

feat(deploy): local-only override

1e9f37b

Binds host ports under non-conflicting numbers (6081/5902/9223/8081/ 9091/3001) so the operator can keep their existing local Chrome and Grafana running alongside. cloudflared moved to a 'disabled' profile.

feat(deploy): .env.example template

fac1f4d

All required environment variables with empty values + inline generator hints. Real .env never committed (.gitignore already covers it under '.env').

docs(deploy): operator-facing README + runbook

5619a6d

Quickstart, Cloudflare dashboard checklist, forced-run snippet, common-failure table. Points back at SPEC + PLAN for the why.

fix(deploy): correct supercronic v0.2.30 sha1sum

a5f55f5

The placeholder value was wrong (build failed with 'computed checksum did NOT match'). Verified by downloading the GitHub release asset and computing sha1sum from the operator's laptop.

chore: gitignore Phase 0 local autocli binary output

e8a9063

CI builds the binary as a separate job and uploads as artifact; Phase 0 locally rebuilds inside a Docker rust container and writes to deploy/daily/bin/. Never commit this file (it's ~8MB).

fix(deploy): pin cloudflared to 2026.3.0 (match live tunnel replica)

e692a40

rick-ubuntu-ssh tunnel's running replica is 2026.3.0 (per Zero Trust dashboard). Our container joins as a 2nd HA replica; matching the connector version avoids mixed-version edge cases.

RickSanchez88E_a8cc added 2 commits May 16, 2026 13:04

RickSanchez88E merged commit d074028 into main May 16, 2026

RickSanchez88E deleted the feat/daily-microservice branch May 16, 2026 13:41

RickSanchez88E mentioned this pull request May 16, 2026

fix(sync): surface scoring exceptions + fix aggregator_like_rows (PR #2 follow-up items 10 & 11) #3

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: daily LinkedIn microservice + autocli CDP wiring + supporting fixes#2

feat: daily LinkedIn microservice + autocli CDP wiring + supporting fixes#2
RickSanchez88E merged 53 commits into
mainfrom
feat/daily-microservice

RickSanchez88E commented May 16, 2026

Uh oh!

RickSanchez88E commented May 16, 2026

Uh oh!

RickSanchez88E commented May 16, 2026

Uh oh!

RickSanchez88E commented May 16, 2026

Uh oh!

RickSanchez88E commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RickSanchez88E commented May 16, 2026

Summary

Architecture highlights

Notable bug fixes discovered during deploy

Test plan

Notes for reviewer

Uh oh!

RickSanchez88E commented May 16, 2026

Uh oh!

RickSanchez88E commented May 16, 2026

Uh oh!

RickSanchez88E commented May 16, 2026

Uh oh!

RickSanchez88E commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant