PathScout is a local-only role discovery CLI for finding high-fit startup opportunities before they become obvious job posts.
It fetches broad signals, scores them against a personal fit profile, stores deduped observations in SQLite, and emits a canonical JSON artifact plus a readable Markdown digest.
- A local-only CLI for monitoring companies, careers pages, RSS feeds, portfolio lists, and manual notes.
- A fit-profile engine for surfacing target roles, hidden-search hypotheses, and weaker watch signals.
- An explainable findings scanner: every surfaced item includes score, tier, reasons, flags, source metadata, and suppression state.
- It is not a hosted marketplace.
- It is not a recruiting CRM.
- It is not a general-purpose job board scraper.
- It does not provide hosted storage, sync, or remote persistence.
From GitHub:
pipx install git+https://github.com/ckoglmeier/pathscout.gitFrom a local checkout:
pipx install .For development:
python3 -m pathscout doctor
python3 -m pathscout run --dry-run --format bothpathscout start
pathscout next
pathscout init
pathscout setup
pathscout doctor
pathscout run --format bothpathscout start is a read-only startup checklist. It shows what exists, what is missing, and the next recommended command without creating or editing files.
pathscout next prints only the next recommended action. /next is also accepted as an alias.
pathscout setup is an interactive guided setup flow. It walks through environment, role/function, locations, avoid terms, background, proof points, constraints, and network context in order, saving answers into local JSON files as it goes.
During init, PathScout asks two onboarding questions in this order:
- What is the right environment for you?
- What is the right role for you?
For scripted setup, pass answers directly:
pathscout init \
--environment "Remote AI startups" \
--role "Founding Product Lead"Use --no-input to create default sample config without prompts.
Outputs:
data/pathscout.sqlite: local state and dedupe history.outputs/latest.json: canonical machine-readable findings artifact.outputs/latest.md: human-readable digest rendered from the JSON findings.outputs/packages/: optional portable opportunity packages created from findings.config/profile.json: personal fit profile.config/background.sample.json: tracked example candidate context.config/background.local.json: private candidate context and proof points.config/sources.json: source adapter configuration.config/watchlist.json: curated company list.config/suppressions.json: structured ignored findings.
PathScout uses schema-versioned JSON files.
config/profile.json is the personal fit model. It contains target roles, stages, domains, excluded domains, location preferences, travel constraints, authority terms, and scoring thresholds.
config/sources.json describes inputs. Each source uses this adapter contract:
{
"id": "watchlist_careers",
"type": "watchlist_careers",
"name": "Watchlist careers pages",
"enabled": true,
"config": {
"path": "config/watchlist.json"
}
}id is stable and scriptable. name is display-only. type selects the adapter. config is adapter-specific.
The watchlist_careers, web_page, and rss adapters share a single network chokepoint (pathscout.fetchers.http_get) that:
- Retries transient network failures (timeouts, connection errors) with jittered exponential backoff before giving up.
- Honors
ETag/Last-Modifiedresponse headers via an injectableResponseCache, reusing the cached body on a304 Not Modifiedresponse instead of re-parsing a fresh one. - Logs fetch failures through the standard
loggingmodule (logging.getLogger("pathscout.fetchers")) instead of swallowing them silently — attach a handler to observe what failed and why.
watchlist_careers additionally supports a per-host rate_limit_seconds config field, enforcing a minimum delay between requests to the same host (independent of the source's overall max_elapsed_seconds run budget):
{
"id": "watchlist_careers",
"type": "watchlist_careers",
"name": "Watchlist careers pages",
"enabled": true,
"config": {
"path": "config/watchlist.json",
"timeout_seconds": 3,
"candidate_paths": ["careers", "jobs"],
"max_elapsed_seconds": 300,
"rate_limit_seconds": 1
}
}ResponseCache and the per-host rate limiter are constructor-injectable (not global state), so a long-lived caller — e.g. a scheduled worker running fetches for many users — can supply persistent implementations instead of the default in-memory, one-per-run behavior the CLI uses.
config/suppressions.json stores structured ignores:
{
"schema_version": 1,
"suppressions": [
{
"id": "finding-content-hash",
"scope": "finding",
"reason": "Not a fit",
"expires_at": "2026-12-31",
"created_at": "2026-06-29"
}
]
}Suppressions affect output visibility. They do not delete observations from SQLite.
The v0.2 runner supports standard-library fetches for:
manual: config-entered notes for companies or opportunities you want tracked.watchlist: turns every active watchlist company into a hidden-search observation.watchlist_careers: probes active watchlist companies' careers pages for posted role evidence.portfolio: turns companies fromconfig/portfolio.jsoninto relationship-context observations.web_page: fetches a single web page.rss: fetches an RSS or Atom feed.
radar_portfolio remains as a deprecated alias for one release. Use portfolio for new config.
pathscout start
pathscout next
pathscout init
pathscout setup
pathscout doctor
pathscout watchlist
pathscout portfolio
pathscout review
pathscout explain <finding-id>
pathscout notes <finding-id> --add "Question to verify before outreach"
pathscout thesis <finding-id>
pathscout package <finding-id>
pathscout suppress <finding-id> --reason "Not a fit"
pathscout run --format json
pathscout run --format markdown
pathscout run --format bothUseful paths can be overridden:
pathscout run \
--profile config/profile.json \
--sources config/sources.json \
--watchlist config/watchlist.json \
--suppressions config/suppressions.json \
--db data/pathscout.sqlite \
--json-out outputs/latest.json \
--out outputs/latest.mdAct Now: explicit target role or recruiter-visible mandate with strong fit signals.Hidden Search Hypothesis: no role posted, but company signals suggest a likely hiring need.Watch Signal: weaker signal, lower-level posting, or incomplete evidence.Filtered: captured for history but excluded from the main digest.
Use review to scan findings from the latest JSON artifact without opening the file:
pathscout review --limit 10
pathscout review --tier "Act Now"Use explain to inspect why a finding surfaced:
pathscout explain <finding-id>Use notes to keep local judgment attached to a finding or company:
pathscout notes <finding-id> --add "Ask a former employee whether this team is still founder-led"
pathscout notes --company "Northstar Robotics"Use thesis to generate a local role-thesis package from a finding. Copy config/background.sample.json to config/background.local.json first if you want the thesis to include private candidate context:
pathscout thesis <finding-id>Thesis packages are written to outputs/theses/ and are generated from the same JSON finding objects used by review and Markdown digests. They include the company moment, problem map, proposed function, fit argument, 90-180 day wedge, notes, and evidence gaps. They are thinking artifacts, not generated job descriptions or send-ready outreach.
Use suppress to hide a finding from later Markdown digests while keeping the raw observation in SQLite and the finding marked in JSON:
pathscout suppress <finding-id> --reason "Not a fit" --expires 2026-12-31Careers pages are parsed into separate role findings when PathScout can identify role-title rows. If a page does not expose clear role titles, PathScout falls back to one page-level finding.
Use package to create a portable, human-readable and agent-readable opportunity package from a finding in outputs/latest.json:
pathscout package <finding-id>Each package includes a manifest, a human Markdown brief, agent instructions, and canonical JSON data under outputs/packages/. See docs/artifacts.md for the artifact contract.
config/background.local.json, legacy config/background.json, data/notes.json, outputs/theses/, and outputs/packages/ are ignored by default because they may contain private candidate context.
See DATA_CONTRACT.md and docs/source_of_truth.md for the local-only storage boundary and agent-readable artifact contract. Network source fetches collect evidence for local runs; they are not hosted storage or sync.
PathScout follows scanner-style findings: stable IDs, evidence, severity-like tiers, reasons, flags, and suppressions.
The config split borrows from dbt-style separation of personal profile from project config. Source IDs follow the pre-commit convention: stable machine IDs plus human names. Suppressions borrow from security scanners: structured ignores with reasons and optional expiration dates.