apply-agent π·π noname1+wip
Self-hosted job scraper runner, with self-hosted LLM-powered CV matching.
Caution
πͺ Itβs possible to filter out legitimate jobs, so use it with caution.
There are three config files the user can configure. Running ./scripts/install.sh in the project root creates basic placeholder versions, but they require some tuning to produce reliable results.
Copy .env.example to .env (defaults), then put machine-specific overrides and secrets into .env.local.
cp .env.example .envPrecedence is: system env > .env.local > .env.
Running LLMs on CPU wonβt provide optimal performance. If you have access to a properly configured Ollama server, set the OLLAMA_BASE_URL=http://_/"-._/"-._:11434 environment variable accordingly.
Otherwise, you can adjust the models by choosing a lighter BATCH_MODEL and a stronger AGENT_MODEL.
| Model | Size | Notes |
|---|---|---|
| SmolLM2 1.7B | ~1.7 B | Compact, efficient general LLM |
| Qwen3-1.7B | ~1.7 B | Similar balance of performance/size |
| TinyLlama 1.1B | ~1.1 B | Slightly smaller, faster |
| Llama3.2 1B | ~1 B | Meta model with good instruction ability |
| Gemma3 1B | ~1 B | Lightweight, strong CPU performance |
| DeepSeek-R1 1.5B | ~1.5 B | Another mid-size small model |
Job search parameters live under the jobspy root node.
This object is passed to the scraper function. The exact location of the configuration is defined by the CONFIG_FILE environment variable.
See the full parameter list to find the right (working) settings.
Example:
jobspy:
site_name:
- linkedin
# - zip_recruiter
- indeed
# - glassdoor
# - google
# - bayt
# - bdjobs
search_term: software engineer
location: London
country_indeed: UK
results_wanted: 10
# hours_old: 72
verbose: 0The CV location is defined by the CV_FILE environment variable. The file should be in Markdown format. The better the CV, the better the job matches.
The agent runs in strict mode by default. To skip questions, set the mode to exploratory.
| Strict | Exploratory |
|---|---|
| Any unresolved uncertainty β WAIT_FOR_HUMAN | Hard gaps β ask once, then proceed |
| Hard gaps β WAIT_FOR_HUMAN | Low confidence β assume best-case |
| Low confidence β WAIT_FOR_HUMAN | LOW_QUALITY β downgrade severity, proceed |
| LOW_QUALITY from EVALUATE/CHALLENGE β FAILED | Bias toward PLAN |
[ Python scraper ]
β
(job records)
β
[ job inbox (files) ]
β
[ batch scorer ]
β
[ ranked jobs ]
β
[ agent runs ]
-
Clear job folders
rm -rv ./data/jobs/* -
Setup project
./scripts/install.sh
-
Install Python requirements venv
./scripts/install_tools.sh
-
-
Scrape jobs
-
Enable virtual environment locally
source tools/scraper/venv/bin/activate
python tools/scraper/runner.py
-
-
Pre-process scraped jobs
bun cli ingest
-
Batch scoring jobs
bun cli scoring
-
Evaluate jobs
bun cli evaluation
-
Answer questions
bun cli answer
After answering, donβt forget to re-evaluate jobs.
Follow the instructions. π€
Use a stable Python version (3.11, 3.12, or 3.13) due to NumPy compatibility.
python3.12 -m venv tools/scraper/venv
source tools/scraper/venv/bin/activate
pip install -r tools/scraper/requirements.txt$ bun cli
Run a single step.
USAGE
bun cli <ingest|scoring|evaluation|answer> [job-id]Runs indefinitely, except for the answers step.
bun startMonitor jobs folder to see the jobs actualy state.
bun ltEvery job is a JSON file. During the evaluation process, it gets updated with notes and travels between status folders. No database required.
Here is the folder sctructure for ./[job-id].json files for further process structure:
data/jobs
βββ inbox # raw scraped jobs (unscored)
βββ screened_out # rejected by batch scoring
βββ shortlisted # passed batch scoring
βββ awaiting_input # agent needs human input
βββ declined # rejected by agent reasoning
βββ approved # agent-approved jobs
stateDiagram-v2
classDef GameOver stroke:darkred
classDef WellDone stroke:yellow
class ScreenedOut GameOver
class FAILED GameOver
class DONE WellDone
[*] --> Scraping
Scraping --> inbox
note right of Scraping
Python script download the jobs to inbox.
end note
state inbox {
[*] --> TransformToSchema
TransformToSchema --> BatchScoring
}
state BatchScoring {
[*] --> Score
}
Score --> Shortlisted
Score --> ScreenedOut
Shortlisted --> IDLE
state StateMachine {
IDLE --> INGEST
INGEST --> NORMALIZE
NORMALIZE --> EVALUATE
EVALUATE --> CHALLENGE
CHALLENGE --> DECIDE
DECIDE --> PLAN
DECIDE --> WAIT_FOR_HUMAN
WAIT_FOR_HUMAN --> DECIDE
}
INGEST --> FAILED
NORMALIZE --> FAILED
EVALUATE --> FAILED
CHALLENGE --> FAILED
DECIDE --> FAILED
WAIT_FOR_HUMAN --> FAILED
PLAN --> DONE
DONE --> [*]
The simplest way to run the project is with Docker Compose. It automatically sets up local LLM models (CPU-only for now) and starts searching for jobs right away. Youβll need about 3 GB of disk space with the default settings. Clone the repository and run with the default settings:
-
Clone the repository
-
Create
.envfrom the examplecp .env.example .env
-
Run Docker Compose
docker compose up
Footnotes
-
Apply in the repo name is confusing β it doesnβt actually do anything. β©