apply-agent 👷💭 ^noname¹+_wip

Self-hosted job scraper runner, with self-hosted LLM-powered CV matching.

Caution

🍪 It’s possible to filter out legitimate jobs, so use it with caution.

User Configuration

There are three config files the user can configure. Running ./scripts/install.sh in the project root creates basic placeholder versions, but they require some tuning to produce reliable results.

⛓️ .env.local

Copy .env.example to .env (defaults), then put machine-specific overrides and secrets into .env.local.

cp .env.example .env

Precedence is: system env > .env.local > .env.

Models

Running LLMs on CPU won’t provide optimal performance. If you have access to a properly configured Ollama server, set the OLLAMA_BASE_URL=http://_/"-._/"-._:11434 environment variable accordingly.

Otherwise, you can adjust the models by choosing a lighter BATCH_MODEL and a stronger AGENT_MODEL.

Model	Size	Notes
SmolLM2 1.7B	~1.7 B	Compact, efficient general LLM
Qwen3-1.7B	~1.7 B	Similar balance of performance/size
TinyLlama 1.1B	~1.1 B	Slightly smaller, faster
Llama3.2 1B	~1 B	Meta model with good instruction ability
Gemma3 1B	~1 B	Lightweight, strong CPU performance
DeepSeek-R1 1.5B	~1.5 B	Another mid-size small model

⛓️ config.yaml

Job search parameters live under the jobspy root node.

This object is passed to the scraper function. The exact location of the configuration is defined by the CONFIG_FILE environment variable.

See the full parameter list to find the right (working) settings.

Example:

jobspy:
  site_name:
    - linkedin
    # - zip_recruiter
    - indeed
    # - glassdoor
    # - google
    # - bayt
    # - bdjobs
  search_term: software engineer
  location: London
  country_indeed: UK
  results_wanted: 10
  # hours_old: 72
  verbose: 0

⛓️ cv.md

The CV location is defined by the CV_FILE environment variable. The file should be in Markdown format. The better the CV, the better the job matches.

Mode semantics

The agent runs in strict mode by default. To skip questions, set the mode to exploratory.

Strict	Exploratory
Any unresolved uncertainty → WAIT_FOR_HUMAN	Hard gaps → ask once, then proceed
Hard gaps → WAIT_FOR_HUMAN	Low confidence → assume best-case
Low confidence → WAIT_FOR_HUMAN	LOW_QUALITY → downgrade severity, proceed
LOW_QUALITY from EVALUATE/CHALLENGE → FAILED	Bias toward PLAN

Data flow

[ Python scraper ]
        ↓
  (job records)
        ↓
[ job inbox (files) ]
        ↓
[ batch scorer ]
        ↓
[ ranked jobs ]
        ↓
[ agent runs ]

Run step by step

Clear job folders
```
rm -rv ./data/jobs/*
```

Setup project

./scripts/install.sh

Install Python requirements venv
```
./scripts/install_tools.sh
```

Scrape jobs

Enable virtual environment locally
```
source tools/scraper/venv/bin/activate
```

python tools/scraper/runner.py

Pre-process scraped jobs
```
bun cli ingest
```
Batch scoring jobs
```
bun cli scoring
```
Evaluate jobs
```
bun cli evaluation
```
Answer questions
```
bun cli answer
```
After answering, don’t forget to re-evaluate jobs.

Install

Bun

Follow the instructions. 🤓

Python

Use a stable Python version (3.11, 3.12, or 3.13) due to NumPy compatibility.

python3.12 -m venv tools/scraper/venv
source tools/scraper/venv/bin/activate
pip install -r tools/scraper/requirements.txt

Run

A single process

$ bun cli
Run a single step.

USAGE
  bun cli <ingest|scoring|evaluation|answer> [job-id]

The orchestrator

Runs indefinitely, except for the answers step.

bun start

Verbose

Monitor jobs folder to see the jobs actualy state.

bun lt

Jobs folder structure

Every job is a JSON file. During the evaluation process, it gets updated with notes and travels between status folders. No database required.

Here is the folder sctructure for ./[job-id].json files for further process structure:

data/jobs
     ├── inbox              # raw scraped jobs (unscored)
     ├── screened_out       # rejected by batch scoring
     ├── shortlisted        # passed batch scoring
     ├── awaiting_input     # agent needs human input
     ├── declined           # rejected by agent reasoning
     └── approved           # agent-approved jobs

States

stateDiagram-v2
  classDef GameOver stroke:darkred
  classDef WellDone stroke:yellow
  class ScreenedOut GameOver
  class FAILED GameOver
  class DONE WellDone

  [*] --> Scraping
  Scraping --> inbox

  note right of Scraping
    Python script download the jobs to inbox.
  end note

  state inbox {
    [*] --> TransformToSchema
    TransformToSchema --> BatchScoring
  }

  state BatchScoring {
    [*] --> Score
  }

  Score --> Shortlisted
  Score --> ScreenedOut
  Shortlisted --> IDLE

  state StateMachine {
    IDLE --> INGEST
    INGEST --> NORMALIZE
    NORMALIZE --> EVALUATE
    EVALUATE --> CHALLENGE
    CHALLENGE --> DECIDE
    DECIDE --> PLAN
    DECIDE --> WAIT_FOR_HUMAN
    WAIT_FOR_HUMAN --> DECIDE
  }

  INGEST --> FAILED
  NORMALIZE --> FAILED
  EVALUATE --> FAILED
  CHALLENGE --> FAILED
  DECIDE --> FAILED
  WAIT_FOR_HUMAN --> FAILED
  PLAN --> DONE

  DONE --> [*]

How to run?

The simplest way to run the project is with Docker Compose. It automatically sets up local LLM models (CPU-only for now) and starts searching for jobs right away. You’ll need about 3 GB of disk space with the default settings. Clone the repository and run with the default settings:

Clone the repository
Create .env from the example
```
cp .env.example .env
```
Run Docker Compose
```
docker compose up
```

Apply in the repo name is confusing — it doesn’t actually do anything. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
.vscode		.vscode
scripts		scripts
src		src
tests		tests
tools/scraper		tools/scraper
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
biome.jsonc		biome.jsonc
bun.lock		bun.lock
bunfig.toml		bunfig.toml
compose.yaml		compose.yaml
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

apply-agent 👷💭 ^noname¹+_wip

User Configuration

⛓️ .env.local

Models

⛓️ config.yaml

⛓️ cv.md

Mode semantics

Data flow

Run step by step

Install

Bun

Python

Run

A single process

The orchestrator

Verbose

Jobs folder structure

States

How to run?

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

apply-agent 👷💭 noname1+wip

User Configuration

⛓️ .env.local

Models

⛓️ config.yaml

⛓️ cv.md

Mode semantics

Data flow

Run step by step

Install

Bun

Python

Run

A single process

The orchestrator

Verbose

Jobs folder structure

States

How to run?

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages

apply-agent 👷💭 ^noname¹+_wip