Skip to content

KLIXPERT-io/dataforseo-cli

Repository files navigation

dfseo — DataForSEO CLI

dfseo is a single-binary CLI that wraps the DataForSEO v3 API behind a hierarchical command tree. It defaults to compact TOON output, strips envelope noise, supports JSONPath filtering, and caches raw responses on disk so repeat queries and filter iterations hit zero network and zero credits. It is built for two kinds of consumers: LLM agents that need small, deterministic payloads, and CLI-native developers who would otherwise hand-roll curl | jq pipelines.

Why dfseo?

DataForSEO already ships an official MCP server, and it works. The catch is response size: a single SERP or backlinks call can return tens of thousands of tokens of envelope, metadata, and list payload — enough to chew through an LLM's context window before the model has done any thinking. That's the gap dfseo was built to close.

The CLI sits between you (or your agent) and the DataForSEO v3 API and trims the response down to what's actually useful: the envelope is stripped by default, output is emitted as TOON (~40–70% fewer tokens than JSON on these payloads), --filter lets you pull only the fields you need via JSONPath, --summary collapses long lists to aggregates, and on-disk caching means repeat queries and filter iterations cost zero credits and zero round-trips. The result is payloads small enough to hand to an LLM directly, and a workflow that's pleasant from a plain terminal too.

Big thanks to the DataForSEO team for the API itself — the coverage and data quality are what make this whole ecosystem work. dfseo is a thin ergonomic wrapper on top of their excellent platform.

Install

curl -fsSL https://raw.githubusercontent.com/KLIXPERT-io/dataforseo-cli/main/scripts/install.sh | sh

Detects OS/arch (darwin/linux/windows × amd64/arm64), downloads the matching binary from the latest GitHub Release, verifies SHA-256, and installs to /usr/local/bin/dfseo (root) or $HOME/.local/bin/dfseo (otherwise). Once installed, dfseo update keeps the binary current.

For Windows, manual download, go install, building from source, mirror/CDN configuration, checksum verification, release-cutting, and uninstall, see INSTALL.md.

Requirements

  • Go 1.25+ if installing from source (matches the module declaration).
  • macOS / Linux / Windows. The OS keychain backs credentials by default. On Linux the secret-service backend needs libsecret installed — if you're on a headless box or container without it, use the DFSEO_LOGIN / DFSEO_PASSWORD env vars and skip dfseo init.

Quickstart

dfseo init                                                   # store DataForSEO creds in the OS keychain
dfseo serp google organic live --keyword "running shoes" \
  --location-code 2840 --language-code en

Every leaf subcommand's --help includes the flag list, credit-cost hint, example, and a redacted response sample — read that first, then construct your call. For just the sample (no flag noise), use dfseo schema <command>.

Global flags

These flags are registered on the root command and apply to every subcommand.

Flag Default Purpose
--format toon Output format: toon, json, or table.
--filter (none) JSONPath applied before formatting (ohler55/ojg).
--raw false Emit the full DataForSEO envelope (tasks / status_code / ...) instead of the stripped result.
--limit 10 Truncate list-typed results. 0 = unlimited.
--summary false Emit aggregate fields and only the first --limit items.
--sandbox false Route to sandbox.dataforseo.com.
--live false Route to api.dataforseo.com (overrides config.default_env).
--no-cache false Bypass cache read and write for this invocation.
--cache-ttl 24h Cache TTL for this invocation. 0 = no cache.
--stats false Force the telemetry line on stderr (auto-detected for TTYs).
-v, --verbose false Print the redacted HTTP request/response to stderr.
--log-off false Disable the request log for this invocation.

Output & filtering

  • The default format is TOON — a compact, structured text format that typically runs ~40–70% fewer tokens than equivalent JSON on DataForSEO payloads. Switch with --format json (standard JSON) or --format table (human-readable tabular).
  • --filter takes a JSONPath expression evaluated against the post-envelope-strip payload. Examples:
    • --filter 'items[*].{title,url}'
    • --filter 'items[?(@.rank_absolute<=5)].url'
  • --raw emits the full envelope (tasks, status_code, status_message, cost, …) — useful when you need task_id or per-task status.
  • --limit N truncates top-level list results. --limit 0 disables the limit. The default is 10.
  • --summary collapses verbose list payloads to aggregates plus the first --limit items — ideal for quick exploration without burning tokens.

Caching

dfseo writes a canonical-JSON copy of every response to ~/.config/dfseo/cache/<env>/<category>/<sha>.json. Repeat calls with identical method + path + body hit the cache and return in <50ms.

  • Default TTL: 24h. Override with --cache-ttl 1h; use --cache-ttl 0 to skip writing on this call.
  • Sandbox and live caches are isolated — sandbox responses never cross into the live cache or vice versa.
  • Inspection:
    • dfseo cache path — print the resolved cache directory.
    • dfseo cache stats — entry count, bytes, hit/miss counters.
    • dfseo cache get <hash|prefix> — dump a raw cached response by hash (TTL-ignored, for debugging).
    • dfseo cache clean — delete expired entries and enforce the LRU caps (10k entries / 500MB by default; evicts to 8k / 400MB).
  • --no-cache bypasses both read and write for a single call when you need a guaranteed-fresh fetch.

Authentication

  • dfseo init (alias: dfseo auth) prompts for your DataForSEO login email + API password, verifies them against /v3/appendix/user_data, and stores them in the OS keychain: Keychain on macOS, Secret Service (libsecret) on Linux, Credential Manager on Windows.
  • dfseo init status (alias: dfseo auth status) prints the credential source, redacted login, account balance, and pinned OpenAPI spec SHA — handy as a health probe.
  • Env vars take precedence over the keychain:
    • DFSEO_LOGIN — your account email.
    • DFSEO_PASSWORD — the API password (not your dashboard login).
  • For headless containers / CI without libsecret, skip dfseo init entirely and use env vars.

No plaintext secrets ever land on disk; only a non-secret reference is written to ~/.config/dfseo/config.json.

Sandbox vs live

  • --sandbox / --live override routing per-call.
  • config.default_env persists a default (set during dfseo init).
  • Sandbox uses the same credentials as live; DataForSEO treats the auth identically.
  • Caches are namespaced per-env, so a sandbox response can never be returned for a live call.

Async tasks

Roughly half the credit cost of live, queue-friendly for batches, and non-blocking. Default to task mode whenever you don't need the result inline.

# Submit and walk away — exits immediately with the task ID + local record path.
dfseo serp google organic task --keyword pizza --location-code 2840 --language-code en

# Later, retrieve the result. While the task is in queue, this prints
# {ok: true, status: in_progress, id: ...} with exit 0 — not an error.
dfseo task get <id>

# Or block now until the task is done (Ctrl-C prints a resume hint).
dfseo task wait <id>

# Equivalent inline form: submit + wait in one call.
dfseo serp google organic task --keyword pizza --location-code 2840 --language-code en --wait

# Inspect persisted records.
dfseo task list
dfseo task delete <id>   # local-only; the task on DataForSEO is unaffected

Storage. Records persist at ~/.config/dfseo/tasks/<id>.json with {id, endpoint, task_get_path, submitted_at, body}. DataForSEO is the source of truth for task IDs — losing this directory does not lose tasks; you can still fetch them with dfseo task get <id> if you have the ID.

Polling. task wait and --wait use exponential backoff 2s → 4s → 8s → 16s → 30s (cap 30s, no overall timeout). Ctrl-C exits 130 and prints a resume hint with dfseo task get <id> and the record path.

Status semantics. DataForSEO status_code 20100 ("Task In Queue") and the 4060x family (handler limit, transient) render as status: in_progress with exit 0 — not an error. Everything else follows the standard error model below.

Batch input

Bulk endpoints accept multi-item inputs from flags, files, or stdin. The CLI auto-chunks at the API's per-call limit and runs chunks in parallel.

# Inline list (comma- or repeat-flag-separated).
dfseo backlinks bulk-ranks live --targets example.com,example.org

# From a file (.txt, .csv, or .json — extension is auto-detected, with a
# content-sniff fallback).
dfseo backlinks bulk-ranks live --file targets.txt

# From stdin.
cat targets.txt | dfseo backlinks bulk-ranks live --targets -

# Inspect chunking before paying for it.
dfseo backlinks bulk-ranks live --file targets.txt --dry-run

--concurrency N (default 4) caps in-flight chunks. Partial failures exit with code 5 and surface per-chunk errors in the output.

Categories & commands

dfseo synthesizes its command tree from a YAML registry generated from the DataForSEO OpenAPI spec — every v3 path is reachable as a hierarchical subcommand. The 12 top-level categories:

  • ai-opt — AI Optimization (LLM mentions, ChatGPT scraper, …)
  • app-data — App Data (Google Play, App Store)
  • appendix — User data, errors, status
  • backlinks — Backlinks API (summary, anchors, bulk, …)
  • business-data — Business Data (Google Business, Yelp, Trustpilot)
  • content-analysis — Content Analysis (search, summary, phrase trends)
  • domain-analytics — Domain Analytics (technologies, whois)
  • keywords-data — Keywords Data (Google Ads, Bing Ads, Trends)
  • labs — DataForSEO Labs (ranked-keywords, competitors, intent)
  • merchant — Merchant API (Amazon, Google Shopping)
  • onpage — OnPage API (instant-pages, lighthouse)
  • serp — SERP API (Google, Bing, YouTube, Yahoo, Naver, …)

Run dfseo <cmd> --help at any level to drill in. Leaf help shows flags, credit cost, an example, and a redacted response sample so you can pick filter paths without a trial request. For just the sample, use dfseo schema <command>.

For any v3 path you'd rather hand-write, use the escape hatch:

dfseo call <v3-path> --method POST --body '<json>'
dfseo call appendix/user_data --method GET

dfseo call authenticates, respects routing and caching, and runs output through the same --format / --filter / --raw / --limit pipeline as synthesized commands.

Request log

Every call appends a JSONL record to ~/.config/dfseo/logs/ capturing ts, cmd, env, endpoint, cache state (hit/miss/bypass), credits, latency_ms, status_code, size_bytes. Files rotate at 10MB; up to 5 rotated files are retained.

  • dfseo logs tail [-n N] — last N entries (default 50).
  • dfseo logs stats — counts by cache state and env, total credits, p50/p95 latency.
  • --log-off disables the log for a single call.

Errors & exit codes

Every error is emitted on stdout in the selected format (TOON / JSON / table), with a stable shape:

ok: false
status_code: <int>
status_message: <string>
hint: <human-readable recovery hint>
endpoint: <path>
cached: <bool>

Exit codes:

Code Meaning
0 success
1 usage / flag error
2 authentication failure
3 API error (status_code ≥ 40000)
4 filter / parse error
5 partial batch failure
6 network / timeout
7 cache / filesystem error

Hints are curated per status_code family — e.g. 40101 points at dfseo init; 40202 points at plan limits.

LLM usage

If you're pointing Claude / GPT / any other LLM at this CLI, point it at skills/dfseo/SKILL.md — a concise consumption guide covering discovery, filter construction, cache-aware iteration, and error branching.

For Claude Code (and any harness that supports the Anthropic skills format), install the skill directly with:

npx skills add https://github.com/klixpert-io/dataforseo-cli --skill dfseo

That drops SKILL.md into your local skills directory so the agent loads it automatically — no copy/paste required.

License

Licensed under the Apache License 2.0.

Troubleshooting

  • no credentials configured → run dfseo init, or export DFSEO_LOGIN / DFSEO_PASSWORD.
  • jsonpath error → check your --filter syntax; the underlying library is ohler55/ojg. Filter expressions are evaluated against the stripped result by default; combine with --raw if you need to reach envelope-level fields.
  • Cache looks stale → run dfseo cache clean, or pass --no-cache once to bypass both read and write.
  • 40202 status_code → DataForSEO rate or plan limit. Check the hint in the error output; back off, or raise limits in the DataForSEO dashboard.
  • 40101 status_code → auth failed. Re-run dfseo init or verify the API password (not the dashboard login) in the DataForSEO console.
  • Linux without libsecret → use env vars; there is no file-backed credential fallback by design.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors