Skip to content

v0.3.1

Choose a tag to compare

@meirk-brd meirk-brd released this 28 May 08:36
· 6 commits to main since this release

v0.3.1

Scraper Studio gains AI self-healing: when a saved scraper drifts —
selectors move, a page redesigns, output goes empty or partial — the agent
fixes it in place so the collector_id keeps working and improves,
instead of rebuilding from scratch. The fix is human-in-the-loop by default:
heal stops at an approval gate, and a new approve command commits it.

All changes are additive — existing scraper create and scraper run
invocations behave exactly as before. This delivers the self-healing path
that v0.2.0 listed as planned.

Features

scraper heal — AI self-healing in place (#11)

bdata scraper heal <collector_id> "<prompt>" is the maintenance twin of
scraper create. It triggers Bright Data's AI self-healing flow
(POST /dca/collectors/{id}/refactor_template) and polls progress, reusing
the same async trigger→poll machinery (429 backoff, retry forwarding) as
create.

bdata scraper heal c_xxx \
  "Price stopped extracting after the page redesign — it's now in span.price-now" \
  --url https://example.com/product/1 -o heal.json
  • You are the detector. The CLI never decides on its own that a scraper is
    broken — a heal is slow, billable, and mutating. You inspect the run output
    and decide. scraper run stays read-only — there is no --heal flag.
  • The collector_id is preserved — the scraper is improved, not replaced.
  • Required <prompt> (≤1000 chars, validated up front); name what's wrong
    and what the correct output should be.
  • Carries over --timeout, --max-retries / --no-retry, and all output
    flags (-o / --json / --pretty / --legacy-output / --timing / -k).

Human-in-the-loop approval gate

By default, heal runs the fix and then stops at an approval gate rather
than committing it — exiting 0 with a status: "awaiting_approval" envelope
that carries preview_result (sample rows the fixed scraper would produce)
and a next_step pointing at scraper approve:

{
  "collector_id": "c_xxx",
  "status": "awaiting_approval",
  "preview_result": [ ... ],
  "next_step": "bdata scraper approve c_xxx --url https://example.com/product/1"
}

awaiting_approval is not a failure — the fix is ready and waiting for
your decision.

scraper approve — commit or reject a fix (#11)

bdata scraper approve <collector_id> commits a fix that heal left awaiting
approval (POST /dca/collectors/{id}/resume_automation_job, then polls to
done). On success the envelope hands back a next_step = scraper run so
you can verify the committed fix.

# Commit the proposed fix
bdata scraper approve c_xxx --url https://example.com/product/1 -o approve.json

# Reject it and start over with a sharper prompt
bdata scraper approve c_xxx --reject

--auto-approve — fully autonomous heal

For unattended flows, heal --auto-approve approves the fix automatically and
polls through to done in one command:

bdata scraper heal c_xxx \
  "Reviews stopped extracting after the page redesign" --auto-approve

The self-healing loop

The intended agent flow: run → inspect → heal → approve → re-run to verify.

bdata scraper run c_xxx https://example.com/product/1 -o run.json   # 1. run
# 2. inspect run.json — if the data is wrong:
bdata scraper heal c_xxx "<what's wrong>" --url https://example.com/product/1 -o heal.json
# 3. review heal.json's preview_result, then commit:
bdata scraper approve c_xxx --url https://example.com/product/1 -o approve.json
# 4. re-run to verify the committed fix

Non-destructive failure

A failed heal (429 cap exhausted, timeout, terminal failed) leaves the
existing scraper unchanged and still working — distinct from create,
where a failure can leave a half-built collector. The recovery note says so.

Upgrade notes

  • No action required — fully additive, backward compatible.
  • scraper run is unchanged and remains read-only by design.

Full changelog: v0.3.0...v0.3.1 (PR #11)