Skip to content

v2.0.0b2 — Custom Provider System & Quality Pass

Pre-release
Pre-release

Choose a tag to compare

@bluet bluet released this 08 May 02:03
· 61 commits to master since this release
38ed5a2

🎯 Custom Proxy Provider System & Quality Pass

This release adds a user-extensible proxy provider system (the headline feature) and bundles a focused quality pass: parser bug fixes with regression tests, removal of recurring SAST false positives by fixing the underlying code (not just suppressing), modernization of %-formatting to f-strings, and isolation of the unverified-SSL bypass into a named helper.

Headline: Custom proxy providers via YAML/JSON

Drop a directory of YAML/JSON config files into your container, mount it, and proxybroker auto-loads them. No code, no PRs, no fork. Targets the Docker bind-mount workflow for entry-level / no-code users.

# /configs/my_api.yaml
type: api
url: https://my-proxy-api.example.com/v1/proxies
api_key: ${PROXY_API_KEY}
proxy_path: data.proxies
docker run -v $(pwd)/configs:/configs ghcr.io/bluet/proxybroker2 find --types HTTP --limit 10

Four provider helpers: SimpleProvider (text/CSV/JSON list endpoints with format autodetect), PaginatedProvider (numbered-page endpoints), APIProvider (JSON APIs with optional auth and dotted proxy_path), ConfigurableProvider (factory).

CLI / API

  • --provider-dir PATH (repeatable) on top-level and every subcommand
  • $PROXYBROKER_PROVIDER_DIR env fallback
  • /configs Docker convention (auto-loaded if directory exists)
  • Python: Broker(provider_dirs=[...]). Empty-list contract preserved: providers=[] means "no bundled defaults" (distinct from providers=None which means "use defaults")

Security model

YAML/JSON loader uses yaml.safe_load and never executes Python from configs. The Python file loader (load_python_providers_from_directory) is opt-in only — not wired to the CLI; users must call it from their own code if they want it.

Parser bug fixes (with regression tests)

  • CSV: stdlib csv module, handles quoted fields with embedded commas (e.g. "Company, Inc",80)
  • Text: extracts only the leading digit run after : as the port (e.g. 1.2.3.4:8080:tag no longer crashes)
  • JSON: no duplicate proxies for items with both ip and host keys
  • JSON: unwraps single-level object-wrapped responses ({"proxies": [...]}, {"data": [...]}, etc.)
  • Paginated: page= query param is replaced, not appended duplicate
  • API: proxy_path navigation handles non-dict gracefully

Quality pass

  • Judge.is_working is now a @property (was a plain attribute) — removes 5 recurring SAST false positives
  • Unverified-SSL context construction extracted to _make_unverified_ssl_context_for_proxy_testing() helper
  • Eighteen %-formatting calls modernized to f-strings (silently accumulating because pre-commit hook was missing --unsafe-fixes; now fixed)
  • aiodns.DNSResolver() lazy-init for Python 3.14 compatibility
  • Dockerfile pinned to specific SHA256 digest of python:3.14-slim

Removed

  • proxybroker.utils.update_geoip_db() body. The function attempted to download from geolite.maxmind.com which has been NXDOMAIN since 2019-12-30 (MaxMind retired the unauthenticated endpoint, license key now required). Now raises RuntimeError linking to the tracking issue. Bundled GeoLite2 databases continue to work for runtime IP lookups.

Python support

3.10–3.14 (added 3.14, default in CI/Docker).

Tracked follow-ups for next releases

  • #200 — GeoIP database replacement strategy (DB-IP, IPLocate, license-key MaxMind, etc.)
  • #201 — IPv6 regex → stdlib ipaddress (eliminates ReDoS suppression)
  • #202 — Cognitive-complexity refactor in parsers (SonarCloud S3776)
  • #203argparse.FileType and asyncio.get_event_loop() deprecation migrations

Full changelog: https://github.com/bluet/proxybroker2/blob/v2.0.0b2/CHANGELOG.md
PR: #199