wasserportal (v0.7.0): R Package with Functions for Scraping Data of Wasserportal Berlin
Latestwasserportal 0.7.0 2026-06-19
This release extends the ThingsBoard integration shipped in 0.6.0 so it also
works against self-hosted ThingsBoard instances and can push the entire
Wasserportal groundwater archive, not just the five-station demo.
Self-hosted ThingsBoard authentication
- New
tb_login()plus username/password (JWT) authentication across all
ThingsBoard tenant-API helpers (tb_setup_devices(),tb_get_device_id(),
tb_list_device_telemetry_keys(),tb_delete_device_telemetry()).
Self-hosted Community Edition has no account-level API keys (a ThingsBoard
Cloud convenience), so it is reached viaPOST /api/auth/login→ a
short-lived JWT sent asX-Authorization: Bearer <token>. SetTB_USERNAMETB_PASSWORD(these win overTB_API_KEYwhen both are present); the
Cloud API-key path keeps working unchanged. Thethingsboard-push.yaml
workflow reads the two new credentials from the matching repository secrets.
tb_login()retries on{408, 429, 500, 502, 503, 504}with up to 4 tries,
sotb_setup_devices()survives a cold-start 5xx from a self-hosted
ThingsBoard sitting behind nginx / a load balancer.tb_auth_header()warns when only one ofTB_USERNAME/TB_PASSWORD
is set and a leftoverTB_API_KEYsilently falls back to the Cloud path,
so the common "one secret missing" misconfiguration is called out instead
of failing obscurely downstream.- Non-2xx responses surface a server response-body excerpt (≤ ~800 chars) in
R errors and retry messages. Stock ThingsBoard does not echo credentials,
but operators whose reverse proxy echoes request fields should mask secrets
in their CI logs. - Internal tidy-up:
tb_list_device_telemetry_keys()keeps a clean
api_key/username/passwordsignature; the pre-resolved-header path
used bytb_delete_device_telemetry()moved into a private helper, so there
is no ambiguous credential precedence on the exported function.
Push every station, not just the demo five
TB_MAX_DEVICES=0lifts the 5-device demo cap and pushes every selected
station -- now wired into the workflow as a repository secret and a
workflow_dispatchinput.TB_STATION_SCOPEchooses which groundwater stations the auto-pick
considers:both(default -- level and quality, the proven demo set),
any(level or quality),gwl/gwq(has that series, possibly both)
orgwl-only/gwq-only(has only that series).TB_TELEMETRY_TYPES
still decides which series get pushed per station.- Station scoring counts distinct quality parameters once via
split()+
vapply(), so scoring the full several-hundred-station pool stays fast. - The selection diagnostic is clearer and self-reconciling: the strict
"both masters + both series" row is labelled distinctly from the relaxed
per-series counts, a reconciling row is added so the numbers add up, and
orphan stations (IDs that have data but are missing from both master
files, and are therefore dropped from the candidate pool) are flagged with
a message -- so master/data drift is visible instead of silent. - The push script validates its numeric
TB_*variables up front
(TB_MAX_DEVICES,TB_HISTORY_DAYS,TB_CHUNK_SIZE,TB_THROTTLE_SECONDS,
TB_MAX_ACTIVE) and aborts with a clear message on a non-numeric value,
instead of letting anNAcrash mid-push after every device attribute set
was already uploaded. The message flags the usual cause --.Renvirondoes
not support inline# comments.
For a full self-hosted push set
TB_PLAN=ce,TB_MAX_DEVICES=0and
TB_STATION_SCOPE=any. Mind the volume -- several hundred devices over the
full archive is millions of data points; start with a bounded
TB_HISTORY_DAYSto validate the run.
Documentation & packaging
- Update the Kompetenzzentrum Wasser Berlin (KWB) author logo on the pkgdown
site to the new brand asset. - The package now requires R ≥ 4.1.0 (
Depends), reflecting the native
|>pipe used in the ThingsBoard code; the unusedLazyDatafield was
removed.