Skip to content

wasserportal (v0.7.0): R Package with Functions for Scraping Data of Wasserportal Berlin

Latest

Choose a tag to compare

@mrustl mrustl released this 19 Jun 10:00
c8965aa

wasserportal 0.7.0 2026-06-19

This release extends the ThingsBoard integration shipped in 0.6.0 so it also
works against self-hosted ThingsBoard instances and can push the entire
Wasserportal groundwater archive
, not just the five-station demo.

Self-hosted ThingsBoard authentication

  • New tb_login() plus username/password (JWT) authentication across all
    ThingsBoard tenant-API helpers (tb_setup_devices(), tb_get_device_id(),
    tb_list_device_telemetry_keys(), tb_delete_device_telemetry()).
    Self-hosted Community Edition has no account-level API keys (a ThingsBoard
    Cloud convenience), so it is reached via POST /api/auth/login → a
    short-lived JWT sent as X-Authorization: Bearer <token>. Set TB_USERNAME
    • TB_PASSWORD (these win over TB_API_KEY when both are present); the
      Cloud API-key path keeps working unchanged. The thingsboard-push.yaml
      workflow reads the two new credentials from the matching repository secrets.
  • tb_login() retries on {408, 429, 500, 502, 503, 504} with up to 4 tries,
    so tb_setup_devices() survives a cold-start 5xx from a self-hosted
    ThingsBoard sitting behind nginx / a load balancer.
  • tb_auth_header() warns when only one of TB_USERNAME / TB_PASSWORD
    is set and a leftover TB_API_KEY silently falls back to the Cloud path,
    so the common "one secret missing" misconfiguration is called out instead
    of failing obscurely downstream.
  • Non-2xx responses surface a server response-body excerpt (≤ ~800 chars) in
    R errors and retry messages. Stock ThingsBoard does not echo credentials,
    but operators whose reverse proxy echoes request fields should mask secrets
    in their CI logs.
  • Internal tidy-up: tb_list_device_telemetry_keys() keeps a clean
    api_key / username / password signature; the pre-resolved-header path
    used by tb_delete_device_telemetry() moved into a private helper, so there
    is no ambiguous credential precedence on the exported function.

Push every station, not just the demo five

  • TB_MAX_DEVICES=0 lifts the 5-device demo cap and pushes every selected
    station -- now wired into the workflow as a repository secret and a
    workflow_dispatch input.
  • TB_STATION_SCOPE chooses which groundwater stations the auto-pick
    considers: both (default -- level and quality, the proven demo set),
    any (level or quality), gwl / gwq (has that series, possibly both)
    or gwl-only / gwq-only (has only that series). TB_TELEMETRY_TYPES
    still decides which series get pushed per station.
  • Station scoring counts distinct quality parameters once via split() +
    vapply(), so scoring the full several-hundred-station pool stays fast.
  • The selection diagnostic is clearer and self-reconciling: the strict
    "both masters + both series" row is labelled distinctly from the relaxed
    per-series counts, a reconciling row is added so the numbers add up, and
    orphan stations (IDs that have data but are missing from both master
    files, and are therefore dropped from the candidate pool) are flagged with
    a message -- so master/data drift is visible instead of silent.
  • The push script validates its numeric TB_* variables up front
    (TB_MAX_DEVICES, TB_HISTORY_DAYS, TB_CHUNK_SIZE, TB_THROTTLE_SECONDS,
    TB_MAX_ACTIVE) and aborts with a clear message on a non-numeric value,
    instead of letting an NA crash mid-push after every device attribute set
    was already uploaded. The message flags the usual cause -- .Renviron does
    not support inline # comments.

For a full self-hosted push set TB_PLAN=ce, TB_MAX_DEVICES=0 and
TB_STATION_SCOPE=any. Mind the volume -- several hundred devices over the
full archive is millions of data points; start with a bounded
TB_HISTORY_DAYS to validate the run.

Documentation & packaging

  • Update the Kompetenzzentrum Wasser Berlin (KWB) author logo on the pkgdown
    site to the new brand asset.
  • The package now requires R ≥ 4.1.0 (Depends), reflecting the native
    |> pipe used in the ThingsBoard code; the unused LazyData field was
    removed.