Skip to content

wasserportal (v0.6.0): R Package with Functions for Scraping Data of Wasserportal Berlin

Choose a tag to compare

@mrustl mrustl released this 17 Jun 14:01
· 28 commits to master since this release
5a3dac1

wasserportal 0.6.0 2026-06-17

This release adds a complete ThingsBoard integration: push Wasserportal
groundwater time series and master data into a ThingsBoard tenant, visualise
them on a ready-made dashboard, and keep everything up to date from GitHub
Actions. The remaining changes harden the historical push so it survives
multi-hour runs against the ThingsBoard Cloud free tier.

New: ThingsBoard integration

  • Push API — new exported helpers tb_setup_devices(),
    tb_push_station_telemetry(), tb_push_station_attributes() and
    tb_push_latest_telemetry() ship Wasserportal time series and master data
    into a ThingsBoard tenant via the device-token telemetry API.
    tb_setup_devices() bootstraps a fresh tenant from an account-level API key
    (sent as X-Authorization: ApiKey <key>), so the whole workflow runs from R.
  • Device discovery & cleanuptb_get_device_id(),
    tb_list_device_telemetry_keys() and tb_delete_device_telemetry() for
    read-only discovery and selective telemetry deletion (latitude/longitude and
    other attributes are preserved, so map widgets keep working after a wipe).
  • Demo vignettevignettes/thingsboard-demo.Rmd walks through the
    ThingsBoard Cloud free-tier (Maker) demo on eu.thingsboard.cloud, including
    the switch to self-hosted Community Edition.
  • Importable dashboardinst/extdata/thingsboard-dashboard.json: an
    OpenStreetMap of the five Berlin groundwater stations, a master-data table and
    two time-series charts (level + selected quality parameters). Widgets discover
    the wasserportal-gw-* devices via an entityName-prefix alias, so the import
    needs no hardcoded device IDs.
  • Automation.github/workflows/thingsboard-push.yaml runs
    inst/scripts/push_to_thingsboard.R on push to main/master/dev, daily at
    07:00 UTC and via workflow_dispatch. The script consumes the daily JSON
    artefacts published to gh-pages (no scrape of its own), auto-selects the five
    groundwater stations with the richest gwl + gwq history, and uploads master
    data as attributes plus level and quality series as telemetry. Credentials come
    from the TB_HOST / TB_API_KEY repository secrets.
  • GeocodingRechtswert_UTM_33_N / Hochwert_UTM_33_N (ETRS89 / UTM 33N,
    EPSG:25833) are converted to WGS84 latitude / longitude attributes so
    ThingsBoard map widgets work out of the box.

Reliability of large historical pushes

  • Transport-error retries — every httr2::req_retry() call now uses
    retry_on_failure = TRUE, so TCP/TLS dropouts ("Broken pipe", peer-closed
    sessions, brief DNS hiccups) are retried with exponential backoff
    (2 / 4 / 8 / 16 s) instead of aborting a station mid-push.
  • Batch-level retry — each parallel mode = "single" batch is wrapped in a
    4-attempt retry (2 / 4 / 8 s) that forces a fresh libcurl connection,
    recovering from poisoned connection-pool handles. Safe because ThingsBoard
    de-duplicates telemetry by (ts, key), so re-POSTs never create duplicate rows.
  • Parallel single-mode pushmode = "single" now uses
    httr2::req_perform_parallel() (max_active, default 10 on Free), lifting
    throughput from ~1.2 to ~10 records/s. Batches are paced one max_active group
    at a time and retried on transient 500/502/503/504 to stay under the Free
    tier's per-device rate limit.
  • Plan presetstb_plan_defaults() and the TB_PLAN env var pick
    mode / chunk_size / throttle_seconds / max_active per ThingsBoard plan
    (free, free-bulk, prototype/pilot/startup/business, ce).
    TB_TELEMETRY_MODE, TB_CHUNK_SIZE, TB_THROTTLE_SECONDS and TB_MAX_ACTIVE
    override individual values; plan, station_ids, history_days and
    telemetry_types are exposed as workflow_dispatch inputs.

Data-handling fixes for the Maker free tier

  • Per-triple records — single mode sends one record per
    (timestamp, key, value) triple instead of grouping ~30 analytes per sampling
    event into one "fat" record, which the Cloud Maker gateway rejected with an
    opaque HTTP 500.
  • Key sanitisationsanitize_tb_key() folds umlauts, drops parentheses
    and replaces spaces / dots / commas with underscores, so quality parameters
    like Leitfaehigkeit 25 grd C vor Ort or pH-Wert (Feld) push through.
  • Pre-1970 timestamps droppedbuild_telemetry_payload() filters
    ts_ms > 0; stations starting in the 1950s produced negative epoch
    milliseconds that the Maker tier answered with HTTP 500 (e.g. station 3 loses
    ~17 years of monthly readings but keeps ~7800 values).
  • Clearer errorstb_error_body() surfaces ThingsBoard's JSON message
    field in R errors instead of the generic "HTTP 500 Internal Server Error".
  • Removed the per-device tb_push_latest_telemetry() smoke test from the push
    script (it left a misleading "latest" row); the helper stays exported for
    ad-hoc connectivity probes.

Notes

  • The free-bulk preset is kept as a reproducible baseline but is confirmed
    not to work
    on the public Cloud Maker tier (as of 2026-05): the gateway
    rejects the array form regardless of chunk size. The default free (single
    mode) is the proven end-to-end path.