wasserportal (v0.6.0): R Package with Functions for Scraping Data of Wasserportal Berlin
wasserportal 0.6.0 2026-06-17
This release adds a complete ThingsBoard integration: push Wasserportal
groundwater time series and master data into a ThingsBoard tenant, visualise
them on a ready-made dashboard, and keep everything up to date from GitHub
Actions. The remaining changes harden the historical push so it survives
multi-hour runs against the ThingsBoard Cloud free tier.
New: ThingsBoard integration
- Push API — new exported helpers
tb_setup_devices(),
tb_push_station_telemetry(),tb_push_station_attributes()and
tb_push_latest_telemetry()ship Wasserportal time series and master data
into a ThingsBoard tenant via the device-token telemetry API.
tb_setup_devices()bootstraps a fresh tenant from an account-level API key
(sent asX-Authorization: ApiKey <key>), so the whole workflow runs from R. - Device discovery & cleanup —
tb_get_device_id(),
tb_list_device_telemetry_keys()andtb_delete_device_telemetry()for
read-only discovery and selective telemetry deletion (latitude/longitude and
other attributes are preserved, so map widgets keep working after a wipe). - Demo vignette —
vignettes/thingsboard-demo.Rmdwalks through the
ThingsBoard Cloud free-tier (Maker) demo oneu.thingsboard.cloud, including
the switch to self-hosted Community Edition. - Importable dashboard —
inst/extdata/thingsboard-dashboard.json: an
OpenStreetMap of the five Berlin groundwater stations, a master-data table and
two time-series charts (level + selected quality parameters). Widgets discover
thewasserportal-gw-*devices via anentityName-prefix alias, so the import
needs no hardcoded device IDs. - Automation —
.github/workflows/thingsboard-push.yamlruns
inst/scripts/push_to_thingsboard.Ron push tomain/master/dev, daily at
07:00 UTC and viaworkflow_dispatch. The script consumes the daily JSON
artefacts published togh-pages(no scrape of its own), auto-selects the five
groundwater stations with the richest gwl + gwq history, and uploads master
data as attributes plus level and quality series as telemetry. Credentials come
from theTB_HOST/TB_API_KEYrepository secrets. - Geocoding —
Rechtswert_UTM_33_N/Hochwert_UTM_33_N(ETRS89 / UTM 33N,
EPSG:25833) are converted to WGS84latitude/longitudeattributes so
ThingsBoard map widgets work out of the box.
Reliability of large historical pushes
- Transport-error retries — every
httr2::req_retry()call now uses
retry_on_failure = TRUE, so TCP/TLS dropouts ("Broken pipe", peer-closed
sessions, brief DNS hiccups) are retried with exponential backoff
(2 / 4 / 8 / 16 s) instead of aborting a station mid-push. - Batch-level retry — each parallel
mode = "single"batch is wrapped in a
4-attempt retry (2 / 4 / 8 s) that forces a fresh libcurl connection,
recovering from poisoned connection-pool handles. Safe because ThingsBoard
de-duplicates telemetry by(ts, key), so re-POSTs never create duplicate rows. - Parallel single-mode push —
mode = "single"now uses
httr2::req_perform_parallel()(max_active, default 10 on Free), lifting
throughput from ~1.2 to ~10 records/s. Batches are paced onemax_activegroup
at a time and retried on transient 500/502/503/504 to stay under the Free
tier's per-device rate limit. - Plan presets —
tb_plan_defaults()and theTB_PLANenv var pick
mode/chunk_size/throttle_seconds/max_activeper ThingsBoard plan
(free,free-bulk,prototype/pilot/startup/business,ce).
TB_TELEMETRY_MODE,TB_CHUNK_SIZE,TB_THROTTLE_SECONDSandTB_MAX_ACTIVE
override individual values;plan,station_ids,history_daysand
telemetry_typesare exposed asworkflow_dispatchinputs.
Data-handling fixes for the Maker free tier
- Per-triple records — single mode sends one record per
(timestamp, key, value)triple instead of grouping ~30 analytes per sampling
event into one "fat" record, which the Cloud Maker gateway rejected with an
opaque HTTP 500. - Key sanitisation —
sanitize_tb_key()folds umlauts, drops parentheses
and replaces spaces / dots / commas with underscores, so quality parameters
likeLeitfaehigkeit 25 grd C vor OrtorpH-Wert (Feld)push through. - Pre-1970 timestamps dropped —
build_telemetry_payload()filters
ts_ms > 0; stations starting in the 1950s produced negative epoch
milliseconds that the Maker tier answered with HTTP 500 (e.g. station 3 loses
~17 years of monthly readings but keeps ~7800 values). - Clearer errors —
tb_error_body()surfaces ThingsBoard's JSONmessage
field in R errors instead of the generic "HTTP 500 Internal Server Error". - Removed the per-device
tb_push_latest_telemetry()smoke test from the push
script (it left a misleading "latest" row); the helper stays exported for
ad-hoc connectivity probes.
Notes
- The
free-bulkpreset is kept as a reproducible baseline but is confirmed
not to work on the public Cloud Maker tier (as of 2026-05): the gateway
rejects the array form regardless of chunk size. The defaultfree(single
mode) is the proven end-to-end path.