feat(privacy-filter): add server-side heartbeat privacy filtering#599
feat(privacy-filter): add server-side heartbeat privacy filtering#599TimeToBuildBob wants to merge 1 commit into
Conversation
Add configurable regex-based privacy filters that intercept heartbeat
events before they reach the datastore. Rules are stored in the
settings key `privacy_filters` as a JSON array and are applied on
every heartbeat request.
Each rule supports:
- `bucket_prefix` (optional): scope the rule to specific buckets
- `field`: the event data key to match (e.g. "title", "app", "url")
- `pattern`: fancy-regex pattern (supports lookaheads, Unicode)
- `action`: "drop" (discard event) or "redact" (replace field value)
- `replacement`: custom redaction string (default: "[redacted]")
Implementation details:
- Filter logic lives in a new `aw_transform::privacy_filter` module
- Invalid regex patterns are logged and skipped (fail-open / graceful)
- Non-string field values are skipped (type-safe matching)
- Dropped events return HTTP 200 with an empty event body so clients
see no error
Example rule (drop private-browsing window titles):
{"field":"title","pattern":"(?i)private browsing|incognito","action":"drop"}
Closes: ActivityWatch/activitywatch#1 (partial — server-side filter MVP)
Relates to: ActivityWatch#482
Greptile SummaryThis PR adds a server-side privacy filtering system for heartbeat events, applying configurable regex-based drop/redact rules from the
Confidence Score: 4/5Safe to merge as an MVP, but the per-heartbeat DB read and regex compilation will add measurable overhead for any user with filters configured, and the dropped-event response shape may cause subtle watcher state drift. The core filtering logic is correct and the test coverage is solid. The main concerns are the per-heartbeat cost of loading settings from SQLite and recompiling regexes (both avoidable with a cache), the semantically odd Event::default() returned for dropped events (which can confuse watcher last_event tracking), and the silent bypass of privacy rules on the bulk-insert endpoint. None of these break the feature outright, but they are real rough edges for a privacy-sensitive path. aw-server/src/endpoints/bucket.rs — the per-heartbeat overhead and dropped-event response shape both live here and warrant the most attention before this ships widely. Important Files Changed
Sequence DiagramsequenceDiagram
participant W as Watcher
participant E as bucket_events_heartbeat
participant DS as Datastore
participant PF as apply_privacy_filter
W->>E: "POST /buckets/<id>/heartbeat"
E->>DS: get_key_value("settings.privacy_filters")
DS-->>E: raw JSON string (or error to [])
E->>E: "serde_json::from_str to Vec<PrivacyFilterRule>"
loop each rule
E->>PF: compile Regex::new(rule.pattern)
end
E->>PF: "apply_privacy_filter(bucket_id, heartbeat, &rules)"
alt Drop rule matched
PF-->>E: None
E-->>W: 200 OK Event::default()
else Redact rule(s) matched
PF-->>E: Some(event with redacted fields)
E->>DS: datastore.heartbeat(bucket_id, event, pulsetime)
DS-->>E: stored/merged Event
E-->>W: 200 OK stored Event
else No rule matched
PF-->>E: Some(original event)
E->>DS: datastore.heartbeat(bucket_id, event, pulsetime)
DS-->>E: stored/merged Event
E-->>W: 200 OK stored Event
end
|
| let rules: Vec<PrivacyFilterRule> = match datastore.get_key_value("settings.privacy_filters") { | ||
| Ok(raw) => serde_json::from_str(&raw).unwrap_or_else(|e| { | ||
| warn!("Failed to parse privacy_filters setting: {}", e); | ||
| vec![] | ||
| }), | ||
| Err(_) => vec![], | ||
| }; |
There was a problem hiding this comment.
Per-heartbeat DB read and regex compilation
get_key_value("settings.privacy_filters") is called on every heartbeat, adding a SQLite read + JSON deserialization on the hottest write path. Immediately after, apply_privacy_filter compiles every regex from scratch on each call. aw-watcher-window sends roughly one heartbeat per second, so a user with a few rules will pay both costs ~86 400 times per day. Settings should be loaded once (at startup or lazily with a short-lived cache) and compiled regexes stored alongside the parsed rules.
| // Event matched a drop rule — acknowledge without storing. | ||
| None => return Ok(Json(Event::default())), | ||
| }; |
There was a problem hiding this comment.
Dropped-event response misleads the client
When a drop rule fires, Event::default() is returned — this has timestamp: Utc::now(), duration: 0, and data: {}. Python/Rust watcher clients store the returned heartbeat as last_event and compare its data and computed end-time (timestamp + duration) when deciding whether to merge the next heartbeat. Returning an event whose data is empty and whose timestamp is "now" means the client treats the drop as a zero-duration event that just ended, which can distort the start-time of the very next legitimate event. Returning the incoming heartbeat unchanged (still without storing it) would give the client a more accurate view of what happened.
| let regex = match Regex::new(&rule.pattern) { | ||
| Ok(r) => r, | ||
| Err(e) => { | ||
| warn!("Privacy filter: invalid regex '{}': {}", rule.pattern, e); | ||
| continue; | ||
| } | ||
| }; | ||
|
|
||
| let field_str = match event.data.get(&rule.field) { | ||
| Some(Value::String(s)) => s.clone(), | ||
| _ => continue, | ||
| }; | ||
|
|
||
| let matches = match regex.is_match(&field_str) { | ||
| Ok(m) => m, | ||
| Err(e) => { | ||
| warn!( | ||
| "Privacy filter: regex match error for '{}': {}", | ||
| rule.pattern, e | ||
| ); | ||
| continue; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Fail-open on parse error silently disables all filters
When serde_json::from_str fails (malformed JSON, schema change, type mismatch), all rules are silently dropped and every event passes through unfiltered. A user who makes a typo in a rule's JSON has no indication that their privacy filters stopped working — they see only a warn! log entry. Consider surfacing this more visibly (e.g., returning a 500 on the next heartbeat, or a /api/0/settings/privacy_filters/status endpoint) so users can detect misconfiguration before sensitive data accumulates.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #599 +/- ##
==========================================
+ Coverage 70.81% 76.11% +5.29%
==========================================
Files 51 61 +10
Lines 2916 4710 +1794
==========================================
+ Hits 2065 3585 +1520
- Misses 851 1125 +274 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
Adds configurable regex-based privacy filters applied at heartbeat ingestion — the most-requested ActivityWatch feature (ActivityWatch/activitywatch#1, 10+ years open).
Rules are stored in the settings key
privacy_filtersas a JSON array and applied on everyPOST /api/0/buckets/<id>/heartbeatrequest before the event reaches the datastore.Rule schema
[ { "bucket_prefix": "aw-watcher-window", "field": "title", "pattern": "(?i)private browsing|incognito", "action": "drop" }, { "field": "title", "pattern": "(?i)secret|confidential", "action": "redact", "replacement": "[redacted]" } ]fieldtitle,app,url, …)patternactiondrop— discard event entirely;redact— replace field valuebucket_prefixreplacementredactaction (default:[redacted])Implementation
aw_transform::privacy_filtermodule — zero new dependencies (fancy-regex already present)bucket_events_heartbeatbeforedatastore.heartbeat()What this does NOT include (intentional MVP scope)
/api/0/settings/privacy_filtersdirectly or via UI in a follow-upRelated