Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Regex-based inbound data filters #482

Open
GeorchW opened this issue May 15, 2024 · 4 comments
Open

Suggestion: Regex-based inbound data filters #482

GeorchW opened this issue May 15, 2024 · 4 comments

Comments

@GeorchW
Copy link

GeorchW commented May 15, 2024

Being able to filter the inbound data has been the most popular feature request for a very long time and comes up again and again in the issues of different repositories. In the status quo, some watchers (e.g. the window watcher) have their own filtering, but they have various problems:

  1. Discoverability: The configuration is so hidden that even the developers forget that it exist (no offense). Even when it's exposed as a config, it still requires the user to find it somewhere in the docs, then edit the configuration file to set it up correctly. This is not very discoverable.
  2. Consistency: The user needs to look into the docs for each watcher to see whether it supports data filtering and to find out how it's configured.
  3. Limited expressiveness: For the window watcher, it's only possible to remove all window titles. Many people say they only want to exclude some sensitive information (e.g. mail subjects). In my case, I'm frequently using an app that shows a timer in the window title, creating a new entry every second, which makes the timeline barely readable and very unresponsive.

Suggestion

We could add regex-based filtering on the heartbeat level: whenever a heartbeat comes in, it's checked against a set of user-configurable regexes. If one matches on any field of the entry, the entry is discarded. We could also extend this feature to allow regex-replacing entries or matching only some fields of the JSON entry.

Similar inbound data filters can be found in e.g. Sentry.

I could probably implement this by myself, at least in Python and the Vue frontend, but probably in Rust as well, but I'd like to know if the approach is welcome in the first place. Tbh, if it isn't, I'd consider writing a simple proxy server that does exactly this -- applying some replacements to the heartbeat endpoint and passing everything else through.

@ErikBjare
Copy link
Member

The window watcher recently had another PR merged for filtering window titles by regex on the client-side before sending: ActivityWatch/aw-watcher-window#99

I don't like the idea of server-side data filters (ideally it'd happen already on the client), but totally agree about discoverability/ease of configuration. This could be addressed with the server-side settings that's in the recent betas. Watchers could fetch the server's filter settings (which could be configured in the Settings view) and filter before sending, just like in the PR above.

I think your plan sounds great, PRs welcome!

@GeorchW
Copy link
Author

GeorchW commented May 20, 2024

What's your argument for client-side data filtering exactly? I don't see the point of implementing regex filtering in each client again and again tbh. I think even performance-wise it would be nicer to have a single efficient implementation in Rust.

I see that there is a bit of overhead involved with sending the full window title to the server, but after all, the communication is happening on localhost, where the bandwidth is basically unlimited, and we're talking about much less than 1 kByte/s.

I'm not sure if there's any privacy advantage by filtering earlier either. I see that there is some possibility of MITM'ing the server, but I don't think it's very likely that such an attack happens -- and if it does, the early filtering is not a sufficient privacy gurantee at all. On the other hand, it's much more likely that users want to show their timeline to others, but want to make sure that some data will never be visible there. Depending on what each watcher implements, they might not have the ability to do so.

Implementing it in a central place also allows to iterate on the design much easier, e.g. when adding replacements.

I see that different watchers provide different fields on which the regexes could be applied, so they might want to have some control over the way the filtering works. But then again, the categories work the same way.

Thinking of which, I think by having the filtering implemented in the server, it could deliver a much nicer user experience when setting it up, since it could preview the changes it would have applied if it were active in the past. We could even re-use the implementation for data scrubbing.

@ErikBjare
Copy link
Member

ErikBjare commented May 20, 2024

What's your argument for client-side data filtering exactly?

It feels wrong to send potentially sensitive information (even if locally) only for it to be discarded.

I think even performance-wise it would be nicer to have a single efficient implementation in Rust.

Performance is not a concern as regexes are fast in any language and the strings involved are short. Most people are still using aw-server-python (default) and we are keeping them at feature-parity, so there'd be no "single implementation" anyway.

Implementing it in a central place also allows to iterate on the design much easier, e.g. when adding replacements.

It's practically already implemented in aw-watcher-window.

imo there's very little to iterate on here. The design is clear, just need to add a setting for it in aw-webui and make the watcher respect it.

I think by having the filtering implemented in the server, it could deliver a much nicer user experience when setting it up, since it could preview the changes it would have applied if it were active in the past.

I don't see how it would affect the user experience in any way. None of those things require filtering implemented in the server.

We could even re-use the implementation for data scrubbing.

Data scrubbing with previews would be purely an UI feature in aw-webui using the existing API, no changes needed to the server.

On the other hand, it's much more likely that users want to show their timeline to others, but want to make sure that some data will never be visible there.

This seems like a different but similar feature, where you want some sensitive data stored (not filtered in the first place), but you want it hidden/masked for the purpose of sharing/screenshots (prob what I want instead of a filter). Seems like another purely UI feature. Already stored data matching the filter expression could be hidden/masked by default in the UI.

@GeorchW
Copy link
Author

GeorchW commented May 20, 2024

Ok, I think I'll just write myself a proxy for scrubbing then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants