-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage data sanitization #11
Comments
paths -> replace dirs except for common ones replace parts of arguments if possible: replace whole arguments if they don't match any special form keep standard arguments and commands (?) |
Almost done: https://github.com/curusarn/resh/tree/dev_2 |
I'm handling different types of data differently. TypesSingle value entriese.g. username, hostname (usually sensitive information)
Paths
Git origin URL
Command lineI need to replace the command and arguments separately so that I can analyze partial matches later.
WhitelistingI created a whitelist containing various common strings.
TL;DRI pretty much hash everything except:
|
|
I have shown this to 3 of my colleagues. Everyone was okay with the result. I got a suggestion that data is sanitized too much. |
I have found a couple of file extension databases. However, they don't seem very fond of other people using their data. I have asked FileInfo.com for permission to use their data.
|
I have added a few of common TLDs to the list. Source: https://www.hayksaakian.com/most-popular-tlds/ |
My idea is to replace sensitive info with placeholders.
It's important to make sure that the same piece of information is always replaced with the same placeholder.
Replace sensitive info with hashes.
The text was updated successfully, but these errors were encountered: