Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean #40

Open
edsu opened this issue May 13, 2020 · 2 comments
Open

Clean #40

edsu opened this issue May 13, 2020 · 2 comments

Comments

@edsu
Copy link
Member

edsu commented May 13, 2020

When pointed at a JSONL file Hydrator could rehydrate or clean the dataset, filtering out any tweets that have been deleted. In addition the Hydrator could allow user to enter an optional list of users to exclude from the dataset, if someone has requested their data be removed, and they don't want to force the user to delete or protect their tweets.

(Thanks to @bergisjules for the use case and the name).

@edsu
Copy link
Member Author

edsu commented May 14, 2020

Here's some more context for this issue.

On a recent DocNow call we discussed whether it would be possible for users to register that they wanted to be removed from a tweet id collection. The Hydrator could then consult that list when hydrating the ids, and filter them out.

However we wouldn't want people to have to publicly register that they want to be removed since it could bring unwanted attention to their choice. But there are cases when we might want the registration to be public (e.g. elected officials).

It would be relatively easy to allow a Hydrator user to enter in a list of users to exclude from hydration. But are there any creative approaches for allowing the user to know what Twitter users might want to be removed?

@edsu
Copy link
Member Author

edsu commented May 14, 2020

I was thinking that perhaps one obvious approach would be not to publish the User IDs of the users who wish to be removed, but to publish hashes of them instead.

When the Hydrator hydrates the data it could then hash the User ID and see if matches one that wants to be removed and act accordingly. The problem then is that other people could easily write a tool that walks through the dataset and highights that the user wanted to be removed.

So hashes by themselves aren't really enough. It feels like there is some missing social piece that can help ensure contextual integrity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant