Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting anonymous research data #14

Closed
tangollama opened this issue Apr 22, 2016 · 4 comments
Closed

Supporting anonymous research data #14

tangollama opened this issue Apr 22, 2016 · 4 comments

Comments

@tangollama
Copy link
Member

@tangollama tangollama commented Apr 22, 2016

We need an automated strategy for supporting anonymized data sets for research. By anonymized, I'm specifically calling out dropping identifier information in patient records like:

  • First and Last Name
  • Street Address
  • Phone number
  • Email
  • Names and addresses of related contacts

It feels to me like this feature ought to be focused around filtered replication in couch to handle specific records as a research copy of the data. That said, I don't have the technical details worked out... which is why someone needs to own this as a feature.

@pgte
Copy link
Contributor

@pgte pgte commented May 24, 2016

Since filtered replication can only tell whether a document should or shouldn't be replicated, I suggest we do a mapped one-way replication from the main database into a anonymized database.
This mapped replication would listen to changes from the main database and, for each document passing, map it on the fly.
This could be a special-purpose node process that would act as a replication proxy (on demand from CouchDB, so that we don't have to reimplement replication and limit ourselves to only filtering some documents on the fly).

Instead of using the main database, a research user (or any anonymized data user) would point to this database instead of the main one.

Another usage of this would be to replicate from the anonymized database into a central database, which could then be used for reporting purposes.

Some desirable side effects ideas:

(perhaps these should go into separate issues)

  • The "researcher" user role would be forced to use this database instead of the main one.
  • Filter writes made by researcher role (error when trying to write to anonymized documents).
  • Include some tests on the test suite to validate that a research user only has access to the anonymized database.
  • Separately implement the process of anonymizing a document (Simpson / Star Wars / * characters replacing real personal data)
@stale
Copy link

@stale stale bot commented Aug 7, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Aug 7, 2019
@fox1t fox1t self-assigned this Aug 7, 2019
@stale stale bot removed the wontfix label Aug 7, 2019
@fox1t
Copy link
Member

@fox1t fox1t commented Aug 7, 2019

This is one of the main goals of the project!

@stale
Copy link

@stale stale bot commented Oct 6, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 6, 2019
@fox1t fox1t closed this Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.