Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
We need an automated strategy for supporting anonymized data sets for research. By anonymized, I'm specifically calling out dropping identifier information in patient records like:
It feels to me like this feature ought to be focused around filtered replication in couch to handle specific records as a research copy of the data. That said, I don't have the technical details worked out... which is why someone needs to own this as a feature.
Since filtered replication can only tell whether a document should or shouldn't be replicated, I suggest we do a mapped one-way replication from the main database into a anonymized database.
Instead of using the main database, a research user (or any anonymized data user) would point to this database instead of the main one.
Another usage of this would be to replicate from the anonymized database into a central database, which could then be used for reporting purposes.
Some desirable side effects ideas:
(perhaps these should go into separate issues)