Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Supporting anonymous research data #14
We need an automated strategy for supporting anonymized data sets for research. By anonymized, I'm specifically calling out dropping identifier information in patient records like:
It feels to me like this feature ought to be focused around filtered replication in couch to handle specific records as a research copy of the data. That said, I don't have the technical details worked out... which is why someone needs to own this as a feature.
Since filtered replication can only tell whether a document should or shouldn't be replicated, I suggest we do a mapped one-way replication from the main database into a anonymized database.
Instead of using the main database, a research user (or any anonymized data user) would point to this database instead of the main one.
Another usage of this would be to replicate from the anonymized database into a central database, which could then be used for reporting purposes.
Some desirable side effects ideas:
(perhaps these should go into separate issues)