Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anonymization → de-identification #108

Merged
merged 1 commit into from
Jul 9, 2021

Conversation

DimitriPapadopoulos
Copy link
Contributor

Fixes #106.

@CPernet
Copy link
Contributor

CPernet commented Jul 5, 2021

@yarikoptic this merging should not happen -- de-identification is the term used by the US Health Insurance Portability and Accountability Act (HIPPA) and as such you can retain patients ID which akin to pseudonymization (to make it clear US-deidentified data as defined in HIPPA remains personal under GDPR)

by contrast anomymization as a clear-cut data status

@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Jul 5, 2021

@CPernet

by contrast anomymization as a clear-cut data status

The meaning of "anomymization" differs widely between the US and the EU. I'm not sure what you mean by "clear-cut data status". I do understand "de-identification" has a specific meaning under HIPAA,, but so does "anonymization" under GDPR.

I do not view the fact that de-identified data under HIPAA data remains personal data under GDPR as a problem. It's almost impossible to anonymize data under GDPR, almost any interesting data will remain personal (pseudnymized instead of anonymized) under GDPR. The idea here is precisely to pseudonymize data, not to "anonymize" data, therefore "de-identification" is the proper term.

@CPernet
Copy link
Contributor

CPernet commented Jul 5, 2021

then psedonymized is the proper term, de-identified is HIPAA specific

@DimitriPapadopoulos
Copy link
Contributor Author

To make things clear:

  • "Anonymization" has a specific and very restrictive meaning meaning under GDPR:
  • On the other hand, "de-identification" under HIPAA and "pseudonymization" under GDPR are similar.
  • The idea here is to "pseudonymize" or "de-identify" data, as it is often unfeasible to "anonymize" neuroimaging data in the GDPR sense.
  • Hence I recommend you use the term "de-identification", as in "pseudonymization", because the term "anonymization" (as in the GDPR) is misused here.

@CPernet
Copy link
Contributor

CPernet commented Jul 5, 2021

I agree with all the above :-) except 'Hence I recommend you use the term "de-identification", as in "pseudonymization", because the term "anonymization" (as in the GDPR) is misused here.' why? to be american centric?

@DimitriPapadopoulos
Copy link
Contributor Author

I find "pseudonymization" too specific. The idea is to use a term that covers attempts to de-identify data, ranging from simple peudonymization to anonymization. The only term that comes to mind is "de-identification". I acknowledge the term has been hijacked by HIPAA to mean something specific, but that's certainbly less specific than "anonymization" under GDPR. I cannot find an alternative term to de-identification in its broad (not HIPAA) sense.

@CPernet
Copy link
Contributor

CPernet commented Jul 5, 2021

ok but then we must define first de-identification to make sure people understand what you mean (I would actually copy/paste the points you made above since they are correct)

@DimitriPapadopoulos
Copy link
Contributor Author

I totally agree. We need a glossary with some basic terms:

  • de-identification (in the broad sense)
  • de-identification (as in HIPAA)
  • pseudonymization: may involve more than changing names to pseudonyms but I do not think the term is precisely defined
  • anonymization (as in the GDPR)
  • anonymization in the broad sense?

@CPernet
Copy link
Contributor

CPernet commented Jul 5, 2021

what about

  • anonymization: a process that involves removing sufficient
    elements so that a natural person can no longer be identified (ISO 29100:2011)
  • de-identification: concept from the US Health Insurance Portability and Accountability Act (HIPPA) which does not involve irreversibility as anonymization does
  • pseudonymization: is defined by GDPR as a process that removes information about identity which may or may not change the status of data from personal to anonymous (it does not for brain imaging data, because the EU court of justice indicated that the possibility of identification is enough to consider data as personal)

Unless specifically mentioned, we used here de-identification in the generic sense, meaning removing obvious features leading to identification (names, addresses, maybe facial information from MRI).

@DimitriPapadopoulos
Copy link
Contributor Author

Yes, looks good.

Is ISO 29100:2011 equivalent to the G29 opinion on anonymization techniques? I haven't read ISO 29100:2011 as I cannot find the PDF on the web site of the ISO/IEC Information Technology Task Force.

By the way, do you have a link to this decision: the EU court of justice indicated that the possibility of identification is enough to consider data as personal?

@CPernet
Copy link
Contributor

CPernet commented Jul 5, 2021

sure - case from IP address - https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2933781

@DimitriPapadopoulos
Copy link
Contributor Author

On the other hand, Wikipedia defines "pseudonymization" as a "de-identification" procedure, hence as a subset of it:

Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms.[1] A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing.

Perhaps we should use "de-identification" in the broad sense, as defined by Wikipedia:

De-identification is the process used to prevent someone's personal identity from being revealed. For example, data produced during human subject research might be de-identified to preserve the privacy of research participants. Biological data may be de-identified in order to comply with HIPAA regulations that define and stipulate patient privacy laws. [1]

@CPernet
Copy link
Contributor

CPernet commented Jul 5, 2021

sure we can use that definition, I would still add that it's a concept from the US Health Insurance Portability and Accountability Act (HIPPA) which does not involve irreversibility as anonymization does

@CPernet
Copy link
Contributor

CPernet commented Jul 5, 2021

don't have the ISO thing either

@DimitriPapadopoulos
Copy link
Contributor Author

I'll add the glossary.

@DimitriPapadopoulos
Copy link
Contributor Author

@CPernet Can you have a look at the updated merge request? I have added the glossary. Fell free to modify.

@DimitriPapadopoulos
Copy link
Contributor Author

Here are the ISO documents:

Copy link
Contributor

@CPernet CPernet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proposed to reverse for credits - the EU workshop discussed anonymization

@CPernet
Copy link
Contributor

CPernet commented Jul 9, 2021

where is the glossary? couldn't not see it -- we need to define up front what we mean in the doc by de-identification and refer to the glossary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Anonymization"
2 participants