Create guidelines for OBO maintainers who want to be included in Wikidata #285

Open
cmungall opened this Issue Jul 12, 2016 · 44 comments

Comments

Projects
None yet
@cmungall
Contributor

cmungall commented Jul 12, 2016

Most OBOs are CC-BY, Wikidata requires CC-0. Some ontologies have apparently granted Wikidata permission to redistribute part or all of their ontology.

We want to make sure this is streamlined with a common process for everyone. Not clear to me how this should be done, ideas welcome, add below.

@cmungall cmungall referenced this issue in obophenotype/uberon Jul 13, 2016

Open

uberon and Wikidata #1235

@mcourtot

This comment has been minimized.

Show comment
Hide comment
@mcourtot

mcourtot Jul 13, 2016

Contributor

CC-by but attribution chosen is by PURL - as long as wikidata uses the PURL (perhaps replacing the URL formatter that currently links directly to AmiGO) I think it should be ok.

Contributor

mcourtot commented Jul 13, 2016

CC-by but attribution chosen is by PURL - as long as wikidata uses the PURL (perhaps replacing the URL formatter that currently links directly to AmiGO) I think it should be ok.

@elviram

This comment has been minimized.

Show comment
Hide comment
@elviram

elviram Jul 13, 2016

Contributor

I agree that this needs to be written down somewhere. It would make everything much clearer and we could avoid getting into lengthy discussion later down the line, or even have to remove data, if something were to happen.

Contributor

elviram commented Jul 13, 2016

I agree that this needs to be written down somewhere. It would make everything much clearer and we could avoid getting into lengthy discussion later down the line, or even have to remove data, if something were to happen.

@andrewsu

This comment has been minimized.

Show comment
Hide comment
@andrewsu

andrewsu Jul 13, 2016

Contributor

re: @mcourtot's comment, it's true that Wikidata probably generally satisfies the attribution requirement for CC-BY. But Wikidata itself is CC0, so if you grant Wikidata permission to use your data, then you also grant downstream users those same CC0 terms. So someone who downloads your ontology via Wikidata would not be required to attribute in any way.

Contributor

andrewsu commented Jul 13, 2016

re: @mcourtot's comment, it's true that Wikidata probably generally satisfies the attribution requirement for CC-BY. But Wikidata itself is CC0, so if you grant Wikidata permission to use your data, then you also grant downstream users those same CC0 terms. So someone who downloads your ontology via Wikidata would not be required to attribute in any way.

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Jul 13, 2016

Contributor

On 13 Jul 2016, at 8:39, Andrew Su wrote:

re: @mcourtot's comment, it's true that Wikidata probably generally
satisfies the attribution requirement for CC-BY. But Wikidata itself
is CC0, so if you grant Wikidata permission to use your data, then you
also grant downstream users those same CC0 terms. So someone who
downloads your ontology via Wikidata would not be required to
attribute in any way.

Additionally, this points to an assumption in the OBO license that
assumes a PURL for every unit of attributable work. What if I want to
produce an ontology that is purely axioms on an existing ontology? This
could be logical axioms (e.g. providing equivalence axioms for DO, or
logical definitions for CHEBI roles) or annotation axioms (e.g. a
translation to another language)? If I want to ensure attribution I
would have to add PURLs for every axiom, which can be a high overhead.

Contributor

cmungall commented Jul 13, 2016

On 13 Jul 2016, at 8:39, Andrew Su wrote:

re: @mcourtot's comment, it's true that Wikidata probably generally
satisfies the attribution requirement for CC-BY. But Wikidata itself
is CC0, so if you grant Wikidata permission to use your data, then you
also grant downstream users those same CC0 terms. So someone who
downloads your ontology via Wikidata would not be required to
attribute in any way.

Additionally, this points to an assumption in the OBO license that
assumes a PURL for every unit of attributable work. What if I want to
produce an ontology that is purely axioms on an existing ontology? This
could be logical axioms (e.g. providing equivalence axioms for DO, or
logical definitions for CHEBI roles) or annotation axioms (e.g. a
translation to another language)? If I want to ensure attribution I
would have to add PURLs for every axiom, which can be a high overhead.

@NuriaQueralt

This comment has been minimized.

Show comment
Hide comment
@NuriaQueralt

NuriaQueralt Jul 14, 2016

Hi, this is Nuria a new post-doc in Andrew's lab. IMO, CC-BY license is a common license option for data providers groups in order to give visibility to their resources and to demonstrate their use by the community to funding agencies. Initiatives on the development of quality and resource use metrics in ELIXIR and NIH are ongoing to support decision-making in funding agencies. A win-to-win idea would be to suggest Wikidata to ELIXIR/NIH as one metrics component to compute resource use by the community. In this way, Wikidata could be used by funding agencies as a data endpoint to evaluate and identify relevant resources for the community, and could be used by data providers as a platform to make widely visible and available their resources to both the community and funding agencies. In this way, we will foster data providers to grant Wikidata permission for data sharing under CC0 license. I am not sure now if in Wikidata could be shown rankings such as number of downloads per year of ontologies, or number of citations per ontologies...

Hi, this is Nuria a new post-doc in Andrew's lab. IMO, CC-BY license is a common license option for data providers groups in order to give visibility to their resources and to demonstrate their use by the community to funding agencies. Initiatives on the development of quality and resource use metrics in ELIXIR and NIH are ongoing to support decision-making in funding agencies. A win-to-win idea would be to suggest Wikidata to ELIXIR/NIH as one metrics component to compute resource use by the community. In this way, Wikidata could be used by funding agencies as a data endpoint to evaluate and identify relevant resources for the community, and could be used by data providers as a platform to make widely visible and available their resources to both the community and funding agencies. In this way, we will foster data providers to grant Wikidata permission for data sharing under CC0 license. I am not sure now if in Wikidata could be shown rankings such as number of downloads per year of ontologies, or number of citations per ontologies...

@elviram

This comment has been minimized.

Show comment
Hide comment
@elviram

elviram Jul 14, 2016

Contributor

Hi Nuria, I have been in a quite a few Wikidata/Wikipedia meetings and the
one thing that is mentioned over and over is that they do not keep track of
users and data. It is all in the spirit of open and free data.
What is doable, is that we can look at how many times an item or property
is used. And there is always the possibility to point at how many times
Wikipedia and Wikidata are accessed.
I am all for your suggestion about ELIXIR/NIH.

What we could do with the licensing issues is draft a general agreement for
OBO ontologies and their free use in Wikidata. Anybody who would want their
ontology in Wikidata would have to sign/agree to it.

Cheers,
Elvira

On Thu, Jul 14, 2016 at 2:27 AM, Núria Queralt Rosinach <
notifications@github.com> wrote:

Hi, this is Nuria a new post-doc in Andrew's lab. IMO, CC-BY license is a
common license option for data providers groups in order to give visibility
to their resources and to demonstrate their use by the community to funding
agencies. Initiatives on the development of quality and resource use
metrics in ELIXIR and NIH are ongoing to support decision-making in funding
agencies. A win-to-win idea would be to suggest Wikidata to ELIXIR/NIH as
one metrics component to compute resource use by the community. In this
way, Wikidata could be used by funding agencies as a data endpoint to
evaluate and identify relevant resources for the community, and could be
used by data providers as a platform to make widely visible and available
their resources to both the community and funding agencies. In this way, we
will foster data providers to grant Wikidata permission for data sharing
under CC0 license. I am not sure now if in Wikidata could be shown rankings
such as number of downloads per year of ontologies, or number of citations
per ontologies...


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#285 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AIB6SBYhq_6zMQX4hLkxl9ErhS6Q_wqlks5qVdbvgaJpZM4JK62o
.

Elvira Mitraka, PhD
Postdoctoral Fellow
Institute of Genome Sciences
University of Maryland, School of Medicine
BioPark II, Office 664
emitraka@som.umaryland.edu
www.igs.maryland.edu

Contributor

elviram commented Jul 14, 2016

Hi Nuria, I have been in a quite a few Wikidata/Wikipedia meetings and the
one thing that is mentioned over and over is that they do not keep track of
users and data. It is all in the spirit of open and free data.
What is doable, is that we can look at how many times an item or property
is used. And there is always the possibility to point at how many times
Wikipedia and Wikidata are accessed.
I am all for your suggestion about ELIXIR/NIH.

What we could do with the licensing issues is draft a general agreement for
OBO ontologies and their free use in Wikidata. Anybody who would want their
ontology in Wikidata would have to sign/agree to it.

Cheers,
Elvira

On Thu, Jul 14, 2016 at 2:27 AM, Núria Queralt Rosinach <
notifications@github.com> wrote:

Hi, this is Nuria a new post-doc in Andrew's lab. IMO, CC-BY license is a
common license option for data providers groups in order to give visibility
to their resources and to demonstrate their use by the community to funding
agencies. Initiatives on the development of quality and resource use
metrics in ELIXIR and NIH are ongoing to support decision-making in funding
agencies. A win-to-win idea would be to suggest Wikidata to ELIXIR/NIH as
one metrics component to compute resource use by the community. In this
way, Wikidata could be used by funding agencies as a data endpoint to
evaluate and identify relevant resources for the community, and could be
used by data providers as a platform to make widely visible and available
their resources to both the community and funding agencies. In this way, we
will foster data providers to grant Wikidata permission for data sharing
under CC0 license. I am not sure now if in Wikidata could be shown rankings
such as number of downloads per year of ontologies, or number of citations
per ontologies...


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#285 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AIB6SBYhq_6zMQX4hLkxl9ErhS6Q_wqlks5qVdbvgaJpZM4JK62o
.

Elvira Mitraka, PhD
Postdoctoral Fellow
Institute of Genome Sciences
University of Maryland, School of Medicine
BioPark II, Office 664
emitraka@som.umaryland.edu
www.igs.maryland.edu

@goodb

This comment has been minimized.

Show comment
Hide comment
@goodb

goodb Aug 1, 2016

I like the idea of standardizing this process. That being said, we have made significant progress working through the addition of one resource at a time and getting permission one at a time. So.. whilst negotiations for an OBO-wide pattern continue, if we want data (e.g. Henning's suggestion of Reactome) in Wikidata, lets go ahead and ask the owners directly.

goodb commented Aug 1, 2016

I like the idea of standardizing this process. That being said, we have made significant progress working through the addition of one resource at a time and getting permission one at a time. So.. whilst negotiations for an OBO-wide pattern continue, if we want data (e.g. Henning's suggestion of Reactome) in Wikidata, lets go ahead and ask the owners directly.

@goodb

This comment has been minimized.

Show comment
Hide comment
@goodb

goodb Aug 1, 2016

You know what would make this all go away? Making OBO Foundry require a CC0 license.

Lets try to answer the attribution problem with good software for tracking usage, not with lawyers writing text that is unenforceable for the bad guys and massively distracting (as demonstrated here) for the good guys.

To Elvira's point above. Actually Wikipedia/Wikidata does keep extensive logs on usage. I started a thread about gaining access to them for the purpose of building an attribution engine. Response was pretty positive, but I didn't have the bandwidth to follow it up.

goodb commented Aug 1, 2016

You know what would make this all go away? Making OBO Foundry require a CC0 license.

Lets try to answer the attribution problem with good software for tracking usage, not with lawyers writing text that is unenforceable for the bad guys and massively distracting (as demonstrated here) for the good guys.

To Elvira's point above. Actually Wikipedia/Wikidata does keep extensive logs on usage. I started a thread about gaining access to them for the purpose of building an attribution engine. Response was pretty positive, but I didn't have the bandwidth to follow it up.

@balhoff

This comment has been minimized.

Show comment
Hide comment
@balhoff

balhoff Aug 1, 2016

Contributor

+1 for CC0. At least, I think it should be recommended more strongly (right now OBO recommends CC-BY).

cc @hlapp

Contributor

balhoff commented Aug 1, 2016

+1 for CC0. At least, I think it should be recommended more strongly (right now OBO recommends CC-BY).

cc @hlapp

@mcourtot

This comment has been minimized.

Show comment
Hide comment
@mcourtot

mcourtot Aug 1, 2016

Contributor

Is there a specific reason Wikidata can't accommodate CC-BY? Other than "they don't want to".

Contributor

mcourtot commented Aug 1, 2016

Is there a specific reason Wikidata can't accommodate CC-BY? Other than "they don't want to".

@goodb

This comment has been minimized.

Show comment
Hide comment
@goodb

goodb Aug 1, 2016

This was a (good) decision made a long time ago and at a higher level than
we are operating on here that is not likely to change. Adding technology
to accommodate multiple licensing patterns in the same knowledge graph is
not trivial and would be a distraction from the main objective.

On Mon, Aug 1, 2016 at 2:01 PM, Melanie Courtot notifications@github.com
wrote:

Is there a specific reason Wikidata can't accommodate CC-BY? Other than
"they don't want to".


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#285 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB_U6llYdIgjhbadn-Cahww0q9Nz_ZwEks5qbl7CgaJpZM4JK62o
.

goodb commented Aug 1, 2016

This was a (good) decision made a long time ago and at a higher level than
we are operating on here that is not likely to change. Adding technology
to accommodate multiple licensing patterns in the same knowledge graph is
not trivial and would be a distraction from the main objective.

On Mon, Aug 1, 2016 at 2:01 PM, Melanie Courtot notifications@github.com
wrote:

Is there a specific reason Wikidata can't accommodate CC-BY? Other than
"they don't want to".


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#285 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB_U6llYdIgjhbadn-Cahww0q9Nz_ZwEks5qbl7CgaJpZM4JK62o
.

@hlapp

This comment has been minimized.

Show comment
Hide comment
@hlapp

hlapp Aug 1, 2016

Contributor

You know what would make this all go away? Making OBO Foundry require a CC0 license.

I fought quite hard to even allow CC0 in the respective OBO Foundry principle. The recommendation is still CC-BY (for, IMHO, poorly motivated reasons).

One argument I made during the discussions leading up to that was that in particular because of the Realism principle espoused by OBO Foundry, most of the content of an OBO Foundry ontology will be unlikely to even satisfy as creative expression. Others, most prominently @alanruttenberg, argued against that, citing previous case law (of which there isn't much, but there is precedent of some ontology in some field having been ruled eligible for copyright protection).

IMO, the stronger argument (which I have also made) is that CC-BY as a legal instrument is the wrong tool to bring to bear for declaring an attribution requirement and demanding compliance with it. Attribution can be given in many different ways that all satisfy the legal requirement of a CC-BY license, but very few of which will satisfy the mechanism of attribution we as scientists really want. So it's really a social norm that we request compliance with, not a legal one, and so a CC-BY license, by itself, adds very little if anything to stating what we expect in return for reuse.

Bottom line, I remain entirely in favor of requiring, or at the very least strongly recommending, that OBO Foundry ontologies be released under a CC0 waiver.

Contributor

hlapp commented Aug 1, 2016

You know what would make this all go away? Making OBO Foundry require a CC0 license.

I fought quite hard to even allow CC0 in the respective OBO Foundry principle. The recommendation is still CC-BY (for, IMHO, poorly motivated reasons).

One argument I made during the discussions leading up to that was that in particular because of the Realism principle espoused by OBO Foundry, most of the content of an OBO Foundry ontology will be unlikely to even satisfy as creative expression. Others, most prominently @alanruttenberg, argued against that, citing previous case law (of which there isn't much, but there is precedent of some ontology in some field having been ruled eligible for copyright protection).

IMO, the stronger argument (which I have also made) is that CC-BY as a legal instrument is the wrong tool to bring to bear for declaring an attribution requirement and demanding compliance with it. Attribution can be given in many different ways that all satisfy the legal requirement of a CC-BY license, but very few of which will satisfy the mechanism of attribution we as scientists really want. So it's really a social norm that we request compliance with, not a legal one, and so a CC-BY license, by itself, adds very little if anything to stating what we expect in return for reuse.

Bottom line, I remain entirely in favor of requiring, or at the very least strongly recommending, that OBO Foundry ontologies be released under a CC0 waiver.

@andrewsu

This comment has been minimized.

Show comment
Hide comment
@andrewsu

andrewsu Aug 1, 2016

Contributor

IMO, the stronger argument (which I have also made) is that CC-BY as a legal instrument is the wrong tool to bring to bear for declaring an attribution requirement and demanding compliance with it. Attribution can be given in many different ways that all satisfy the legal requirement of a CC-BY license, but very few of which will satisfy the mechanism of attribution we as scientists really want. So it's really a social norm that we request compliance with, not a legal one, and so a CC-BY license, by itself, adds very little if anything to stating what we expect in return for reuse.

Eloquently put, Hilmar. I could not agree more...

Contributor

andrewsu commented Aug 1, 2016

IMO, the stronger argument (which I have also made) is that CC-BY as a legal instrument is the wrong tool to bring to bear for declaring an attribution requirement and demanding compliance with it. Attribution can be given in many different ways that all satisfy the legal requirement of a CC-BY license, but very few of which will satisfy the mechanism of attribution we as scientists really want. So it's really a social norm that we request compliance with, not a legal one, and so a CC-BY license, by itself, adds very little if anything to stating what we expect in return for reuse.

Eloquently put, Hilmar. I could not agree more...

@dhimmel

This comment has been minimized.

Show comment
Hide comment
@dhimmel

dhimmel Aug 2, 2016

👍 for CC0, 👎 for CC BY

I think the OBO Foundry should strongly recommend CC0 and nudge ontologies to switch from CC BY to CC0 when possible. I'll start with the legalese that reusers are subject to under a CC BY 4.0 license:

Section 3 – License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the following conditions.

a. _Attribution._

  1. If You Share the Licensed Material (including in modified form), You must:

    A. retain the following if it is supplied by the Licensor with the Licensed Material:

    1. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);
    2. a copyright notice;
    3. a notice that refers to this Public License;
    4. a notice that refers to the disclaimer of warranties;
    5. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;

    B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and

    C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.

  2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.

  3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable.

  4. If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License.

The effect of Section 3 is uncertainty and drudgery. Did the the Licensor identify the creators? If so, you must retain it. Did the the Licensor supply a copyright notice? If so, you must retain it. Don't fail to mention if you modified the resource. Even if you license your derivative work under a compatible license such as CC BY-NC, you must still mention the original license. After reading these conditions, I think it's likely that my use of CC BY ontologies in Hetionet — an integrative network of biology — may not comply with the entirety of these CC BY conditions, even though I went to great pains trying to comply with the incredible burden laws and licenses place on publicly-funded data.

The best applications of knowledge will be integrative. Integrating CC BY content can be tricky because you must deal with multiple potentially-contradictory license conditions as well as attribution stacking. The amount of weird tricky situations that arise when you do even a little integration is astounding. Some CC BY resources will have Sui Generis database rights. Others will not. Most lawyers don't have the expertise to provide guidance on these issues and lawyers generally avoid giving advice unless contracted to do so. Academics and others who just want to do science don't have sufficient access to legal experts. Even when you have access to a laywer, the process injects a long delay, at great expense to whoever is paying the tab. The overall effect is that whenever there are legally ambiguous situations, you waste users' time and dissuade reuse. CC0 was designed to avoid uncertainty. The license is lengthly, but since the whole point is to make the content in the public domain, you don't have to worry about any conditions of reuse.

Legally-enforced attribution is overrated. Best practice requires establishing data provenance. Any high quality resource will attribute when that attribution is productive. Sometimes it's not productive to attribute. Sometimes it's destructive. For example, I created PharmacotherapyDB — a CC0 catalog of drug–disease treatments. The drugs are coded using DrugBank and the diseases are coded using the Disease Ontology. I don't want my users to be burdened by licensing and I want my data to be maximally reused, so I used CC0. But am I violating the Disease Ontology's CC BY License? I've created a derivate work that includes 97 DO terms, and these terms potentially represent an original work of authorship. Answering this question requires wading through legal precedent, which is an extreme burden. Much of this precedent is yet to exist: the space is filled with open questions. Sometimes it's nice to just use an identifier and not have to attribute anything. Identifiers usually have their provenance embedded anyways. Based on these considerations, DrugBank — a dually licensed (aka commercial) resource — released the core of their resource as CC0.

The aforementioned practice of granting WikiData permission to release data under CC0 but then officially releasing the same data under CC BY is not ideal. This will create confusion as it's unclear whether WikiData actually had sufficient permission to apply CC0. Users of WikiData content could be liable for violating upstream data licensing and many users won't want to take that risk. The authoritative source of the data should apply the most permissive license that the data is released under anywhere to avoid these situations. You also don't want two classes of users: those who access from the authoritative site and get the restrictive license and those who use WikiData. Finally, there's the possibility of a resource diverging, similar to the recent Ethereum hard fork. This could happen if WikiData is granted permission to reproduce an ontology at one point, but subsequent contributions are made under the CC BY license.

Finally, licenses and laws change over time. Currently, CC BY 4.0 is compatible with a broad range of licenses. However, incompatibilities may arise in the future. Let's create knowledge and content that withstands the test of time. From the perspective of a creator, I want to maximize the reuse of my creations. Most of us are in the incredibly lucky position that the public funds us to create knowledge. Don't waste the opportunity to do something revolutionary over petty attribution concerns. Don't rely on the threat of suing your greatest advocates (those who use your data) for recognition.

dhimmel commented Aug 2, 2016

👍 for CC0, 👎 for CC BY

I think the OBO Foundry should strongly recommend CC0 and nudge ontologies to switch from CC BY to CC0 when possible. I'll start with the legalese that reusers are subject to under a CC BY 4.0 license:

Section 3 – License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the following conditions.

a. _Attribution._

  1. If You Share the Licensed Material (including in modified form), You must:

    A. retain the following if it is supplied by the Licensor with the Licensed Material:

    1. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);
    2. a copyright notice;
    3. a notice that refers to this Public License;
    4. a notice that refers to the disclaimer of warranties;
    5. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;

    B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and

    C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.

  2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.

  3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable.

  4. If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License.

The effect of Section 3 is uncertainty and drudgery. Did the the Licensor identify the creators? If so, you must retain it. Did the the Licensor supply a copyright notice? If so, you must retain it. Don't fail to mention if you modified the resource. Even if you license your derivative work under a compatible license such as CC BY-NC, you must still mention the original license. After reading these conditions, I think it's likely that my use of CC BY ontologies in Hetionet — an integrative network of biology — may not comply with the entirety of these CC BY conditions, even though I went to great pains trying to comply with the incredible burden laws and licenses place on publicly-funded data.

The best applications of knowledge will be integrative. Integrating CC BY content can be tricky because you must deal with multiple potentially-contradictory license conditions as well as attribution stacking. The amount of weird tricky situations that arise when you do even a little integration is astounding. Some CC BY resources will have Sui Generis database rights. Others will not. Most lawyers don't have the expertise to provide guidance on these issues and lawyers generally avoid giving advice unless contracted to do so. Academics and others who just want to do science don't have sufficient access to legal experts. Even when you have access to a laywer, the process injects a long delay, at great expense to whoever is paying the tab. The overall effect is that whenever there are legally ambiguous situations, you waste users' time and dissuade reuse. CC0 was designed to avoid uncertainty. The license is lengthly, but since the whole point is to make the content in the public domain, you don't have to worry about any conditions of reuse.

Legally-enforced attribution is overrated. Best practice requires establishing data provenance. Any high quality resource will attribute when that attribution is productive. Sometimes it's not productive to attribute. Sometimes it's destructive. For example, I created PharmacotherapyDB — a CC0 catalog of drug–disease treatments. The drugs are coded using DrugBank and the diseases are coded using the Disease Ontology. I don't want my users to be burdened by licensing and I want my data to be maximally reused, so I used CC0. But am I violating the Disease Ontology's CC BY License? I've created a derivate work that includes 97 DO terms, and these terms potentially represent an original work of authorship. Answering this question requires wading through legal precedent, which is an extreme burden. Much of this precedent is yet to exist: the space is filled with open questions. Sometimes it's nice to just use an identifier and not have to attribute anything. Identifiers usually have their provenance embedded anyways. Based on these considerations, DrugBank — a dually licensed (aka commercial) resource — released the core of their resource as CC0.

The aforementioned practice of granting WikiData permission to release data under CC0 but then officially releasing the same data under CC BY is not ideal. This will create confusion as it's unclear whether WikiData actually had sufficient permission to apply CC0. Users of WikiData content could be liable for violating upstream data licensing and many users won't want to take that risk. The authoritative source of the data should apply the most permissive license that the data is released under anywhere to avoid these situations. You also don't want two classes of users: those who access from the authoritative site and get the restrictive license and those who use WikiData. Finally, there's the possibility of a resource diverging, similar to the recent Ethereum hard fork. This could happen if WikiData is granted permission to reproduce an ontology at one point, but subsequent contributions are made under the CC BY license.

Finally, licenses and laws change over time. Currently, CC BY 4.0 is compatible with a broad range of licenses. However, incompatibilities may arise in the future. Let's create knowledge and content that withstands the test of time. From the perspective of a creator, I want to maximize the reuse of my creations. Most of us are in the incredibly lucky position that the public funds us to create knowledge. Don't waste the opportunity to do something revolutionary over petty attribution concerns. Don't rely on the threat of suing your greatest advocates (those who use your data) for recognition.

@goodb

This comment has been minimized.

Show comment
Hide comment
@goodb

goodb Aug 2, 2016

here here!

On Mon, Aug 1, 2016 at 8:38 PM, Daniel Himmelstein <notifications@github.com

wrote:

👍 for CC0, 👎 for CC BY

I think the OBOFoundry should strongly recommend CC0 and nudge ontologies
to switch from CC BY to CC0 when possible. I'll start with the legalese
that reusers are subject to under a CC BY 4.0
https://creativecommons.org/licenses/by/4.0/legalcode license:

Section 3 – License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the
following conditions.

a. Attribution.

If You Share the Licensed Material (including in modified form), You
must:

A. retain the following if it is supplied by the Licensor with the
Licensed Material:

  1. identification of the creator(s) of the Licensed Material and any
    others designated to receive attribution, in any reasonable manner
    requested by the Licensor (including by pseudonym if designated);

    1. a copyright notice;
    2. a notice that refers to this Public License;
    3. a notice that refers to the disclaimer of warranties;
    4. a URI or hyperlink to the Licensed Material to the extent
      reasonably practicable;

    B. indicate if You modified the Licensed Material and retain an
    indication of any previous modifications; and

    C. indicate the Licensed Material is licensed under this Public
    License, and include the text of, or the URI or hyperlink to, this Public
    License.
    2.

    You may satisfy the conditions in Section 3(a)(1) in any reasonable
    manner based on the medium, means, and context in which You Share the
    Licensed Material. For example, it may be reasonable to satisfy the
    conditions by providing a URI or hyperlink to a resource that includes the
    required information.
    3.

    If requested by the Licensor, You must remove any of the information
    required by Section 3(a)(1)(A) to the extent reasonably practicable.
    4.

    If You Share Adapted Material You produce, the Adapter's License You
    apply must not prevent recipients of the Adapted Material from complying
    with this Public License.

The effect of Section 3 is uncertainty and drudgery. Did the the Licensor
identify the creators? If so, you must retain it. Did the the Licensor
supply a copyright notice? If so, you must retain it. Don't fail to mention
if you modified the resource. Even if you license your derivative work
under a compatible license such as CC BY-NC, you must still mention the
original license. After reading these conditions, I think it's likely that
my use of CC BY ontologies in Hetionet https://neo4j.het.io — an
integrative network of biology — may not comply with the entirety of these
CC BY conditions, even though I went to great pains
https://doi.org/10.15363/thinklab.d107 trying to comply with the
incredible burden laws and licenses place on publicly-funded data.

The best applications of knowledge will be integrative. Integrating CC BY
content can be tricky because you must deal with multiple
potentially-contradictory license conditions as well as attribution
stacking. The amount of weird tricky situations that arise when you do even
a little integration is astounding. Some CC BY resources will have Sui
Generis
database rights. Others will not. Most lawyers don't have the
expertise to provide guidance on these issues and lawyers generally avoid
giving advice unless contracted to do so. Academics and others who just
want to do science don't have sufficient access to legal experts. Even when
you have access to a laywer, the process injects a long delay, at great
expense to whoever is paying the tab. The overall effect is that whenever
there are legally ambiguous situations, you waste users' time and dissuade
reuse. CC0 was designed to avoid uncertainty. The license is lengthly, but
since the whole point is to make the content in the public domain, you
don't have to worry about any conditions of reuse.

Legally-enforced attribution is overrated. Best practice requires
establishing data provenance. Any high quality resource will attribute when
that attribution is productive. Sometimes it's not productive to attribute.
Sometimes it's destructive. For example, I created PharmacotherapyDB
https://doi.org/10.6084/m9.figshare.3103054 — a CC0 catalog of
drug–disease treatments. The drugs are coded using DrugBank and the
diseases are coded using the Disease Ontology. I don't want my users to be
burdened by licensing and I want my data to be maximally reused, so I used
CC0. But am I violating the Disease Ontology's CC BY License? I've created
a derivate work that includes 97 DO terms, and these terms potentially
represent an original work of authorship. Answering this question requires
wading through legal precedent, which is an extreme burden. Much of this
precedent is yet to exist: the space is filled with open questions.
Sometimes it's nice to just use an identifier and not have to attribute
anything. Identifiers usually have their provenance embedded anyways. Based
on these considerations, DrugBank — a dually licensed (aka commercial)
resource — released
https://thinklab.com/discussion/sounding-the-alarm-on-drugbanks-new-license-and-terms-of-use/213#10
the core of their resource as CC0.

The aforementioned practice <#m_6883376251829826149_issue-165202910> of
granting WikiData permission to release data under CC0 but then officially
releasing the same data under CC BY is not ideal. This will create
confusion as it's unclear whether WikiData actually had sufficient
permission to apply CC0. Users of WikiData content could be liable for
violating upstream data licensing and many users won't want to take that
risk. The authoritative source of the data should apply the most permissive
license that the data is released under anywhere to avoid these situations.
You also don't want two classes of users: those who access from the
authoritative site and get the restrictive license and those who use
WikiData. Finally, there's the possibility of a resource diverging, similar
to the recent Ethereum hard fork. This could happen if WikiData is granted
permission to reproduce an ontology at one point, but subsequent
contributions are made under the CC BY license.

Finally, licenses and laws change over time. Currently, CC BY 4.0 is
compatible with a broad range of licenses. However, incompatibilities may
arise in the future. Let's create knowledge and content that withstands the
test of time. From the perspective of a creator, I want to maximize the
reuse of my creations. Most of us are in the incredibly lucky position that
the public funds us to create knowledge. Don't waste the opportunity to do
something revolutionary over petty attribution concerns. Don't rely on the
threat of suing your greatest advocates (those who use your data) for
recognition.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#285 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB_U6jCt5e1B80xUN3PE_l_7wvyVebfXks5qbru0gaJpZM4JK62o
.

goodb commented Aug 2, 2016

here here!

On Mon, Aug 1, 2016 at 8:38 PM, Daniel Himmelstein <notifications@github.com

wrote:

👍 for CC0, 👎 for CC BY

I think the OBOFoundry should strongly recommend CC0 and nudge ontologies
to switch from CC BY to CC0 when possible. I'll start with the legalese
that reusers are subject to under a CC BY 4.0
https://creativecommons.org/licenses/by/4.0/legalcode license:

Section 3 – License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the
following conditions.

a. Attribution.

If You Share the Licensed Material (including in modified form), You
must:

A. retain the following if it is supplied by the Licensor with the
Licensed Material:

  1. identification of the creator(s) of the Licensed Material and any
    others designated to receive attribution, in any reasonable manner
    requested by the Licensor (including by pseudonym if designated);

    1. a copyright notice;
    2. a notice that refers to this Public License;
    3. a notice that refers to the disclaimer of warranties;
    4. a URI or hyperlink to the Licensed Material to the extent
      reasonably practicable;

    B. indicate if You modified the Licensed Material and retain an
    indication of any previous modifications; and

    C. indicate the Licensed Material is licensed under this Public
    License, and include the text of, or the URI or hyperlink to, this Public
    License.
    2.

    You may satisfy the conditions in Section 3(a)(1) in any reasonable
    manner based on the medium, means, and context in which You Share the
    Licensed Material. For example, it may be reasonable to satisfy the
    conditions by providing a URI or hyperlink to a resource that includes the
    required information.
    3.

    If requested by the Licensor, You must remove any of the information
    required by Section 3(a)(1)(A) to the extent reasonably practicable.
    4.

    If You Share Adapted Material You produce, the Adapter's License You
    apply must not prevent recipients of the Adapted Material from complying
    with this Public License.

The effect of Section 3 is uncertainty and drudgery. Did the the Licensor
identify the creators? If so, you must retain it. Did the the Licensor
supply a copyright notice? If so, you must retain it. Don't fail to mention
if you modified the resource. Even if you license your derivative work
under a compatible license such as CC BY-NC, you must still mention the
original license. After reading these conditions, I think it's likely that
my use of CC BY ontologies in Hetionet https://neo4j.het.io — an
integrative network of biology — may not comply with the entirety of these
CC BY conditions, even though I went to great pains
https://doi.org/10.15363/thinklab.d107 trying to comply with the
incredible burden laws and licenses place on publicly-funded data.

The best applications of knowledge will be integrative. Integrating CC BY
content can be tricky because you must deal with multiple
potentially-contradictory license conditions as well as attribution
stacking. The amount of weird tricky situations that arise when you do even
a little integration is astounding. Some CC BY resources will have Sui
Generis
database rights. Others will not. Most lawyers don't have the
expertise to provide guidance on these issues and lawyers generally avoid
giving advice unless contracted to do so. Academics and others who just
want to do science don't have sufficient access to legal experts. Even when
you have access to a laywer, the process injects a long delay, at great
expense to whoever is paying the tab. The overall effect is that whenever
there are legally ambiguous situations, you waste users' time and dissuade
reuse. CC0 was designed to avoid uncertainty. The license is lengthly, but
since the whole point is to make the content in the public domain, you
don't have to worry about any conditions of reuse.

Legally-enforced attribution is overrated. Best practice requires
establishing data provenance. Any high quality resource will attribute when
that attribution is productive. Sometimes it's not productive to attribute.
Sometimes it's destructive. For example, I created PharmacotherapyDB
https://doi.org/10.6084/m9.figshare.3103054 — a CC0 catalog of
drug–disease treatments. The drugs are coded using DrugBank and the
diseases are coded using the Disease Ontology. I don't want my users to be
burdened by licensing and I want my data to be maximally reused, so I used
CC0. But am I violating the Disease Ontology's CC BY License? I've created
a derivate work that includes 97 DO terms, and these terms potentially
represent an original work of authorship. Answering this question requires
wading through legal precedent, which is an extreme burden. Much of this
precedent is yet to exist: the space is filled with open questions.
Sometimes it's nice to just use an identifier and not have to attribute
anything. Identifiers usually have their provenance embedded anyways. Based
on these considerations, DrugBank — a dually licensed (aka commercial)
resource — released
https://thinklab.com/discussion/sounding-the-alarm-on-drugbanks-new-license-and-terms-of-use/213#10
the core of their resource as CC0.

The aforementioned practice <#m_6883376251829826149_issue-165202910> of
granting WikiData permission to release data under CC0 but then officially
releasing the same data under CC BY is not ideal. This will create
confusion as it's unclear whether WikiData actually had sufficient
permission to apply CC0. Users of WikiData content could be liable for
violating upstream data licensing and many users won't want to take that
risk. The authoritative source of the data should apply the most permissive
license that the data is released under anywhere to avoid these situations.
You also don't want two classes of users: those who access from the
authoritative site and get the restrictive license and those who use
WikiData. Finally, there's the possibility of a resource diverging, similar
to the recent Ethereum hard fork. This could happen if WikiData is granted
permission to reproduce an ontology at one point, but subsequent
contributions are made under the CC BY license.

Finally, licenses and laws change over time. Currently, CC BY 4.0 is
compatible with a broad range of licenses. However, incompatibilities may
arise in the future. Let's create knowledge and content that withstands the
test of time. From the perspective of a creator, I want to maximize the
reuse of my creations. Most of us are in the incredibly lucky position that
the public funds us to create knowledge. Don't waste the opportunity to do
something revolutionary over petty attribution concerns. Don't rely on the
threat of suing your greatest advocates (those who use your data) for
recognition.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#285 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB_U6jCt5e1B80xUN3PE_l_7wvyVebfXks5qbru0gaJpZM4JK62o
.

@cgreene

This comment has been minimized.

Show comment
Hide comment
@cgreene

cgreene Aug 2, 2016

👍 to @dhimmel - especially "Academics and others who just want to do science don't have sufficient access to legal experts. Even when you have access to a laywer, the process injects a long delay, at great expense to whoever is paying the tab. The overall effect is that whenever there are legally ambiguous situations, you waste users' time and dissuade reuse."

cgreene commented Aug 2, 2016

👍 to @dhimmel - especially "Academics and others who just want to do science don't have sufficient access to legal experts. Even when you have access to a laywer, the process injects a long delay, at great expense to whoever is paying the tab. The overall effect is that whenever there are legally ambiguous situations, you waste users' time and dissuade reuse."

@obigriffith obigriffith referenced this issue in griffithlab/civic-client Aug 2, 2016

Closed

change content license to CC0 #493

@goodb

This comment has been minimized.

Show comment
Hide comment
@goodb

goodb Aug 2, 2016

In case it or the references are useful, here is an open letter to NIGMS in support of broad adoption of CC0.

goodb commented Aug 2, 2016

In case it or the references are useful, here is an open letter to NIGMS in support of broad adoption of CC0.

@malachig

This comment has been minimized.

Show comment
Hide comment
@malachig

malachig Aug 3, 2016

Thanks @goodb for making that publicly available and linking it here. Very helpful to be able to refer to discussions like this thread, that letter, and this OpenData StackExchange thread. Collectively this has convinced us to switch to CC0 for the www.civicdb.org project.

malachig commented Aug 3, 2016

Thanks @goodb for making that publicly available and linking it here. Very helpful to be able to refer to discussions like this thread, that letter, and this OpenData StackExchange thread. Collectively this has convinced us to switch to CC0 for the www.civicdb.org project.

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Aug 8, 2016

Contributor

So what is the proposed solution to the attribution issue?

@andrewsu says "Most of us are in the incredibly lucky position that the public funds us to create knowledge". But the reality is a lot of the content in the OBO Library is not funded, and that which is funded is does not have secure funding. Future funding relies on the content creators justifying to funders that their ontology is widely adopted in different databases and platforms (commercial and academic). Is CC-BY a perfect tool for ensuring that companies don't take an ontology, sell it as part of their product suite and provide it to their customers with no attribution? Far from it. But many perceive this as the only tool they have. In fact the inclination is usually to go for a more restrictive license - look at the databases these ontologies are used with for examples, typically discriminatory restrictive licenses. Not everyone uses the same function to evaluate the tradeoff between perceived control and obstructive reuse. Some may prefer a sliver of protection at the cost of some obstruction to integration in some data warehouses.

How do we move forward?

  1. CC0 advocates need to provide more persuasive arguments than "you're making it harder for me".
  2. CC-BY advocates need to provide clearer arguments for why the license should not and does not restrict good actors. The OBO documentation on how OBO prevents attribution stacking is a good start but it's not clear how that works
  3. What are some short or long term compromises? For example, is there a template for providing a CC-0 axiom-subset of a CC-BY ontology?
  4. Arguments in either direction need to be based in the reality of the current funding situation. Many of us would love to put things in the public domain for the common good, but we need a concrete plan to ensure funding in the face of corporate products taking content and using it without attribution.
Contributor

cmungall commented Aug 8, 2016

So what is the proposed solution to the attribution issue?

@andrewsu says "Most of us are in the incredibly lucky position that the public funds us to create knowledge". But the reality is a lot of the content in the OBO Library is not funded, and that which is funded is does not have secure funding. Future funding relies on the content creators justifying to funders that their ontology is widely adopted in different databases and platforms (commercial and academic). Is CC-BY a perfect tool for ensuring that companies don't take an ontology, sell it as part of their product suite and provide it to their customers with no attribution? Far from it. But many perceive this as the only tool they have. In fact the inclination is usually to go for a more restrictive license - look at the databases these ontologies are used with for examples, typically discriminatory restrictive licenses. Not everyone uses the same function to evaluate the tradeoff between perceived control and obstructive reuse. Some may prefer a sliver of protection at the cost of some obstruction to integration in some data warehouses.

How do we move forward?

  1. CC0 advocates need to provide more persuasive arguments than "you're making it harder for me".
  2. CC-BY advocates need to provide clearer arguments for why the license should not and does not restrict good actors. The OBO documentation on how OBO prevents attribution stacking is a good start but it's not clear how that works
  3. What are some short or long term compromises? For example, is there a template for providing a CC-0 axiom-subset of a CC-BY ontology?
  4. Arguments in either direction need to be based in the reality of the current funding situation. Many of us would love to put things in the public domain for the common good, but we need a concrete plan to ensure funding in the face of corporate products taking content and using it without attribution.
@cgreene

This comment has been minimized.

Show comment
Hide comment
@cgreene

cgreene Aug 9, 2016

@cmungall : I don't really see how CC-BY helps one justify that the ontology is widely adopted. In practice, I expect that scientists who want to disseminate their research are going to cite the ontology regardless of its CC0/CC-BY status.

CC-BY is essentially using the threat of the legal system (which, let's be honest, is very unlikely to be enforced) to require this in some manner. Hypothetically if some commercial entity took a CC-BY resource and attempted to sell it as their own, would one imagine a university or individual using the legal system to require them to acknowledge the source? That seems like a lot of cost with relatively low reward.

I wonder if the best way to make a strong case for funding is to emphasize the impact that a resource has had. If CC-BY provides a sliver of protection but increases barriers to use in some contexts, then it may hurt ones' ability to fund a resource because the overall impact of the resource may be diminished.

cgreene commented Aug 9, 2016

@cmungall : I don't really see how CC-BY helps one justify that the ontology is widely adopted. In practice, I expect that scientists who want to disseminate their research are going to cite the ontology regardless of its CC0/CC-BY status.

CC-BY is essentially using the threat of the legal system (which, let's be honest, is very unlikely to be enforced) to require this in some manner. Hypothetically if some commercial entity took a CC-BY resource and attempted to sell it as their own, would one imagine a university or individual using the legal system to require them to acknowledge the source? That seems like a lot of cost with relatively low reward.

I wonder if the best way to make a strong case for funding is to emphasize the impact that a resource has had. If CC-BY provides a sliver of protection but increases barriers to use in some contexts, then it may hurt ones' ability to fund a resource because the overall impact of the resource may be diminished.

@dhimmel

This comment has been minimized.

Show comment
Hide comment
@dhimmel

dhimmel Aug 9, 2016

CC0 advocates need to provide more persuasive arguments than "you're making it harder for me".

@cmungall, this issue illustrates the argument for CC0 — if an ontology wants to be part of projects like WikiData, it needs to be CC0 compatible.

For example, is there a template for providing a CC-0 axiom-subset of a CC-BY ontology?

I'm having trouble understanding what "axiom" means. But I think at a minimum, nodes (terms) should be released as CC0. This would include term identifiers, names, synonyms, and descriptions. This would remove any barriers to creating public domain relationships that use OBO Foundry nodes as endpoints.

Arguments in either direction need to be based in the reality of the current funding situation. Many of us would love to put things in the public domain for the common good, but we need a concrete plan to ensure funding in the face of corporate products taking content and using it without attribution.

CC0 will bestow a competitive advantage with respect to funding. Funders want to see their commissioned research making the greatest contribution. If given a choice between funding a CC0 and CC BY resource, I expect the funders would prefer CC0 because of the greater reuse potential. CC BY also creates the potential that the work must be repreated (say for inclusion in WikiData), which is a horrific concept to a funder.

Maximizing reuse will create the strongest argument for continued funding. Say a company does use an ontology without attribution. Grant proposals can still mention this reuse and that the ontology is creating value in industry, which will demonstrate the broad relevance and user base for the resource. At a time when the science community is beginning appreciate the importance of open data, OBO Foundry ontologies can bolster their appeal to funders by leading the way.

dhimmel commented Aug 9, 2016

CC0 advocates need to provide more persuasive arguments than "you're making it harder for me".

@cmungall, this issue illustrates the argument for CC0 — if an ontology wants to be part of projects like WikiData, it needs to be CC0 compatible.

For example, is there a template for providing a CC-0 axiom-subset of a CC-BY ontology?

I'm having trouble understanding what "axiom" means. But I think at a minimum, nodes (terms) should be released as CC0. This would include term identifiers, names, synonyms, and descriptions. This would remove any barriers to creating public domain relationships that use OBO Foundry nodes as endpoints.

Arguments in either direction need to be based in the reality of the current funding situation. Many of us would love to put things in the public domain for the common good, but we need a concrete plan to ensure funding in the face of corporate products taking content and using it without attribution.

CC0 will bestow a competitive advantage with respect to funding. Funders want to see their commissioned research making the greatest contribution. If given a choice between funding a CC0 and CC BY resource, I expect the funders would prefer CC0 because of the greater reuse potential. CC BY also creates the potential that the work must be repreated (say for inclusion in WikiData), which is a horrific concept to a funder.

Maximizing reuse will create the strongest argument for continued funding. Say a company does use an ontology without attribution. Grant proposals can still mention this reuse and that the ontology is creating value in industry, which will demonstrate the broad relevance and user base for the resource. At a time when the science community is beginning appreciate the importance of open data, OBO Foundry ontologies can bolster their appeal to funders by leading the way.

@goodb

This comment has been minimized.

Show comment
Hide comment
@goodb

goodb Aug 12, 2016

"is there a template for providing a CC-0 axiom-subset of a CC-BY ontology". To clarify this. Many OBO ontologies now make extensive use of OWL description logic to build computable definitions of their classes. This makes it possible to, for example, infer a subclassOf or instanceOf relationship automatically based on the properties of the entity or class in question. When using terms from an ontology in many applications (any that do not use OWL) these class membership axioms may not be integrated. Hence, we can imagine that a subset of the ontology minus these more sophisticated logical constructs might be shared differently than the entire thing. Since these logical definitions contain a significant fraction of the intellectual property of the ontologies that use them, perhaps it would be more satisfactory to their authors to share the other portions of the ontologies (term names, identifiers, basic concept graphs) more completely openly. This seems to be what @dhimmel is suggesting as in fact what we have already started to do with the Gene Ontology import into wikidata..

goodb commented Aug 12, 2016

"is there a template for providing a CC-0 axiom-subset of a CC-BY ontology". To clarify this. Many OBO ontologies now make extensive use of OWL description logic to build computable definitions of their classes. This makes it possible to, for example, infer a subclassOf or instanceOf relationship automatically based on the properties of the entity or class in question. When using terms from an ontology in many applications (any that do not use OWL) these class membership axioms may not be integrated. Hence, we can imagine that a subset of the ontology minus these more sophisticated logical constructs might be shared differently than the entire thing. Since these logical definitions contain a significant fraction of the intellectual property of the ontologies that use them, perhaps it would be more satisfactory to their authors to share the other portions of the ontologies (term names, identifiers, basic concept graphs) more completely openly. This seems to be what @dhimmel is suggesting as in fact what we have already started to do with the Gene Ontology import into wikidata..

@drseb

This comment has been minimized.

Show comment
Hide comment
@drseb

drseb Aug 12, 2016

Member

I would like to give some view from my personal site - as one of the developers of the Human Phenotype Ontology (HPO). (I do not speak for all HPO developers). Also, I am no expert on licenses. I just came across this thread upon a discussion about derivatives of HPO.

HPO's intention is to be a tool for the community and a tool created by the community. We try to keep the quality of HPO high by accepting change-requests, but still letting an HPO developer decide if this is valid request and eventually implement those changes.

HPO is now used in several contexts, in research, but also by several genetic diagnostics companies around the world that provide phenotype-driven diagnostics. For a given set of symptoms of a patient, HPO is also used to find similar patients or physicians that might be the best experts.

I vote for a more restrictive license for HPO:

  • that ensures acknowledgement, such that we know who uses HPO (funding et al.), but also that the user of HPO based tools sees that HPO used (and which version of HPO)
  • that no changes are made to HPO that are not checked by experts
  • that no derivatives are published, but if derivatives are created that those are clearly marked as not being the original HPO (I fear legal consequences)

To motivate this I want to give an excerpt of examples, that I encountered during the last years:

  • Person A of company X: started to add new ontology classes in his version of HPO using his own ID-space. We do not know which super-classes were defined for these classes.
  • Person B: in a databases of that person, the ontology was transformed from a DAG into a tree (probably for simplicity).
  • Companies XYX: do only show that HPO is used in some sub-site (in a tiny footnote).

These changes (A+B) are IMHO pretty strong, as it possibly affects the result of (semantic) similarity calculation performed over HPO.
My fear is that such changes might lead to missed results or even to a slow-down in the diagnostic process. In the worst-case scenario patients have to wait longer for their diagnosis and (sorry for my pessimism) patients might die during that time.

I fear that this might fall back on HPO in terms of public opinion on the quality of HPO or even in terms of being sued and having to prove that it was the companies fault and not HPO's.

I have no idea which ready-made license is most appropriate for this, I just wanted to give a little insight on my thoughts/background.

cc @pnrobinson @mellybelly

Member

drseb commented Aug 12, 2016

I would like to give some view from my personal site - as one of the developers of the Human Phenotype Ontology (HPO). (I do not speak for all HPO developers). Also, I am no expert on licenses. I just came across this thread upon a discussion about derivatives of HPO.

HPO's intention is to be a tool for the community and a tool created by the community. We try to keep the quality of HPO high by accepting change-requests, but still letting an HPO developer decide if this is valid request and eventually implement those changes.

HPO is now used in several contexts, in research, but also by several genetic diagnostics companies around the world that provide phenotype-driven diagnostics. For a given set of symptoms of a patient, HPO is also used to find similar patients or physicians that might be the best experts.

I vote for a more restrictive license for HPO:

  • that ensures acknowledgement, such that we know who uses HPO (funding et al.), but also that the user of HPO based tools sees that HPO used (and which version of HPO)
  • that no changes are made to HPO that are not checked by experts
  • that no derivatives are published, but if derivatives are created that those are clearly marked as not being the original HPO (I fear legal consequences)

To motivate this I want to give an excerpt of examples, that I encountered during the last years:

  • Person A of company X: started to add new ontology classes in his version of HPO using his own ID-space. We do not know which super-classes were defined for these classes.
  • Person B: in a databases of that person, the ontology was transformed from a DAG into a tree (probably for simplicity).
  • Companies XYX: do only show that HPO is used in some sub-site (in a tiny footnote).

These changes (A+B) are IMHO pretty strong, as it possibly affects the result of (semantic) similarity calculation performed over HPO.
My fear is that such changes might lead to missed results or even to a slow-down in the diagnostic process. In the worst-case scenario patients have to wait longer for their diagnosis and (sorry for my pessimism) patients might die during that time.

I fear that this might fall back on HPO in terms of public opinion on the quality of HPO or even in terms of being sued and having to prove that it was the companies fault and not HPO's.

I have no idea which ready-made license is most appropriate for this, I just wanted to give a little insight on my thoughts/background.

cc @pnrobinson @mellybelly

@pnrobinson

This comment has been minimized.

Show comment
Hide comment
@pnrobinson

pnrobinson Aug 12, 2016

Hi everybody. I agree with Sebastian that because the HPO is being used in an ever broader range of medical contexts, extra care and responsibility is needed on our part. I think that we should basically discourage others from changing the HPO for their own needs because (i) if the change is good, we want all potential patients to benefit from it; and (ii) if the change is bad, we do not want the patients who are being served by the company in question to suffer negative consequences and we also do not want to be held legally responsible for a mistake that somebody else has made.

How does the rest of the OBO community feel about this? Is any kind of ND license acceptable in this forum owing to the status of the HPO as a resource that is being used directly in clinical care?

-peter

Peter Robinson

Professor of Computational Biology

The Jackson Laboratory for Genomic Medicine

10 Discovery Drive

Farmington, CT 06032

860.837.2095 t | 860.990.3130 m

peter.robinson@jax.orgmailto:peter.robinson@jax.org

www.jax.org

The Jackson Laboratory: Leading the search for tomorrow's cures


From: Sebastian Köhler notifications@github.com
Sent: Friday, August 12, 2016 8:09 AM
To: OBOFoundry/OBOFoundry.github.io
Cc: Peter Robinson; Mention
Subject: Re: [OBOFoundry/OBOFoundry.github.io] Create guidelines for OBO maintainers who want to be included in Wikidata (#285)

I would like to give some view from my personal site - as one of the developers of the Human Phenotype Ontology (HPO). (I do not speak for all HPO developers). Also, I am no expert on licenses. I just came across this thread upon a discussion about derivatives of HPO.

HPO's intention is to be a tool for the community and a tool created by the community. We try to keep the quality of HPO high by accepting change-requests, but still letting an HPO developer decide if this is valid request and eventually implement those changes.

HPO is now used in several contexts, in research, but also by several genetic diagnostics companies around the world that provide phenotype-driven diagnostics. For a given set of symptoms of a patient, HPO is also used to find similar patients or physicians that might be the best experts.

I vote for a more restrictive license for HPO:

  • that ensures acknowledgement, such that we know who uses HPO (funding et al.), but also that the user of HPO based tools sees that HPO used (and which version of HPO)
  • that no changes are made to HPO that are not checked by experts
  • that no derivatives are published, but if derivatives are created that those are clearly marked as not being the original HPO (I fear legal consequences)

To motivate this I want to give an excerpt of examples, that I encountered during the last years:

  • Person A of company X: started to add new ontology classes in his version of HPO using his own ID-space. We do not know which super-classes were defined for these classes.
  • Person B: in a databases of that person, the ontology was transformed from a DAG into a tree (probably for simplicity).
  • Companies XYX: do only show that HPO is used in some sub-site (in a tiny footnote).

These changes (A+B) are IMHO pretty strong, as it possibly affects the result of (semantic) similarity calculation performed over HPO.
My fear is that such changes might lead to missed results or even to a slow-down in the diagnostic process. In the worst-case scenario patients have to wait longer for their diagnosis and (sorry for my pessimism) patients might die during that time.

I fear that this might fall back on HPO in terms of public opinion on the quality of HPO or even in terms of being sued and having to prove that it was the companies fault and not HPO's.

I have no idea which ready-made license is most appropriate for this, I just wanted to give a little insight on my thoughts/background.

cc @pnrobinsonhttps://github.com/pnrobinson @mellybellyhttps://github.com/mellybelly


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHubhttps://github.com/OBOFoundry/OBOFoundry.github.io/issues/285#issuecomment-239430014, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEtuPPJOUS4YzummrDBouDY0r3rt2tgQks5qfGKKgaJpZM4JK62o.

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

Hi everybody. I agree with Sebastian that because the HPO is being used in an ever broader range of medical contexts, extra care and responsibility is needed on our part. I think that we should basically discourage others from changing the HPO for their own needs because (i) if the change is good, we want all potential patients to benefit from it; and (ii) if the change is bad, we do not want the patients who are being served by the company in question to suffer negative consequences and we also do not want to be held legally responsible for a mistake that somebody else has made.

How does the rest of the OBO community feel about this? Is any kind of ND license acceptable in this forum owing to the status of the HPO as a resource that is being used directly in clinical care?

-peter

Peter Robinson

Professor of Computational Biology

The Jackson Laboratory for Genomic Medicine

10 Discovery Drive

Farmington, CT 06032

860.837.2095 t | 860.990.3130 m

peter.robinson@jax.orgmailto:peter.robinson@jax.org

www.jax.org

The Jackson Laboratory: Leading the search for tomorrow's cures


From: Sebastian Köhler notifications@github.com
Sent: Friday, August 12, 2016 8:09 AM
To: OBOFoundry/OBOFoundry.github.io
Cc: Peter Robinson; Mention
Subject: Re: [OBOFoundry/OBOFoundry.github.io] Create guidelines for OBO maintainers who want to be included in Wikidata (#285)

I would like to give some view from my personal site - as one of the developers of the Human Phenotype Ontology (HPO). (I do not speak for all HPO developers). Also, I am no expert on licenses. I just came across this thread upon a discussion about derivatives of HPO.

HPO's intention is to be a tool for the community and a tool created by the community. We try to keep the quality of HPO high by accepting change-requests, but still letting an HPO developer decide if this is valid request and eventually implement those changes.

HPO is now used in several contexts, in research, but also by several genetic diagnostics companies around the world that provide phenotype-driven diagnostics. For a given set of symptoms of a patient, HPO is also used to find similar patients or physicians that might be the best experts.

I vote for a more restrictive license for HPO:

  • that ensures acknowledgement, such that we know who uses HPO (funding et al.), but also that the user of HPO based tools sees that HPO used (and which version of HPO)
  • that no changes are made to HPO that are not checked by experts
  • that no derivatives are published, but if derivatives are created that those are clearly marked as not being the original HPO (I fear legal consequences)

To motivate this I want to give an excerpt of examples, that I encountered during the last years:

  • Person A of company X: started to add new ontology classes in his version of HPO using his own ID-space. We do not know which super-classes were defined for these classes.
  • Person B: in a databases of that person, the ontology was transformed from a DAG into a tree (probably for simplicity).
  • Companies XYX: do only show that HPO is used in some sub-site (in a tiny footnote).

These changes (A+B) are IMHO pretty strong, as it possibly affects the result of (semantic) similarity calculation performed over HPO.
My fear is that such changes might lead to missed results or even to a slow-down in the diagnostic process. In the worst-case scenario patients have to wait longer for their diagnosis and (sorry for my pessimism) patients might die during that time.

I fear that this might fall back on HPO in terms of public opinion on the quality of HPO or even in terms of being sued and having to prove that it was the companies fault and not HPO's.

I have no idea which ready-made license is most appropriate for this, I just wanted to give a little insight on my thoughts/background.

cc @pnrobinsonhttps://github.com/pnrobinson @mellybellyhttps://github.com/mellybelly


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHubhttps://github.com/OBOFoundry/OBOFoundry.github.io/issues/285#issuecomment-239430014, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEtuPPJOUS4YzummrDBouDY0r3rt2tgQks5qfGKKgaJpZM4JK62o.

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

@mcourtot

This comment has been minimized.

Show comment
Hide comment
@mcourtot

mcourtot Aug 12, 2016

Contributor

I think that by arguing CC0 vs CC-by we are losing track of what we are trying to achieve. Here we have a set of resources with diverse licenses - CC0, CC-BY + few others - that would like to know how (if?) it is possible for their data to exist within Wikidata. Note that in addition to OBO resources there are many other (e.g. UniProt) which are not CC0, so I don't think this issue is isolated to the OBO community.

I like the solutions Chris offers:

  • Can Wikidata suggest a way to accommodate non CC0 resources?
  • Can OBO resources produce a CC0 subset?

Looking at the UniProt page at https://www.wikidata.org/wiki/Q905695, it states:
screen shot 2016-08-12 at 14 13 15

Could we have something similar for each OBO resource?

Once we have some sort of resolution for this, we can work on the others issues that need to be addressed for including in Wikidata:

  • proper attribution, #299
  • reuse of URIs, #298
Contributor

mcourtot commented Aug 12, 2016

I think that by arguing CC0 vs CC-by we are losing track of what we are trying to achieve. Here we have a set of resources with diverse licenses - CC0, CC-BY + few others - that would like to know how (if?) it is possible for their data to exist within Wikidata. Note that in addition to OBO resources there are many other (e.g. UniProt) which are not CC0, so I don't think this issue is isolated to the OBO community.

I like the solutions Chris offers:

  • Can Wikidata suggest a way to accommodate non CC0 resources?
  • Can OBO resources produce a CC0 subset?

Looking at the UniProt page at https://www.wikidata.org/wiki/Q905695, it states:
screen shot 2016-08-12 at 14 13 15

Could we have something similar for each OBO resource?

Once we have some sort of resolution for this, we can work on the others issues that need to be addressed for including in Wikidata:

  • proper attribution, #299
  • reuse of URIs, #298
@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Aug 12, 2016

Contributor

I agree the thread has diverged from the original one about how to get OBOs into WD. But this is an important development. @pnrobinson and @drseb make good arguments from a license that is more restrictive than the two recommended by OBO. With my OBO hat I want to see HP adopt BY but with my HPO hat I see the arguments.

What would the implications of HPO adopting ND? As it is generally not imported and used for axiomatization the effect on the rest of OBO might be relatively low (of course implications for WD and @dhimmel's graph store are another matter).

However, if an ontology that is used for axiomatization were to adopt ND that could have very bad implications: making an import module may be in breach of the ND clause.

From a practical POV, are we looking at a two level split within OBO: 'axiomatic' ontologies and 'application' ontologies, with weaker licensing imposed on the former?

Contributor

cmungall commented Aug 12, 2016

I agree the thread has diverged from the original one about how to get OBOs into WD. But this is an important development. @pnrobinson and @drseb make good arguments from a license that is more restrictive than the two recommended by OBO. With my OBO hat I want to see HP adopt BY but with my HPO hat I see the arguments.

What would the implications of HPO adopting ND? As it is generally not imported and used for axiomatization the effect on the rest of OBO might be relatively low (of course implications for WD and @dhimmel's graph store are another matter).

However, if an ontology that is used for axiomatization were to adopt ND that could have very bad implications: making an import module may be in breach of the ND clause.

From a practical POV, are we looking at a two level split within OBO: 'axiomatic' ontologies and 'application' ontologies, with weaker licensing imposed on the former?

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Aug 12, 2016

Contributor

@mcourtot - the problem with my solution is that the subset to go into WD may often constitute 99% of the content, it feels like a getout clause.

Not sure what you're suggesting re: UniProt. Are you saying that UniProt is in WD despite the more restrictive CC-ND license, hence can't we do the same thing for OBOs?

Contributor

cmungall commented Aug 12, 2016

@mcourtot - the problem with my solution is that the subset to go into WD may often constitute 99% of the content, it feels like a getout clause.

Not sure what you're suggesting re: UniProt. Are you saying that UniProt is in WD despite the more restrictive CC-ND license, hence can't we do the same thing for OBOs?

@Public-Health-Bioinformatics

This comment has been minimized.

Show comment
Hide comment
@Public-Health-Bioinformatics

Public-Health-Bioinformatics Aug 14, 2016

Contributor

I presume I'm not alone in lacking the weeks of time required to personally understand the legal ramifications of licensing. I must turn to others in the community (ideally lawyers making forays into what look to me like untested waters) for recommendations.

Two aspects of ontology re-use have been raised that people are turning to legalese to solve: quality control and marketing survival (funder stuff). I think they should be understood as separate challenges.

Marketing
Ontology uptake is a popularity contest that has real survival ramifications. Along the lines of cgreen's "emphasize the impact that a resource has had", I'm ok with the simplicity of having a license require that its ontology be listed sufficiently with a website or data repository that uses it. Would a more precise solution that focuses on standards for reporting summaries of term usage in a system be more attractive? Something like "if a curated ontology has defined an online usage reporting service according to the semantic web W3C standard XYZ, use (views/records?) in your application of that term's URI's should be counted and reported semi-annually to that service". This is like the royalty system for playing songs on the radio - just the stats part. That sync's with any software resource provider's desire to understand its content. 'Course it would take some time to develop such a repository but at least the onus is on the ontology provider(s) to do so. Carrot and stick: Would this really need to be enforced by legalese though, or attractive enough just by existing as a service?

Quality Control
Taking a historical step back, it is fascinating to compare "pre-ontology" dictionary projects like OED, in which curators avidly sought first-use instances of words and phrases in documents, but did not associate any kind of proprietary ownership beyond say acknowledging the case where a phrase was trademarked, with our digital age ontology quest, where we are witnessing the merging of term definition, formal logic, and distributed software reuse of entities. To "own" chunks of language at this level strikes me as hopefully a temporary historical concept, much like the patenting of DNA fragments. However, I get that communities want to control the definitions they curate, and thereby provide quality control, and the only way to do this is to ensure particular use of an entity can be uniquely traced back to its curating community and the conditions it legally provides use on. In that scenario - unfolding now - software may be composed of thousands of entities from hundreds of orthogonal ontologies. Will exclusive use of an CC0-licensed family of ontologies be the only way out of this complexity? Can't a simple license protect term curators from issues arising from the misuse or repurposing of the terms they curate?

P.s. I like a simple model where the use in a given database or ontology of an entity URI like http://purl.obolibrary.org/obo/GO_0097114 is sufficient for providing reference back to a term's primary ontology where any necessary further legal attribution of an ontology term and restrictions on potential re-use of its labels, definitions and other immediately associated axioms is stated.

I presume I'm not alone in lacking the weeks of time required to personally understand the legal ramifications of licensing. I must turn to others in the community (ideally lawyers making forays into what look to me like untested waters) for recommendations.

Two aspects of ontology re-use have been raised that people are turning to legalese to solve: quality control and marketing survival (funder stuff). I think they should be understood as separate challenges.

Marketing
Ontology uptake is a popularity contest that has real survival ramifications. Along the lines of cgreen's "emphasize the impact that a resource has had", I'm ok with the simplicity of having a license require that its ontology be listed sufficiently with a website or data repository that uses it. Would a more precise solution that focuses on standards for reporting summaries of term usage in a system be more attractive? Something like "if a curated ontology has defined an online usage reporting service according to the semantic web W3C standard XYZ, use (views/records?) in your application of that term's URI's should be counted and reported semi-annually to that service". This is like the royalty system for playing songs on the radio - just the stats part. That sync's with any software resource provider's desire to understand its content. 'Course it would take some time to develop such a repository but at least the onus is on the ontology provider(s) to do so. Carrot and stick: Would this really need to be enforced by legalese though, or attractive enough just by existing as a service?

Quality Control
Taking a historical step back, it is fascinating to compare "pre-ontology" dictionary projects like OED, in which curators avidly sought first-use instances of words and phrases in documents, but did not associate any kind of proprietary ownership beyond say acknowledging the case where a phrase was trademarked, with our digital age ontology quest, where we are witnessing the merging of term definition, formal logic, and distributed software reuse of entities. To "own" chunks of language at this level strikes me as hopefully a temporary historical concept, much like the patenting of DNA fragments. However, I get that communities want to control the definitions they curate, and thereby provide quality control, and the only way to do this is to ensure particular use of an entity can be uniquely traced back to its curating community and the conditions it legally provides use on. In that scenario - unfolding now - software may be composed of thousands of entities from hundreds of orthogonal ontologies. Will exclusive use of an CC0-licensed family of ontologies be the only way out of this complexity? Can't a simple license protect term curators from issues arising from the misuse or repurposing of the terms they curate?

P.s. I like a simple model where the use in a given database or ontology of an entity URI like http://purl.obolibrary.org/obo/GO_0097114 is sufficient for providing reference back to a term's primary ontology where any necessary further legal attribution of an ontology term and restrictions on potential re-use of its labels, definitions and other immediately associated axioms is stated.

@goodb

This comment has been minimized.

Show comment
Hide comment
@goodb

goodb Aug 15, 2016

I think the comparison to the OED project is truly apt here. You can, to this day, buy a copy of the OED - paying its maintainers and assuring that you have the most appropriate definitions according to them. The fact that its terms are CC0 does not prevent that from happening - in fact, its the only reason it does happen. Imagine the challenge of writing, well anything, if you had to negotiate for use of each semantic region of language with some different curatorial authority?!

Unless this community wants to follow in the footsteps of the Chemical Abstract Society and start suing people for use of unique identifiers for entities in the world, I see no advantages whatsoever to be gained by sticking any form of license on a PURL, a set of aliases, and a textual definition.

One of the fundamental principles of the OBO foundry is the idea of building orthogonal ontologies. They basically cannot be used without mixing them together. Consider even the case of the HPO. How useful would it be if we did not have access to the names of genes? Their coordinates on genomes? etc. Should NCBI and Ensembl start licensing the entities in their collections?

Now, what about the ontology in its full logical glory? A few thoughts:

  1. Regarding the original question about wikidata and the subsequent HetNet use case provided by Daniel Himmelstein, this is basically a moot point. Neither application can represent or compute with the OWL axioms that encode the ontology logic. Both are simple networks in need of coherent, globally unique names for their nodes and edges.
  2. The concern raised above that a group that imported an ontology might change it internally, to suit their application, and potentially generate results that are not in agreement with the intent of the ontology owners and thus potentially wrong and thus potentially dangerous is fundamentally without merit.
    a. The possibility of external changes of public information entities (e.g. all open source code) in no way breaks the curatorial authority of the owners of the resource. Unless they allowed completely unvetted changes into the ontology that they distribute from external sources, the owners continue to own. If someone wants the consortium standard view of an ontology, they can get it from that authority.
    b. It is entirely possible that people changing an ontology for use in their software might actually make it better, not worse. Presumably people building software – especially those selling software – want it to perform well at its task. They have no incentive to make it worse.
    c. If an ontology is to be released for use by the public, using whatever license you want, it is fundamentally impossible to enforce a restriction that it is used in a pre-specified, unaltered way.
  3. Ontologies do not do anything useful until they are operationalized in software. If the owners of an ontology truly believe there is only one way their creation can and should be used, then perhaps they should consider selling a binary implementation of that software rather than going down the rather confusing path of appearing on GitHub as if they are an open resource that is seeking community input.

Apart from anything else, this discussion is mostly about money, right? Developers of the OBO ontologies would, quite reasonably, like to get paid to continue their work. If everyone here in the room was driving a Tesla to work every day, free to philosophize about the nature of biological reality without worrying about their next grant, I doubt we would see such concern about a CC-by versus CC0 license for their work products. So lets answer the question, how do licenses impact our ability to keep ontology development efforts funded? Lets assume, for now, that the federal government is going to be the main source of revenue. What do they want to see for their investment? Perhaps it would be fruitful to invite some of your program officers into the discussion here but my impression is that they want to see the maximum impact for dollar invested – and that comes about through maximal openness.

goodb commented Aug 15, 2016

I think the comparison to the OED project is truly apt here. You can, to this day, buy a copy of the OED - paying its maintainers and assuring that you have the most appropriate definitions according to them. The fact that its terms are CC0 does not prevent that from happening - in fact, its the only reason it does happen. Imagine the challenge of writing, well anything, if you had to negotiate for use of each semantic region of language with some different curatorial authority?!

Unless this community wants to follow in the footsteps of the Chemical Abstract Society and start suing people for use of unique identifiers for entities in the world, I see no advantages whatsoever to be gained by sticking any form of license on a PURL, a set of aliases, and a textual definition.

One of the fundamental principles of the OBO foundry is the idea of building orthogonal ontologies. They basically cannot be used without mixing them together. Consider even the case of the HPO. How useful would it be if we did not have access to the names of genes? Their coordinates on genomes? etc. Should NCBI and Ensembl start licensing the entities in their collections?

Now, what about the ontology in its full logical glory? A few thoughts:

  1. Regarding the original question about wikidata and the subsequent HetNet use case provided by Daniel Himmelstein, this is basically a moot point. Neither application can represent or compute with the OWL axioms that encode the ontology logic. Both are simple networks in need of coherent, globally unique names for their nodes and edges.
  2. The concern raised above that a group that imported an ontology might change it internally, to suit their application, and potentially generate results that are not in agreement with the intent of the ontology owners and thus potentially wrong and thus potentially dangerous is fundamentally without merit.
    a. The possibility of external changes of public information entities (e.g. all open source code) in no way breaks the curatorial authority of the owners of the resource. Unless they allowed completely unvetted changes into the ontology that they distribute from external sources, the owners continue to own. If someone wants the consortium standard view of an ontology, they can get it from that authority.
    b. It is entirely possible that people changing an ontology for use in their software might actually make it better, not worse. Presumably people building software – especially those selling software – want it to perform well at its task. They have no incentive to make it worse.
    c. If an ontology is to be released for use by the public, using whatever license you want, it is fundamentally impossible to enforce a restriction that it is used in a pre-specified, unaltered way.
  3. Ontologies do not do anything useful until they are operationalized in software. If the owners of an ontology truly believe there is only one way their creation can and should be used, then perhaps they should consider selling a binary implementation of that software rather than going down the rather confusing path of appearing on GitHub as if they are an open resource that is seeking community input.

Apart from anything else, this discussion is mostly about money, right? Developers of the OBO ontologies would, quite reasonably, like to get paid to continue their work. If everyone here in the room was driving a Tesla to work every day, free to philosophize about the nature of biological reality without worrying about their next grant, I doubt we would see such concern about a CC-by versus CC0 license for their work products. So lets answer the question, how do licenses impact our ability to keep ontology development efforts funded? Lets assume, for now, that the federal government is going to be the main source of revenue. What do they want to see for their investment? Perhaps it would be fruitful to invite some of your program officers into the discussion here but my impression is that they want to see the maximum impact for dollar invested – and that comes about through maximal openness.

@dhimmel

This comment has been minimized.

Show comment
Hide comment
@dhimmel

dhimmel Aug 16, 2016

The name "Open Biomedical Ontologies" suggests that all OBO Foundry content should meet the Open Definition. Therefore, any "no derivative" or "non-commercial" stipulations should be out of the question. In addition, the current OBO Foundry principles specify that either a CC BY or CC0 license must be applied. Therefore, it seems that several ontologies are currently non-compliant, such as HPO, whose license states:

That neither the content of the HPO file(s) nor the logical relationships embedded within the HPO file(s) be altered in any way.

In addition to @goodb's points on the importance of derivatives, there is another very important consideration. Derivatives are necessary to decouple the content in an ontology from its initial creators. Say for example that a situation arises where the HPO is no longer effectively curating their ontology. Such a situation could occur due to a funding shortfall or faculty passing on. No derivatives means other groups cannot create parallel or successive projects. You have essentially tied the future of the knowledge to the future of the initial creators.

Regarding the liability comments by @drseb and @pnrobinson, CC0 and CC BY both contain strong liability disclaimers. CC BY goes further to require the provision of "a notice that refers to the disclaimer of warranties" and an indication "if You modified the Licensed Material". Therefore, "no derivates" achieves little-to-no extra protection from liability at great cost.

dhimmel commented Aug 16, 2016

The name "Open Biomedical Ontologies" suggests that all OBO Foundry content should meet the Open Definition. Therefore, any "no derivative" or "non-commercial" stipulations should be out of the question. In addition, the current OBO Foundry principles specify that either a CC BY or CC0 license must be applied. Therefore, it seems that several ontologies are currently non-compliant, such as HPO, whose license states:

That neither the content of the HPO file(s) nor the logical relationships embedded within the HPO file(s) be altered in any way.

In addition to @goodb's points on the importance of derivatives, there is another very important consideration. Derivatives are necessary to decouple the content in an ontology from its initial creators. Say for example that a situation arises where the HPO is no longer effectively curating their ontology. Such a situation could occur due to a funding shortfall or faculty passing on. No derivatives means other groups cannot create parallel or successive projects. You have essentially tied the future of the knowledge to the future of the initial creators.

Regarding the liability comments by @drseb and @pnrobinson, CC0 and CC BY both contain strong liability disclaimers. CC BY goes further to require the provision of "a notice that refers to the disclaimer of warranties" and an indication "if You modified the Licensed Material". Therefore, "no derivates" achieves little-to-no extra protection from liability at great cost.

@pnrobinson

This comment has been minimized.

Show comment
Hide comment
@pnrobinson

pnrobinson Aug 16, 2016

Dear everybody, I see both sides of this coin. We are concerned not about commercial use (in fact, quite a number of companies are using the HPO for free, which is allowed by our license). The HPO is in a relatively special situation in that it is being used in numerous clinical situations not only for research but increasingly for patient care, and thus we are trying to fulfill a responsibility to patients whose data is being analyzed by HPO, and I believe that the composition of our team from MDs, computer scientists, and ontologists, puts us in a unique position to do so. Also, given that the HPO is now being used by projects such as 100 thousand genomes, NIH Undiagnosed Diseases Network, and has been translated into 6 languages, it is becoming an unofficial standard for patient data exchange in the field of medical genetics, and it would not be at all useful to have multiple versions of the HPO that would make data exchange difficult.

In any situation like this, there are multiple partially incompatible goals, and our license was our best attempt to balance between them.

That said, there exists (for instance) a derivative version of the HPO embedded in a larger effort that has essentially been renamed (medgen). This is OK, since nobody would mistake that version for the "real" HPO and medgen is being used for other,very useful purposes. I am not sure if any of the current CC licenses capture this nuance?

-Peter

Peter Robinson

Professor of Computational Biology

The Jackson Laboratory for Genomic Medicine

10 Discovery Drive

Farmington, CT 06032

860.837.2095 t | 860.990.3130 m

peter.robinson@jax.orgmailto:peter.robinson@jax.org

www.jax.org

The Jackson Laboratory: Leading the search for tomorrow's cures


From: Daniel Himmelstein notifications@github.com
Sent: Monday, August 15, 2016 9:39 PM
To: OBOFoundry/OBOFoundry.github.io
Cc: Peter Robinson; Mention
Subject: Re: [OBOFoundry/OBOFoundry.github.io] Create guidelines for OBO maintainers who want to be included in Wikidata (#285)

The name "Open Biomedical Ontologies" suggests that all OBO Foundry content should meet the open definitionhttp://opendefinition.org/. Therefore, any "no derivative" or "non-commercial" stipulations should be out of the question. In addition, the current OBO Foundry principleshttp://www.obofoundry.org/principles/fp-001-open.html specify that either a CC BY or CC0 license must be applied. Therefore, it seems that several ontologies are currently non-compliant, such as HPO, whose license stateshttp://human-phenotype-ontology.github.io/license.html:

That neither the content of the HPO file(s) nor the logical relationships embedded within the HPO file(s) be altered in any way.

In addition to @goodb's points on the importance of derivatives, there is another very important consideration. Derivatives are necessary to decouple the content in an ontology from its initial creators. Say for example that a situation arises where the HPO is no longer effectively curating their ontology. Such a situation could occur due to a funding shortfall or faculty passing on. No derivatives means other groups cannot create parallel or successor projects. You have essentially tied the future of the knowledge to the future of the initial creators.

Regarding the liability comments by @drsebhttps://github.com/drseb and @pnrobinsonhttps://github.com/pnrobinson, CC0 and CC BY both contain strong liability disclaimers. CC BY goes further to require the provision of "a notice that refers to the disclaimer of warranties" and an indication "if You modified the Licensed Material". Therefore, "no derivates" achieves little-to-no extra protection from liability at great cost.

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHubhttps://github.com/OBOFoundry/OBOFoundry.github.io/issues/285#issuecomment-239979012, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEtuPD0zrRBbZP0OSeBA0TEBA_fHNGKqks5qgRTHgaJpZM4JK62o.

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

Dear everybody, I see both sides of this coin. We are concerned not about commercial use (in fact, quite a number of companies are using the HPO for free, which is allowed by our license). The HPO is in a relatively special situation in that it is being used in numerous clinical situations not only for research but increasingly for patient care, and thus we are trying to fulfill a responsibility to patients whose data is being analyzed by HPO, and I believe that the composition of our team from MDs, computer scientists, and ontologists, puts us in a unique position to do so. Also, given that the HPO is now being used by projects such as 100 thousand genomes, NIH Undiagnosed Diseases Network, and has been translated into 6 languages, it is becoming an unofficial standard for patient data exchange in the field of medical genetics, and it would not be at all useful to have multiple versions of the HPO that would make data exchange difficult.

In any situation like this, there are multiple partially incompatible goals, and our license was our best attempt to balance between them.

That said, there exists (for instance) a derivative version of the HPO embedded in a larger effort that has essentially been renamed (medgen). This is OK, since nobody would mistake that version for the "real" HPO and medgen is being used for other,very useful purposes. I am not sure if any of the current CC licenses capture this nuance?

-Peter

Peter Robinson

Professor of Computational Biology

The Jackson Laboratory for Genomic Medicine

10 Discovery Drive

Farmington, CT 06032

860.837.2095 t | 860.990.3130 m

peter.robinson@jax.orgmailto:peter.robinson@jax.org

www.jax.org

The Jackson Laboratory: Leading the search for tomorrow's cures


From: Daniel Himmelstein notifications@github.com
Sent: Monday, August 15, 2016 9:39 PM
To: OBOFoundry/OBOFoundry.github.io
Cc: Peter Robinson; Mention
Subject: Re: [OBOFoundry/OBOFoundry.github.io] Create guidelines for OBO maintainers who want to be included in Wikidata (#285)

The name "Open Biomedical Ontologies" suggests that all OBO Foundry content should meet the open definitionhttp://opendefinition.org/. Therefore, any "no derivative" or "non-commercial" stipulations should be out of the question. In addition, the current OBO Foundry principleshttp://www.obofoundry.org/principles/fp-001-open.html specify that either a CC BY or CC0 license must be applied. Therefore, it seems that several ontologies are currently non-compliant, such as HPO, whose license stateshttp://human-phenotype-ontology.github.io/license.html:

That neither the content of the HPO file(s) nor the logical relationships embedded within the HPO file(s) be altered in any way.

In addition to @goodb's points on the importance of derivatives, there is another very important consideration. Derivatives are necessary to decouple the content in an ontology from its initial creators. Say for example that a situation arises where the HPO is no longer effectively curating their ontology. Such a situation could occur due to a funding shortfall or faculty passing on. No derivatives means other groups cannot create parallel or successor projects. You have essentially tied the future of the knowledge to the future of the initial creators.

Regarding the liability comments by @drsebhttps://github.com/drseb and @pnrobinsonhttps://github.com/pnrobinson, CC0 and CC BY both contain strong liability disclaimers. CC BY goes further to require the provision of "a notice that refers to the disclaimer of warranties" and an indication "if You modified the Licensed Material". Therefore, "no derivates" achieves little-to-no extra protection from liability at great cost.

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHubhttps://github.com/OBOFoundry/OBOFoundry.github.io/issues/285#issuecomment-239979012, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEtuPD0zrRBbZP0OSeBA0TEBA_fHNGKqks5qgRTHgaJpZM4JK62o.

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

@JervenBolleman

This comment has been minimized.

Show comment
Hide comment
@JervenBolleman

JervenBolleman Aug 16, 2016

Regarding UniProt (for whom I am not a spokesperson), a license has been given to export some data from UniProt to WikiData. I can look up the details if desired. (Remember you can have more than one license at a time). Now we are also keeping CC-BY-ND 3.0 mostly because no one cares enough to really push this forwards (external or internal).

Thing is most of the worries fall under Trademark rights not copyrights. Some one does something stupid with a HPO derivative it would probably be easier to get them legally under a trademark or even a libel case than a copyright case. @drseb the Person A and Person B problems are not practically resolvable using a copyright license if the derivative stays within the company. HPO getting only tiny mentions is also not likely to be effective/affordable to enforce.

In the end I think the guidance should be: Do you have a large legal enforcement budget, 100,000 USD or more per year? If yes use that budget to get advice from your legal council. If not use CC0 so that people are free to reuse, improve upon your work and allowing them to collaborate with you.

But in the end as long as doing this right is not a key concern for PI's it is going to stay the wild west that this is.

JervenBolleman commented Aug 16, 2016

Regarding UniProt (for whom I am not a spokesperson), a license has been given to export some data from UniProt to WikiData. I can look up the details if desired. (Remember you can have more than one license at a time). Now we are also keeping CC-BY-ND 3.0 mostly because no one cares enough to really push this forwards (external or internal).

Thing is most of the worries fall under Trademark rights not copyrights. Some one does something stupid with a HPO derivative it would probably be easier to get them legally under a trademark or even a libel case than a copyright case. @drseb the Person A and Person B problems are not practically resolvable using a copyright license if the derivative stays within the company. HPO getting only tiny mentions is also not likely to be effective/affordable to enforce.

In the end I think the guidance should be: Do you have a large legal enforcement budget, 100,000 USD or more per year? If yes use that budget to get advice from your legal council. If not use CC0 so that people are free to reuse, improve upon your work and allowing them to collaborate with you.

But in the end as long as doing this right is not a key concern for PI's it is going to stay the wild west that this is.

@cgreene

This comment has been minimized.

Show comment
Hide comment
@cgreene

cgreene Aug 16, 2016

All of the HPO concerns seem to center around trademark. This seems to indicate a concern around trademark, not license:

This is OK, since nobody would mistake that version for the "real" HPO and medgen is being used for other,very useful purposes.

I expect that HPO could be under CC0 and could still prevent people from using their trademark without their consent.

@goodb really hits the nail on the head. If one wants to develop an impactful ontology, it makes the most sense to have it be broadly used. Restrictive licenses that apply to the ontology provide an incentive to develop a competing ontology (e.g. in this case something like an "Open Phenotype Ontology"). This would force funders to decide between a resource with a track record that can be used only in certain ways, or a resource that is newer but has higher potential long-term impact due to the diversity of ways that it can be used. Maintaining openness (but controlling trademark to avoid use of the branding without your consent) seems to accomplish both goals without providing the incentive to users and funders to develop an alternative.

cgreene commented Aug 16, 2016

All of the HPO concerns seem to center around trademark. This seems to indicate a concern around trademark, not license:

This is OK, since nobody would mistake that version for the "real" HPO and medgen is being used for other,very useful purposes.

I expect that HPO could be under CC0 and could still prevent people from using their trademark without their consent.

@goodb really hits the nail on the head. If one wants to develop an impactful ontology, it makes the most sense to have it be broadly used. Restrictive licenses that apply to the ontology provide an incentive to develop a competing ontology (e.g. in this case something like an "Open Phenotype Ontology"). This would force funders to decide between a resource with a track record that can be used only in certain ways, or a resource that is newer but has higher potential long-term impact due to the diversity of ways that it can be used. Maintaining openness (but controlling trademark to avoid use of the branding without your consent) seems to accomplish both goals without providing the incentive to users and funders to develop an alternative.

@egonw

This comment has been minimized.

Show comment
Hide comment
@egonw

egonw Aug 16, 2016

I hope everyone is clear on the concept that CCZero is not a true license and that it is a waiver where you waive any legal and moral rights on the data, in a (American) public domain fashion. No one has copyright over data in Wikidata, which means that anyone can use it and it cannot be incompatible with any data license. (Otherwise, great discussion, which will serve as an excellent example for others!)

egonw commented Aug 16, 2016

I hope everyone is clear on the concept that CCZero is not a true license and that it is a waiver where you waive any legal and moral rights on the data, in a (American) public domain fashion. No one has copyright over data in Wikidata, which means that anyone can use it and it cannot be incompatible with any data license. (Otherwise, great discussion, which will serve as an excellent example for others!)

@andrewsu

This comment has been minimized.

Show comment
Hide comment
@andrewsu

andrewsu Aug 16, 2016

Contributor

Agreed, great discussion everyone!

Regarding @JervenBolleman:

But in the end as long as doing this right is not a key concern for PI's it is going to stay the wild west that this is.

and @cgreene:

Restrictive licenses that apply to the ontology provide an incentive to develop a competing ontology (e.g. in this case something like an "Open Phenotype Ontology").

I agree that things are currently like the wild west (and @dhimmel has shown in gory detail why this impedes scientific progress). And I agree that open alternatives are likely to emerge. My bet is that Wikidata will be a very strong driving force on both fronts (taming the wild west and developing open alternatives), and the key question is whether each individual OBO Foundry ontology wants to provide a foundation on which to build.

Since @pnrobinson has been a gracious participant in this discussion, I thought I'd highlight how an HPO term is represented in Wikidata. Consider headache in Wikidata. It has mappings to ICD-9 and -10, MeSH, ICPC, etc., as well as translations to Spanish and Chinese. We (the Gene Wiki team) added the drug links, but those aside this record is created and maintained by other members of the community.

There are certainly differences relative to HPO, particularly with the specificity of the clinical definition and the higher-level categorization of phenotypes classes. And while I'm sure HPO's team is more principled and consistent, I suspect that over time Wikidata will emerge as a de facto standard because it is highly interconnected and seamless to use. (Similar to how dbpedia emerged as a hub for Linked Open Data).

I also want to return to a few of @cmungall's driving questions/goals.

How do we move forward?

  1. CC0 advocates need to provide more persuasive arguments than "you're making it harder for me".

I actually think the argument that CC-BY is an impediment to integration efforts is quite persuasive and compelling. And I'd add to that the argument that the technical platforms for community-based, open alternatives to emerge exist now, so the question for ontology developers is whether to treat them as collaborators or competitors.

  1. What are some short or long term compromises? For example, is there a template for providing a CC-0 axiom-subset of a CC-BY ontology?

My understanding of your proposal (after some translation help from @goodb) is to potentially separate the descriptions of things (IDs, definitions, relationships) from the constraints and rules that allow one of programatically reason over the ontology. And the former perhaps could be released as CC0 as a mechanism to build community infrastructure and encourage adoption, whereas the latter has more derived IP that could be protected in some way. That seems reasonable to me...

Another compromise approach would be to have a CC0 release after some restricted period, similar to the approach taken by Pheonix Bioinformatics (e.g., TAIR). Users who need absolutely up-to-date versions presumably would abide by whatever terms you set, but things eventually get to full sharing by CC0. (The downside of this solution of course is having to manage/explain the discrepancies between the various versions.)

  1. Arguments in either direction need to be based in the reality of the current funding situation. Many of us would love to put things in the public domain for the common good, but we need a concrete plan to ensure funding in the face of corporate products taking content and using it without attribution.

I would like to believe that the greater integration enabled by CC0 (and Wikidata in particular) would actually be strong selling points to justify continued funding. I think the question to ontology developers is "What metrics would best demonstrate how awesome your resource is in your grant proposals?" If we had a better idea of this point, we might be able to collaboratively create those tools (both in general and for Wikidata in particular).

I'd also echo @goodb's call to point your program officers and funders to this thread. Hearing their perspective I think would be very valuable. I'd earlier invited several BD2K program staff here, and I hope they will chime in at some point.

Contributor

andrewsu commented Aug 16, 2016

Agreed, great discussion everyone!

Regarding @JervenBolleman:

But in the end as long as doing this right is not a key concern for PI's it is going to stay the wild west that this is.

and @cgreene:

Restrictive licenses that apply to the ontology provide an incentive to develop a competing ontology (e.g. in this case something like an "Open Phenotype Ontology").

I agree that things are currently like the wild west (and @dhimmel has shown in gory detail why this impedes scientific progress). And I agree that open alternatives are likely to emerge. My bet is that Wikidata will be a very strong driving force on both fronts (taming the wild west and developing open alternatives), and the key question is whether each individual OBO Foundry ontology wants to provide a foundation on which to build.

Since @pnrobinson has been a gracious participant in this discussion, I thought I'd highlight how an HPO term is represented in Wikidata. Consider headache in Wikidata. It has mappings to ICD-9 and -10, MeSH, ICPC, etc., as well as translations to Spanish and Chinese. We (the Gene Wiki team) added the drug links, but those aside this record is created and maintained by other members of the community.

There are certainly differences relative to HPO, particularly with the specificity of the clinical definition and the higher-level categorization of phenotypes classes. And while I'm sure HPO's team is more principled and consistent, I suspect that over time Wikidata will emerge as a de facto standard because it is highly interconnected and seamless to use. (Similar to how dbpedia emerged as a hub for Linked Open Data).

I also want to return to a few of @cmungall's driving questions/goals.

How do we move forward?

  1. CC0 advocates need to provide more persuasive arguments than "you're making it harder for me".

I actually think the argument that CC-BY is an impediment to integration efforts is quite persuasive and compelling. And I'd add to that the argument that the technical platforms for community-based, open alternatives to emerge exist now, so the question for ontology developers is whether to treat them as collaborators or competitors.

  1. What are some short or long term compromises? For example, is there a template for providing a CC-0 axiom-subset of a CC-BY ontology?

My understanding of your proposal (after some translation help from @goodb) is to potentially separate the descriptions of things (IDs, definitions, relationships) from the constraints and rules that allow one of programatically reason over the ontology. And the former perhaps could be released as CC0 as a mechanism to build community infrastructure and encourage adoption, whereas the latter has more derived IP that could be protected in some way. That seems reasonable to me...

Another compromise approach would be to have a CC0 release after some restricted period, similar to the approach taken by Pheonix Bioinformatics (e.g., TAIR). Users who need absolutely up-to-date versions presumably would abide by whatever terms you set, but things eventually get to full sharing by CC0. (The downside of this solution of course is having to manage/explain the discrepancies between the various versions.)

  1. Arguments in either direction need to be based in the reality of the current funding situation. Many of us would love to put things in the public domain for the common good, but we need a concrete plan to ensure funding in the face of corporate products taking content and using it without attribution.

I would like to believe that the greater integration enabled by CC0 (and Wikidata in particular) would actually be strong selling points to justify continued funding. I think the question to ontology developers is "What metrics would best demonstrate how awesome your resource is in your grant proposals?" If we had a better idea of this point, we might be able to collaboratively create those tools (both in general and for Wikidata in particular).

I'd also echo @goodb's call to point your program officers and funders to this thread. Hearing their perspective I think would be very valuable. I'd earlier invited several BD2K program staff here, and I hope they will chime in at some point.

@pnrobinson

This comment has been minimized.

Show comment
Hide comment
@pnrobinson

pnrobinson Aug 16, 2016

Hi Andrew,

I suspect that we actually agree on everything (and certainly we are both working towards the same aims), but your example of headache in wikidata is what I am worried about. The English language definition is imprecise (wrong for a stickler), and does not agree with the Spanish language definition. None of the three medications listed for headache are actually indicated for headache. I doubt that anybody is making medical decisions based on wikidata currently, but people are making medical decisions based partly on HPO analysis. AS you know, we are planning on working with you to entirely put the HPO into wikidata. Imagine that somebody then ingested all of wikidata (including incorrect items that were not vetted by the HPO team), and then made wrong medical decisions based on this, thinking that it was a rebranded version of HPO. This completely exaggerated scenario is what I would like to avoid (and it would certainly damage wikidata as much as the HPO). I hope everybody realises that I am playing devil's advocate to make a point, but from my previous days as a practicing physician, I can say that wrong decisions are made in medical practice, and I really think that one needs to do everything possible to prevent this. I think that is a quite realistic assessment of what is needed to act responsibly and promote wider use of ontologies in medical fields.

-Peter

Peter Robinson

Professor of Computational Biology

The Jackson Laboratory for Genomic Medicine

10 Discovery Drive

Farmington, CT 06032

860.837.2095 t | 860.990.3130 m

peter.robinson@jax.orgmailto:peter.robinson@jax.org

www.jax.org

The Jackson Laboratory: Leading the search for tomorrow's cures


From: Andrew Su notifications@github.com
Sent: Tuesday, August 16, 2016 5:37 PM
To: OBOFoundry/OBOFoundry.github.io
Cc: Peter Robinson; Mention
Subject: Re: [OBOFoundry/OBOFoundry.github.io] Create guidelines for OBO maintainers who want to be included in Wikidata (#285)

Agreed, great discussion everyone!

Regarding @JervenBollemanhttps://github.com/JervenBolleman:

But in the end as long as doing this right is not a key concern for PI's it is going to stay the wild west that this is.

and @cgreenehttps://github.com/cgreene:

Restrictive licenses that apply to the ontology provide an incentive to develop a competing ontology (e.g. in this case something like an "Open Phenotype Ontology").

I agree that things are currently like the wild west (and @dhimmelhttps://github.com/dhimmel has shown in gory detail why this impedes scientific progress). And I agree that open alternatives are likely to emerge. My bet is that Wikidata will be a very strong driving force on both fronts (taming the wild west and developing open alternatives), and the key question is whether each individual OBO Foundry ontology wants to provide a foundation on which to build.

Since @pnrobinsonhttps://github.com/pnrobinson has been a gracious participant in this discussion, I thought I'd highlight how an HPO term is represented in Wikidata. Consider headache in Wikidatahttps://www.wikidata.org/wiki/Q86. It has mappings to ICD-9 and -10, MeSH, ICPC, etc., as well as translations to Spanish and Chinese. We (the Gene Wiki team) addedhttps://www.wikidata.org/w/index.php?title=Q86&diff=325686519&oldid=320472033 the drug links, but those aside this record is created and maintained by other members of the community.

There are certainly differences relative to HPO, particularly with the specificity of the clinical definition and the higher-level categorization of phenotypes classes. And while I'm sure HPO's team is more principled and consistent, I suspect that over time Wikidata will emerge as a de facto standard because it is highly interconnected and seamless to use. (Similar to how dbpedia emerged as a hub for Linked Open Data).

I also want to return to a few of @cmungallhttps://github.com/cmungall's driving questions/goals.

How do we move forward?

  1. CC0 advocates need to provide more persuasive arguments than "you're making it harder for me".

I actually think the argument that CC-BY is an impediment to integration efforts is quite persuasive and compelling. And I'd add to that the argument that the technical platforms for community-based, open alternatives to emerge exist now, so the question for ontology developers is whether to treat them as collaborators or competitors.

  1. What are some short or long term compromises? For example, is there a template for providing a CC-0 axiom-subset of a CC-BY ontology?

My understanding of your proposal (after some translation help from @goodbhttps://github.com/goodb) is to potentially separate the descriptions of things (IDs, definitions, relationships) from the constraints and rules that allow one of programatically reason over the ontology. And the former perhaps could be released as CC0 as a mechanism to build community infrastructure and encourage adoption, whereas the latter has more derived IP that could be protected in some way. That seems reasonable to me...

Another compromise approach would be to have a CC0 release after some restricted period, similar to the approach taken by Pheonix Bioinformaticshttp://www.phoenixbioinformatics.org/ (e.g., TAIRhttp://database.oxfordjournals.org/content/2016/baw018.long). Users who need absolutely up-to-date versions presumably would abide by whatever terms you set, but things eventually get to full sharing by CC0. (The downside of this solution of course is having to manage/explain the discrepancies between the various versions.)

  1. Arguments in either direction need to be based in the reality of the current funding situation. Many of us would love to put things in the public domain for the common good, but we need a concrete plan to ensure funding in the face of corporate products taking content and using it without attribution.

I would like to believe that the greater integration enabled by CC0 (and Wikidata in particular) would actually be strong selling points to justify continued funding. I think the question to ontology developers is "What metrics would best demonstrate how awesome your resource is in your grant proposals?" If we had a better idea of this point, we might be able to collaboratively create those tools (both in general and for Wikidata in particular).

I'd also echo @goodbhttps://github.com/goodb's call to point your program officers and funders to this thread. Hearing their perspective I think would be very valuable. I'd earlier invited several BD2K program staff here, and I hope they will chime in at some point.

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHubhttps://github.com/OBOFoundry/OBOFoundry.github.io/issues/285#issuecomment-240247537, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEtuPEkjOQ5AnEndYKT3fXOdlgcc_Cnsks5qgi20gaJpZM4JK62o.

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

Hi Andrew,

I suspect that we actually agree on everything (and certainly we are both working towards the same aims), but your example of headache in wikidata is what I am worried about. The English language definition is imprecise (wrong for a stickler), and does not agree with the Spanish language definition. None of the three medications listed for headache are actually indicated for headache. I doubt that anybody is making medical decisions based on wikidata currently, but people are making medical decisions based partly on HPO analysis. AS you know, we are planning on working with you to entirely put the HPO into wikidata. Imagine that somebody then ingested all of wikidata (including incorrect items that were not vetted by the HPO team), and then made wrong medical decisions based on this, thinking that it was a rebranded version of HPO. This completely exaggerated scenario is what I would like to avoid (and it would certainly damage wikidata as much as the HPO). I hope everybody realises that I am playing devil's advocate to make a point, but from my previous days as a practicing physician, I can say that wrong decisions are made in medical practice, and I really think that one needs to do everything possible to prevent this. I think that is a quite realistic assessment of what is needed to act responsibly and promote wider use of ontologies in medical fields.

-Peter

Peter Robinson

Professor of Computational Biology

The Jackson Laboratory for Genomic Medicine

10 Discovery Drive

Farmington, CT 06032

860.837.2095 t | 860.990.3130 m

peter.robinson@jax.orgmailto:peter.robinson@jax.org

www.jax.org

The Jackson Laboratory: Leading the search for tomorrow's cures


From: Andrew Su notifications@github.com
Sent: Tuesday, August 16, 2016 5:37 PM
To: OBOFoundry/OBOFoundry.github.io
Cc: Peter Robinson; Mention
Subject: Re: [OBOFoundry/OBOFoundry.github.io] Create guidelines for OBO maintainers who want to be included in Wikidata (#285)

Agreed, great discussion everyone!

Regarding @JervenBollemanhttps://github.com/JervenBolleman:

But in the end as long as doing this right is not a key concern for PI's it is going to stay the wild west that this is.

and @cgreenehttps://github.com/cgreene:

Restrictive licenses that apply to the ontology provide an incentive to develop a competing ontology (e.g. in this case something like an "Open Phenotype Ontology").

I agree that things are currently like the wild west (and @dhimmelhttps://github.com/dhimmel has shown in gory detail why this impedes scientific progress). And I agree that open alternatives are likely to emerge. My bet is that Wikidata will be a very strong driving force on both fronts (taming the wild west and developing open alternatives), and the key question is whether each individual OBO Foundry ontology wants to provide a foundation on which to build.

Since @pnrobinsonhttps://github.com/pnrobinson has been a gracious participant in this discussion, I thought I'd highlight how an HPO term is represented in Wikidata. Consider headache in Wikidatahttps://www.wikidata.org/wiki/Q86. It has mappings to ICD-9 and -10, MeSH, ICPC, etc., as well as translations to Spanish and Chinese. We (the Gene Wiki team) addedhttps://www.wikidata.org/w/index.php?title=Q86&diff=325686519&oldid=320472033 the drug links, but those aside this record is created and maintained by other members of the community.

There are certainly differences relative to HPO, particularly with the specificity of the clinical definition and the higher-level categorization of phenotypes classes. And while I'm sure HPO's team is more principled and consistent, I suspect that over time Wikidata will emerge as a de facto standard because it is highly interconnected and seamless to use. (Similar to how dbpedia emerged as a hub for Linked Open Data).

I also want to return to a few of @cmungallhttps://github.com/cmungall's driving questions/goals.

How do we move forward?

  1. CC0 advocates need to provide more persuasive arguments than "you're making it harder for me".

I actually think the argument that CC-BY is an impediment to integration efforts is quite persuasive and compelling. And I'd add to that the argument that the technical platforms for community-based, open alternatives to emerge exist now, so the question for ontology developers is whether to treat them as collaborators or competitors.

  1. What are some short or long term compromises? For example, is there a template for providing a CC-0 axiom-subset of a CC-BY ontology?

My understanding of your proposal (after some translation help from @goodbhttps://github.com/goodb) is to potentially separate the descriptions of things (IDs, definitions, relationships) from the constraints and rules that allow one of programatically reason over the ontology. And the former perhaps could be released as CC0 as a mechanism to build community infrastructure and encourage adoption, whereas the latter has more derived IP that could be protected in some way. That seems reasonable to me...

Another compromise approach would be to have a CC0 release after some restricted period, similar to the approach taken by Pheonix Bioinformaticshttp://www.phoenixbioinformatics.org/ (e.g., TAIRhttp://database.oxfordjournals.org/content/2016/baw018.long). Users who need absolutely up-to-date versions presumably would abide by whatever terms you set, but things eventually get to full sharing by CC0. (The downside of this solution of course is having to manage/explain the discrepancies between the various versions.)

  1. Arguments in either direction need to be based in the reality of the current funding situation. Many of us would love to put things in the public domain for the common good, but we need a concrete plan to ensure funding in the face of corporate products taking content and using it without attribution.

I would like to believe that the greater integration enabled by CC0 (and Wikidata in particular) would actually be strong selling points to justify continued funding. I think the question to ontology developers is "What metrics would best demonstrate how awesome your resource is in your grant proposals?" If we had a better idea of this point, we might be able to collaboratively create those tools (both in general and for Wikidata in particular).

I'd also echo @goodbhttps://github.com/goodb's call to point your program officers and funders to this thread. Hearing their perspective I think would be very valuable. I'd earlier invited several BD2K program staff here, and I hope they will chime in at some point.

You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHubhttps://github.com/OBOFoundry/OBOFoundry.github.io/issues/285#issuecomment-240247537, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEtuPEkjOQ5AnEndYKT3fXOdlgcc_Cnsks5qgi20gaJpZM4JK62o.

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

@andrawaag

This comment has been minimized.

Show comment
Hide comment
@andrawaag

andrawaag Aug 17, 2016

Imagine that somebody then ingested all of wikidata (including incorrect items that were not vetted by the HPO team), and then made wrong medical decisions based on this, thinking that it was a rebranded version of HPO.

This is a very valid point, which shows the need for elaborate usage of referencing on every wikidata statement originating from HPO. I would even argue that when a statement is properly referenced in wikidata it isn't wrong, but a disagreement between respected sources. By properly referencing the statements (and disregarding those statements with poor referencing) a user should be able to distinguish between good and bad content.

Imagine that somebody then ingested all of wikidata (including incorrect items that were not vetted by the HPO team), and then made wrong medical decisions based on this, thinking that it was a rebranded version of HPO.

This is a very valid point, which shows the need for elaborate usage of referencing on every wikidata statement originating from HPO. I would even argue that when a statement is properly referenced in wikidata it isn't wrong, but a disagreement between respected sources. By properly referencing the statements (and disregarding those statements with poor referencing) a user should be able to distinguish between good and bad content.

@andrewsu

This comment has been minimized.

Show comment
Hide comment
@andrewsu

andrewsu Aug 30, 2016

Contributor

For reference, twitter pointed me to another eloquent post arguing for CC0 (or actually something referred to as "CC0 (+BY)" ) over CC-BY in a different domain: http://www.dancohen.org/2013/11/26/cc0-by/. Posting it here for reference...

@pnrobinson thanks for the followup, and you're welcome for the example that (inadvertently) illustrated your point... ;) I agree with you and Andra that it shows why we need to have firm processes in place. And I think our Wikidata team is on our way to building that solid infrastructure to implement those processes.

What about next steps? Or do we just continue to work with the few groups (DO, HPO, uberon) who are willing to be test cases so we can see how things work in practice?

Contributor

andrewsu commented Aug 30, 2016

For reference, twitter pointed me to another eloquent post arguing for CC0 (or actually something referred to as "CC0 (+BY)" ) over CC-BY in a different domain: http://www.dancohen.org/2013/11/26/cc0-by/. Posting it here for reference...

@pnrobinson thanks for the followup, and you're welcome for the example that (inadvertently) illustrated your point... ;) I agree with you and Andra that it shows why we need to have firm processes in place. And I think our Wikidata team is on our way to building that solid infrastructure to implement those processes.

What about next steps? Or do we just continue to work with the few groups (DO, HPO, uberon) who are willing to be test cases so we can see how things work in practice?

@Public-Health-Bioinformatics

This comment has been minimized.

Show comment
Hide comment
@Public-Health-Bioinformatics

Public-Health-Bioinformatics Aug 30, 2016

Contributor

I definitely want to hear from others about next steps. About this "implied or ethical attribution" idea, I'd like to touch on what that could look like for resources like Wikidata that carry on crowd-sourced annotation around imported ontology terms like one might find with HPO "headache" (regardless of its particular content license). If Wikidata is importing several aspects of a term, say its label, definition, and synonyms, from an ontology, I would love to see that visually marked in a distinct (bolded or layered) way as existing word-for-word from the existing "reference" ontology; all other annotations could then be more easily distinguished and judged on their secondary merit.

Contributor

Public-Health-Bioinformatics commented Aug 30, 2016

I definitely want to hear from others about next steps. About this "implied or ethical attribution" idea, I'd like to touch on what that could look like for resources like Wikidata that carry on crowd-sourced annotation around imported ontology terms like one might find with HPO "headache" (regardless of its particular content license). If Wikidata is importing several aspects of a term, say its label, definition, and synonyms, from an ontology, I would love to see that visually marked in a distinct (bolded or layered) way as existing word-for-word from the existing "reference" ontology; all other annotations could then be more easily distinguished and judged on their secondary merit.

@andrewsu

This comment has been minimized.

Show comment
Hide comment
@andrewsu

andrewsu Sep 15, 2016

Contributor

I hope nobody objects that I add a link to another relevant blog post, this one discussing reasons why data is different than software in terms of copyleft: http://lu.is/blog/2016/09/14/copyleft-and-data-databases-as-poor-subject/

tl;dr: Open licensing works when you strike a healthy balance between obligations and reuse. Data, and how it is used, is different from software in ways that change that balance, making reasonable compromises in software (like attribution) suddenly become insanely difficult barriers.

This is the second post in a series, with apparently more to come...

Contributor

andrewsu commented Sep 15, 2016

I hope nobody objects that I add a link to another relevant blog post, this one discussing reasons why data is different than software in terms of copyleft: http://lu.is/blog/2016/09/14/copyleft-and-data-databases-as-poor-subject/

tl;dr: Open licensing works when you strike a healthy balance between obligations and reuse. Data, and how it is used, is different from software in ways that change that balance, making reasonable compromises in software (like attribution) suddenly become insanely difficult barriers.

This is the second post in a series, with apparently more to come...

@brightbyte

This comment has been minimized.

Show comment
Hide comment
@brightbyte

brightbyte Sep 16, 2016

If Wikidata is importing several aspects of a term, say its label, definition, and synonyms, from an ontology, I would love to see that visually marked in a distinct (bolded or layered) way as existing word-for-word from the existing "reference" ontology

Ironically, label, description, and aliases are three of the few things that Wikidata does not record source or provenance for (because these are considered editorial content, not sourced "statements"). I suppose the description has the biggest claim on copyright, and should probably not be imported from a source with an incompatible license. A description from a CC-BY source could be imported as a statement, if properly sourced. A (copyrightable) description from a CC-BY-SA source cannot be imported into Wikidata without special permission by the copyright holder. Which description may or may not be copyrightable depends on the jurisdiction, I suppose. My personal rule of thumb is that a description < 100 characters is probably not copyrightable (for lack of originality), but I wouldn't bet much on this holding up under all circumstances.

If Wikidata is importing several aspects of a term, say its label, definition, and synonyms, from an ontology, I would love to see that visually marked in a distinct (bolded or layered) way as existing word-for-word from the existing "reference" ontology

Ironically, label, description, and aliases are three of the few things that Wikidata does not record source or provenance for (because these are considered editorial content, not sourced "statements"). I suppose the description has the biggest claim on copyright, and should probably not be imported from a source with an incompatible license. A description from a CC-BY source could be imported as a statement, if properly sourced. A (copyrightable) description from a CC-BY-SA source cannot be imported into Wikidata without special permission by the copyright holder. Which description may or may not be copyrightable depends on the jurisdiction, I suppose. My personal rule of thumb is that a description < 100 characters is probably not copyrightable (for lack of originality), but I wouldn't bet much on this holding up under all circumstances.

@lschriml

This comment has been minimized.

Show comment
Hide comment
@lschriml

lschriml Sep 22, 2016

Contributor

The Human Disease Ontology (DO), with CCBY 3.0 licensing decided to provide to the Wikidata (see: https://www.wikidata.org/wiki/User:ProteinBoxBot/Legal) under Wikidata's CC0 licensing.

The Human Disease Ontology (DO) is licensed under CC-BY 3.0. The intent of DO's licensing choice is to promote open sharing and adaption of the DO, as an ontology of human diseases, with attribution to the DO project. As a project and resource to the community, we decided to import DO's terms, term related data and class hierarchy into Wikidata. The Disease Ontology object in Wikidata (Q5282129) provides attribution to the DO project for the related DO information loaded into Wikidata. As Principal Investigator of the DO project, I freely provide the content of DO for use and distribution to the Wikidata project without restrictions of attribution for the use of each term, and it's relate information, in the ontology. The Disease Ontology was created to be a community resource. The Wikidata and related projects, enable the content of DO to be used without restriction, thus serving the greater good of the community.

Contributor

lschriml commented Sep 22, 2016

The Human Disease Ontology (DO), with CCBY 3.0 licensing decided to provide to the Wikidata (see: https://www.wikidata.org/wiki/User:ProteinBoxBot/Legal) under Wikidata's CC0 licensing.

The Human Disease Ontology (DO) is licensed under CC-BY 3.0. The intent of DO's licensing choice is to promote open sharing and adaption of the DO, as an ontology of human diseases, with attribution to the DO project. As a project and resource to the community, we decided to import DO's terms, term related data and class hierarchy into Wikidata. The Disease Ontology object in Wikidata (Q5282129) provides attribution to the DO project for the related DO information loaded into Wikidata. As Principal Investigator of the DO project, I freely provide the content of DO for use and distribution to the Wikidata project without restrictions of attribution for the use of each term, and it's relate information, in the ontology. The Disease Ontology was created to be a community resource. The Wikidata and related projects, enable the content of DO to be used without restriction, thus serving the greater good of the community.

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Sep 28, 2016

Contributor

@lschriml - I'm not quite clear on how this differs from DO having a CC-0 license?

Also, (and this is a general question, not meaning to pick on you), is being the PI of the DO project sufficient for being able to override the CC-BY rights in this way? If the DO is a community project, then is it not the case that all content providers throughout the history of the DO have copyright and must also be consulted and agree to the transfer? (this seems like one argument for starting out with CC-0, to avoid these issues).

Contributor

cmungall commented Sep 28, 2016

@lschriml - I'm not quite clear on how this differs from DO having a CC-0 license?

Also, (and this is a general question, not meaning to pick on you), is being the PI of the DO project sufficient for being able to override the CC-BY rights in this way? If the DO is a community project, then is it not the case that all content providers throughout the history of the DO have copyright and must also be consulted and agree to the transfer? (this seems like one argument for starting out with CC-0, to avoid these issues).

@dhimmel

This comment has been minimized.

Show comment
Hide comment
@dhimmel

dhimmel Nov 17, 2016

I just came across two recent & amazing blog posts by Katie Fortney writing for the Office of Scholarly Communications at the University of California. These are the best introductions to academic data licensing that I'm aware of:

dhimmel commented Nov 17, 2016

I just came across two recent & amazing blog posts by Katie Fortney writing for the Office of Scholarly Communications at the University of California. These are the best introductions to academic data licensing that I'm aware of:

@dhimmel dhimmel referenced this issue in CDLUC3/dash Nov 17, 2016

Open

CC-BY Rationale #19

andrewsu added a commit to andrewsu/OBOFoundry.github.io that referenced this issue Jul 6, 2017

modify license choice language
Given that the choice between CC0 and CC-BY is a nuanced one with many pros and cons on both sides of the issue, I offer three suggestions for this document:

1. linking to OBOFoundry#285 where many issues are explicitly discussed
2. removing the explicit recommendation of CC-BY
3. adding a request for attribution in all cases regardless of license (following [this pattern](http://www.dancohen.org/2013/11/26/cc0-by/))

I of course understand that this policy is ultimately under the purview of the Editorial WG, but I've formulated this as a pull request just to propose something specific.

@pbuttigieg pbuttigieg referenced this issue in EnvironmentOntology/envo Mar 20, 2018

Open

Make ENVO CC-0 #600

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment