Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create guidelines for OBO maintainers who want to be included in Wikidata #285

Open
cmungall opened this issue Jul 12, 2016 · 48 comments
Open

Create guidelines for OBO maintainers who want to be included in Wikidata #285

cmungall opened this issue Jul 12, 2016 · 48 comments

Comments

@cmungall
Copy link
Contributor

@cmungall cmungall commented Jul 12, 2016

Most OBOs are CC-BY, Wikidata requires CC-0. Some ontologies have apparently granted Wikidata permission to redistribute part or all of their ontology.

We want to make sure this is streamlined with a common process for everyone. Not clear to me how this should be done, ideas welcome, add below.

@mcourtot
Copy link
Contributor

@mcourtot mcourtot commented Jul 13, 2016

CC-by but attribution chosen is by PURL - as long as wikidata uses the PURL (perhaps replacing the URL formatter that currently links directly to AmiGO) I think it should be ok.

@elviram
Copy link
Contributor

@elviram elviram commented Jul 13, 2016

I agree that this needs to be written down somewhere. It would make everything much clearer and we could avoid getting into lengthy discussion later down the line, or even have to remove data, if something were to happen.

@andrewsu
Copy link
Contributor

@andrewsu andrewsu commented Jul 13, 2016

re: @mcourtot's comment, it's true that Wikidata probably generally satisfies the attribution requirement for CC-BY. But Wikidata itself is CC0, so if you grant Wikidata permission to use your data, then you also grant downstream users those same CC0 terms. So someone who downloads your ontology via Wikidata would not be required to attribute in any way.

@cmungall
Copy link
Contributor Author

@cmungall cmungall commented Jul 13, 2016

On 13 Jul 2016, at 8:39, Andrew Su wrote:

re: @mcourtot's comment, it's true that Wikidata probably generally
satisfies the attribution requirement for CC-BY. But Wikidata itself
is CC0, so if you grant Wikidata permission to use your data, then you
also grant downstream users those same CC0 terms. So someone who
downloads your ontology via Wikidata would not be required to
attribute in any way.

Additionally, this points to an assumption in the OBO license that
assumes a PURL for every unit of attributable work. What if I want to
produce an ontology that is purely axioms on an existing ontology? This
could be logical axioms (e.g. providing equivalence axioms for DO, or
logical definitions for CHEBI roles) or annotation axioms (e.g. a
translation to another language)? If I want to ensure attribution I
would have to add PURLs for every axiom, which can be a high overhead.

@NuriaQueralt
Copy link

@NuriaQueralt NuriaQueralt commented Jul 14, 2016

Hi, this is Nuria a new post-doc in Andrew's lab. IMO, CC-BY license is a common license option for data providers groups in order to give visibility to their resources and to demonstrate their use by the community to funding agencies. Initiatives on the development of quality and resource use metrics in ELIXIR and NIH are ongoing to support decision-making in funding agencies. A win-to-win idea would be to suggest Wikidata to ELIXIR/NIH as one metrics component to compute resource use by the community. In this way, Wikidata could be used by funding agencies as a data endpoint to evaluate and identify relevant resources for the community, and could be used by data providers as a platform to make widely visible and available their resources to both the community and funding agencies. In this way, we will foster data providers to grant Wikidata permission for data sharing under CC0 license. I am not sure now if in Wikidata could be shown rankings such as number of downloads per year of ontologies, or number of citations per ontologies...

@elviram
Copy link
Contributor

@elviram elviram commented Jul 14, 2016

Hi Nuria, I have been in a quite a few Wikidata/Wikipedia meetings and the
one thing that is mentioned over and over is that they do not keep track of
users and data. It is all in the spirit of open and free data.
What is doable, is that we can look at how many times an item or property
is used. And there is always the possibility to point at how many times
Wikipedia and Wikidata are accessed.
I am all for your suggestion about ELIXIR/NIH.

What we could do with the licensing issues is draft a general agreement for
OBO ontologies and their free use in Wikidata. Anybody who would want their
ontology in Wikidata would have to sign/agree to it.

Cheers,
Elvira

On Thu, Jul 14, 2016 at 2:27 AM, Núria Queralt Rosinach <
notifications@github.com> wrote:

Hi, this is Nuria a new post-doc in Andrew's lab. IMO, CC-BY license is a
common license option for data providers groups in order to give visibility
to their resources and to demonstrate their use by the community to funding
agencies. Initiatives on the development of quality and resource use
metrics in ELIXIR and NIH are ongoing to support decision-making in funding
agencies. A win-to-win idea would be to suggest Wikidata to ELIXIR/NIH as
one metrics component to compute resource use by the community. In this
way, Wikidata could be used by funding agencies as a data endpoint to
evaluate and identify relevant resources for the community, and could be
used by data providers as a platform to make widely visible and available
their resources to both the community and funding agencies. In this way, we
will foster data providers to grant Wikidata permission for data sharing
under CC0 license. I am not sure now if in Wikidata could be shown rankings
such as number of downloads per year of ontologies, or number of citations
per ontologies...


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#285 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AIB6SBYhq_6zMQX4hLkxl9ErhS6Q_wqlks5qVdbvgaJpZM4JK62o
.

Elvira Mitraka, PhD
Postdoctoral Fellow
Institute of Genome Sciences
University of Maryland, School of Medicine
BioPark II, Office 664
emitraka@som.umaryland.edu
www.igs.maryland.edu

@goodb
Copy link

@goodb goodb commented Aug 1, 2016

I like the idea of standardizing this process. That being said, we have made significant progress working through the addition of one resource at a time and getting permission one at a time. So.. whilst negotiations for an OBO-wide pattern continue, if we want data (e.g. Henning's suggestion of Reactome) in Wikidata, lets go ahead and ask the owners directly.

@goodb
Copy link

@goodb goodb commented Aug 1, 2016

You know what would make this all go away? Making OBO Foundry require a CC0 license.

Lets try to answer the attribution problem with good software for tracking usage, not with lawyers writing text that is unenforceable for the bad guys and massively distracting (as demonstrated here) for the good guys.

To Elvira's point above. Actually Wikipedia/Wikidata does keep extensive logs on usage. I started a thread about gaining access to them for the purpose of building an attribution engine. Response was pretty positive, but I didn't have the bandwidth to follow it up.

@balhoff
Copy link
Contributor

@balhoff balhoff commented Aug 1, 2016

+1 for CC0. At least, I think it should be recommended more strongly (right now OBO recommends CC-BY).

cc @hlapp

@mcourtot
Copy link
Contributor

@mcourtot mcourtot commented Aug 1, 2016

Is there a specific reason Wikidata can't accommodate CC-BY? Other than "they don't want to".

@goodb
Copy link

@goodb goodb commented Aug 1, 2016

This was a (good) decision made a long time ago and at a higher level than
we are operating on here that is not likely to change. Adding technology
to accommodate multiple licensing patterns in the same knowledge graph is
not trivial and would be a distraction from the main objective.

On Mon, Aug 1, 2016 at 2:01 PM, Melanie Courtot notifications@github.com
wrote:

Is there a specific reason Wikidata can't accommodate CC-BY? Other than
"they don't want to".


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#285 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB_U6llYdIgjhbadn-Cahww0q9Nz_ZwEks5qbl7CgaJpZM4JK62o
.

@hlapp
Copy link
Contributor

@hlapp hlapp commented Aug 1, 2016

You know what would make this all go away? Making OBO Foundry require a CC0 license.

I fought quite hard to even allow CC0 in the respective OBO Foundry principle. The recommendation is still CC-BY (for, IMHO, poorly motivated reasons).

One argument I made during the discussions leading up to that was that in particular because of the Realism principle espoused by OBO Foundry, most of the content of an OBO Foundry ontology will be unlikely to even satisfy as creative expression. Others, most prominently @alanruttenberg, argued against that, citing previous case law (of which there isn't much, but there is precedent of some ontology in some field having been ruled eligible for copyright protection).

IMO, the stronger argument (which I have also made) is that CC-BY as a legal instrument is the wrong tool to bring to bear for declaring an attribution requirement and demanding compliance with it. Attribution can be given in many different ways that all satisfy the legal requirement of a CC-BY license, but very few of which will satisfy the mechanism of attribution we as scientists really want. So it's really a social norm that we request compliance with, not a legal one, and so a CC-BY license, by itself, adds very little if anything to stating what we expect in return for reuse.

Bottom line, I remain entirely in favor of requiring, or at the very least strongly recommending, that OBO Foundry ontologies be released under a CC0 waiver.

@andrewsu
Copy link
Contributor

@andrewsu andrewsu commented Aug 1, 2016

IMO, the stronger argument (which I have also made) is that CC-BY as a legal instrument is the wrong tool to bring to bear for declaring an attribution requirement and demanding compliance with it. Attribution can be given in many different ways that all satisfy the legal requirement of a CC-BY license, but very few of which will satisfy the mechanism of attribution we as scientists really want. So it's really a social norm that we request compliance with, not a legal one, and so a CC-BY license, by itself, adds very little if anything to stating what we expect in return for reuse.

Eloquently put, Hilmar. I could not agree more...

@dhimmel
Copy link

@dhimmel dhimmel commented Aug 2, 2016

👍 for CC0, 👎 for CC BY

I think the OBO Foundry should strongly recommend CC0 and nudge ontologies to switch from CC BY to CC0 when possible. I'll start with the legalese that reusers are subject to under a CC BY 4.0 license:

Section 3 – License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the following conditions.

a. _Attribution._

  1. If You Share the Licensed Material (including in modified form), You must:

    A. retain the following if it is supplied by the Licensor with the Licensed Material:

    1. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated);
    2. a copyright notice;
    3. a notice that refers to this Public License;
    4. a notice that refers to the disclaimer of warranties;
    5. a URI or hyperlink to the Licensed Material to the extent reasonably practicable;

    B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and

    C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License.

  2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information.

  3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable.

  4. If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License.

The effect of Section 3 is uncertainty and drudgery. Did the the Licensor identify the creators? If so, you must retain it. Did the the Licensor supply a copyright notice? If so, you must retain it. Don't fail to mention if you modified the resource. Even if you license your derivative work under a compatible license such as CC BY-NC, you must still mention the original license. After reading these conditions, I think it's likely that my use of CC BY ontologies in Hetionet — an integrative network of biology — may not comply with the entirety of these CC BY conditions, even though I went to great pains trying to comply with the incredible burden laws and licenses place on publicly-funded data.

The best applications of knowledge will be integrative. Integrating CC BY content can be tricky because you must deal with multiple potentially-contradictory license conditions as well as attribution stacking. The amount of weird tricky situations that arise when you do even a little integration is astounding. Some CC BY resources will have Sui Generis database rights. Others will not. Most lawyers don't have the expertise to provide guidance on these issues and lawyers generally avoid giving advice unless contracted to do so. Academics and others who just want to do science don't have sufficient access to legal experts. Even when you have access to a laywer, the process injects a long delay, at great expense to whoever is paying the tab. The overall effect is that whenever there are legally ambiguous situations, you waste users' time and dissuade reuse. CC0 was designed to avoid uncertainty. The license is lengthly, but since the whole point is to make the content in the public domain, you don't have to worry about any conditions of reuse.

Legally-enforced attribution is overrated. Best practice requires establishing data provenance. Any high quality resource will attribute when that attribution is productive. Sometimes it's not productive to attribute. Sometimes it's destructive. For example, I created PharmacotherapyDB — a CC0 catalog of drug–disease treatments. The drugs are coded using DrugBank and the diseases are coded using the Disease Ontology. I don't want my users to be burdened by licensing and I want my data to be maximally reused, so I used CC0. But am I violating the Disease Ontology's CC BY License? I've created a derivate work that includes 97 DO terms, and these terms potentially represent an original work of authorship. Answering this question requires wading through legal precedent, which is an extreme burden. Much of this precedent is yet to exist: the space is filled with open questions. Sometimes it's nice to just use an identifier and not have to attribute anything. Identifiers usually have their provenance embedded anyways. Based on these considerations, DrugBank — a dually licensed (aka commercial) resource — released the core of their resource as CC0.

The aforementioned practice of granting WikiData permission to release data under CC0 but then officially releasing the same data under CC BY is not ideal. This will create confusion as it's unclear whether WikiData actually had sufficient permission to apply CC0. Users of WikiData content could be liable for violating upstream data licensing and many users won't want to take that risk. The authoritative source of the data should apply the most permissive license that the data is released under anywhere to avoid these situations. You also don't want two classes of users: those who access from the authoritative site and get the restrictive license and those who use WikiData. Finally, there's the possibility of a resource diverging, similar to the recent Ethereum hard fork. This could happen if WikiData is granted permission to reproduce an ontology at one point, but subsequent contributions are made under the CC BY license.

Finally, licenses and laws change over time. Currently, CC BY 4.0 is compatible with a broad range of licenses. However, incompatibilities may arise in the future. Let's create knowledge and content that withstands the test of time. From the perspective of a creator, I want to maximize the reuse of my creations. Most of us are in the incredibly lucky position that the public funds us to create knowledge. Don't waste the opportunity to do something revolutionary over petty attribution concerns. Don't rely on the threat of suing your greatest advocates (those who use your data) for recognition.

@goodb
Copy link

@goodb goodb commented Aug 2, 2016

here here!

On Mon, Aug 1, 2016 at 8:38 PM, Daniel Himmelstein <notifications@github.com

wrote:

👍 for CC0, 👎 for CC BY

I think the OBOFoundry should strongly recommend CC0 and nudge ontologies
to switch from CC BY to CC0 when possible. I'll start with the legalese
that reusers are subject to under a CC BY 4.0
https://creativecommons.org/licenses/by/4.0/legalcode license:

Section 3 – License Conditions.

Your exercise of the Licensed Rights is expressly made subject to the
following conditions.

a. Attribution.

If You Share the Licensed Material (including in modified form), You
must:

A. retain the following if it is supplied by the Licensor with the
Licensed Material:

  1. identification of the creator(s) of the Licensed Material and any
    others designated to receive attribution, in any reasonable manner
    requested by the Licensor (including by pseudonym if designated);

    1. a copyright notice;
    2. a notice that refers to this Public License;
    3. a notice that refers to the disclaimer of warranties;
    4. a URI or hyperlink to the Licensed Material to the extent
      reasonably practicable;

    B. indicate if You modified the Licensed Material and retain an
    indication of any previous modifications; and

    C. indicate the Licensed Material is licensed under this Public
    License, and include the text of, or the URI or hyperlink to, this Public
    License.
    2.

    You may satisfy the conditions in Section 3(a)(1) in any reasonable
    manner based on the medium, means, and context in which You Share the
    Licensed Material. For example, it may be reasonable to satisfy the
    conditions by providing a URI or hyperlink to a resource that includes the
    required information.
    3.

    If requested by the Licensor, You must remove any of the information
    required by Section 3(a)(1)(A) to the extent reasonably practicable.
    4.

    If You Share Adapted Material You produce, the Adapter's License You
    apply must not prevent recipients of the Adapted Material from complying
    with this Public License.

The effect of Section 3 is uncertainty and drudgery. Did the the Licensor
identify the creators? If so, you must retain it. Did the the Licensor
supply a copyright notice? If so, you must retain it. Don't fail to mention
if you modified the resource. Even if you license your derivative work
under a compatible license such as CC BY-NC, you must still mention the
original license. After reading these conditions, I think it's likely that
my use of CC BY ontologies in Hetionet https://neo4j.het.io — an
integrative network of biology — may not comply with the entirety of these
CC BY conditions, even though I went to great pains
https://doi.org/10.15363/thinklab.d107 trying to comply with the
incredible burden laws and licenses place on publicly-funded data.

The best applications of knowledge will be integrative. Integrating CC BY
content can be tricky because you must deal with multiple
potentially-contradictory license conditions as well as attribution
stacking. The amount of weird tricky situations that arise when you do even
a little integration is astounding. Some CC BY resources will have Sui
Generis
database rights. Others will not. Most lawyers don't have the
expertise to provide guidance on these issues and lawyers generally avoid
giving advice unless contracted to do so. Academics and others who just
want to do science don't have sufficient access to legal experts. Even when
you have access to a laywer, the process injects a long delay, at great
expense to whoever is paying the tab. The overall effect is that whenever
there are legally ambiguous situations, you waste users' time and dissuade
reuse. CC0 was designed to avoid uncertainty. The license is lengthly, but
since the whole point is to make the content in the public domain, you
don't have to worry about any conditions of reuse.

Legally-enforced attribution is overrated. Best practice requires
establishing data provenance. Any high quality resource will attribute when
that attribution is productive. Sometimes it's not productive to attribute.
Sometimes it's destructive. For example, I created PharmacotherapyDB
https://doi.org/10.6084/m9.figshare.3103054 — a CC0 catalog of
drug–disease treatments. The drugs are coded using DrugBank and the
diseases are coded using the Disease Ontology. I don't want my users to be
burdened by licensing and I want my data to be maximally reused, so I used
CC0. But am I violating the Disease Ontology's CC BY License? I've created
a derivate work that includes 97 DO terms, and these terms potentially
represent an original work of authorship. Answering this question requires
wading through legal precedent, which is an extreme burden. Much of this
precedent is yet to exist: the space is filled with open questions.
Sometimes it's nice to just use an identifier and not have to attribute
anything. Identifiers usually have their provenance embedded anyways. Based
on these considerations, DrugBank — a dually licensed (aka commercial)
resource — released
https://thinklab.com/discussion/sounding-the-alarm-on-drugbanks-new-license-and-terms-of-use/213#10
the core of their resource as CC0.

The aforementioned practice <#m_6883376251829826149_issue-165202910> of
granting WikiData permission to release data under CC0 but then officially
releasing the same data under CC BY is not ideal. This will create
confusion as it's unclear whether WikiData actually had sufficient
permission to apply CC0. Users of WikiData content could be liable for
violating upstream data licensing and many users won't want to take that
risk. The authoritative source of the data should apply the most permissive
license that the data is released under anywhere to avoid these situations.
You also don't want two classes of users: those who access from the
authoritative site and get the restrictive license and those who use
WikiData. Finally, there's the possibility of a resource diverging, similar
to the recent Ethereum hard fork. This could happen if WikiData is granted
permission to reproduce an ontology at one point, but subsequent
contributions are made under the CC BY license.

Finally, licenses and laws change over time. Currently, CC BY 4.0 is
compatible with a broad range of licenses. However, incompatibilities may
arise in the future. Let's create knowledge and content that withstands the
test of time. From the perspective of a creator, I want to maximize the
reuse of my creations. Most of us are in the incredibly lucky position that
the public funds us to create knowledge. Don't waste the opportunity to do
something revolutionary over petty attribution concerns. Don't rely on the
threat of suing your greatest advocates (those who use your data) for
recognition.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#285 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB_U6jCt5e1B80xUN3PE_l_7wvyVebfXks5qbru0gaJpZM4JK62o
.

@cgreene
Copy link

@cgreene cgreene commented Aug 2, 2016

👍 to @dhimmel - especially "Academics and others who just want to do science don't have sufficient access to legal experts. Even when you have access to a laywer, the process injects a long delay, at great expense to whoever is paying the tab. The overall effect is that whenever there are legally ambiguous situations, you waste users' time and dissuade reuse."

@goodb
Copy link

@goodb goodb commented Aug 2, 2016

In case it or the references are useful, here is an open letter to NIGMS in support of broad adoption of CC0.

@malachig
Copy link

@malachig malachig commented Aug 3, 2016

Thanks @goodb for making that publicly available and linking it here. Very helpful to be able to refer to discussions like this thread, that letter, and this OpenData StackExchange thread. Collectively this has convinced us to switch to CC0 for the www.civicdb.org project.

@cmungall
Copy link
Contributor Author

@cmungall cmungall commented Aug 8, 2016

So what is the proposed solution to the attribution issue?

@andrewsu says "Most of us are in the incredibly lucky position that the public funds us to create knowledge". But the reality is a lot of the content in the OBO Library is not funded, and that which is funded is does not have secure funding. Future funding relies on the content creators justifying to funders that their ontology is widely adopted in different databases and platforms (commercial and academic). Is CC-BY a perfect tool for ensuring that companies don't take an ontology, sell it as part of their product suite and provide it to their customers with no attribution? Far from it. But many perceive this as the only tool they have. In fact the inclination is usually to go for a more restrictive license - look at the databases these ontologies are used with for examples, typically discriminatory restrictive licenses. Not everyone uses the same function to evaluate the tradeoff between perceived control and obstructive reuse. Some may prefer a sliver of protection at the cost of some obstruction to integration in some data warehouses.

How do we move forward?

  1. CC0 advocates need to provide more persuasive arguments than "you're making it harder for me".
  2. CC-BY advocates need to provide clearer arguments for why the license should not and does not restrict good actors. The OBO documentation on how OBO prevents attribution stacking is a good start but it's not clear how that works
  3. What are some short or long term compromises? For example, is there a template for providing a CC-0 axiom-subset of a CC-BY ontology?
  4. Arguments in either direction need to be based in the reality of the current funding situation. Many of us would love to put things in the public domain for the common good, but we need a concrete plan to ensure funding in the face of corporate products taking content and using it without attribution.
@cgreene
Copy link

@cgreene cgreene commented Aug 9, 2016

@cmungall : I don't really see how CC-BY helps one justify that the ontology is widely adopted. In practice, I expect that scientists who want to disseminate their research are going to cite the ontology regardless of its CC0/CC-BY status.

CC-BY is essentially using the threat of the legal system (which, let's be honest, is very unlikely to be enforced) to require this in some manner. Hypothetically if some commercial entity took a CC-BY resource and attempted to sell it as their own, would one imagine a university or individual using the legal system to require them to acknowledge the source? That seems like a lot of cost with relatively low reward.

I wonder if the best way to make a strong case for funding is to emphasize the impact that a resource has had. If CC-BY provides a sliver of protection but increases barriers to use in some contexts, then it may hurt ones' ability to fund a resource because the overall impact of the resource may be diminished.

@dhimmel
Copy link

@dhimmel dhimmel commented Aug 9, 2016

CC0 advocates need to provide more persuasive arguments than "you're making it harder for me".

@cmungall, this issue illustrates the argument for CC0 — if an ontology wants to be part of projects like WikiData, it needs to be CC0 compatible.

For example, is there a template for providing a CC-0 axiom-subset of a CC-BY ontology?

I'm having trouble understanding what "axiom" means. But I think at a minimum, nodes (terms) should be released as CC0. This would include term identifiers, names, synonyms, and descriptions. This would remove any barriers to creating public domain relationships that use OBO Foundry nodes as endpoints.

Arguments in either direction need to be based in the reality of the current funding situation. Many of us would love to put things in the public domain for the common good, but we need a concrete plan to ensure funding in the face of corporate products taking content and using it without attribution.

CC0 will bestow a competitive advantage with respect to funding. Funders want to see their commissioned research making the greatest contribution. If given a choice between funding a CC0 and CC BY resource, I expect the funders would prefer CC0 because of the greater reuse potential. CC BY also creates the potential that the work must be repreated (say for inclusion in WikiData), which is a horrific concept to a funder.

Maximizing reuse will create the strongest argument for continued funding. Say a company does use an ontology without attribution. Grant proposals can still mention this reuse and that the ontology is creating value in industry, which will demonstrate the broad relevance and user base for the resource. At a time when the science community is beginning appreciate the importance of open data, OBO Foundry ontologies can bolster their appeal to funders by leading the way.

@goodb
Copy link

@goodb goodb commented Aug 12, 2016

"is there a template for providing a CC-0 axiom-subset of a CC-BY ontology". To clarify this. Many OBO ontologies now make extensive use of OWL description logic to build computable definitions of their classes. This makes it possible to, for example, infer a subclassOf or instanceOf relationship automatically based on the properties of the entity or class in question. When using terms from an ontology in many applications (any that do not use OWL) these class membership axioms may not be integrated. Hence, we can imagine that a subset of the ontology minus these more sophisticated logical constructs might be shared differently than the entire thing. Since these logical definitions contain a significant fraction of the intellectual property of the ontologies that use them, perhaps it would be more satisfactory to their authors to share the other portions of the ontologies (term names, identifiers, basic concept graphs) more completely openly. This seems to be what @dhimmel is suggesting as in fact what we have already started to do with the Gene Ontology import into wikidata..

@drseb
Copy link
Member

@drseb drseb commented Aug 12, 2016

I would like to give some view from my personal site - as one of the developers of the Human Phenotype Ontology (HPO). (I do not speak for all HPO developers). Also, I am no expert on licenses. I just came across this thread upon a discussion about derivatives of HPO.

HPO's intention is to be a tool for the community and a tool created by the community. We try to keep the quality of HPO high by accepting change-requests, but still letting an HPO developer decide if this is valid request and eventually implement those changes.

HPO is now used in several contexts, in research, but also by several genetic diagnostics companies around the world that provide phenotype-driven diagnostics. For a given set of symptoms of a patient, HPO is also used to find similar patients or physicians that might be the best experts.

I vote for a more restrictive license for HPO:

  • that ensures acknowledgement, such that we know who uses HPO (funding et al.), but also that the user of HPO based tools sees that HPO used (and which version of HPO)
  • that no changes are made to HPO that are not checked by experts
  • that no derivatives are published, but if derivatives are created that those are clearly marked as not being the original HPO (I fear legal consequences)

To motivate this I want to give an excerpt of examples, that I encountered during the last years:

  • Person A of company X: started to add new ontology classes in his version of HPO using his own ID-space. We do not know which super-classes were defined for these classes.
  • Person B: in a databases of that person, the ontology was transformed from a DAG into a tree (probably for simplicity).
  • Companies XYX: do only show that HPO is used in some sub-site (in a tiny footnote).

These changes (A+B) are IMHO pretty strong, as it possibly affects the result of (semantic) similarity calculation performed over HPO.
My fear is that such changes might lead to missed results or even to a slow-down in the diagnostic process. In the worst-case scenario patients have to wait longer for their diagnosis and (sorry for my pessimism) patients might die during that time.

I fear that this might fall back on HPO in terms of public opinion on the quality of HPO or even in terms of being sued and having to prove that it was the companies fault and not HPO's.

I have no idea which ready-made license is most appropriate for this, I just wanted to give a little insight on my thoughts/background.

cc @pnrobinson @mellybelly

@pnrobinson
Copy link

@pnrobinson pnrobinson commented Aug 12, 2016

Hi everybody. I agree with Sebastian that because the HPO is being used in an ever broader range of medical contexts, extra care and responsibility is needed on our part. I think that we should basically discourage others from changing the HPO for their own needs because (i) if the change is good, we want all potential patients to benefit from it; and (ii) if the change is bad, we do not want the patients who are being served by the company in question to suffer negative consequences and we also do not want to be held legally responsible for a mistake that somebody else has made.

How does the rest of the OBO community feel about this? Is any kind of ND license acceptable in this forum owing to the status of the HPO as a resource that is being used directly in clinical care?

-peter

Peter Robinson

Professor of Computational Biology

The Jackson Laboratory for Genomic Medicine

10 Discovery Drive

Farmington, CT 06032

860.837.2095 t | 860.990.3130 m

peter.robinson@jax.orgmailto:peter.robinson@jax.org

www.jax.org

The Jackson Laboratory: Leading the search for tomorrow's cures


From: Sebastian Köhler notifications@github.com
Sent: Friday, August 12, 2016 8:09 AM
To: OBOFoundry/OBOFoundry.github.io
Cc: Peter Robinson; Mention
Subject: Re: [OBOFoundry/OBOFoundry.github.io] Create guidelines for OBO maintainers who want to be included in Wikidata (#285)

I would like to give some view from my personal site - as one of the developers of the Human Phenotype Ontology (HPO). (I do not speak for all HPO developers). Also, I am no expert on licenses. I just came across this thread upon a discussion about derivatives of HPO.

HPO's intention is to be a tool for the community and a tool created by the community. We try to keep the quality of HPO high by accepting change-requests, but still letting an HPO developer decide if this is valid request and eventually implement those changes.

HPO is now used in several contexts, in research, but also by several genetic diagnostics companies around the world that provide phenotype-driven diagnostics. For a given set of symptoms of a patient, HPO is also used to find similar patients or physicians that might be the best experts.

I vote for a more restrictive license for HPO:

  • that ensures acknowledgement, such that we know who uses HPO (funding et al.), but also that the user of HPO based tools sees that HPO used (and which version of HPO)
  • that no changes are made to HPO that are not checked by experts
  • that no derivatives are published, but if derivatives are created that those are clearly marked as not being the original HPO (I fear legal consequences)

To motivate this I want to give an excerpt of examples, that I encountered during the last years:

  • Person A of company X: started to add new ontology classes in his version of HPO using his own ID-space. We do not know which super-classes were defined for these classes.
  • Person B: in a databases of that person, the ontology was transformed from a DAG into a tree (probably for simplicity).
  • Companies XYX: do only show that HPO is used in some sub-site (in a tiny footnote).

These changes (A+B) are IMHO pretty strong, as it possibly affects the result of (semantic) similarity calculation performed over HPO.
My fear is that such changes might lead to missed results or even to a slow-down in the diagnostic process. In the worst-case scenario patients have to wait longer for their diagnosis and (sorry for my pessimism) patients might die during that time.

I fear that this might fall back on HPO in terms of public opinion on the quality of HPO or even in terms of being sued and having to prove that it was the companies fault and not HPO's.

I have no idea which ready-made license is most appropriate for this, I just wanted to give a little insight on my thoughts/background.

cc @pnrobinsonhttps://github.com/pnrobinson @mellybellyhttps://github.com/mellybelly


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHubhttps://github.com//issues/285#issuecomment-239430014, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEtuPPJOUS4YzummrDBouDY0r3rt2tgQks5qfGKKgaJpZM4JK62o.

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

@mcourtot
Copy link
Contributor

@mcourtot mcourtot commented Aug 12, 2016

I think that by arguing CC0 vs CC-by we are losing track of what we are trying to achieve. Here we have a set of resources with diverse licenses - CC0, CC-BY + few others - that would like to know how (if?) it is possible for their data to exist within Wikidata. Note that in addition to OBO resources there are many other (e.g. UniProt) which are not CC0, so I don't think this issue is isolated to the OBO community.

I like the solutions Chris offers:

  • Can Wikidata suggest a way to accommodate non CC0 resources?
  • Can OBO resources produce a CC0 subset?

Looking at the UniProt page at https://www.wikidata.org/wiki/Q905695, it states:
screen shot 2016-08-12 at 14 13 15

Could we have something similar for each OBO resource?

Once we have some sort of resolution for this, we can work on the others issues that need to be addressed for including in Wikidata:

  • proper attribution, #299
  • reuse of URIs, #298
@cmungall
Copy link
Contributor Author

@cmungall cmungall commented Aug 12, 2016

I agree the thread has diverged from the original one about how to get OBOs into WD. But this is an important development. @pnrobinson and @drseb make good arguments from a license that is more restrictive than the two recommended by OBO. With my OBO hat I want to see HP adopt BY but with my HPO hat I see the arguments.

What would the implications of HPO adopting ND? As it is generally not imported and used for axiomatization the effect on the rest of OBO might be relatively low (of course implications for WD and @dhimmel's graph store are another matter).

However, if an ontology that is used for axiomatization were to adopt ND that could have very bad implications: making an import module may be in breach of the ND clause.

From a practical POV, are we looking at a two level split within OBO: 'axiomatic' ontologies and 'application' ontologies, with weaker licensing imposed on the former?

@brightbyte
Copy link

@brightbyte brightbyte commented Sep 16, 2016

If Wikidata is importing several aspects of a term, say its label, definition, and synonyms, from an ontology, I would love to see that visually marked in a distinct (bolded or layered) way as existing word-for-word from the existing "reference" ontology

Ironically, label, description, and aliases are three of the few things that Wikidata does not record source or provenance for (because these are considered editorial content, not sourced "statements"). I suppose the description has the biggest claim on copyright, and should probably not be imported from a source with an incompatible license. A description from a CC-BY source could be imported as a statement, if properly sourced. A (copyrightable) description from a CC-BY-SA source cannot be imported into Wikidata without special permission by the copyright holder. Which description may or may not be copyrightable depends on the jurisdiction, I suppose. My personal rule of thumb is that a description < 100 characters is probably not copyrightable (for lack of originality), but I wouldn't bet much on this holding up under all circumstances.

@lschriml
Copy link
Contributor

@lschriml lschriml commented Sep 22, 2016

The Human Disease Ontology (DO), with CCBY 3.0 licensing decided to provide to the Wikidata (see: https://www.wikidata.org/wiki/User:ProteinBoxBot/Legal) under Wikidata's CC0 licensing.

The Human Disease Ontology (DO) is licensed under CC-BY 3.0. The intent of DO's licensing choice is to promote open sharing and adaption of the DO, as an ontology of human diseases, with attribution to the DO project. As a project and resource to the community, we decided to import DO's terms, term related data and class hierarchy into Wikidata. The Disease Ontology object in Wikidata (Q5282129) provides attribution to the DO project for the related DO information loaded into Wikidata. As Principal Investigator of the DO project, I freely provide the content of DO for use and distribution to the Wikidata project without restrictions of attribution for the use of each term, and it's relate information, in the ontology. The Disease Ontology was created to be a community resource. The Wikidata and related projects, enable the content of DO to be used without restriction, thus serving the greater good of the community.

@cmungall
Copy link
Contributor Author

@cmungall cmungall commented Sep 28, 2016

@lschriml - I'm not quite clear on how this differs from DO having a CC-0 license?

Also, (and this is a general question, not meaning to pick on you), is being the PI of the DO project sufficient for being able to override the CC-BY rights in this way? If the DO is a community project, then is it not the case that all content providers throughout the history of the DO have copyright and must also be consulted and agree to the transfer? (this seems like one argument for starting out with CC-0, to avoid these issues).

@dhimmel
Copy link

@dhimmel dhimmel commented Nov 17, 2016

I just came across two recent & amazing blog posts by Katie Fortney writing for the Office of Scholarly Communications at the University of California. These are the best introductions to academic data licensing that I'm aware of:

andrewsu added a commit to andrewsu/OBOFoundry.github.io that referenced this issue Jul 6, 2017
Given that the choice between CC0 and CC-BY is a nuanced one with many pros and cons on both sides of the issue, I offer three suggestions for this document:

1. linking to OBOFoundry#285 where many issues are explicitly discussed
2. removing the explicit recommendation of CC-BY
3. adding a request for attribution in all cases regardless of license (following [this pattern](http://www.dancohen.org/2013/11/26/cc0-by/))

I of course understand that this policy is ultimately under the purview of the Editorial WG, but I've formulated this as a pull request just to propose something specific.
@nlharris
Copy link
Contributor

@nlharris nlharris commented Apr 13, 2020

What is the status of this?

@ddooley
Copy link

@ddooley ddooley commented Apr 14, 2020

If someone has already explained this, please point me in the right direction: I'd like to CC0 FoodOn, but it also imports CC-BY ontology terms. Am I out of luck until all source ontologies are CC0 ? Or can I simply state that CC0 pertains to FOODON_ prefixed terms?

@cmungall
Copy link
Contributor Author

@cmungall cmungall commented Apr 14, 2020

@hlapp
Copy link
Contributor

@hlapp hlapp commented Apr 14, 2020

"Or can I simply state that CC0 pertains to FOODON_ prefixed terms?"

I can't see how this would be a problem. You are not redistributing an upstream ontology under a conflicting license. You are only (re)using some terms from upstream ontologies in your axioms (and/or annotation axioms).

That said, IANAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.