-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create guidelines for OBO maintainers who want to be included in Wikidata #285
Comments
CC-by but attribution chosen is by PURL - as long as wikidata uses the PURL (perhaps replacing the URL formatter that currently links directly to AmiGO) I think it should be ok. |
I agree that this needs to be written down somewhere. It would make everything much clearer and we could avoid getting into lengthy discussion later down the line, or even have to remove data, if something were to happen. |
re: @mcourtot's comment, it's true that Wikidata probably generally satisfies the attribution requirement for CC-BY. But Wikidata itself is CC0, so if you grant Wikidata permission to use your data, then you also grant downstream users those same CC0 terms. So someone who downloads your ontology via Wikidata would not be required to attribute in any way. |
On 13 Jul 2016, at 8:39, Andrew Su wrote:
Additionally, this points to an assumption in the OBO license that |
Hi, this is Nuria a new post-doc in Andrew's lab. IMO, CC-BY license is a common license option for data providers groups in order to give visibility to their resources and to demonstrate their use by the community to funding agencies. Initiatives on the development of quality and resource use metrics in ELIXIR and NIH are ongoing to support decision-making in funding agencies. A win-to-win idea would be to suggest Wikidata to ELIXIR/NIH as one metrics component to compute resource use by the community. In this way, Wikidata could be used by funding agencies as a data endpoint to evaluate and identify relevant resources for the community, and could be used by data providers as a platform to make widely visible and available their resources to both the community and funding agencies. In this way, we will foster data providers to grant Wikidata permission for data sharing under CC0 license. I am not sure now if in Wikidata could be shown rankings such as number of downloads per year of ontologies, or number of citations per ontologies... |
Hi Nuria, I have been in a quite a few Wikidata/Wikipedia meetings and the What we could do with the licensing issues is draft a general agreement for Cheers, On Thu, Jul 14, 2016 at 2:27 AM, Núria Queralt Rosinach <
Elvira Mitraka, PhD |
I like the idea of standardizing this process. That being said, we have made significant progress working through the addition of one resource at a time and getting permission one at a time. So.. whilst negotiations for an OBO-wide pattern continue, if we want data (e.g. Henning's suggestion of Reactome) in Wikidata, lets go ahead and ask the owners directly. |
You know what would make this all go away? Making OBO Foundry require a CC0 license. Lets try to answer the attribution problem with good software for tracking usage, not with lawyers writing text that is unenforceable for the bad guys and massively distracting (as demonstrated here) for the good guys. To Elvira's point above. Actually Wikipedia/Wikidata does keep extensive logs on usage. I started a thread about gaining access to them for the purpose of building an attribution engine. Response was pretty positive, but I didn't have the bandwidth to follow it up. |
+1 for CC0. At least, I think it should be recommended more strongly (right now OBO recommends CC-BY). cc @hlapp |
Is there a specific reason Wikidata can't accommodate CC-BY? Other than "they don't want to". |
This was a (good) decision made a long time ago and at a higher level than On Mon, Aug 1, 2016 at 2:01 PM, Melanie Courtot notifications@github.com
|
I fought quite hard to even allow CC0 in the respective OBO Foundry principle. The recommendation is still CC-BY (for, IMHO, poorly motivated reasons). One argument I made during the discussions leading up to that was that in particular because of the Realism principle espoused by OBO Foundry, most of the content of an OBO Foundry ontology will be unlikely to even satisfy as creative expression. Others, most prominently @alanruttenberg, argued against that, citing previous case law (of which there isn't much, but there is precedent of some ontology in some field having been ruled eligible for copyright protection). IMO, the stronger argument (which I have also made) is that CC-BY as a legal instrument is the wrong tool to bring to bear for declaring an attribution requirement and demanding compliance with it. Attribution can be given in many different ways that all satisfy the legal requirement of a CC-BY license, but very few of which will satisfy the mechanism of attribution we as scientists really want. So it's really a social norm that we request compliance with, not a legal one, and so a CC-BY license, by itself, adds very little if anything to stating what we expect in return for reuse. Bottom line, I remain entirely in favor of requiring, or at the very least strongly recommending, that OBO Foundry ontologies be released under a CC0 waiver. |
Eloquently put, Hilmar. I could not agree more... |
👍 for CC0, 👎 for CC BYI think the OBO Foundry should strongly recommend CC0 and nudge ontologies to switch from CC BY to CC0 when possible. I'll start with the legalese that reusers are subject to under a CC BY 4.0 license:
The effect of Section 3 is uncertainty and drudgery. Did the the Licensor identify the creators? If so, you must retain it. Did the the Licensor supply a copyright notice? If so, you must retain it. Don't fail to mention if you modified the resource. Even if you license your derivative work under a compatible license such as CC BY-NC, you must still mention the original license. After reading these conditions, I think it's likely that my use of CC BY ontologies in Hetionet — an integrative network of biology — may not comply with the entirety of these CC BY conditions, even though I went to great pains trying to comply with the incredible burden laws and licenses place on publicly-funded data. The best applications of knowledge will be integrative. Integrating CC BY content can be tricky because you must deal with multiple potentially-contradictory license conditions as well as attribution stacking. The amount of weird tricky situations that arise when you do even a little integration is astounding. Some CC BY resources will have Sui Generis database rights. Others will not. Most lawyers don't have the expertise to provide guidance on these issues and lawyers generally avoid giving advice unless contracted to do so. Academics and others who just want to do science don't have sufficient access to legal experts. Even when you have access to a laywer, the process injects a long delay, at great expense to whoever is paying the tab. The overall effect is that whenever there are legally ambiguous situations, you waste users' time and dissuade reuse. CC0 was designed to avoid uncertainty. The license is lengthly, but since the whole point is to make the content in the public domain, you don't have to worry about any conditions of reuse. Legally-enforced attribution is overrated. Best practice requires establishing data provenance. Any high quality resource will attribute when that attribution is productive. Sometimes it's not productive to attribute. Sometimes it's destructive. For example, I created PharmacotherapyDB — a CC0 catalog of drug–disease treatments. The drugs are coded using DrugBank and the diseases are coded using the Disease Ontology. I don't want my users to be burdened by licensing and I want my data to be maximally reused, so I used CC0. But am I violating the Disease Ontology's CC BY License? I've created a derivate work that includes 97 DO terms, and these terms potentially represent an original work of authorship. Answering this question requires wading through legal precedent, which is an extreme burden. Much of this precedent is yet to exist: the space is filled with open questions. Sometimes it's nice to just use an identifier and not have to attribute anything. Identifiers usually have their provenance embedded anyways. Based on these considerations, DrugBank — a dually licensed (aka commercial) resource — released the core of their resource as CC0. The aforementioned practice of granting WikiData permission to release data under CC0 but then officially releasing the same data under CC BY is not ideal. This will create confusion as it's unclear whether WikiData actually had sufficient permission to apply CC0. Users of WikiData content could be liable for violating upstream data licensing and many users won't want to take that risk. The authoritative source of the data should apply the most permissive license that the data is released under anywhere to avoid these situations. You also don't want two classes of users: those who access from the authoritative site and get the restrictive license and those who use WikiData. Finally, there's the possibility of a resource diverging, similar to the recent Ethereum hard fork. This could happen if WikiData is granted permission to reproduce an ontology at one point, but subsequent contributions are made under the CC BY license. Finally, licenses and laws change over time. Currently, CC BY 4.0 is compatible with a broad range of licenses. However, incompatibilities may arise in the future. Let's create knowledge and content that withstands the test of time. From the perspective of a creator, I want to maximize the reuse of my creations. Most of us are in the incredibly lucky position that the public funds us to create knowledge. Don't waste the opportunity to do something revolutionary over petty attribution concerns. Don't rely on the threat of suing your greatest advocates (those who use your data) for recognition. |
here here! On Mon, Aug 1, 2016 at 8:38 PM, Daniel Himmelstein <notifications@github.com
|
👍 to @dhimmel - especially "Academics and others who just want to do science don't have sufficient access to legal experts. Even when you have access to a laywer, the process injects a long delay, at great expense to whoever is paying the tab. The overall effect is that whenever there are legally ambiguous situations, you waste users' time and dissuade reuse." |
In case it or the references are useful, here is an open letter to NIGMS in support of broad adoption of CC0. |
Thanks @goodb for making that publicly available and linking it here. Very helpful to be able to refer to discussions like this thread, that letter, and this OpenData StackExchange thread. Collectively this has convinced us to switch to CC0 for the www.civicdb.org project. |
So what is the proposed solution to the attribution issue? @andrewsu says "Most of us are in the incredibly lucky position that the public funds us to create knowledge". But the reality is a lot of the content in the OBO Library is not funded, and that which is funded is does not have secure funding. Future funding relies on the content creators justifying to funders that their ontology is widely adopted in different databases and platforms (commercial and academic). Is CC-BY a perfect tool for ensuring that companies don't take an ontology, sell it as part of their product suite and provide it to their customers with no attribution? Far from it. But many perceive this as the only tool they have. In fact the inclination is usually to go for a more restrictive license - look at the databases these ontologies are used with for examples, typically discriminatory restrictive licenses. Not everyone uses the same function to evaluate the tradeoff between perceived control and obstructive reuse. Some may prefer a sliver of protection at the cost of some obstruction to integration in some data warehouses. How do we move forward?
|
@cmungall : I don't really see how CC-BY helps one justify that the ontology is widely adopted. In practice, I expect that scientists who want to disseminate their research are going to cite the ontology regardless of its CC0/CC-BY status. CC-BY is essentially using the threat of the legal system (which, let's be honest, is very unlikely to be enforced) to require this in some manner. Hypothetically if some commercial entity took a CC-BY resource and attempted to sell it as their own, would one imagine a university or individual using the legal system to require them to acknowledge the source? That seems like a lot of cost with relatively low reward. I wonder if the best way to make a strong case for funding is to emphasize the impact that a resource has had. If CC-BY provides a sliver of protection but increases barriers to use in some contexts, then it may hurt ones' ability to fund a resource because the overall impact of the resource may be diminished. |
@cmungall, this issue illustrates the argument for CC0 — if an ontology wants to be part of projects like WikiData, it needs to be CC0 compatible.
I'm having trouble understanding what "axiom" means. But I think at a minimum, nodes (terms) should be released as CC0. This would include term identifiers, names, synonyms, and descriptions. This would remove any barriers to creating public domain relationships that use OBO Foundry nodes as endpoints.
CC0 will bestow a competitive advantage with respect to funding. Funders want to see their commissioned research making the greatest contribution. If given a choice between funding a CC0 and CC BY resource, I expect the funders would prefer CC0 because of the greater reuse potential. CC BY also creates the potential that the work must be repreated (say for inclusion in WikiData), which is a horrific concept to a funder. Maximizing reuse will create the strongest argument for continued funding. Say a company does use an ontology without attribution. Grant proposals can still mention this reuse and that the ontology is creating value in industry, which will demonstrate the broad relevance and user base for the resource. At a time when the science community is beginning appreciate the importance of open data, OBO Foundry ontologies can bolster their appeal to funders by leading the way. |
"is there a template for providing a CC-0 axiom-subset of a CC-BY ontology". To clarify this. Many OBO ontologies now make extensive use of OWL description logic to build computable definitions of their classes. This makes it possible to, for example, infer a subclassOf or instanceOf relationship automatically based on the properties of the entity or class in question. When using terms from an ontology in many applications (any that do not use OWL) these class membership axioms may not be integrated. Hence, we can imagine that a subset of the ontology minus these more sophisticated logical constructs might be shared differently than the entire thing. Since these logical definitions contain a significant fraction of the intellectual property of the ontologies that use them, perhaps it would be more satisfactory to their authors to share the other portions of the ontologies (term names, identifiers, basic concept graphs) more completely openly. This seems to be what @dhimmel is suggesting as in fact what we have already started to do with the Gene Ontology import into wikidata.. |
I would like to give some view from my personal site - as one of the developers of the Human Phenotype Ontology (HPO). (I do not speak for all HPO developers). Also, I am no expert on licenses. I just came across this thread upon a discussion about derivatives of HPO. HPO's intention is to be a tool for the community and a tool created by the community. We try to keep the quality of HPO high by accepting change-requests, but still letting an HPO developer decide if this is valid request and eventually implement those changes. HPO is now used in several contexts, in research, but also by several genetic diagnostics companies around the world that provide phenotype-driven diagnostics. For a given set of symptoms of a patient, HPO is also used to find similar patients or physicians that might be the best experts. I vote for a more restrictive license for HPO:
To motivate this I want to give an excerpt of examples, that I encountered during the last years:
These changes (A+B) are IMHO pretty strong, as it possibly affects the result of (semantic) similarity calculation performed over HPO. I fear that this might fall back on HPO in terms of public opinion on the quality of HPO or even in terms of being sued and having to prove that it was the companies fault and not HPO's. I have no idea which ready-made license is most appropriate for this, I just wanted to give a little insight on my thoughts/background. |
Hi everybody. I agree with Sebastian that because the HPO is being used in an ever broader range of medical contexts, extra care and responsibility is needed on our part. I think that we should basically discourage others from changing the HPO for their own needs because (i) if the change is good, we want all potential patients to benefit from it; and (ii) if the change is bad, we do not want the patients who are being served by the company in question to suffer negative consequences and we also do not want to be held legally responsible for a mistake that somebody else has made. How does the rest of the OBO community feel about this? Is any kind of ND license acceptable in this forum owing to the status of the HPO as a resource that is being used directly in clinical care? -peter Peter Robinson Professor of Computational Biology The Jackson Laboratory for Genomic Medicine 10 Discovery Drive Farmington, CT 06032 860.837.2095 t | 860.990.3130 m peter.robinson@jax.orgmailto:peter.robinson@jax.org The Jackson Laboratory: Leading the search for tomorrow's cures From: Sebastian Köhler notifications@github.com I would like to give some view from my personal site - as one of the developers of the Human Phenotype Ontology (HPO). (I do not speak for all HPO developers). Also, I am no expert on licenses. I just came across this thread upon a discussion about derivatives of HPO. HPO's intention is to be a tool for the community and a tool created by the community. We try to keep the quality of HPO high by accepting change-requests, but still letting an HPO developer decide if this is valid request and eventually implement those changes. HPO is now used in several contexts, in research, but also by several genetic diagnostics companies around the world that provide phenotype-driven diagnostics. For a given set of symptoms of a patient, HPO is also used to find similar patients or physicians that might be the best experts. I vote for a more restrictive license for HPO:
To motivate this I want to give an excerpt of examples, that I encountered during the last years:
These changes (A+B) are IMHO pretty strong, as it possibly affects the result of (semantic) similarity calculation performed over HPO. I fear that this might fall back on HPO in terms of public opinion on the quality of HPO or even in terms of being sued and having to prove that it was the companies fault and not HPO's. I have no idea which ready-made license is most appropriate for this, I just wanted to give a little insight on my thoughts/background. cc @pnrobinsonhttps://github.com/pnrobinson @mellybellyhttps://github.com/mellybelly — Reply to this email directly, view it on GitHubhttps://github.com//issues/285#issuecomment-239430014, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AEtuPPJOUS4YzummrDBouDY0r3rt2tgQks5qfGKKgaJpZM4JK62o.The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible. |
I think that by arguing CC0 vs CC-by we are losing track of what we are trying to achieve. Here we have a set of resources with diverse licenses - CC0, CC-BY + few others - that would like to know how (if?) it is possible for their data to exist within Wikidata. Note that in addition to OBO resources there are many other (e.g. UniProt) which are not CC0, so I don't think this issue is isolated to the OBO community. I like the solutions Chris offers:
Looking at the UniProt page at https://www.wikidata.org/wiki/Q905695, it states: Could we have something similar for each OBO resource? Once we have some sort of resolution for this, we can work on the others issues that need to be addressed for including in Wikidata:
|
I agree the thread has diverged from the original one about how to get OBOs into WD. But this is an important development. @pnrobinson and @drseb make good arguments from a license that is more restrictive than the two recommended by OBO. With my OBO hat I want to see HP adopt BY but with my HPO hat I see the arguments. What would the implications of HPO adopting ND? As it is generally not imported and used for axiomatization the effect on the rest of OBO might be relatively low (of course implications for WD and @dhimmel's graph store are another matter). However, if an ontology that is used for axiomatization were to adopt ND that could have very bad implications: making an import module may be in breach of the ND clause. From a practical POV, are we looking at a two level split within OBO: 'axiomatic' ontologies and 'application' ontologies, with weaker licensing imposed on the former? |
I just came across two recent & amazing blog posts by Katie Fortney writing for the Office of Scholarly Communications at the University of California. These are the best introductions to academic data licensing that I'm aware of:
|
Given that the choice between CC0 and CC-BY is a nuanced one with many pros and cons on both sides of the issue, I offer three suggestions for this document: 1. linking to OBOFoundry#285 where many issues are explicitly discussed 2. removing the explicit recommendation of CC-BY 3. adding a request for attribution in all cases regardless of license (following [this pattern](http://www.dancohen.org/2013/11/26/cc0-by/)) I of course understand that this policy is ultimately under the purview of the Editorial WG, but I've formulated this as a pull request just to propose something specific.
What is the status of this? |
If someone has already explained this, please point me in the right direction: I'd like to CC0 FoodOn, but it also imports CC-BY ontology terms. Am I out of luck until all source ontologies are CC0 ? Or can I simply state that CC0 pertains to FOODON_ prefixed terms? |
"Or can I simply state that CC0 pertains to FOODON_ prefixed terms?"
uncharted territory. What are the compositional semantics of owl:import and
dc:license? What about when robot merge is done, as is standard with import
chains?
You could certainly put a CC0 on foodon-base.owl but once you add the
import statements it seems to get murkier
I would start by poking upstream, asking if they are willing to make a CC-0
axiom subset that satisfies your imports requirements. E.g for some
ontologies I work on we have shied away from CC-0 as some text definitions
came from other sources, but the logical axioms and names are ours so we
release that CC-0.
…On Tue, Apr 14, 2020 at 12:34 PM Damion Dooley ***@***.***> wrote:
If someone has already explained this, please point me in the right
direction: I'd like to CC0 FoodOn, but it also imports CC-BY ontology
terms. Am I out of luck until all source ontologies are CC0 ? Or can I
simply state that CC0 pertains to FOODON_ prefixed terms?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#285 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOP5Z27JOYZFJVG7ELDRMS3DJANCNFSM4CJLVWUA>
.
|
I can't see how this would be a problem. You are not redistributing an upstream ontology under a conflicting license. You are only (re)using some terms from upstream ontologies in your axioms (and/or annotation axioms). That said, IANAL. |
Are there still actionable things in this ticket? |
Most OBOs are CC-BY, Wikidata requires CC-0. Some ontologies have apparently granted Wikidata permission to redistribute part or all of their ontology.
We want to make sure this is streamlined with a common process for everyone. Not clear to me how this should be done, ideas welcome, add below.
The text was updated successfully, but these errors were encountered: