New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the CC BY-"NC"-SA LICENSE #14
Comments
This is a legacy dataset and any change of the license would have to be negotiated with Google. My personal opinion is that a trained model is just a large collection of numbers, which does not contain or provide access to the underlying texts. So it is not a copyrightable work and it should be possible to distribute it independently of the license of the corpus it was trained on (provided the license of the corpus did not restrict your usage of the corpus in such a way that would prevent you from obtaining the model). DISCLAIMER: I am not a lawyer and you should not take my opinion as legal advice. (But at least one lawyer, to whom I talked some years ago, agreed with this view of trained models when I explained them to him.) |
Thank you for your reply. I think there are two copyright holder, one is Google that original content holder and the other is Hiroshi Kanayama who reprocessed it. Who is the dominant copyright holder (according to your reply, it's Google)? |
This is a good remark. Whoever added their wisdom to the data should have a say should the data be released under a different license. But Google comes earlier on the time scale, and the "-SA" ("share alike") part means that whoever redistributes the data must use the same license. If Hiroshi wanted to provide the data without the "-NC" restriction, he would have to obtain Google's consent anyway. If Google gives the consent, then I suppose it will pertain to the original Google version of the data and you would have to ask Hiroshi's team whether they are willing to extend it to the version they processed. |
The NC restriction appears to be non-negotiable as far as Google is concerned. Many people have tried to buy a commercial license for one or more of the GSD treebanks, but so far to no avail. I fear that it applies here too. |
Currently I do use a part of original Google's annotation thus we need to follow the Google's license, even though most of sentences were from Japanese Wikipedia. I hope someone negotiate Google to rethink GSD license for all languages, not just for Japanese. |
As Universal Dependencies project, is there any plan to negotiate license problem? It'll be hard for each language team leader (like Hiroshi Kanayama) to negotiate the license. |
I don't think UD has capacity for that. And it is very unlikely that all UD treebanks can be relicensed under CC BY-SA. Some time ago we tried to get rid at least of the GNU GPL licenses, which are not suitable for data, and we failed—in some cases we never got a response from a relevant party. |
Fortunately, the license of UD_Japanese-PUD is CC-BY-SA hence we can use the PUD-based pretrained models for commercial purposes even if those models are created by outsiders. I'm going to submit a PR to push the PUD-based Japanese language model, tomorrow. |
I've just created a PR for Japanese Model using UD_Japanese-PUD with my NE annotations. |
Hi, UD Japanese team takes over the maintenance of UD_Japanese-GSD data from the preceding maintainers. The scope of claim is as follows:
|
Hi, We have negotiated the "NC" removal from the UD_Japanase-GSD license. We will assign CC BY-SA for the newer version of UD_Japanese-GSD after the "NC" removal of the original repository. |
This has been solved in v2.5. |
First, thank you for the great work for Japanese dataset.
We are considering to use UD_Japanese-GSD for training the built-in model of the open-source library.
explosion/spaCy#3756
The library (spaCy) is MIT license library. For that reason, it will be sometimes used for commercial use. Under this situation, could we use UD_Japanese-GSD to train the built-in spaCy model (other models are provided like following)?
https://spacy.io/models
https://github.com/explosion/spacy-models
UD_Japanese-GSD is licensed CC BY-"NC"-SA. We regard the "NC" influence to the trained model.
The text was updated successfully, but these errors were encountered: