Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the CC BY-"NC"-SA LICENSE #14

Closed
icoxfog417 opened this issue Jun 20, 2019 · 12 comments
Closed

About the CC BY-"NC"-SA LICENSE #14

icoxfog417 opened this issue Jun 20, 2019 · 12 comments
Labels

Comments

@icoxfog417
Copy link

icoxfog417 commented Jun 20, 2019

First, thank you for the great work for Japanese dataset.

We are considering to use UD_Japanese-GSD for training the built-in model of the open-source library.

explosion/spaCy#3756

The library (spaCy) is MIT license library. For that reason, it will be sometimes used for commercial use. Under this situation, could we use UD_Japanese-GSD to train the built-in spaCy model (other models are provided like following)?

https://spacy.io/models
https://github.com/explosion/spacy-models

UD_Japanese-GSD is licensed CC BY-"NC"-SA. We regard the "NC" influence to the trained model.

@icoxfog417 icoxfog417 changed the title About the CC LICENSE About the CC BY-"NC"-SA LICENSE Jun 20, 2019
@dan-zeman
Copy link
Member

This is a legacy dataset and any change of the license would have to be negotiated with Google.

My personal opinion is that a trained model is just a large collection of numbers, which does not contain or provide access to the underlying texts. So it is not a copyrightable work and it should be possible to distribute it independently of the license of the corpus it was trained on (provided the license of the corpus did not restrict your usage of the corpus in such a way that would prevent you from obtaining the model). DISCLAIMER: I am not a lawyer and you should not take my opinion as legal advice. (But at least one lawyer, to whom I talked some years ago, agreed with this view of trained models when I explained them to him.)

@icoxfog417
Copy link
Author

Thank you for your reply.

I think there are two copyright holder, one is Google that original content holder and the other is Hiroshi Kanayama who reprocessed it. Who is the dominant copyright holder (according to your reply, it's Google)?

@dan-zeman
Copy link
Member

and the other is Hiroshi Kanayama who reprocessed it

This is a good remark. Whoever added their wisdom to the data should have a say should the data be released under a different license. But Google comes earlier on the time scale, and the "-SA" ("share alike") part means that whoever redistributes the data must use the same license. If Hiroshi wanted to provide the data without the "-NC" restriction, he would have to obtain Google's consent anyway. If Google gives the consent, then I suppose it will pertain to the original Google version of the data and you would have to ask Hiroshi's team whether they are willing to extend it to the version they processed.

@jnivre
Copy link

jnivre commented Jun 27, 2019

The NC restriction appears to be non-negotiable as far as Google is concerned. Many people have tried to buy a commercial license for one or more of the GSD treebanks, but so far to no avail. I fear that it applies here too.

@kanayamah
Copy link
Contributor

Currently I do use a part of original Google's annotation thus we need to follow the Google's license, even though most of sentences were from Japanese Wikipedia. I hope someone negotiate Google to rethink GSD license for all languages, not just for Japanese.

@icoxfog417
Copy link
Author

As Universal Dependencies project, is there any plan to negotiate license problem? It'll be hard for each language team leader (like Hiroshi Kanayama) to negotiate the license.

@dan-zeman
Copy link
Member

I don't think UD has capacity for that. And it is very unlikely that all UD treebanks can be relicensed under CC BY-SA. Some time ago we tried to get rid at least of the GNU GPL licenses, which are not suitable for data, and we failed—in some cases we never got a response from a relevant party.

@hiroshi-matsuda-rit
Copy link

Fortunately, the license of UD_Japanese-PUD is CC-BY-SA hence we can use the PUD-based pretrained models for commercial purposes even if those models are created by outsiders.
In addition, I just completed to annotate the NE labels (by OntoNotes 5 manner) to a thousand of the sentences in UD_Japanese-PUD. The spaCy ready json file is also available.
https://github.com/megagonlabs/UD_Japanese-PUD

I'm going to submit a PR to push the PUD-based Japanese language model, tomorrow.

@hiroshi-matsuda-rit
Copy link

I've just created a PR for Japanese Model using UD_Japanese-PUD with my NE annotations.
explosion/spaCy#3899

@masayu-a
Copy link
Contributor

masayu-a commented Jul 3, 2019

Hi,

UD Japanese team takes over the maintenance of UD_Japanese-GSD data from the preceding maintainers.
I have contacted a preceding maintainer in Google to permit the license restoration of UD_Japanese-GSD.

The scope of claim is as follows:

@masayu-a
Copy link
Contributor

Hi,

We have negotiated the "NC" removal from the UD_Japanase-GSD license.
The preceding maintainers accepted the removal.
It will be noted in the page https://github.com/ryanmcd/uni-dep-tb in the near future.

We will assign CC BY-SA for the newer version of UD_Japanese-GSD after the "NC" removal of the original repository.

@kanayamah
Copy link
Contributor

This has been solved in v2.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants