About the CC BY-"NC"-SA LICENSE #14

icoxfog417 · 2019-06-20T00:00:36Z

First, thank you for the great work for Japanese dataset.

We are considering to use UD_Japanese-GSD for training the built-in model of the open-source library.

The library (spaCy) is MIT license library. For that reason, it will be sometimes used for commercial use. Under this situation, could we use UD_Japanese-GSD to train the built-in spaCy model (other models are provided like following)?

https://spacy.io/models
https://github.com/explosion/spacy-models

UD_Japanese-GSD is licensed CC BY-"NC"-SA. We regard the "NC" influence to the trained model.

dan-zeman · 2019-06-20T05:55:23Z

This is a legacy dataset and any change of the license would have to be negotiated with Google.

My personal opinion is that a trained model is just a large collection of numbers, which does not contain or provide access to the underlying texts. So it is not a copyrightable work and it should be possible to distribute it independently of the license of the corpus it was trained on (provided the license of the corpus did not restrict your usage of the corpus in such a way that would prevent you from obtaining the model). DISCLAIMER: I am not a lawyer and you should not take my opinion as legal advice. (But at least one lawyer, to whom I talked some years ago, agreed with this view of trained models when I explained them to him.)

icoxfog417 · 2019-06-27T05:35:32Z

Thank you for your reply.

I think there are two copyright holder, one is Google that original content holder and the other is Hiroshi Kanayama who reprocessed it. Who is the dominant copyright holder (according to your reply, it's Google)?

dan-zeman · 2019-06-27T07:28:03Z

and the other is Hiroshi Kanayama who reprocessed it

This is a good remark. Whoever added their wisdom to the data should have a say should the data be released under a different license. But Google comes earlier on the time scale, and the "-SA" ("share alike") part means that whoever redistributes the data must use the same license. If Hiroshi wanted to provide the data without the "-NC" restriction, he would have to obtain Google's consent anyway. If Google gives the consent, then I suppose it will pertain to the original Google version of the data and you would have to ask Hiroshi's team whether they are willing to extend it to the version they processed.

jnivre · 2019-06-27T08:28:45Z

The NC restriction appears to be non-negotiable as far as Google is concerned. Many people have tried to buy a commercial license for one or more of the GSD treebanks, but so far to no avail. I fear that it applies here too.

kanayamah · 2019-06-28T02:09:24Z

Currently I do use a part of original Google's annotation thus we need to follow the Google's license, even though most of sentences were from Japanese Wikipedia. I hope someone negotiate Google to rethink GSD license for all languages, not just for Japanese.

icoxfog417 · 2019-06-28T02:38:59Z

As Universal Dependencies project, is there any plan to negotiate license problem? It'll be hard for each language team leader (like Hiroshi Kanayama) to negotiate the license.

dan-zeman · 2019-06-28T10:13:17Z

I don't think UD has capacity for that. And it is very unlikely that all UD treebanks can be relicensed under CC BY-SA. Some time ago we tried to get rid at least of the GNU GPL licenses, which are not suitable for data, and we failed—in some cases we never got a response from a relevant party.

hiroshi-matsuda-rit · 2019-06-30T19:46:56Z

Fortunately, the license of UD_Japanese-PUD is CC-BY-SA hence we can use the PUD-based pretrained models for commercial purposes even if those models are created by outsiders.
In addition, I just completed to annotate the NE labels (by OntoNotes 5 manner) to a thousand of the sentences in UD_Japanese-PUD. The spaCy ready json file is also available.
https://github.com/megagonlabs/UD_Japanese-PUD

I'm going to submit a PR to push the PUD-based Japanese language model, tomorrow.

hiroshi-matsuda-rit · 2019-07-02T18:25:35Z

I've just created a PR for Japanese Model using UD_Japanese-PUD with my NE annotations.
explosion/spaCy#3899

masayu-a · 2019-07-03T11:55:21Z

Hi,

UD Japanese team takes over the maintenance of UD_Japanese-GSD data from the preceding maintainers.
I have contacted a preceding maintainer in Google to permit the license restoration of UD_Japanese-GSD.

The scope of claim is as follows:

UD_Japanese-GSD is based on wikipedia texts, which is under CC BY-SA.
However, the UD_Japanese-GSD data is CC BY-"NC"-SA. The license was not properly altered.
We hope that the license should be restored as the wikipedia text license CC BY-SA.
https://creativecommons.org/faq/#can-i-combine-material-under-different-creative-commons-licenses-in-my-work

masayu-a · 2019-08-28T10:28:10Z

Hi,

We have negotiated the "NC" removal from the UD_Japanase-GSD license.
The preceding maintainers accepted the removal.
It will be noted in the page https://github.com/ryanmcd/uni-dep-tb in the near future.

We will assign CC BY-SA for the newer version of UD_Japanese-GSD after the "NC" removal of the original repository.

kanayamah · 2020-05-01T05:05:56Z

This has been solved in v2.5.

icoxfog417 changed the title ~~About the CC LICENSE~~ About the CC BY-"NC"-SA LICENSE Jun 20, 2019

dan-zeman added the question label Jun 20, 2019

icoxfog417 mentioned this issue Aug 29, 2019

Japanese Model explosion/spaCy#3756

Closed

kanayamah closed this as completed May 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the CC BY-"NC"-SA LICENSE #14

About the CC BY-"NC"-SA LICENSE #14

icoxfog417 commented Jun 20, 2019 •

edited

dan-zeman commented Jun 20, 2019

icoxfog417 commented Jun 27, 2019

dan-zeman commented Jun 27, 2019

jnivre commented Jun 27, 2019

kanayamah commented Jun 28, 2019

icoxfog417 commented Jun 28, 2019

dan-zeman commented Jun 28, 2019

hiroshi-matsuda-rit commented Jun 30, 2019

hiroshi-matsuda-rit commented Jul 2, 2019

masayu-a commented Jul 3, 2019

masayu-a commented Aug 28, 2019

kanayamah commented May 1, 2020

About the CC BY-"NC"-SA LICENSE #14

About the CC BY-"NC"-SA LICENSE #14

Comments

icoxfog417 commented Jun 20, 2019 • edited

dan-zeman commented Jun 20, 2019

icoxfog417 commented Jun 27, 2019

dan-zeman commented Jun 27, 2019

jnivre commented Jun 27, 2019

kanayamah commented Jun 28, 2019

icoxfog417 commented Jun 28, 2019

dan-zeman commented Jun 28, 2019

hiroshi-matsuda-rit commented Jun 30, 2019

hiroshi-matsuda-rit commented Jul 2, 2019

masayu-a commented Jul 3, 2019

masayu-a commented Aug 28, 2019

kanayamah commented May 1, 2020

icoxfog417 commented Jun 20, 2019 •

edited