Precise licence #149

metal3d · 2021-01-04T17:43:25Z

Hi,
Thanks for that very nice work.

It could be interesting to precise the license for this repository, I will be honest, I think using deepseach to be used in a professional project. But... it's very important to precise the model license.

Our projects are open-source, but can be used commercially, so if you use something like GPL or LGPL, MIT, BSD... it's ok for us - and of course, we must include a note pointing on your source code.

If you choose (like did VOSK) one CC license with "NC" option (non-commercial), so we will not be able to use your models. That's sad but we respect author's choices.

The question is, if you're using CC content to train your model, should you use the same license?

I precise my point of view: me, and "Smile" enterprise where I work, we are opensource defender. We provide our work with open licences and we contribute a lot on opensource solutions.
I think that something like MIT, BSD or LGPL licenses are the best choice for your work, but it's important to be sure that your dataset doesn't impose a license to model generation.

lissyx · 2021-01-04T18:16:11Z

I precise my point of view: me, and "Smile" enterprise where I work, we are opensource defender. We provide our work with open licences and we contribute a lot on opensource solutions.

Good to know smile is interested, i used to collaborate somehow when i was working at mandriva.

The question is, if you're using CC content to train your model, should you use the same license?

For all the code i wrote, i have no t really decided but since deepspeech is MPL, it could make sense.

Most of the dataset are either cc-0 or similar, but i have not complete memory of the licence. However, legal advised us it was safe to release the model and checkpoints under some licence we like.

I can have a look at that when im back from pto, on the 11th.

metal3d · 2021-01-05T08:14:32Z

Thanks for the reply 👍

That's not urgent, deepspeech is anyway not fully stable and (excuse me) the model is not precise enough 😃 - but it's very interesting to be used for proof of concept and to keep it in mind to be used later.

I'm personally impressed by the done work.

Even if the model will use "NC" license, I'm sure that I will use it anyway for my personal uses and project in a while.

lissyx · 2021-01-11T14:35:54Z

So, as much as I can remember from legal talks: unless a dataset explicitely limits released models/checkpoints licensing, we are free to use whatever we want.

Currently, we use:

Lingua Libre: CC-by-SA
Common Voice FR: CC-0
Training Speech: CC-0
African Accented French: Apache 2.0
M-AILABS French: BSD/MIT with attribution like
Centre de Conférence Pierre Mendès France: ~CC-0

So I think we should be safe to release that under any license?

metal3d · 2021-01-15T08:18:33Z

I think that the most restrictive license on the list is CC-BY-SA but if I'm not wrong that could not interfere with your work because you don't provide the dataset. Voices are used as libraries. So you can use any license you want.

The only advice I can give is to not use too much restrictive license. I can understand that you make this free and that it could be frustrating that enterprises can use them to sell products.

Using a license that is freely includable inside a product with "restriction" to say "we are using commonvoice-fr from this address with XXX license" is, IMHO, the best solution.

So, LGPL, MIT, Apache license, or CC-BY-SA can be a good choice.
GPL, CC-NC, and so on are very restrictive and cannot be used by professionals. That's a respectable choice but very limiting.

Of course, you are the author and you can choose what you prefer :)

lissyx · 2021-01-15T08:20:28Z

The only advice I can give is to not use too much restrictive license. I can understand that you make this free and that it could be frustrating that enterprises can use them to sell products.

Dont worry, i have quite a good understanding of the consequences.

lissyx · 2021-02-01T16:56:19Z

I think that the most restrictive license on the list is CC-BY-SA but if I'm not wrong that could not interfere with your work because you don't provide the dataset. Voices are used as libraries. So you can use any license you want.

Indeed, although discussing with colleagues and we remember receiving some legal warning on that, it was not so obvious.

Given that the lingualibre dataset is rather small, and it seems its export is still broken (I contacted the dev several time but to no reply yet, unsure if he is still actively working onthat or what), I guess it could be better to just re-train without this dataset so we don't take any legal risk?

Fix #149: Use MPL 2.0 for DeepSpeech

lissyx closed this as completed in 3bbc3b8 Mar 29, 2021

lissyx added a commit that referenced this issue Mar 29, 2021

Merge pull request #152 from lissyx/license

5699e59

Fix #149: Use MPL 2.0 for DeepSpeech

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precise licence #149

Precise licence #149

metal3d commented Jan 4, 2021

lissyx commented Jan 4, 2021

metal3d commented Jan 5, 2021

lissyx commented Jan 11, 2021

metal3d commented Jan 15, 2021

lissyx commented Jan 15, 2021

lissyx commented Feb 1, 2021

Precise licence #149

Precise licence #149

Comments

metal3d commented Jan 4, 2021

lissyx commented Jan 4, 2021

metal3d commented Jan 5, 2021

lissyx commented Jan 11, 2021

metal3d commented Jan 15, 2021

lissyx commented Jan 15, 2021

lissyx commented Feb 1, 2021