Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precise licence #149

Closed
metal3d opened this issue Jan 4, 2021 · 6 comments
Closed

Precise licence #149

metal3d opened this issue Jan 4, 2021 · 6 comments

Comments

@metal3d
Copy link

metal3d commented Jan 4, 2021

Hi,
Thanks for that very nice work.

It could be interesting to precise the license for this repository, I will be honest, I think using deepseach to be used in a professional project. But... it's very important to precise the model license.

Our projects are open-source, but can be used commercially, so if you use something like GPL or LGPL, MIT, BSD... it's ok for us - and of course, we must include a note pointing on your source code.

If you choose (like did VOSK) one CC license with "NC" option (non-commercial), so we will not be able to use your models. That's sad but we respect author's choices.

The question is, if you're using CC content to train your model, should you use the same license?

I precise my point of view: me, and "Smile" enterprise where I work, we are opensource defender. We provide our work with open licences and we contribute a lot on opensource solutions.
I think that something like MIT, BSD or LGPL licenses are the best choice for your work, but it's important to be sure that your dataset doesn't impose a license to model generation.

@lissyx
Copy link
Collaborator

lissyx commented Jan 4, 2021

I precise my point of view: me, and "Smile" enterprise where I work, we are opensource defender. We provide our work with open licences and we contribute a lot on opensource solutions.

Good to know smile is interested, i used to collaborate somehow when i was working at mandriva.

The question is, if you're using CC content to train your model, should you use the same license?

For all the code i wrote, i have no t really decided but since deepspeech is MPL, it could make sense.

Most of the dataset are either cc-0 or similar, but i have not complete memory of the licence. However, legal advised us it was safe to release the model and checkpoints under some licence we like.

I can have a look at that when im back from pto, on the 11th.

@metal3d
Copy link
Author

metal3d commented Jan 5, 2021

Thanks for the reply 👍

That's not urgent, deepspeech is anyway not fully stable and (excuse me) the model is not precise enough 😃 - but it's very interesting to be used for proof of concept and to keep it in mind to be used later.

I'm personally impressed by the done work.

Even if the model will use "NC" license, I'm sure that I will use it anyway for my personal uses and project in a while.

@lissyx
Copy link
Collaborator

lissyx commented Jan 11, 2021

So, as much as I can remember from legal talks: unless a dataset explicitely limits released models/checkpoints licensing, we are free to use whatever we want.

Currently, we use:

  • Lingua Libre: CC-by-SA
  • Common Voice FR: CC-0
  • Training Speech: CC-0
  • African Accented French: Apache 2.0
  • M-AILABS French: BSD/MIT with attribution like
  • Centre de Conférence Pierre Mendès France: ~CC-0

So I think we should be safe to release that under any license?

@metal3d
Copy link
Author

metal3d commented Jan 15, 2021

I think that the most restrictive license on the list is CC-BY-SA but if I'm not wrong that could not interfere with your work because you don't provide the dataset. Voices are used as libraries. So you can use any license you want.

The only advice I can give is to not use too much restrictive license. I can understand that you make this free and that it could be frustrating that enterprises can use them to sell products.

Using a license that is freely includable inside a product with "restriction" to say "we are using commonvoice-fr from this address with XXX license" is, IMHO, the best solution.

  • So, LGPL, MIT, Apache license, or CC-BY-SA can be a good choice.
  • GPL, CC-NC, and so on are very restrictive and cannot be used by professionals. That's a respectable choice but very limiting.

Of course, you are the author and you can choose what you prefer :)

@lissyx
Copy link
Collaborator

lissyx commented Jan 15, 2021

The only advice I can give is to not use too much restrictive license. I can understand that you make this free and that it could be frustrating that enterprises can use them to sell products.

Dont worry, i have quite a good understanding of the consequences.

@lissyx
Copy link
Collaborator

lissyx commented Feb 1, 2021

I think that the most restrictive license on the list is CC-BY-SA but if I'm not wrong that could not interfere with your work because you don't provide the dataset. Voices are used as libraries. So you can use any license you want.

Indeed, although discussing with colleagues and we remember receiving some legal warning on that, it was not so obvious.

Given that the lingualibre dataset is rather small, and it seems its export is still broken (I contacted the dev several time but to no reply yet, unsure if he is still actively working onthat or what), I guess it could be better to just re-train without this dataset so we don't take any legal risk?

@lissyx lissyx closed this as completed in 3bbc3b8 Mar 29, 2021
lissyx added a commit that referenced this issue Mar 29, 2021
Fix #149: Use MPL 2.0 for DeepSpeech
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants