New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precise licence #149
Comments
Good to know smile is interested, i used to collaborate somehow when i was working at mandriva.
For all the code i wrote, i have no t really decided but since deepspeech is MPL, it could make sense. Most of the dataset are either cc-0 or similar, but i have not complete memory of the licence. However, legal advised us it was safe to release the model and checkpoints under some licence we like. I can have a look at that when im back from pto, on the 11th. |
Thanks for the reply 👍 That's not urgent, deepspeech is anyway not fully stable and (excuse me) the model is not precise enough 😃 - but it's very interesting to be used for proof of concept and to keep it in mind to be used later. I'm personally impressed by the done work. Even if the model will use "NC" license, I'm sure that I will use it anyway for my personal uses and project in a while. |
So, as much as I can remember from legal talks: unless a dataset explicitely limits released models/checkpoints licensing, we are free to use whatever we want. Currently, we use:
So I think we should be safe to release that under any license? |
I think that the most restrictive license on the list is CC-BY-SA but if I'm not wrong that could not interfere with your work because you don't provide the dataset. Voices are used as libraries. So you can use any license you want. The only advice I can give is to not use too much restrictive license. I can understand that you make this free and that it could be frustrating that enterprises can use them to sell products. Using a license that is freely includable inside a product with "restriction" to say "we are using commonvoice-fr from this address with XXX license" is, IMHO, the best solution.
Of course, you are the author and you can choose what you prefer :) |
Dont worry, i have quite a good understanding of the consequences. |
Indeed, although discussing with colleagues and we remember receiving some legal warning on that, it was not so obvious. Given that the lingualibre dataset is rather small, and it seems its export is still broken (I contacted the dev several time but to no reply yet, unsure if he is still actively working onthat or what), I guess it could be better to just re-train without this dataset so we don't take any legal risk? |
Hi,
Thanks for that very nice work.
It could be interesting to precise the license for this repository, I will be honest, I think using deepseach to be used in a professional project. But... it's very important to precise the model license.
Our projects are open-source, but can be used commercially, so if you use something like GPL or LGPL, MIT, BSD... it's ok for us - and of course, we must include a note pointing on your source code.
If you choose (like did VOSK) one CC license with "NC" option (non-commercial), so we will not be able to use your models. That's sad but we respect author's choices.
The question is, if you're using CC content to train your model, should you use the same license?
I precise my point of view: me, and "Smile" enterprise where I work, we are opensource defender. We provide our work with open licences and we contribute a lot on opensource solutions.
I think that something like MIT, BSD or LGPL licenses are the best choice for your work, but it's important to be sure that your dataset doesn't impose a license to model generation.
The text was updated successfully, but these errors were encountered: