Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License change from GPL to MPL #2456

merged 5 commits into from
Nov 6, 2017

License change from GPL to MPL #2456

merged 5 commits into from
Nov 6, 2017


Copy link

@mattdowle mattdowle commented Nov 1, 2017

This pull request is a verbatim copy of an email I sent to all 24 contributors of code to the project. All 24 have now approved with either a review or a thumbs-up. Thanks everyone!

Dear contributor to data.table,

Since its creation in 2006, data.table has been GPL. You are receiving this email because you have contributed either R or C code to the data.table project (even one line) and are considered a license holder. Contributors to documentation-only have not been included. Myself and Arun (because we are the biggest contributors) have tentatively agreed, subject to all your approvals and further comments, that it was never our intention to prevent closed-source products from using data.table, and to change to the Mozilla Public License (MPL). We think of data.table as a library.

I believed that closed-source products can already call interpreted R code, such as :

DT = fread(...)

It was never our intention to prevent usage like that by closed-source products. I believed that R code was "data" and not considered linking. The reason I believed that was this GNU FAQ :

The first paragraph is very clear :

When the interpreter just interprets a language, the answer is no. The interpreted program, to the interpreter, is just data; a free software license like the GPL, based on copyright law, cannot limit what data you use the interpreter on. You can run it on any data (interpreted program), any way you like, and there are no requirements about licensing that data to anyone.

So, the GPL does not apply to interpreted code. However, it has been brought to my attention that the same answer continues with 3 further paragraphs :

However, when the interpreter is extended to provide “bindings” to other facilities (often, but not necessarily, libraries), the interpreted program is effectively linked to the facilities it uses through these bindings. So if these facilities are released under the GPL, the interpreted program that uses them must be released in a GPL-compatible way. The JNI or Java Native Interface is an example of such a binding mechanism; libraries that are accessed in this way are linked dynamically with the Java programs that call them. These libraries are also linked with the interpreter. If the interpreter is linked statically with these libraries, or if it is designed to link dynamically with these specific libraries, then it too needs to be released in a GPL-compatible way.

Another similar and very common case is to provide libraries with the interpreter which are themselves interpreted. For instance, Perl comes with many Perl modules, and a Java implementation comes with many Java classes. These libraries and the programs that call them are always dynamically linked together.

A consequence is that if you choose to use GPL'd Perl modules or Java classes in your program, you must release the program in a GPL-compatible way, regardless of the license used in the Perl or Java interpreter that the combined Perl or Java program will run on.

So, it is my new understanding that data.table's GPL license could be interpreted as preventing closed-source products from using data.table, even at R-only level. However strange that sounds, I'm willing to agree to that definition, for the avoidance of doubt.

What we intended was that we contributed free software in the open under one condition: that if you improve data.table itself and distribute your improved data.table as a product, then that product must be open-source too. We were concerned about the code inside data.table itself, not use of data.table. This is why, for data.table, we never agreed with MIT or Apache, and still don't. We do not like those licenses for data.table because we think them unfair to the contributors. We are concerned about, say, data.table.PRO, a closed-source improvement of data.table being created based on our free contributions. We are not concerned about using data.table as a library by closed-source software. In fact, we are now concerned about that being perceived as prevented.

Whether or not we agree with the way this GNU FAQ is being interpreted, we are happy in principle to agree to that interpretation. This is not a change in policy, but a change in the license to match what we intended anyway, for the avoidance of doubt.

The natural first thought, was LGPL. But that has some restrictions that are opposed :
The MPL (Mozilla Public License) does not have those restrictions. The MPL is even lesser than the LGPL. It is the lowest we can go with throwing the code to the wind and going with a lax license like MIT or Apache which would allow closed-source data.table.PRO to be created.

I didn't want to put anyone under public pressure. So this is an email first. If nobody disagrees then I will create a pull request making the license change. All project members will be reviewers and all must approve. I can't add non project members as reviewers because GitHub doesn't allow that, so non project members will need to please add their vote to 'thumbs-up'. Any single one of you can veto the change and that will be respected. Even if all project members agree, a single contributor who isn't a project member can still veto the change and that will be respected too. The pull request will basically be this email copied in. Further discussion could take place inside the PR before you approve; you don't have to approve now by email unless you want to. This is an opportunity to apply your veto privately via email to me, before the public PR is created.

To put some examples to it, if we change to MPL, here is what will be ok and what will not be ok.

What is ok under MPL

  • closed-source products can use data.table via any mechanism of their choosing. That includes R-only usage of data.table. Linking to and calling its C API in a closed-source product will now be ok under MPL but was not ok under GPL (if anyone wants to do that!).

  • improving data.table's implementation and distributing that improvement privately within your company, for profit or not, even across your international offices is ok by MPL and was ok by GPL too. It's only when the package is distributed outside a company (think data.table.PRO) that any improvements to data.table have to be released open-source. Those improvements don't have to be contributed back to data.table but they do have to be open-source in public and licensed as MPL to ensure the code stays open-source. Nothing compels private changes to GPL, LGPL or MPL code to be released. Distributing within a company is not distributing. However, the MPL's wording is simpler and more explicit than the GPL in this regard.

  • Re-implementing data.table in a "clean-room" independently from scratch: there's nothing we can do to prevent that. For example, TERR is a closed-source re-implementation of R by TIBCO which is distributed to its customers. We'd all prefer if TERR was open-source so we could benefit from it but there's nothing we can do because they did it independently with new code without looking at R's source code. That could, presumably, be done for data.table too. If we changed to MIT or Apache, then the task of improving data.table and making closed-source data.table.PRO would be much easier, which we don't think is fair to data.table's contributors.

What is not ok under MPL

  • Matt Dowle creating closed-source data.table.PRO. I am not the license holder. All the contributors (i.e. you) are jointly the license holders. As soon as you contributed to data.table you are a license holder and you can then veto any future license changes. I have prevented myself from making closed-source data.table.PRO via the choice of GPL. Moving to MPL will not change this, I will still be prevented. If I am prevented, I don't see why we should change to Apache or MIT to let someone else create data.table.PRO.

  • anyone creating data.table.PRO by starting from the code inside data.table, making it better and releasing that as a closed-source product. Or, the other way too: releasing the changes under an Apache, MIT, or similarly lax license. Because the lax license opens the door to closed-source data.table.PRO from there. Our intention is to prevent our free contributions from helping a closed-source improvement of data.table be created. We don't want to be disrespected in that way or taken advantage of. We don't mind if a closed-source product uses data.table.

Anticipated questions

Q: What does Matt and Arun's 'author' status mean?
A: We get mentioned in citation("data.table") because we're, currently, the biggest contributors. It's unrelated to ownership or licensing.

Q: What does the 'contributor' status mean in DESCRIPTION?
A: Again, it's not to do with licensing or ownership. The contributors names are listed on the CRAN page. It's kudos.

Q: Why is there no license holder in DESCRIPTION?
A: Because, currently, we like that there isn't. There still won't be under MPL. All the contributors own the project jointly and the license can't be changed without all their permission. If we changed to MIT or Apache we would then have to name a license holder (that's a requirement of those licenses). Who would be the license holder be in our case? From that point, your contributions would be given to the license holder. They could change the license later (say, to closed-source, or a PRO version with enhancements). We could pick Apache and assign the Apache Foundation as license holder. But that is a long process to incubate. For now, we prefer the simpler MPL and leave the door open to consider Apache in the future.

Q: If we decide MPL was a mistake, can we change back to GPL?
A: Yes. We would all have to agree again. The MPL is the lowest and simplest we can go while retaining this "it-belongs-to-all-of-us" feature. However, once v1.11.0 was released as MPL we could not take that back. Any subsequent change back to GPL would apply from v1.11.2 onwards.

Q: Could we change to a lax license, like MIT, Apache or BSD?
A: Yes. We would all have to agree again. To change back from those licenses, though, your permission would no longer be needed. It would be the license holder who could decide that by themselves. This is one reason I have ruled out those licenses for data.table, for now, subject to your comments. The other reason being that they permit closed-source data.table.PRO.

Q: Why is it Matt that's writing?
A: Just because I'm the biggest contributor and current maintainer of the package on CRAN. I'm merely acting as an administrator/maintainer.

Q: What's driving the change?
A: H2O (my employer) has created a closed-source product called Driverless-AI. It uses a Python package pydatatable which is a port of data.table to Python. With data.table being GPL, pydatatable needs to be GPL, and therefore the concern is that Driverless-AI can't be closed-source because it would call a Python GPL package, even just interpreted Python. Changing R's data.table to MPL allows pydatatable to be MPL which would then allow closed-source Driverless-AI to use it. I see it a good compromise between these two very differently licensed communities.

Q: Has Matt been asked about license changes before?
A: Yes. One person in the Julia community asked me last year if they could use fread.c in Julia. At the time I declined because Julia is MIT and allows, for example, JuliaPRO to be created. I didn't want to contribute for free to closed-source JuliaPRO. If data.table changed to MPL, it would be easier for Julia to use fread.c. They could, if they wished, take fread.c and include it in Julia with the MPL license. Any improvements to fread.c could not be made in JuliaPRO, however, without open-sourcing those improvements. Which I think is fair to fread's contributors. fread.c has already been separated/agnostified from the R API (freadR.c) so it should be easier to hook up into Julia. It is already hooked up into pydatatable.

Q: Has Matt ever been asked to change data.table to Apache?
A: Yes, recently. I discussed with Arun and we declined. I continued negotiations, discovered MPL, agreed with Arun and am now putting this forward to you all.

Q: Can any contributor make a pull request to change the license?
A: Yes. As long you gain approval from all contributors, the change would be made.

Q: You said MPL is the "lowest we can go" and is lesser than the LGPL. Is there an independent table or something to look at?
A: See Concentrating on the first 3 columns of colored boxes (headings: Linking, Distribution, Modification), observe:
Apache & MIT : all 3 boxes green -- we think that is too lax
LGPL : all 3 boxes blue -- the "With restrictions" blue status in the first column for Linking is opposed (clauses 4d0 and 4d1 of LGPL-3)
MPL : first box green, other two blue.
No other license on the list has green, blue, blue. So MPL uniquely matches our intentions.

Q: Is there any debate about which version of MPL to choose?
A: No. v2 of MPL is clear. Unlike GPL and LGPL which have their quirks about versions and combinations of versions.

Q: What about the license of data.table's dependencies?
A: data.table does not have any dependencies other than R itself so this isn't an issue. If data.table depended on any GPL packages then we would not be free to choose MPL.

Q: Is Mozilla Public License (MPL) recognized by CRAN as an acceptable license for a package?
A: Yes it is listed in with the acronym MPL-2.

Q: Why will the license field in DESCRIPTION contain "MPL-2 | LICENSE"
A: Google lawyers do not accept CRAN's acronyms or links. They require the actual license file to be present. The LICENSE file is a verbatim copy of the MPL-2. If we put just the LICENSE file then people might think it was a special unique license. So we've ended up with both the acronym and the file.

Q: What if Matt dies?
A: CRAN maintainers would ask the next largest contributor if they would like to be maintainer: currently, Arun.

Q: What if a contributor cannot be contacted?
A: That is one reason why this is an email first. To see if it is even possible to achieve 100% approval.

Please reply to indicate if you're ok in principle for the pull request to be created and your permission to be asked for in public there. I need a reply from everyone please to know whether full agreement is possible.


@mattdowle mattdowle added this to the v1.10.6 milestone Nov 1, 2017
Copy link

@lianos lianos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the license change.

Copy link

@restonslacker restonslacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve

Copy link

codecov-io commented Nov 1, 2017

Codecov Report

Merging #2456 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2456   +/-   ##
  Coverage   91.58%   91.58%           
  Files          62       62           
  Lines       12028    12028           
  Hits        11016    11016           
  Misses       1012     1012

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce37655...1934aff. Read the comment docs.

Copy link

st-pasha commented Nov 1, 2017

Thanks Matt, this is really great! I understand that Alan was studying this move in great detail, so I wonder if he can post answers to few more questions for the FAQ:

  • What about data.table's dependency on R? As you stated above, R is the only dependency of data.table. However, R itself is licensed under GPL-2 | GPL-3, with some of the header files being LGPL-2.1. The data.table library links against R's executable at runtime. Since R's executable is licensed under GPL-3, does that impose any restrictions on data.table? Does the fact that the headers are under LGPL help?
  • Could there be any problems with Rversion.h? Most, but not all R's header files are licensed under LGPL. Among those that are not, "Rversion.h" is the only one that data.table uses (as far as I can see). On one hand that file is auto-generated. On the other hand, it doesn't carry any explicit license note, which means it inherits the default R project's license: GPL-3. Does it then create any repercussions for the data.table ?

Copy link
Member Author

mattdowle commented Nov 1, 2017

@st-pasha Yes the fact that R's headers are LGPL makes the difference. That's why it's possible for there to be a very wide range of licenses (including MPL, Apache and MIT) available for packages acceptable to CRAN listed in which is linked from CRAN policies and R-exts.
The COPYRIGHTS file details the reason for the change to LGPL made in 2001. I won't extract parts of that file here: the whole file should be read. So even closed-source packages for R are possible. Such closed-source packages just can't be on CRAN due to CRAN's open-source policy but the LGPL license allows even closed-source R packages off CRAN and some do exist.

In terms of Rversion.h I doubt anyone considers that a version number needs to be licensed. There isn't any code to be protected in a version number.

Yes, Alan (our in-house H2O lawyer) reviewed my email to contributors before it was sent out. If you need him to reply here about these supplemental questions, please ask him.

Copy link

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Hoping this means a wider audience for data.table and more :)

Copy link


Copy link

Just curious, will pydatatable be open-sourced?

Copy link

st-pasha commented Nov 6, 2017

@dselivanov Yes

mattdowle added a commit that referenced this pull request Nov 6, 2017
Copy link
Member Author

mattdowle commented Nov 6, 2017

Everyone has approved. Thank you!

As the thumbs-up pop-up only shows the first 10 ids "plus 5 more", I need to post a final list in one place. It seems non project members can add an approving review even if they haven't been requested, so that's better than thumbs-up if there's ever a next time. I've heard from everyone via email too.

NB: the sum(commits) here is 2,871 and excludes merge commits, as stated at the top of the graph in Insights->Contributors. The 3,175 commits displayed at the top left of the code tab is 304 larger because that includes merge commits.


@mattdowle mattdowle merged commit 6fa0a96 into master Nov 6, 2017
@mattdowle mattdowle deleted the license branch November 6, 2017 23:14

You may add additional accurate notices of copyright ownership.

Exhibit B - "Incompatible With Secondary Licenses" Notice
Copy link

@alamit alamit Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that as the LICENSE file contains the Exhibit B and I couldn't find any other copyright notice, it makes it very unclear to determine whether data.table is allowed to be combined with "Secondary Licenses" or not. Because the License states that in the absence of copyright notice, the LICENSE file should be used as copyright notice.

As the relicensing has been conducted to make data.table compatible with closed source software. I believe including this Exhibit in the LICENSE file is not recommended, as it makes data.table incompatible with software with different licenses than MPL 2.0.

Copy link
Member Author

mattdowle commented Feb 16, 2018

Exhibit B is marked as an exhibit. It is not a notice but an exhibit of a notice. Therefore, that notice does not apply either to the LICENSE file itself or to the project, for the reason that it is not a notice; it is an exhibit of a notice.

If you still believe it is recommended to remove Exhibit B from the LICENSE file, please point me to a third party recommendation. I believe the LICENSE file is best left unchanged: it is best to be a verbatim copy of MPL-2.0 including its exhibits so there can be no doubt it is verbatim MPL-2.0; e.g. as confirmed by the file size in bytes and a hash.

I hope this answers every aspect of your comment and that you are able to proceed satisfactorily. If there is anything else I can answer in more detail or anything that I can change, please let me know.

A little more on Exhibit A (not exhibit B). This text appears just after Exhibit A at the bottom of the LICENSE file :

If it is not possible or desirable to put the notice [Exhibit A] in a particular
file, then You may include the notice in a location (such as a LICENSE
file in a relevant directory) where a recipient would be likely to look
for such a notice.

So, yes, I see no need to place the license, or Exhibit A at the top of each and every source file. I know many projects do, but I saw no need. I'm relying on this explicit paragraph in the license (GPL has a similar one) and I've called this file LICENSE in the root directory with that in mind, as is common in many other projects too. This saves having to check we've remembered to place Exhibit A at the top of each source file in the correct way. Also, when we changed from GPL to MPL we didn't need to go and touch every single source file. I've seen Exhibit A or similar in every source file in projects which have a mixture of licenses. So having one LICENSE file in the root of data.table conveys clearly and easily that there is no mixed licensing in data.table: it's all MPL 2.0.

That is my current thinking anyway. Happy to hear further comments and suggestions.

Copy link

From :

Q25: What happens if someone doesn't use the per-file boilerplate, and just ships a copy of the full MPL 2 with their code?
The code is licensed under the plain MPL 2. It is not considered Incompatible with Secondary Licenses. Making code Incompatible with Secondary Licenses requires an active choice on the part of the licensor; it is not the default. The notice in Exhibit B is not considered "attached" merely by being present as the Exhibit B of a copy of the full MPL 2.

Copy link

related to #4140 - approving

Copy link

@chenghlee chenghlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this licensing change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

None yet