-
Notifications
You must be signed in to change notification settings - Fork 298
Improve the bundle jar license and notice remove using exclude #1991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I doubt this statement is true. The shadow plugin does not change anything wrt to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this refactoring @gh-yzou ! The new build script looks much better to me... I still have a couple of comments :)
--conf spark.sql.catalog.polaris.credential="root:secret" \ | ||
--conf spark.sql.catalog.polaris.scope='PRINCIPAL_ROLE:ALL' \ | ||
--conf spark.sql.catalog.polaris.token-refresh-enabled=true \ | ||
--conf spark.sql.catalog.<catalog-name>.warehouse=<catalog-name> \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not use polaris
as the example catalog name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the shell command without filling the actual value, i added an example in the README with specific value filled.
// customized LICENSE and NOTICE file are called CUSTOM-LICENSE and CUSTOM-NOTICE, | ||
// and renamed to LICENSE and NOTICE after include, this is to avoid the file | ||
// being excluded due to the exclude pattern matching used above. | ||
from("${projectDir}/CUSTOM-LICENSE") { rename { "LICENSE" } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If CUSTOM
is only to avoid falling under the exclude rules above, how about naming it BUNDLE-LICENSE
instead? I think that name is clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively (because the simple LICENSE
file name feel more appropriate to me), how about the following?
- Copy
${projectDir}/LICENSE"
tobuild/license/BUNDLE-LICENSE
- Include
build/license/BUNDLE-LICENSE
with rename inside the shadow task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative seems unnecessary complicated to me, I think name with BUNDLE-* is a perfect name, which indicates the license specifically for the bundle jar, which is different than the regular LICENSE and NOTICE we use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbonofre : WDYT about BUNDLE-LICENSE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gh-yzou : I edited by comment about copy at build time. I hope the intent is clearer now. Sorry about the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i think i got your original comment, i don't think we need to do those extra copies, which seems unnecessary, and i think BUNDLE-LICENSE and BUNDLE-NOTICE is a good name there, we can see what @jbonofre think about the original file name, just note that in the final jar, the name will still be LICENSE and NOTICE, that is just the file name in the original project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These LICENSE
files are intended only for binary distributions. In the source code, using the name BUNDLE-LICENSE
is perfectly fine, as long as we ensure the correct file name is used in the final published binary artifacts. Let's make sure the naming is adjusted appropriately during the packaging step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimas-b I reverted the extra module change in this PR, I think we can investigate into how to reuse the shadowJar plugin in a followup PR. And I think @jbonofre is ok with the current license name, since it is not a user facing name, I think we can always improve on top of it later. could you please take one more look of this PR?
@snazy it doesn't change the sourceJar, however, it updates the generated .pom file to only dependent on the bundle jar, and it stop publishes the original sourceJar, where we only got the following jars
Therefore, we wasn't able to directly reuse the shadow jar. However, we are able to reuse it now, and with a separate bundle project, it also make things much more clear. |
// add polaris customized LICENSE and NOTICE for the bundle jar at top level. Note that the | ||
// customized LICENSE and NOTICE file are called BUNDLE-LICENSE and BUNDLE-NOTICE, | ||
// and renamed to LICENSE and NOTICE after include, this is to avoid the file | ||
// being excluded due to the exclude pattern matching used above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised a bit that this works based on the documentation for the exclude which I thought was the last thing that is run after the steps like this would execute but I'm glad that it does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I need to make sure the original file name doesn't match any of the matching rules to avoid being excluded, the rename actually happens after the excluding. I have verified manually, the LICENSE and NOTICE is the only one I see at top level, and they are the right one
I still don't understand what you're trying to say here. In the PR description you say: There's nothing that stops users of the shadow plugin to publish both the "base" jar (e.g. using a different classifier) and the shaded jar from a single project. |
|
||
dependencies { implementation(project(":polaris-spark-${sparkMajorVersion}_${scalaVersion}")) } | ||
|
||
tasks.named<ShadowJar>("shadowJar") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this PR the bundled artifact maven coordinates are going to be different from what is currently in 1.0.0 RC6 (that is polaris-spark-3.5_2.12-1.0.0-incubating-bundle.jar
).
Since the polaris-spark
artifact name is occupied by the non-bundled Spark Client jar already (without a qualifier), I believe it is not possible to offer maven redirects, so this is going to be a breaking change in terms or Maven coordinates of Polaris artifacts. Regardless of whether we expect users to download this jar via Maven or now, I think it deserves to be mentioned in the CHANGELOG (unless we include this PR into 1.0.0).
Cf. https://lists.apache.org/thread/hrdchvljflcknyq9c7o7by9jhpv204op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, i can add the jar name format update in the change log
exclude("META-INF/**/*NOTICE*") | ||
// exclude the top level LICENSE, LICENSE-*.txt and NOTICE | ||
exclude("LICENSE*") | ||
exclude("NOTICE*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do the same kind of filter in the existing spark client module? If not, what prevents up from doing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed for the other spark client module. This is needed by the bundle jar because the bundle jar packs all the dependencies in one jar, and therefore need a customized license. however, for the regular spark client, we do not pack any dependency in the jar, it is pure source code from Polaris, and the regular poalris license it good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, why not use the same build code for the bunded jar in the old Spark module? Why do we need a new module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are couple of reasons:
- with this approach, we will be able to reuse the current shadowJar plugin for maven publish, instead of this hack added https://github.com/apache/polaris/pull/1991/files#diff-d380e458e986c6183b1e23fcea1169811acfc68b2662ff9c88bb26b821238607L136, which i think make things more consistent.
- This seems a more commonly used approach to publish bundle jar or uber jar, such as Iceberg.
- This could make the module responsibility more clear, for example, the bundle-license and bundle-notice is only needed for the bundle project, not the regular project.
@snazy sorry for the confusion, by source jar, i don't refer to the *-source.jar, I am referring to the jar without any classifier, here is what we expected to see once we do publish to maven
where you can see we have a jar with name polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT.jar. However, after reuse the shadowJar plugin, what we see are the following
and when customer uses --packages org.apach.polaris:polaris-spark:1.1.0-incubating-SNAPSHOT, it is actually installing the bundle jar. However, when user is using the --packages options, we expect the polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT.jar is installed, and the other dependency will be downloaded accordingly. The polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT-bundle.jar is only intended for the --jars use case.
and polaris-spark-bundle-3.5_2.12-1.1.0-incubating-SNAPSHOT.jar is used as the --jars option, the original polari-spark project continue remain for the --package use case, with the following jars
|
a506adf
to
672cd1f
Compare
@gh-yzou I still do not understand why you need to add a new Gradle project just to have one other jar. It is possible to build and publish both the "raw" and the "shadow" jar from a single project, we do this in Nessie's Spark extensions. BTW: It is concerning that the term "customer" is being used. Polaris is an Apache project, not a commercial project! |
@snazy I don't need a separate project to publish both Jars (polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT.jar and polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar). This is what happening today, however, we are not able to reuse the shadowJar plugin. Regarding to nessie, i assume you are referring this https://github.com/projectnessie/nessie/blob/main/integrations/spark-extensions/build.gradle.kts#L84. However, it seems that this shadowJar packs the dependencies and doesn't have any classifier, which means the shadowJar packed will be the project jar when publish. I did a compile of nessie project, and here is what I see
For which, i believe nessie-spark-extensions-3.5_2.12-0.104.3-SNAPSHOT.jar is the shadowJar. As we have mentioned, the shadowJar is packed for the --jars use case, where we pack everything. For --package use case, the dependencies are suppose to be downloaded while installing the project, and we just need the original project jar (not the shadowJar). Regarding to "he term "customer" is being used", are you referring to the jar name changes, like from project-spark-xxx-bundle.jar to project-spark-bundle-xxx.jar. Since this is for the --jar use case, and the jar name updates along with the version name for every release, i think the impact should be relative low, especially this is still an experimenting project. We are also making sure that we doc this change clearly in the CHANGELOG |
@gh-yzou It is possible to publish both the "non-shadow" and the "shadow" jar from one project, even with the classifiers reversed. It is rather a matter of configuring the Gradle artifacts and (Maven) publications. So I suggest to consider this approach.
No, I'm referring to the usage of the commercial term "customers", which inappropriate in Apache projects. Apache projects have users and are by definition not commercial. |
@snazy sure, we can look into how to improve the shadowJar plugin. What I can do is for his Pr just do the improvement of license, and we can do the other improvement in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I would suggest to add a note about the artifact coordinates change as users may be surprised by this change.
672cd1f
to
1e53608
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @gh-yzou ! The current state of this PR LGTM!
nit: The PR title seems a bit outdated? |
@dimas-b i just rebased about an hour ago, but I can rebase it again. |
@@ -1,3 +1,5 @@ | |||
import com.github.jengelman.gradle.plugins.shadow.tasks.ShadowJar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, looks like this import should be below the copyright comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, let me move it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimas-b i fixed the order, and also updated the PR title and description
no need to rebase, but the PR title still mentions a new "project", which is no longer the case, right? |
c42dad3
to
1239a9e
Compare
Previously we added addLicenseFilesToJar job to help removing the LICENSE and NOTICE from dependency jars and also add our own Jars.
The addLicenseFilesToJar does this brute forcely, this PR improves the process by using exclude, and we are also able to remove this extra addLicenseFilesToJar job.