Skip to content

[Build/Core] - Add a Gradle task to create a git version properties file for core module#5186

Closed
kbendick wants to merge 16 commits intoapache:masterfrom
kbendick:kb-generate-properties-file-for-core-on-build
Closed

[Build/Core] - Add a Gradle task to create a git version properties file for core module#5186
kbendick wants to merge 16 commits intoapache:masterfrom
kbendick:kb-generate-properties-file-for-core-on-build

Conversation

@kbendick
Copy link
Contributor

@kbendick kbendick commented Jul 3, 2022

This PR adds a gradle task createProperties, which generates a properties file that is intended for access from org.apache.iceberg.util.VersionPropertiesUtil.

The git build SHA (both short and full) and other useful information is accessible from that file.

This information is useful for debugging, and can also be sent as a header or request property with various catalogs for debugging catalog interactions with external resources.

The generated version.properties file is added to the runtime and test resources from within the createProperties task.

If anybody knows a cleaner way to do that so as not to pollute the git sources, please let me know

@kbendick
Copy link
Contributor Author

kbendick commented Jul 3, 2022

version.properties is added to core jar and thus runtime jars:

$ jar -tvf  spark/v3.3/spark-runtime/build/libs/iceberg-spark-runtime-3.3_2.12-0.14.0-SNAPSHOT.jar | grep version.properties
   291 Fri Feb 01 00:00:00 PST 1980 org/apache/iceberg/util/version.properties
   903 Fri Feb 01 00:00:00 PST 1980 org/apache/hc/client5/version.properties
   839 Fri Feb 01 00:00:00 PST 1980 org/apache/hc/core5/version.properties

version.properties is accessible at runtime:

$ spark-shell --jars spark/v3.3/spark-runtime/build/libs/iceberg-spark-runtime-3.3_2.12-0.14.0-SNAPSHOT.jar

scala> import org.apache.iceberg.util.VersionPropertiesUtil;
import org.apache.iceberg.util.VersionPropertiesUtil

scala> VersionPropertiesUtil.gitHashFull()
res0: String = aceda8990a62e0b36a895cb5f5d49c4ee8d82d6f

scala> VersionPropertiesUtil.gitHash()
res1: String = aceda8990a

scala> VersionPropertiesUtil.projectName()
res0: String = iceberg-core

scala> VersionPropertiesUtil.projectVersion()
res1: String = 0.14.0-SNAPSHOT

In CI, projectVersion winds up being the same as gitHash.

@kbendick kbendick changed the title Build / Core - Add a Gradle task to create a build properties file for core module and accessor class [WIP] [Build/Core] - Add a Gradle task to create a build properties file for core module and accessor class Jul 3, 2022
@nastra
Copy link
Contributor

nastra commented Jul 4, 2022

In general the approach makes sense to me. However, when looking at the spark-runtime jar I noticed that there's a git.properties file with a bunch of information being added (looks like this actually comes from the apache-arrow lib). Here's the content of that file:

#Generated by Git-Commit-Id-Plugin
#Sat Jan 29 05:11:45 UTC 2022
git.build.user.email=
git.build.host=Mac-1643432868411.local
git.dirty=true
git.remote.origin.url=https\://github.com/apache/arrow
git.closest.tag.name=apache-arrow-7.0.0
git.commit.id.describe-short=apache-arrow-7.0.0-0-dirty
git.commit.user.email=szucs.krisztian@gmail.com
git.commit.time=29.01.2022 @ 00\:08\:24 UTC
git.commit.message.full=[Release] Update versions for 7.0.0
git.build.version=7.0.0
git.commit.message.short=[Release] Update versions for 7.0.0
git.commit.id.abbrev=e90472e
git.branch=e90472e35b40f58b17d408438bb8de1641bfe6ef
git.build.user.name=
git.closest.tag.commit.count=0
git.commit.id.describe=apache-arrow-7.0.0-0-ge90472e-dirty
git.commit.id=e90472e35b40f58b17d408438bb8de1641bfe6ef
git.tags=apache-arrow-7.0.0
git.build.time=29.01.2022 @ 05\:11\:45 UTC
git.commit.user.name=Krisztián Szűcs

I looked at https://github.com/n0mer/gradle-git-properties and was wondering whether we could actually use that plugin to generate a git.properties file for Iceberg. It seems the plugin lets you control which info goes into that file. Also note that we could include the Iceberg version via a customProperty.

For completeness, here's how Arrow generates that file: https://github.com/apache/arrow/blob/master/java/pom.xml#L249-L290

@kbendick
Copy link
Contributor Author

kbendick commented Jul 5, 2022

In general the approach makes sense to me. However, when looking at the spark-runtime jar I noticed that there's a git.properties file with a bunch of information being added (looks like this actually comes from the apache-arrow lib). Here's the content of that file:

#Generated by Git-Commit-Id-Plugin
#Sat Jan 29 05:11:45 UTC 2022
git.build.user.email=
git.build.host=Mac-1643432868411.local
git.dirty=true
git.remote.origin.url=https\://github.com/apache/arrow
git.closest.tag.name=apache-arrow-7.0.0
git.commit.id.describe-short=apache-arrow-7.0.0-0-dirty
git.commit.user.email=szucs.krisztian@gmail.com
git.commit.time=29.01.2022 @ 00\:08\:24 UTC
git.commit.message.full=[Release] Update versions for 7.0.0
git.build.version=7.0.0
git.commit.message.short=[Release] Update versions for 7.0.0
git.commit.id.abbrev=e90472e
git.branch=e90472e35b40f58b17d408438bb8de1641bfe6ef
git.build.user.name=
git.closest.tag.commit.count=0
git.commit.id.describe=apache-arrow-7.0.0-0-ge90472e-dirty
git.commit.id=e90472e35b40f58b17d408438bb8de1641bfe6ef
git.tags=apache-arrow-7.0.0
git.build.time=29.01.2022 @ 05\:11\:45 UTC
git.commit.user.name=Krisztián Szűcs

I looked at https://github.com/n0mer/gradle-git-properties and was wondering whether we could actually use that plugin to generate a git.properties file for Iceberg. It seems the plugin lets you control which info goes into that file. Also note that we could include the Iceberg version via a customProperty.

For completeness, here's how Arrow generates that file: https://github.com/apache/arrow/blob/master/java/pom.xml#L249-L290

This is great. I'll try with this, as I can't find a clean way to deal with the additional file that's generated from the git tree.

@rdblue
Copy link
Contributor

rdblue commented Jul 5, 2022

@nastra, @kbendick, the Iceberg release process builds the binaries that are pushed to maven central from the source release tarball, where there is no git directory. To use a plugin like this, we will need to make sure we have some process for generating the file and probably putting it in the source tree as part of the process, like how we add the version.txt file.

Also, I couldn't tell from a brief look at the README, can that plugin produce a git.properties file that is not in the root of the Jar? We want to make sure we don't have a name conflict, so I'd prefer to have org/apache/iceberg/git.properties or something similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a local variable?

Copy link
Contributor Author

@kbendick kbendick Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I made everything static because most things in util are static and because the generated properties file is relative to this path.

But this could be a local variable if we make VersionPropertiesUtil have a public constructor etc. Maybe we should move it out of util? Being in util also means that we have to place the generated properties file in main/src/resources/org/apache/iceberg/util for it to be readable from here which I don't care for.

@kbendick kbendick force-pushed the kb-generate-properties-file-for-core-on-build branch from e3c5bff to e6efb42 Compare July 5, 2022 17:14
@kbendick
Copy link
Contributor Author

kbendick commented Jul 5, 2022

@nastra, @kbendick, the Iceberg release process builds the binaries that are pushed to maven central from the source release tarball, where there is no git directory. To use a plugin like this, we will need to make sure we have some process for generating the file and probably putting it in the source tree as part of the process, like how we add the version.txt file.

Also, I couldn't tell from a brief look at the README, can that plugin produce a git.properties file that is not in the root of the Jar? We want to make sure we don't have a name conflict, so I'd prefer to have org/apache/iceberg/git.properties or something similar.

I'm not sure about the plugin. The way I'm doing it now definitely adds the file to the source tree (though I'd like to only do that for the release process). For now, I'm going to remove the .gitignore entry for this file and then possibly add steps in the release process to ensure version.properties is committed.

@nastra
Copy link
Contributor

nastra commented Jul 6, 2022

I have a minimal example with the below diffset that produces a my-git-file.properties in the root folder of the project (note that this file isn't added to jars):

$ git diff
diff --git a/build.gradle b/build.gradle
index c6395fa9d..91d654f1a 100644
--- a/build.gradle
+++ b/build.gradle
@@ -34,6 +34,7 @@ buildscript {
     classpath 'me.champeau.jmh:jmh-gradle-plugin:0.6.6'
     classpath "com.github.alisiikh:gradle-scalastyle-plugin:3.4.1"
     classpath 'com.palantir.gradle.revapi:gradle-revapi:1.7.0'
+    classpath "com.gorylenko.gradle-git-properties:gradle-git-properties:2.4.1"
   }
 }
 
@@ -42,8 +43,19 @@ plugins {
 }
 
 try {
-  // apply this plugin in a try-catch block so that we can handle cases without .git directory
+  // apply these plugins in a try-catch block so that we can handle cases without .git directory
   apply plugin: 'com.palantir.git-version'
+  apply plugin: "com.gorylenko.gradle-git-properties"
+
+  gitProperties {
+    // this produces a git properties file in the root of the project
+    gitPropertiesName = "my-git-file.properties"
+    gitPropertiesResourceDir = file("$project.rootDir")
+  }
 } catch (Exception e) {
   project.logger.error(e.getMessage())
 }
nastra@nastra-datastax-pc:~/Development/workspace/iceberg$ ./gradlew clean generateGitProperties

BUILD SUCCESSFUL in 3s
25 actionable tasks: 24 executed, 1 from cache
nastra@nastra-datastax-pc:~/Development/workspace/iceberg$ find . -name "my-git-file.properties"
./my-git-file.properties

I think what's unclear to me is whether this file is something that we'd want to add to every single jar (since that would actually make sense imo to have some git info in jars) or whether we just want to generate a single properties file in the root of the project.

In case we want to add the properties file to the root and to every single jar, then this minimal example does the job:

$ git diff
diff --git a/build.gradle b/build.gradle
index c6395fa9d..16e39bfb3 100644
--- a/build.gradle
+++ b/build.gradle
@@ -34,6 +34,7 @@ buildscript {
     classpath 'me.champeau.jmh:jmh-gradle-plugin:0.6.6'
     classpath "com.github.alisiikh:gradle-scalastyle-plugin:3.4.1"
     classpath 'com.palantir.gradle.revapi:gradle-revapi:1.7.0'
+    classpath "com.gorylenko.gradle-git-properties:gradle-git-properties:2.4.1"
   }
 }
 
@@ -42,8 +43,15 @@ plugins {
 }
 
 try {
-  // apply this plugin in a try-catch block so that we can handle cases without .git directory
+  // apply these plugins in a try-catch block so that we can handle cases without .git directory
   apply plugin: 'com.palantir.git-version'
+  apply plugin: "com.gorylenko.gradle-git-properties"
+
+  gitProperties {
+    // this produces a git properties file in the root of the project
+    gitPropertiesName = "my-git-file.properties"
+    gitPropertiesResourceDir = file("$project.rootDir")
+  }
 } catch (Exception e) {
   project.logger.error(e.getMessage())
 }
@@ -72,6 +80,7 @@ allprojects {
 subprojects {
   apply plugin: 'nebula.dependency-recommender'
   apply plugin: 'java-library'
+  apply plugin: "com.gorylenko.gradle-git-properties"
 
   configurations {
     testImplementation.extendsFrom compileOnly
@@ -149,6 +158,11 @@ subprojects {
       exceptionFormat "full"
     }
   }
+
+  gitProperties {
+    // this produces a git properties file in the root of each jar
+    gitPropertiesName = "my-git-file.properties"
+  }
 }
$ find . -name "my-git-file.properties"
./core/build/resources/main/my-git-file.properties
./aliyun/build/resources/main/my-git-file.properties
./my-git-file.properties
./parquet/build/resources/main/my-git-file.properties
./orc/build/resources/main/my-git-file.properties
./data/build/resources/main/my-git-file.properties
./bundled-guava/build/resources/main/my-git-file.properties
./flink/v1.15/flink-runtime/build/resources/main/my-git-file.properties
./flink/v1.15/flink/build/resources/main/my-git-file.properties
./flink/build/resources/main/my-git-file.properties
./spark/v3.3/spark-runtime/build/resources/main/my-git-file.properties
./spark/v3.3/spark/build/resources/main/my-git-file.properties
./spark/v3.3/spark-extensions/build/resources/main/my-git-file.properties
./spark/build/resources/main/my-git-file.properties
./dell/build/resources/main/my-git-file.properties
./hive-runtime/build/resources/main/my-git-file.properties
./aws/build/resources/main/my-git-file.properties
./api/build/resources/main/my-git-file.properties
./hive-metastore/build/resources/main/my-git-file.properties
./mr/build/resources/main/my-git-file.properties
./arrow/build/resources/main/my-git-file.properties
./common/build/resources/main/my-git-file.properties
./nessie/build/resources/main/my-git-file.properties
./gcp/build/resources/main/my-git-file.properties
./pig/build/resources/main/my-git-file.properties

Given that we don't specify which info we want, the full content of the file looks like this:

git.branch=error-prone-warnings-core-module
git.build.host=nastra-datastax-pc
git.build.user.email=etudenhoefner@gmail.com
git.build.user.name=Eduard Tudenhoefner
git.build.version=0.14.0-SNAPSHOT
git.closest.tag.commit.count=661
git.closest.tag.name=release-base-0.13.0
git.commit.id=fa8d8c5880202be71e4986091d1621d291ae01cf
git.commit.id.abbrev=fa8d8c5
git.commit.id.describe=release-base-0.13.0-661-gfa8d8c5-dirty
git.commit.message.full=Core\: Fix ErrorProne Warnings\n
git.commit.message.short=Core\: Fix ErrorProne Warnings
git.commit.time=2022-07-05T10\:44\:45+0200
git.commit.user.email=etudenhoefner@gmail.com
git.commit.user.name=Eduard Tudenhoefner
git.dirty=true
git.remote.origin.url=https\://github.com/apache/iceberg.git
git.tags=
git.total.commit.count=2919

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with double-check locking, I believe this should be marked volatile to prevent cache incoherence.

@kbendick kbendick force-pushed the kb-generate-properties-file-for-core-on-build branch from c6c2566 to 8717211 Compare July 6, 2022 18:10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make these Optional<String> for non-release builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current plan to add the properties file to the release tarball, we can include these all the time.

@kbendick kbendick force-pushed the kb-generate-properties-file-for-core-on-build branch from c7603c0 to a83dad4 Compare July 7, 2022 20:14
kbendick added 11 commits July 7, 2022 13:14
… output file

* Add a version.properties output file for core module to access build git sha and other property attributes at runtime
* Add a utility class, VersionPropertiesUtil, to access the properties file at runtime
* Add tests to ensure that VersionPropertiesUtil can read the properties file
@rdblue
Copy link
Contributor

rdblue commented Jul 7, 2022

@kbendick, can you either open a separate PR or update this one to only include the gradle task that drops a file into core/build/iceberg-build.properties? I'd like to get that reviewed quickly, without lots of unrelated things.

@kbendick
Copy link
Contributor Author

kbendick commented Jul 7, 2022

@kbendick, can you either open a separate PR or update this one to only include the gradle task that drops a file into core/build/iceberg-build.properties? I'd like to get that reviewed quickly, without lots of unrelated things.

I saved everything that was in this branch elsewhere, but I can open a new PR if we’d like.

@kbendick kbendick changed the title [WIP] [Build/Core] - Add a Gradle task to create a build properties file for core module and accessor class [Build/Core] - Add a Gradle task to create a git version properties file for core module Jul 7, 2022
@github-actions github-actions bot removed the INFRA label Jul 7, 2022
@github-actions github-actions bot added the INFRA label Jul 7, 2022
Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kbendick
Copy link
Contributor Author

Closed in favor of #5228.

Might come back to this task to add into our manifests at a later date.

@kbendick kbendick closed this Jul 12, 2022
@kbendick kbendick deleted the kb-generate-properties-file-for-core-on-build branch July 12, 2022 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments