-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BIGTOP-4044: Enhance Bigtop with Concurrent Compilation Support for A… #1212
Conversation
What machine you uesd? |
|
@JiaLiangC I got following error on Ubuntu 22.04 x86_64.
|
How about conditionally adding Maven argument as done for some products in do-component-build instead of using |
@iwasakims Additionally, I will provide a comparison of the time consumption before and after enabling concurrent compilation for all components. |
@iwasakims The user compiles with a parallel compilation parameter, like compileThreads. |
I have another idea: |
@iwasakims Hello, based on our previous discussion about the method of introducing the parallel compilation patch, can we decide on a suitable plan? Then, I will proceed with subsequent testing based on this plan. |
It would be nice if this can cover both rpm and deb. I prefer directly generating the maven.config patch (in package.gradle from string template) to modifiying a file with regex.
I think just using |
@iwasakims Currently, all Java components support concurrent compilation, so we only need to add maven_parallel_compile = False for the few components that do not support Java compilation. Alternatively, we could add maven_parallel_compile = True for each Java component individually. Which approach do you think is more appropriate? |
@JiaLiangC How about |
|
@iwasakims If you mean just simply changing the label name from maven_parallel_compile to maven_parallel_build, in the style of https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3, then of course that's better. |
…dditional Components
d5c1420
to
2a7bab7
Compare
@iwasakims |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I tested this on Ubuntu 22.04 by building some projects with -PbuildThreads=2C
.
@@ -394,6 +416,8 @@ def genTasks = { target -> | |||
delete ("$PKG_BUILD_DIR/deb") | |||
def final DEB_BLD_DIR = "$PKG_BUILD_DIR/deb/$NAME-${PKG_VERSION}" | |||
def final DEB_PKG_DIR = "$PKG_BUILD_DIR/deb/$PKG_NAME-${PKG_VERSION}-${BIGTOP_BUILD_STAMP}" | |||
def final ENABLE_MAVEN_PARALLEL_BUILD = config.bigtop.components[target].maven_parallel_build | |||
def final MAVEN_BUILD_THREADS = project.hasProperty('buildThreads') ? project.property('buildThreads') : null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation/comments for the feature could be added later. We do not allow number without C
while Maven do?
Ideally, we should examine the warnings if we use parallel builds for creating release.
|
@iwasakims |
…dditional Components
Description of PR
Background:
Within the components maintained by Bigtop, a significant portion is built using Java and relies on Maven as the build tool.
Rationale:
Compiling components that consist of numerous modules can be a time-consuming process. For instance, some components contain hundreds of modules, and compiling them one by one consumes a substantial amount of time. Even when all dependencies are pre-downloaded for a second compilation, the process remains slow due to the sequential nature of compilation. Additionally, compiling all components together still results in sequential compilation, making it challenging to fully leverage CPU resources and reduce compilation time significantly. Consequently, repetitive compilation and testing phases impose prolonged waiting periods.
Proposal:
I propose the introduction of a new parameter that allows users to toggle parallel compilation for components built using Maven, thus empowering them to align compilation practices with their specific needs.
Related Pull Requests (PRs):
This discussion can be divided into two main parts:
The first part entails adding parallel compilation functionality and enabling it for components that have undergone testing without encountering additional issues related to parallel compilation. These components include Hive, HBase, Flink, ZooKeeper, Alluxio, Phoenix, Livy, Zeppelin
The second part involves enabling parallel compilation for components that face challenges with parallel compilation and necessitate additional patches to address Maven's parallel compilation capabilities. These components include Ranger, Tez, Hadoop, Spark
Compilation Environment: CentOS 7 x86_64, 16C, SSD
**The following table shows the time comparison for repeated compilations, where dependencies are already downloaded, before and after using parallel compilation.
As can be seen, there is an overall performance improvement of about 2-3 times. If it's the first compilation, given the massive dependencies that need to be downloaded, the advantage of parallel compilation becomes even more apparent. For example, the first compilation of Hadoop 3 was reduced from nearly 3 hours to about 1 hour.**
How was this patch tested?
manual test
test compile apache hbase in parallel on ubuntu22 x8664
test hive on centos7 x86_64
On CentOS 7 x86_64, the parallel compilation speed of Alluxio is one-third of the non-parallel compilation speed, taking only 7 minutes.
phoenix centos7 x86_64
livy centos7 x86_64
zeppelin on centos7 x86_64
For code changes: