Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIGTOP-4044: Enhance Bigtop with Concurrent Compilation Support for A… #1212

Merged
merged 1 commit into from
Dec 28, 2023

Conversation

JiaLiangC
Copy link
Contributor

@JiaLiangC JiaLiangC commented Dec 15, 2023

…dditional Components

Description of PR

Background:
Within the components maintained by Bigtop, a significant portion is built using Java and relies on Maven as the build tool.

Rationale:
Compiling components that consist of numerous modules can be a time-consuming process. For instance, some components contain hundreds of modules, and compiling them one by one consumes a substantial amount of time. Even when all dependencies are pre-downloaded for a second compilation, the process remains slow due to the sequential nature of compilation. Additionally, compiling all components together still results in sequential compilation, making it challenging to fully leverage CPU resources and reduce compilation time significantly. Consequently, repetitive compilation and testing phases impose prolonged waiting periods.

Proposal:
I propose the introduction of a new parameter that allows users to toggle parallel compilation for components built using Maven, thus empowering them to align compilation practices with their specific needs.

Related Pull Requests (PRs):
This discussion can be divided into two main parts:

The first part entails adding parallel compilation functionality and enabling it for components that have undergone testing without encountering additional issues related to parallel compilation. These components include Hive, HBase, Flink, ZooKeeper, Alluxio, Phoenix, Livy, Zeppelin

The second part involves enabling parallel compilation for components that face challenges with parallel compilation and necessitate additional patches to address Maven's parallel compilation capabilities. These components include Ranger, Tez, Hadoop, Spark

Compilation Environment: CentOS 7 x86_64, 16C, SSD

**The following table shows the time comparison for repeated compilations, where dependencies are already downloaded, before and after using parallel compilation.

As can be seen, there is an overall performance improvement of about 2-3 times. If it's the first compilation, given the massive dependencies that need to be downloaded, the advantage of parallel compilation becomes even more apparent. For example, the first compilation of Hadoop 3 was reduced from nearly 3 hours to about 1 hour.**

Component Time Before Time After
Alluxio 21min 07:43min
Hive 05:33min 03:04min
HBase 06:18min 02:55min
Zookeeper 01:25min 35s
Livy 03:29min 03:12min
Phoenix 05:32min 11:23min
Zeppelin 14:15min 13:19min
Flink 14:16min 36:27min

How was this patch tested?

manual test
test compile apache hbase in parallel on ubuntu22 x8664
image

test hive on centos7 x86_64
image

On CentOS 7 x86_64, the parallel compilation speed of Alluxio is one-third of the non-parallel compilation speed, taking only 7 minutes.
image

phoenix centos7 x86_64
image

livy centos7 x86_64
image

zeppelin on centos7 x86_64

image

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'BIGTOP-3638. Your PR title ...')?
  • Make sure that newly added files do not have any licensing issues. When in doubt refer to https://www.apache.org/licenses/

@MacChen01
Copy link
Contributor

What machine you uesd?

@JiaLiangC
Copy link
Contributor Author

What machine you uesd?
@MacChen01
centos7 16C 16G SSD and maven dependency already downloaded.

@iwasakims
Copy link
Member

@JiaLiangC I got following error on Ubuntu 22.04 x86_64.

$ ./gradlew zookeeper-clean zookeeper-pkg                                                                                                                                                                                                                                                                   
...                                                                                                                                                                                                                                                                                                         
+ mvn clean install -DskipTests -Pfull-build                                                                                                                                                                                                                                                                
[ERROR] Error executing Maven.                                                                                                                                                                                                                                                                              
[ERROR] Invalid threads value: '@CORES@'. Supported are int values.                                                                                                                                                                                                                                         
make[1]: *** [debian/rules:32: override_dh_auto_build] Error 1                                                                                                                                                                                                                                              
make[1]: Leaving directory '/home/iwasakims/srcs/bigtop/output/zookeeper/zookeeper-3.7.2'                                                                                                                                                                                                                   
make: *** [debian/rules:29: build] Error 2                                                                                                                                                                                                                                                                  
dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2                                                                                                                                                                                                                              

@iwasakims
Copy link
Member

How about conditionally adding Maven argument as done for some products in do-component-build instead of using .mvn/maven.config? We can access vars in bigtop.bom via environment variables.

@JiaLiangC
Copy link
Contributor Author

@iwasakims
Good idea! We just need to pass in parameters externally, and then, if the component supports concurrent compilation, we can simply handle the passed-in parameters by adding the corresponding Maven options during the do-component-build process. For components that do not support this, there's no need to process the parameter! I'll make some modifications and then test it again.

Additionally, I will provide a comparison of the time consumption before and after enabling concurrent compilation for all components.

@JiaLiangC
Copy link
Contributor Author

How about conditionally adding Maven argument as done for some products in do-component-build instead of using .mvn/maven.config? We can access vars in bigtop.bom via environment variables.

@iwasakims
I've reconsidered your approach and there's something I don't quite understand. What you're suggesting is:

The user compiles with a parallel compilation parameter, like compileThreads.
This parameter is then defined as an environmental variable.
Next, in the spec file or control file, this variable is passed to the do-component script for processing.
What's the benefit of this approach? It seems like it would lead to extensive, invasive code changes, requiring modifications to the do-component scripts and either the spec or control scripts of all involved components.

@JiaLiangC
Copy link
Contributor Author

JiaLiangC commented Dec 20, 2023

I have another idea:
We could save the mvn patch file in the template directory. For components that support concurrent compilation, we can simply copy it over. To determine which components support concurrent compilation, we might need to add an attribute in the bigtop.bom file to mark those components that support this feature.
The advantage of this method is that it involves very minimal code changes.

@JiaLiangC
Copy link
Contributor Author

@iwasakims Hello, based on our previous discussion about the method of introducing the parallel compilation patch, can we decide on a suitable plan? Then, I will proceed with subsequent testing based on this plan.

@iwasakims
Copy link
Member

@JiaLiangC

We could save the mvn patch file in the template directory. For components that support concurrent compilation, we can simply copy it over.

It would be nice if this can cover both rpm and deb. I prefer directly generating the maven.config patch (in package.gradle from string template) to modifiying a file with regex.

To determine which components support concurrent compilation, we might need to add an attribute in the bigtop.bom file to mark those components that support this feature.

I think just using project.hasProperty('compileThreads') as done in current patch would be enough with comments (and commented out defaults) in bigtop.bom.

@JiaLiangC
Copy link
Contributor Author

@iwasakims
Hello, I would like to ask about your reply mentioning "comments (and commented out defaults) in bigtop.bom". Does this refer to the 'label' under components, similar to how packaging = 'rpm' is used?

Currently, all Java components support concurrent compilation, so we only need to add maven_parallel_compile = False for the few components that do not support Java compilation. Alternatively, we could add maven_parallel_compile = True for each Java component individually. Which approach do you think is more appropriate?

@iwasakims
Copy link
Member

@JiaLiangC How about maven_parallel_build = True prefering opt-in style and following the name of the feature? Parallel processing seems not to be limited to compilation based on your experience.

@JiaLiangC
Copy link
Contributor Author

.
@iwasakims
Sorry, my English is not very good, so I don't quite understand what you mean. Could you please explain it in more detail?

@JiaLiangC
Copy link
Contributor Author

@iwasakims If you mean just simply changing the label name from maven_parallel_compile to maven_parallel_build, in the style of https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3, then of course that's better.

@JiaLiangC
Copy link
Contributor Author

@iwasakims
Can you help review this PR? It has already been tested on CentOS 7 and Ubuntu 22.

image

Copy link
Member

@iwasakims iwasakims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I tested this on Ubuntu 22.04 by building some projects with -PbuildThreads=2C.

@@ -394,6 +416,8 @@ def genTasks = { target ->
delete ("$PKG_BUILD_DIR/deb")
def final DEB_BLD_DIR = "$PKG_BUILD_DIR/deb/$NAME-${PKG_VERSION}"
def final DEB_PKG_DIR = "$PKG_BUILD_DIR/deb/$PKG_NAME-${PKG_VERSION}-${BIGTOP_BUILD_STAMP}"
def final ENABLE_MAVEN_PARALLEL_BUILD = config.bigtop.components[target].maven_parallel_build
def final MAVEN_BUILD_THREADS = project.hasProperty('buildThreads') ? project.property('buildThreads') : null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation/comments for the feature could be added later. We do not allow number without C while Maven do?

@iwasakims
Copy link
Member

Ideally, we should examine the warnings if we use parallel builds for creating release.

[INFO] --- mavanagaiata:0.9.4:commit (find-current-git-revision) @ zookeeper-client ---                                                                                                                                                                                                                     
[WARNING] *****************************************************************                                                                                                                                                                                                                                 
[WARNING] * Your build is requesting parallel execution, but this         *                                                                                                                                                                                                                                 
[WARNING] * project contains the following plugin(s) that have goals not  *                                                                                                                                                                                                                                 
[WARNING] * marked as thread-safe to support parallel execution.          *                                                                                                                                                                                                                                 
[WARNING] * While this /may/ work fine, please look for plugin updates    *                                                                                                                                                                                                                                 
[WARNING] * and/or request plugins be made thread-safe.                   *                                                                                                                                                                                                                                 
[WARNING] * If reporting an issue, report it against the plugin in        *                                                                                                                                                                                                                                 
[WARNING] * question, not against Apache Maven.                           *                                                                                                                                                                                                                                 
[WARNING] *****************************************************************                                                                                                                                                                                                                                 
[WARNING] The following plugins are not marked as thread-safe in Apache ZooKeeper - Documentation:                                                                                                                                                                                                          
[WARNING]   com.ruleoftech:markdown-page-generator-plugin:2.1.0                                                                                                                                                                                                                                             
[WARNING]                                                                                                                                                                                                                                                                                                   
[WARNING] Enable debug to see precisely which goals are not marked as thread-safe.                                                                                                                                                                                                                          
[WARNING] *****************************************************************                                                                                                                                                                                                                                 
[INFO]                                                                                                                                                                                                                                                                                                      
[INFO] --- maven-clean-plugin:3.1.0:clean (default-clean) @ zookeeper-docs ---                                                                                                                                                                                                                              
[WARNING] *****************************************************************                                                                                                                                                                                                                                 
[WARNING] * Your build is requesting parallel execution, but this         *                                                                                                                                                                                                                                 
[INFO]                                                                                                                                                                                                                                                                                                      
[INFO] --- mavanagaiata:0.9.4:commit (find-current-git-revision) @ zookeeper-docs ---                                                                                                                                                                                                                       
[WARNING] * project contains the following plugin(s) that have goals not  *                                                                                                                                                                                                                                 
[WARNING] * marked as thread-safe to support parallel execution.          *                                                                                                                                                                                                                                 
[WARNING] * While this /may/ work fine, please look for plugin updates    *                                                                                                                                                                                                                                 
[WARNING] * and/or request plugins be made thread-safe.                   *                                                                                                                                                                                                                                 
[WARNING] * If reporting an issue, report it against the plugin in        *                                                                                                                                                                                                                                 
[WARNING] * question, not against Apache Maven.                           *                                                                                                                                                                                                                                 
[WARNING] *****************************************************************                                                                                                                                                                                                                                 
[WARNING] The following plugins are not marked as thread-safe in Apache ZooKeeper - Jute:                                                                                                                                                                                                                   
[WARNING]   org.codehaus.mojo:javacc-maven-plugin:2.6                                                                                                                                                                                                                                                       
[WARNING]                                                                                                                                                                                                                                                                                                   
[WARNING] Enable debug to see precisely which goals are not marked as thread-safe.                                                                                                                                                                                                                          
[WARNING] *****************************************************************

@iwasakims iwasakims merged commit cfb8acc into apache:master Dec 28, 2023
@JiaLiangC
Copy link
Contributor Author

@iwasakims
Thank you for helping with the review.
Next, i will:
1.Add the corresponding documentation.
2.Enhance buildThreads to support numeric values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants