Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-4479: Eagerly Init/Load FileSystem In Tez Task Containers #274

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shameersss1
Copy link
Contributor

Initing/Loading FileSystem such as S3 can take ~10s - ~20s when called for the first time and the time taken for subsequent calls are negligable. If we can load the FileSystem much before it is used can help us to save some time. It can be especially useful in case of pre-warm Tez containers where the Tez task containers comes up when the Application Master (AM) is launched and not on-demand which is the default behavior. It can be also useful in cases where the Mapper tasks spends considerable time consuming the upstream shuffle data and then heads to process some FileSystem operations, in all such cases we have few FileSystem load up time.

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 25m 4s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 6m 35s Maven dependency ordering for branch
+1 💚 mvninstall 10m 49s master passed
+1 💚 compile 0m 59s master passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu122.04
+1 💚 compile 0m 53s master passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~22.04-b09
+1 💚 checkstyle 0m 56s master passed
+1 💚 javadoc 1m 8s master passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 56s master passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~22.04-b09
+0 🆗 spotbugs 0m 43s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 10s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 9s Maven dependency ordering for patch
+1 💚 mvninstall 0m 37s the patch passed
+1 💚 compile 0m 40s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu122.04
+1 💚 javac 0m 40s the patch passed
+1 💚 compile 0m 35s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~22.04-b09
+1 💚 javac 0m 35s the patch passed
+1 💚 checkstyle 0m 20s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 37s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 36s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~22.04-b09
-1 ❌ findbugs 0m 41s tez-runtime-internals generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
_ Other Tests _
+1 💚 unit 2m 12s tez-api in the patch passed.
+1 💚 unit 0m 35s tez-runtime-internals in the patch passed.
+1 💚 asflicense 0m 21s The patch does not generate ASF License warnings.
59m 1s
Reason Tests
FindBugs module:tez-runtime-internals
Incorrect lazy initialization of static field org.apache.tez.runtime.task.TezChild.eagerInitFsPool in org.apache.tez.runtime.task.TezChild.eagerInitFileSystemPaths(Configuration) At TezChild.java:field org.apache.tez.runtime.task.TezChild.eagerInitFsPool in org.apache.tez.runtime.task.TezChild.eagerInitFileSystemPaths(Configuration) At TezChild.java:[lines 512-513]
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-274/1/artifact/out/Dockerfile
GITHUB PR #274
JIRA Issue TEZ-4479
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 582a4107fe97 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 6bd6f9c
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~22.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~22.04-b09
findbugs https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-274/1/artifact/out/new-findbugs-tez-runtime-internals.html
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-274/1/testReport/
Max. process+thread count 385 (vs. ulimit of 5500)
modules C: tez-api tez-runtime-internals U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-274/1/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@shameersss1
Copy link
Contributor Author

@abstractdog Could you please review the changes?

* String value. Comma seperated list of FileSystem paths which needs to be eagerly initialized.
* For example s3://bucket/,file://,hdfs://localhost:8020/
*/
@ConfigurationScope(Scope.AM)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Scope is VERTEX as we're doing this in TezChild
  2. constant name should contain TEZ_ prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

public void run() {
try {
new Path(path).getFileSystem(conf);
LOG.info("Eagerly initiated FileSystem at path {}", path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to measure the time spent with initialization and print it to logs too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

if (eagerInitFsPool == null && !eagerInitPaths.isEmpty()) {
eagerInitFsPool = Executors.newCachedThreadPool(new ThreadFactoryBuilder()
.setDaemon(true)
.setPriority(Thread.MAX_PRIORITY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the code, I can see we tend not to set priority in thread pools...I guess we can remove this to simplify this code further

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

@Override
public void run() {
try {
new Path(path).getFileSystem(conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this filesystem be closed? does it hold any resources when it's open?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be gc'ed eventually? I can see any instances in the code were we are not explicitly closing the filesystem object. For example : https://github.com/apache/tez/blob/master/tez-plugins/tez-protobuf-history-plugin/src/main/java/org/apache/tez/dag/history/logging/proto/DatePartitionedLogger.java#L136

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 22m 58s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 6m 15s Maven dependency ordering for branch
+1 💚 mvninstall 10m 15s master passed
+1 💚 compile 1m 4s master passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu122.04
+1 💚 compile 1m 1s master passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~22.04-b09
+1 💚 checkstyle 1m 5s master passed
+1 💚 javadoc 1m 12s master passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 1m 5s master passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~22.04-b09
+0 🆗 spotbugs 0m 43s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 2m 4s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 0m 37s the patch passed
+1 💚 compile 0m 38s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu122.04
+1 💚 javac 0m 38s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~22.04-b09
+1 💚 javac 0m 34s the patch passed
+1 💚 checkstyle 0m 21s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 36s the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 37s the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~22.04-b09
-1 ❌ findbugs 0m 40s tez-runtime-internals generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
_ Other Tests _
+1 💚 unit 2m 11s tez-api in the patch passed.
+1 💚 unit 0m 37s tez-runtime-internals in the patch passed.
+1 💚 asflicense 0m 21s The patch does not generate ASF License warnings.
56m 31s
Reason Tests
FindBugs module:tez-runtime-internals
Incorrect lazy initialization of static field org.apache.tez.runtime.task.TezChild.eagerInitFsPool in org.apache.tez.runtime.task.TezChild.eagerInitFileSystemPaths(Configuration) At TezChild.java:field org.apache.tez.runtime.task.TezChild.eagerInitFsPool in org.apache.tez.runtime.task.TezChild.eagerInitFileSystemPaths(Configuration) At TezChild.java:[lines 512-513]
Dead store to fs in org.apache.tez.runtime.task.TezChild$3.run() At TezChild.java:org.apache.tez.runtime.task.TezChild$3.run() At TezChild.java:[line 524]
Subsystem Report/Notes
Docker ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-274/2/artifact/out/Dockerfile
GITHUB PR #274
JIRA Issue TEZ-4479
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux a0619e798ce7 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 25a9536
Default Java Private Build-1.8.0_362-8u362-ga-0ubuntu1~22.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u362-ga-0ubuntu1~22.04-b09
findbugs https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-274/2/artifact/out/new-findbugs-tez-runtime-internals.html
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-274/2/testReport/
Max. process+thread count 388 (vs. ulimit of 5500)
modules C: tez-api tez-runtime-internals U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-274/2/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@@ -503,6 +506,33 @@ public static TezChild newTezChild(Configuration conf, String host, int port, St
hadoopShim);
}

private static void eagerInitFileSystemPaths(Configuration conf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be good to measure the time spent on FS init (even for cloudstores) and share the details, before trying out this patch.

Reason: This get inited in TezChild, but for running container, it closes the FileSystem explicitly via "FileSystem.closeAllForUGI(childUGI);". Refer "public ContainerExecutionResult run()" method.

Even if this gets early inited, it will not have major impact in container reuse scenario. It will be good to measure and find out the timing spent in FS init.

.build());
}
for (String path : eagerInitPaths) {
eagerInitFsPool.execute(new Runnable() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before rushing to create lots of fs instances in parallel, look at HADOOP-17313 and why we actually added semaphores to stop apps like tez creating too many at the same time. this code may cause overload problems, or the fs semaphore will hold you back for safety.

best to look at why its taking so long; if s3a bucket existence checks aren't involved, then it'll be whatever auth mechanism is plugged in. same for abfs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants