Skip to content

HDDS-15102. Avoid ArchiveOutputStream.createArchiveEntry due to libnss issue#10117

Merged
adoroszlai merged 4 commits intoapache:masterfrom
symious:HDDS-15102
Apr 25, 2026
Merged

HDDS-15102. Avoid ArchiveOutputStream.createArchiveEntry due to libnss issue#10117
adoroszlai merged 4 commits intoapache:masterfrom
symious:HDDS-15102

Conversation

@symious
Copy link
Copy Markdown
Contributor

@symious symious commented Apr 24, 2026

What changes were proposed in this pull request?

We encountered DN randomly crash in our cluster, should be related to sssd bugs. Some detailed error logs can be found below.

This ticket is a workaround for this issue, so that sssd won't be used to trigger the crash of Datanode.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f2bb5f5dd2b, pid=2252464, tid=2472131
#
# JRE version: OpenJDK Runtime Environment (21.0.9) (build 21.0.9-internal-adhoc.root.jdk)
# Java VM: OpenJDK 64-Bit Server VM (21.0.9-internal-adhoc.root.jdk, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-amd64)
# Problematic frame:
# C  [libnss_sss.so.2+0x6d2b]  _nss_sss_endnetent+0x4b 
Stack: [0x00007f26e0bf6000,0x00007f26e0cf7000],  sp=0x00007f26e0cf4f10,  free space=1019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libnss_sss.so.2+0x6d2b]  _nss_sss_endnetent+0x4b
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 35103  sun.nio.fs.UnixNativeDispatcher.getpwuid(I)[B java.base@21.0.9-internal (0 bytes) @ 0x00007f2ba229ae19 [0x00007f2ba229adc0+0x0000000000000059]
J 47972 c2 org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.includeFile(Ljava/io/File;Ljava/lang/String;Lorg/apache/commons/compress/archivers/ArchiveOutputStream;)V (80 bytes) @ 0x00007f2ba1401160 [0x00007f2ba14004c0+0x0000000000000ca0]
J 48138 c1 org.apache.hadoop.ozone.container.keyvalue.TarContainerPacker.pack(Lorg/apache/hadoop/ozone/container/common/interfaces/Container;Ljava/io/OutputStream;)V (105 bytes) @ 0x00007f2b98c6a3ac [0x00007f2b98c69e80+0x000000000000052c]
J 48133 c1 org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.packContainerToDestination(Ljava/io/OutputStream;Lorg/apache/hadoop/ozone/container/common/interfaces/ContainerPacker;)V (63 bytes) @ 0x00007f2b998a1aa4 [0x00007f2b998a17c0+0x00000000000002e4]
J 48131 c1 org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.exportContainerData(Ljava/io/OutputStream;Lorg/apache/hadoop/ozone/container/common/interfaces/ContainerPacker;)V (166 bytes) @ 0x00007f2b989993ec [0x00007f2b98998b60+0x000000000000088c]
J 48128 c1 org.apache.hadoop.ozone.container.replication.OnDemandContainerReplicationSource.copyData(JLjava/io/OutputStream;Lorg/apache/hadoop/ozone/container/replication/CopyContainerCompression;)V (58 bytes) @ 0x00007f2b98e71d04 [0x00007f2b98e70ea0+0x0000000000000e64]
J 48118 c1 org.apache.hadoop.ozone.container.replication.GrpcReplicationService.download(Lorg/apache/hadoop/hdds/protocol/datanode/proto/ContainerProtos$CopyContainerRequestProto;Lorg/apache/ratis/thirdparty/io/grpc/stub/StreamObserver;)V (296 bytes) @ 0x00007f2b999c4d14 [0x00007f2b999c4360+0x00000000000009b4]
J 48117 c1 org.apache.hadoop.hdds.protocol.datanode.proto.IntraDatanodeProtocolServiceGrpc$MethodHandlers.invoke(Ljava/lang/Object;Lorg/apache/ratis/thirdparty/io/grpc/stub/StreamObserver;)V (50 bytes) @ 0x00007f2b99314bcc [0x00007f2b993149e0+0x00000000000001ec]
J 35258 c2 org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext()V (81 bytes) @ 0x00007f2ba0f58f18 [0x00007f2ba0f58d60+0x00000000000001b8]
J 16850 c2 org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run()V (35 bytes) @ 0x00007f2ba135e5c0 [0x00007f2ba135e220+0x00000000000003a0]
J 51312 c2 org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run()V (114 bytes) @ 0x00007f2ba123dab4 [0x00007f2ba123d9a0+0x0000000000000114]
J 27132 c2 java.util.concurrent.ThreadPoolExecutor$Worker.run()V java.base@21.0.9-internal (9 bytes) @ 0x00007f2ba0fa2f24 [0x00007f2ba0fa2c80+0x00000000000002a4]
J 25894 c2 java.lang.Thread.run()V java.base@21.0.9-internal (23 bytes) @ 0x00007f2ba125b5cc [0x00007f2ba125b520+0x00000000000000ac]
v  ~StubRoutines::call_stub 0x00007f2b9fc52cbf 

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15102

How was this patch tested?

This is a replacement of TarArchiveEntry initialization, existing unit test will be enough.

@symious symious requested a review from adoroszlai April 24, 2026 03:11
Copy link
Copy Markdown
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @symious for the patch.

Comment thread hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/Archiver.java Outdated
Copy link
Copy Markdown
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @symious for updating the patch.

private static TarArchiveEntry createBasicTarArchiveEntry(File file, String entryName)
throws IOException {
TarArchiveEntry entry = new TarArchiveEntry(entryName);
entry.setMode(TarArchiveEntry.DEFAULT_FILE_MODE);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think setMode should be called. TarArchiveEntry(String) sets mode depending on whether name ends with /. It also sets linkFlag, which cannot be updated later. I think we should rely on this behavior by ensuring name ends with / for directories.

https://github.com/apache/commons-compress/blob/commons-compress-1.28.0/src/main/java/org/apache/commons/compress/archivers/tar/TarArchiveEntry.java#L616-L619

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, PTAL.

Comment thread hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/Archiver.java Outdated
@adoroszlai adoroszlai changed the title HDDS-15102. Skip ArchiveOutputStream.createArchiveEntry to avoid libnss issue HDDS-15102. Avoid ArchiveOutputStream.createArchiveEntry due to libnss issue Apr 24, 2026
@symious
Copy link
Copy Markdown
Contributor Author

symious commented Apr 24, 2026

@adoroszlai Thanks for the reminder.

At first, it's only for includeFile, so the affected scope is smaller.

Let me try to fix these errors.

@symious
Copy link
Copy Markdown
Contributor Author

symious commented Apr 24, 2026

@adoroszlai Updated, PTAL.

@adoroszlai adoroszlai merged commit 543a744 into apache:master Apr 25, 2026
130 of 133 checks passed
@adoroszlai
Copy link
Copy Markdown
Contributor

Thanks @symious for the patch.

@symious
Copy link
Copy Markdown
Contributor Author

symious commented Apr 27, 2026

@adoroszlai Thank you for the review and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants