Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV error in docker (Java client) #14534

Closed
lbenc135 opened this issue Mar 2, 2022 · 13 comments
Closed

SIGSEGV error in docker (Java client) #14534

lbenc135 opened this issue Mar 2, 2022 · 13 comments
Labels
deprecated/question Questions should happened in GitHub Discussions lifecycle/stale Stale

Comments

@lbenc135
Copy link
Contributor

lbenc135 commented Mar 2, 2022

Describe the bug
Pulsar Java client crashes with the message below when trying to create a Pulsar client. I reproduced the crash with versions 2.9.1, 2.8.2 and 2.7.4, but same code works on 2.7.1. Also the crash doesn't happen when running on a local machine, but happens when running in a docker container (openjdk:14-alpine).

Logs:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000003fd6, pid=1, tid=7
#
# JRE version: OpenJDK Runtime Environment (14.0+33) (build 14-ea+33)
# Java VM: OpenJDK 64-Bit Server VM (14-ea+33, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  0x0000000000003fd6
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /src/services/rule_engine/core.1)
#
# An error report file with more information is saved as:
# /src/services/rule_engine/hs_err_pid1.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.

Full log:
hs_err_pid1.log

Code:

client = PulsarClient.builder()
                        .serviceUrl(BaseSettings.PULSAR_URL)   // "pulsar://localhost:6650"
                        .loadConf(clientSettings)                             // empty HashMap
                        .build();

To Reproduce
Not sure. The description should hopefully provide enough info.

Expected behavior
Doesn't crash.

Desktop (please complete the following information):
Full info available in the full log file (under S Y S T E M near the end).

@lbenc135 lbenc135 added the type/bug The PR fixed a bug or issue reported a bug label Mar 2, 2022
@lhotari
Copy link
Member

lhotari commented Mar 2, 2022

@lbenc135 Can you try if the same problem reproduces when you don't use Alpine based OpenJDK base image? Please test with openjdk:14.

@lhotari
Copy link
Member

lhotari commented Mar 2, 2022

Possibly related to #11415 #11224 #10798

@lhotari
Copy link
Member

lhotari commented Mar 2, 2022

netty/netty-tcnative#649 (comment)

We switched to adoptopenjdk/openjdk15:alpine-slim (instead of openjdk:15-alpine) we the problem disappeared

Since adoptopenjdk is deprecated, can you try uring eclipse-temurin:17-alpine base image to see if that works for you? Eclipse Temurin images are maintained by Adoptium and it provides pre-built OpenJDK binaries.

@lbenc135
Copy link
Contributor Author

lbenc135 commented Mar 2, 2022

@lhotari openjdk:14 works, but eclipse-temurin:17-alpine does not. In any case, changing the base image is a bit tricky for us. Is there any plan to fix this for alpine images?

@lhotari
Copy link
Member

lhotari commented Mar 2, 2022

@lhotari openjdk:14 works, but eclipse-temurin:17-alpine does not. In any case, changing the base image is a bit tricky for us. Is there any plan to fix this for alpine images?

Since this is an open source project, it will depend on someone contributing a fix for this problem. One form of contributing is contributing a simple repro case. That could be a separate GitHub repository which contains the repro and instructions.

There might be workarounds. Some issues might be caused by shaded library versions conflicting with the application. Here's one issue about this in netty: netty/netty#11879

For Pulsar, it's possible to use the unshaded client. The coordinates are here:
https://search.maven.org/artifact/org.apache.pulsar/pulsar-client-original/2.8.2/jar

for maven

<dependency>
  <groupId>org.apache.pulsar</groupId>
  <artifactId>pulsar-client-original</artifactId>
  <version>2.8.2</version>
</dependency>

for gradle

implementation 'org.apache.pulsar:pulsar-client-original:2.8.2'

Does your application use Netty or contain shaded Netty?

@lhotari
Copy link
Member

lhotari commented Mar 2, 2022

When using pulsar-client-original, you might need to also use dependencyManagement in maven or Gradle's version alignment features to ensure that there aren't mixed versions of Netty and Netty netty-tcnative-boringssl-static libraries .

For maven, something like this:

  <properties>
    <netty.version>4.1.74.Final</netty.version>
    <netty-tc-native.version>2.0.48.Final</netty-tc-native.version>
  </properties>
  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>io.netty</groupId>
        <artifactId>netty-bom</artifactId>
        <version>${netty.version}</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
      <dependency>
        <groupId>io.netty</groupId>
        <artifactId>netty-tcnative-boringssl-static</artifactId>
        <version>${netty-tc-native.version}</version>
      </dependency>
    </dependencies>
  </dependencyManagement>
  <dependencies>
   <dependency>
     <groupId>org.apache.pulsar</groupId>
     <artifactId>pulsar-client-original</artifactId>
     <version>2.8.2</version>
   </dependency>
 </dependencies>

@lbenc135 Are you using maven or gradle?

@codelipenghui codelipenghui added deprecated/question Questions should happened in GitHub Discussions and removed type/bug The PR fixed a bug or issue reported a bug labels Mar 15, 2022
@lbenc135
Copy link
Contributor Author

@lhotari Sorry for the delay. We're using Maven and the solution with pulsar-client-original and Netty dependency management worked with openjdk:14-alpine. Thanks for the tip!

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

lhotari added a commit to lhotari/pulsar that referenced this issue May 18, 2022
- release notes https://netty.io/news/2022/05/06/2-1-77-Final.html
  - improves Alpine / musl compatibility
    - could help issues such as apache#14534 apache#11415 apache#11224 apache#10798
  - improves shading compatibility
lhotari added a commit that referenced this issue May 18, 2022
)

- release notes https://netty.io/news/2022/05/06/2-1-77-Final.html
  - improves Alpine / musl compatibility
    - could help issues such as #14534 #11415 #11224 #10798
  - improves shading compatibility
  - fixes a bug related to the native epoll transport and epoll_pwait2
lhotari added a commit that referenced this issue May 18, 2022
)

- release notes https://netty.io/news/2022/05/06/2-1-77-Final.html
  - improves Alpine / musl compatibility
    - could help issues such as #14534 #11415 #11224 #10798
  - improves shading compatibility
  - fixes a bug related to the native epoll transport and epoll_pwait2

(cherry picked from commit a8045fc)
lhotari added a commit that referenced this issue May 18, 2022
)

- release notes https://netty.io/news/2022/05/06/2-1-77-Final.html
  - improves Alpine / musl compatibility
    - could help issues such as #14534 #11415 #11224 #10798
  - improves shading compatibility
  - fixes a bug related to the native epoll transport and epoll_pwait2

(cherry picked from commit a8045fc)
lhotari added a commit that referenced this issue May 18, 2022
)

- release notes https://netty.io/news/2022/05/06/2-1-77-Final.html
  - improves Alpine / musl compatibility
    - could help issues such as #14534 #11415 #11224 #10798
  - improves shading compatibility
  - fixes a bug related to the native epoll transport and epoll_pwait2

(cherry picked from commit a8045fc)
lhotari added a commit to datastax/pulsar that referenced this issue May 18, 2022
…che#15646)

- release notes https://netty.io/news/2022/05/06/2-1-77-Final.html
  - improves Alpine / musl compatibility
    - could help issues such as apache#14534 apache#11415 apache#11224 apache#10798
  - improves shading compatibility
  - fixes a bug related to the native epoll transport and epoll_pwait2

(cherry picked from commit a8045fc)
lhotari added a commit to datastax/pulsar that referenced this issue May 18, 2022
…che#15646)

- release notes https://netty.io/news/2022/05/06/2-1-77-Final.html
  - improves Alpine / musl compatibility
    - could help issues such as apache#14534 apache#11415 apache#11224 apache#10798
  - improves shading compatibility
  - fixes a bug related to the native epoll transport and epoll_pwait2

(cherry picked from commit a8045fc)

# Conflicts:
#	pulsar-sql/presto-distribution/LICENSE
lhotari added a commit to datastax/pulsar that referenced this issue May 18, 2022
…che#15646)

- release notes https://netty.io/news/2022/05/06/2-1-77-Final.html
  - improves Alpine / musl compatibility
    - could help issues such as apache#14534 apache#11415 apache#11224 apache#10798
  - improves shading compatibility
  - fixes a bug related to the native epoll transport and epoll_pwait2

(cherry picked from commit a8045fc)
lhotari added a commit to datastax/pulsar that referenced this issue May 18, 2022
…che#15646)

- release notes https://netty.io/news/2022/05/06/2-1-77-Final.html
  - improves Alpine / musl compatibility
    - could help issues such as apache#14534 apache#11415 apache#11224 apache#10798
  - improves shading compatibility
  - fixes a bug related to the native epoll transport and epoll_pwait2

(cherry picked from commit a8045fc)
@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label May 30, 2022
@ehenoma
Copy link

ehenoma commented Jun 29, 2022

Got the same issue using openjdk:17 in docker

# A fatal error has been detected by the Java Runtime Environment:
--
Wed, Jun 29 2022 5:53:04 pm | #
Wed, Jun 29 2022 5:53:04 pm | # SIGSEGV (0xb) at pc=0x0000000000003fd6, pid=7, tid=8
Wed, Jun 29 2022 5:53:04 pm | #
Wed, Jun 29 2022 5:53:04 pm | # JRE version: OpenJDK Runtime Environment (17.0+14) (build 17-ea+14)
Wed, Jun 29 2022 5:53:04 pm | # Java VM: OpenJDK 64-Bit Server VM (17-ea+14, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, serial gc, linux-amd64)
Wed, Jun 29 2022 5:53:04 pm | # Problematic frame:
Wed, Jun 29 2022 5:53:04 pm | # C 0x0000000000003fd6
Wed, Jun 29 2022 5:53:04 pm | #
Wed, Jun 29 2022 5:53:04 pm | # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /usr/src/build/core.7)
Wed, Jun 29 2022 5:53:04 pm | #
Wed, Jun 29 2022 5:53:04 pm | # An error report file with more information is saved as:
Wed, Jun 29 2022 5:53:04 pm | # /usr/src/build/hs_err_pid7.log
Wed, Jun 29 2022 5:53:04 pm | #
Wed, Jun 29 2022 5:53:04 pm | # If you would like to submit a bug report, please visit:
Wed, Jun 29 2022 5:53:04 pm | # https://bugreport.java.com/bugreport/crash.jsp
Wed, Jun 29 2022 5:53:04 pm | # The crash happened outside the Java Virtual Machine in native code.
Wed, Jun 29 2022 5:53:04 pm | # See problematic frame for where to report the bug.
Wed, Jun 29 2022 5:53:04 pm | #

@github-actions github-actions bot removed the Stale label Jun 30, 2022
Jason918 pushed a commit to Jason918/pulsar that referenced this issue Aug 1, 2022
…che#15646)

- release notes https://netty.io/news/2022/05/06/2-1-77-Final.html
  - improves Alpine / musl compatibility
    - could help issues such as apache#14534 apache#11415 apache#11224 apache#10798
  - improves shading compatibility
  - fixes a bug related to the native epoll transport and epoll_pwait2

(cherry picked from commit a8045fc)
Jason918 pushed a commit to Jason918/pulsar that referenced this issue Aug 1, 2022
…che#15646)

- release notes https://netty.io/news/2022/05/06/2-1-77-Final.html
  - improves Alpine / musl compatibility
    - could help issues such as apache#14534 apache#11415 apache#11224 apache#10798
  - improves shading compatibility
  - fixes a bug related to the native epoll transport and epoll_pwait2

(cherry picked from commit a8045fc)
@github-actions
Copy link

github-actions bot commented Aug 2, 2022

The issue had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Aug 2, 2022
@ypzhuang
Copy link

upgrade from pulsar-client-all:2.10.0 to 2.10.1, the issue gone.

@tisonkun
Copy link
Member

@ypzhuang Thanks for your report. Closed as fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecated/question Questions should happened in GitHub Discussions lifecycle/stale Stale
Projects
None yet
Development

No branches or pull requests

6 participants