Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Use OAuth2 on macOX arm64 failed. #282

Closed
wants to merge 1 commit into from

Conversation

shibd
Copy link
Member

@shibd shibd commented Jan 10, 2023

Fixes #281

Motivation

This issue only happened on the arm64 macOS environment and when using SSL.

This can be caused by static linking cross-compiled OpenSSL.

I noticed that CURL could directly configure the --with-secure-transport parameter to use Apple's SSL/TLS implementation instead of OpenSSL:

Original document:

On modern Apple operating systems, curl can be built to use Apple's SSL/TLS implementation, Secure Transport, instead of OpenSSL. To build with Secure Transport for SSL/TLS, use the configure option --with-secure-transport. (It is not necessary to use the option --without-openssl.)

Modifications

  • Configure the --with-secure-transport parameter to use Apple's SSL/TLS implementation when build arm64 macOS.

Verifying this change

  • I will add OAuth2 related unit tests later.

Documentation

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@shibd
Copy link
Member Author

shibd commented Jan 10, 2023

@merlimat @BewareMyPower I'm not sure if this is a good solution, PTAL.

Comment on lines +170 to +174
if [ $ARCH = 'arm64' ]; then
SSL_CONF="--with-secure-transport"
else
SSL_CONF="--without-secure-transport --with-ssl=$PREFIX"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get why the secure transport is disabled when building curl on x64 architectures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter means disabled curl calls to Apple's native secure transport. Refer here

I tested and the parameter doesn't work if compiled on x86_64. So when build the x86_64, leave it as it is (using OpenSSL)

@BewareMyPower
Copy link
Contributor

I think we need to wait for the feedback from @ericallam.

use Apple's SSL/TLS implementation instead of OpenSSL:

It does not make sense to me. Can you confirm it's a bug of OpenSSL? Maybe it requires some certificates like this issue: https://stackoverflow.com/questions/65442972/ssl-routinestls-process-server-certificatecertificate-verify-failed

@ericallam
Copy link

What's the best way for me to test this?

@BewareMyPower
Copy link
Contributor

@ericallam You can follow the guide here (make sure this PR is included):

pkg/mac/build-cpp-deps-lib.sh
pkg/mac/build-cpp-lib.sh
npm install

Then you can replace any file under the examples/ directory with your own code and run it.

@shibd BTW, I'm not sure whether we can provide the Pulsar.node to test. Though we have the Pulsar.node, there is no document to describe how to use it.

@shibd
Copy link
Member Author

shibd commented Jan 11, 2023

@shibd BTW, I'm not sure whether we can provide the Pulsar.node to test. Though we have the Pulsar.node, there is no document to describe how to use it.

I don't think we need to expose Pulsar.node to users.

If the user build from the source, It can be used after global installation:

npm install -g

@ericallam
Copy link

Okay I'm working on testing this now. I started with Building from source using the steps in the README on the master branch without the fix to make sure I could reproduce the issue. Here's the example I created to run the test (with some bits sensitive details redacted):

const Pulsar = require("../");

(async () => {
  Pulsar.Client.setLogHandler((level, file, line, message) => {
    console.log("[%s][%s:%d] %s", level, file, line, message);
  });

  const auth = new Pulsar.AuthenticationOauth2({
    type: "sn_service_account",
    client_id: "...",
    client_secret: "...",
    issuer_url: "https://auth.streamnative.cloud/",
    audience: "...",
  });

  // Create a client
  const client = new Pulsar.Client({
    serviceUrl: "pulsar+ssl://<cluster>.<orgId>.snio.cloud:6651",
    authentication: auth,
  });

  // Create a consumer
  const consumer = await client.subscribe({
    topic: "persistent://public/default/my-topic",
    subscription: "sub1",
    subscriptionType: "Shared",
  });

  const producer = await client.createProducer({
    topic: "persistent://public/default/my-topic",
    sendTimeoutMs: 30000,
    batchingEnabled: true,
  });

  for (let i = 0; i < 10; i += 1) {
    const msg = `my-message-${i}`;
    producer.send({
      data: Buffer.from(msg),
    });
    console.log(`Sent message: ${msg}`);
  }
  await producer.flush();

  // Receive messages
  for (let i = 0; i < 10; i += 1) {
    const msg = await consumer.receive();
    console.log(msg.getData().toString());
    consumer.acknowledge(msg);
  }

  await producer.close();
  await consumer.close();
  await client.close();
})();

And here's the result of running that:

pulsar-client-node $ node examples/oauth.js 
[1][Client:87] Subscribing on Topic :persistent://public/default/my-topic
[1][ClientConnection:189] [<none> -> pulsar+ssl://<cluster>.<orgId>.snio.cloud:6651] Create ClientConnection, timeout=10000
[1][ConnectionPool:97] Created connection for pulsar+ssl://<cluster>.<orgId>.snio.cloud:6651
[1][ClientConnection:379] [192.168.1.8:61122 -> 184.72.154.254:6651] Connected to broker
[3][ClientConnection:472] [192.168.1.8:61122 -> 184.72.154.254:6651] Handshake failed: certificate verify failed (SSL routines, tls_process_server_certificate)
[1][ClientConnection:1584] [192.168.1.8:61122 -> 184.72.154.254:6651] Connection closed with ConnectError
[3][ClientImpl:407] Error Checking/Getting Partition Metadata while Subscribing on persistent://public/default/my-topic -- ConnectError
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

[Error: Failed to create consumer: ConnectError]

Now after integrating the changes from the PR, I did the following:

$ rm -rf ./pkg/mac/build
$ ./pkg/mac/build-cpp-deps-lib.sh
$ ./pkg/mac/build-cpp-lib.sh
$ npm install

Then I tried running an example and I'm getting an error:

node examples/producer
node:internal/modules/cjs/loader:1243
  return process.dlopen(module, path.toNamespacedPath(filename));
                 ^

Error: dlopen(/Users/eric/code/OpenSource/pulsar-client-node/lib/binding/Pulsar.node, 0x0001): symbol not found in flat namespace (_kSecAttrLabel)
    at Module._extensions..node (node:internal/modules/cjs/loader:1243:18)
    at Module.load (node:internal/modules/cjs/loader:1037:32)
    at Module._load (node:internal/modules/cjs/loader:878:12)
    at Module.require (node:internal/modules/cjs/loader:1061:19)
    at require (node:internal/modules/cjs/helpers:103:18)
    at Object.<anonymous> (/Users/eric/code/OpenSource/pulsar-client-node/src/pulsar-binding.js:24:17)
    at Module._compile (node:internal/modules/cjs/loader:1159:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1213:10)
    at Module.load (node:internal/modules/cjs/loader:1037:32)
    at Module._load (node:internal/modules/cjs/loader:878:12) {
  code: 'ERR_DLOPEN_FAILED'
}

@ericallam
Copy link

I don't think this is an macOS issue. I tried running 1.8.0 in a docker container (built and run on linux) and I'm getting a similar error:

[1][ClientConnection:189] [<none> -> pulsar+ssl://<cluster>.<orgId>.snio.cloud:6651] Create ClientConnection, timeout=10000
[3][AuthOauth2:224] Response failed for getting the well-known configuration https://auth.streamnative.cloud/. Error Code 77: error setting certificate verify locations:  CAfile: /etc/ssl/certs/ca-certificates.crt CApath: none
[1][ConnectionPool:97] Created connection for pulsar+ssl://<cluster>.<orgId>.snio.cloud:6651
[1][ClientConnection:379] [172.17.0.2:48284 -> 184.72.154.254:6651] Connected to broker
[3][ClientConnection:472] [172.17.0.2:48284 -> 184.72.154.254:6651] Handshake failed: certificate verify failed (SSL routines)
[1][ClientConnection:1584] [172.17.0.2:48284 -> 184.72.154.254:6651] Connection closed with ConnectError
[3][ClientImpl:184] Error Checking/Getting Partition Metadata while creating producer on persistent://public/default/inside-docker -- ConnectError
[1][ClientConnection:267] [172.17.0.2:48284 -> 184.72.154.254:6651] Destroyed connection
[Error: Failed to create producer: ConnectError]

Am I supposed to be setting an option that I am not? Any of these:

CleanShot 2023-01-11 at 14 18 40

@shibd
Copy link
Member Author

shibd commented Jan 11, 2023

Hi, @ericallam Thanks for your feedback.

I just wanted to confirm. Before v1.7.0, the same program and environment could run normally?

I haven't tested with a certified domain yet, and I can run it by manually importing the certificate. Refer repo: https://github.com/shibd/pulsar-node-test

Am I supposed to be setting an option that I am not? Any of these:

If you can download the certificate, you can try pointing to it:

  const client = new Pulsar.Client({
    serviceUrl: 'pulsar+ssl://localhost:6651',
    authentication: auth,
    operationTimeoutSeconds: 30,
    tlsTrustCertsFilePath: './run-pulsar/cacert.pem',
    useTls: true,
    tlsValidateHostname: false,
    tlsAllowInsecureConnection: false,
  });

I still suspect that linking static OpenSSL is the cause, I will try linking dynamic libraries to try.

@ericallam
Copy link

ericallam commented Jan 11, 2023

I have 1.7.0 working just fine locally, but it's connecting to a local docker container running and I don't use any authentication. This has only started once I've tried to add authentication (because I'm trying to connect to StreamNative cloud hosted instance). I'm now also trying to deploy it to production and this is where I'm running into issues. According to the StreamNative nodejs docs, they suggest passing tlsAllowInsecureConnection: true option when connecting to their cloud (https://docs.streamnative.io/cloud/stable/connect/client/connect-nodejs), but I'm still getting a segfault when I do that:

8:25:40 PM: Connecting to pulsar instance at pulsar+ssl://cluster.org.snio.cloud:6651...
8:25:40 PM: 📡 Connected to pulsar at pulsar+ssl://cluster.org.snio.cloud:6651
8:25:40 PM: [20:25:40.931] [trigger.dev publisher]  Initializing publisher with config {"topic":"persistent://triggerdotdev/workflows/run-command-responses"}
8:25:40 PM: PID 55 received SIGSEGV for address: 0x0
8:25:40 PM: /app/node_modules/.pnpm/segfault-handler@1.3.0/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x372d)[0x7f06215af72d]
8:25:40 PM: /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f0628a4b520]
8:25:40 PM: /lib/libpulsar.so.2.10.3(SSL_get_peer_certificate+0x12)[0x7f061b8311e2]
8:25:40 PM: /lib/libpulsar.so.2.10.3(+0x6137af)[0x7f061b8137af]
8:25:40 PM: /lib/libpulsar.so.2.10.3(+0x61545b)[0x7f061b81545b]
8:25:40 PM: /lib/libpulsar.so.2.10.3(+0x5e96de)[0x7f061b7e96de]
8:25:40 PM: /lib/libpulsar.so.2.10.3(+0x5eebe2)[0x7f061b7eebe2]
8:25:40 PM: /lib/libpulsar.so.2.10.3(+0x5d8c8e)[0x7f061b7d8c8e]
8:25:40 PM: /lib/libpulsar.so.2.10.3(curl_multi_perform+0x93)[0x7f061b7d9c93]
8:25:40 PM: /lib/libpulsar.so.2.10.3(curl_easy_perform+0x107)[0x7f061b7d3dd7]
8:25:40 PM: /lib/libpulsar.so.2.10.3(+0x45e97f)[0x7f061b65e97f]
8:25:40 PM: /lib/x86_64-linux-gnu/libc.so.6(+0x99f68)[0x7f0628aa2f68]
8:25:40 PM: /lib/libpulsar.so.2.10.3(+0x45f24e)[0x7f061b65f24e]
8:25:40 PM: /lib/libpulsar.so.2.10.3(_ZN6pulsar10AuthOauth211getAuthDataERSt10shared_ptrINS_26AuthenticationDataProviderEE+0x33)[0x7f061b65d423]
8:25:40 PM: /lib/libpulsar.so.2.10.3(_ZN6pulsar16ClientConnectionC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_St10shared_ptrINS_15ExecutorServiceEERKNS_19ClientConfigurationERKS9_INS_14AuthenticationEE+0x11c7)[0x7f061b54c9d7]
8:25:40 PM: /lib/libpulsar.so.2.10.3(_ZN6pulsar14ConnectionPool18getConnectionAsyncERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_+0x94b)[0x7f061b59a24b]
8:25:40 PM: /lib/libpulsar.so.2.10.3(_ZN6pulsar24BinaryProtoLookupService25getPartitionMetadataAsyncERKSt10shared_ptrINS_9TopicNameEE+0x11e)[0x7f061b52f2fe]
8:25:40 PM: /lib/libpulsar.so.2.10.3(+0x378068)[0x7f061b578068]
8:25:40 PM: /lib/libpulsar.so.2.10.3(_ZN6pulsar6Client19createProducerAsyncERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_21ProducerConfigurationESt8functionIFvNS_6ResultENS_8ProducerEEE+0x45)[0x7f061b5380d5]
8:25:40 PM: /lib/libpulsar.so.2.10.3(pulsar_client_create_producer_async+0xcf)[0x7f061b65154f]
8:25:40 PM: /app/node_modules/.pnpm/pulsar-client@1.7.0/node_modules/pulsar-client/build/Release/Pulsar.node(_ZN8Producer11NewInstanceERKN4Napi12CallbackInfoESt10shared_ptrI14_pulsar_clientE+0x3c7)[0x7f062132a647]
8:25:40 PM: /app/node_modules/.pnpm/pulsar-client@1.7.0/node_modules/pulsar-client/build/Release/Pulsar.node(_ZN6Client14CreateProducerERKN4Napi12CallbackInfoE+0x50)[0x7f0621323090]
8:25:40 PM: /app/node_modules/.pnpm/pulsar-client@1.7.0/node_modules/pulsar-client/build/Release/Pulsar.node(_ZN4Napi12InstanceWrapI6ClientE29InstanceMethodCallbackWrapperEP10napi_env__P20napi_callback_info__+0x133)[0x7f0621325fe3]
8:25:40 PM: node[0xb1499d]
8:25:40 PM: node[0xda5fa0]
8:25:40 PM: node(_ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE+0xaf)[0xda74df]
8:25:40 PM: node[0x16e9af9]
8:25:41 PM: undefined
8:25:41 PM: /app/apps/webapp:
8:25:41 PM:  ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL  webapp@1.0.0 start: `cross-env NODE_ENV=production node --max-old-space-size=8192 ./build/server.js`
8:25:41 PM: Exit status 1

@shibd
Copy link
Member Author

shibd commented Jan 12, 2023

Hi, @ericallam Thanks for your feedback.

#282 (comment)

I also used the cloud environment test, and after compiling through the source code according to the README step, I got the same error as you.

I will prioritize trying to fix this issue.

@ericallam
Copy link

I was able to connect to StreamNative cloud with OAuth2 credentials on 1.7.0 from macOS, using the following steps to install the pulsar-client: https://github.com/triggerdotdev/trigger.dev/blob/main/DEVELOPMENT.md#pulsar-requirements

@ericallam
Copy link

ericallam commented Jan 12, 2023

This issue is only happening on NodeJS 18.0+ (at least from my tests), and does not occur when we downgrade to Node 16.19.0 on the 1.7.0 version

@shibd
Copy link
Member Author

shibd commented Jan 13, 2023

@ericallam Hi,

I released a hotfix version in my repository using this PR change, and I can use oauth2 + ssl normally on my arm64 machine. (Version 13.1)

Could you refer to this repository README.md to see if it can be run on your host?

@ericallam
Copy link

I pulled down your repository and followed your directions and I got this error (using Node.js v18.12.1)

node:internal/modules/cjs/loader:1243
  return process.dlopen(module, path.toNamespacedPath(filename));
                 ^

Error: dlopen(/Users/eric/code/triggerdotdev/pulsar/pulsar-node-oauth2-ssl-test/node_modules/shibaodi-pulsar-client/lib/binding/Pulsar.node, 0x0001): symbol not found in flat namespace (_kSecAttrLabel)
    at Module._extensions..node (node:internal/modules/cjs/loader:1243:18)
    at Module.load (node:internal/modules/cjs/loader:1037:32)
    at Module._load (node:internal/modules/cjs/loader:878:12)
    at Module.require (node:internal/modules/cjs/loader:1061:19)
    at require (node:internal/modules/cjs/helpers:103:18)
    at Object.<anonymous> (/Users/eric/code/triggerdotdev/pulsar/pulsar-node-oauth2-ssl-test/node_modules/shibaodi-pulsar-client/src/pulsar-binding.js:24:17)
    at Module._compile (node:internal/modules/cjs/loader:1159:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1213:10)
    at Module.load (node:internal/modules/cjs/loader:1037:32)
    at Module._load (node:internal/modules/cjs/loader:878:12) {
  code: 'ERR_DLOPEN_FAILED'
}

I tried using Node.js v16.19.0 as well but got the same error.

@BewareMyPower
Copy link
Contributor

@ericallam Could this error be reproduced by running the following code?

const Pulsar = require('./node_modules/shibaodi-pulsar-client/lib/binding/Pulsar.node')

You can change the path in require to the actual path of the Pulsar.node, which could be downloaded from https://github.com/shibd/pulsar-client-node/actions/runs/3904096695 (uncompress macos-18-arm64)

@BewareMyPower
Copy link
Contributor

BewareMyPower commented Jan 16, 2023

From #282 (comment) I see

I tested and the parameter doesn't work if compiled on x86_64.

Then I tried the following patch, which adds the same compile flags as this PR does for arm64 macOS.

diff --git a/pkg/mac/build-cpp-deps-lib.sh b/pkg/mac/build-cpp-deps-lib.sh
index e2078d1..aa4e04d 100755
--- a/pkg/mac/build-cpp-deps-lib.sh
+++ b/pkg/mac/build-cpp-deps-lib.sh
@@ -167,12 +167,12 @@ if [ ! -f curl-${CURL_VERSION}.done ]; then
     tar xfz curl-${CURL_VERSION}.tar.gz
     pushd curl-${CURL_VERSION}
       CFLAGS="-fPIC -arch ${ARCH} -mmacosx-version-min=${MACOSX_DEPLOYMENT_TARGET}" \
-            ./configure --with-ssl=$PREFIX \
+            ./configure \
               --without-nghttp2 \
               --without-libidn2 \
               --disable-ldap \
               --without-brotli \
-              --without-secure-transport \
+              --with-secure-transport \
               --disable-ipv6 \
               --prefix=$PREFIX \
               --host=$ARCH-apple-darwin

Then, build from source.

pkg/mac/build-cpp-deps-lib.sh
pkg/mac/build-cpp-lib.sh
npm install

After that, I reproduced the same issue with #282 (comment) successfully

% node examples/producer
node:internal/modules/cjs/loader:1210
  return process.dlopen(module, path.toNamespacedPath(filename));
                 ^

Error: dlopen(/Users/xuyunze/node-demo/pulsar-client-node/lib/binding/Pulsar.node, 0x0001): symbol not found in flat namespace (_kSecAttrLabel)
    at Object.Module._extensions..node (node:internal/modules/cjs/loader:1210:18)
    at Module.load (node:internal/modules/cjs/loader:1004:32)
    at Function.Module._load (node:internal/modules/cjs/loader:839:12)
    at Module.require (node:internal/modules/cjs/loader:1028:19)
    at require (node:internal/modules/cjs/helpers:102:18)
    at Object.<anonymous> (/Users/xuyunze/node-demo/pulsar-client-node/src/pulsar-binding.js:24:17)
    at Module._compile (node:internal/modules/cjs/loader:1126:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1180:10)
    at Module.load (node:internal/modules/cjs/loader:1004:32)
    at Function.Module._load (node:internal/modules/cjs/loader:839:12) {
  code: 'ERR_DLOPEN_FAILED'
}

It's caused by loading the Pulsar.node:

> const Pulsar = require('./lib/binding/Pulsar.node')
Uncaught:
Error: dlopen(/Users/xuyunze/node-demo/pulsar-client-node/lib/binding/Pulsar.node, 0x0001): symbol not found in flat namespace (_kSecAttrLabel)
    at Object.Module._extensions..node (node:internal/modules/cjs/loader:1210:18)
    at Module.load (node:internal/modules/cjs/loader:1004:32)
    at Function.Module._load (node:internal/modules/cjs/loader:839:12)
    at Module.require (node:internal/modules/cjs/loader:1028:19)
    at require (node:internal/modules/cjs/helpers:102:18) {
  code: 'ERR_DLOPEN_FAILED'
}

Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the root cause of the "symbol not found" error is that libcurl.a is statically linked, but after adding the --with-secure-transport option when compiling libcurl.a, the libcurl.a will depend on Apple's SSL/TLS implementation, which is the Security Framework like:

/System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 1.0.0, current version 60158.100.133)

(You can run otool -L libcurl.dylib to see the dependencies.)

If libpulsarwithdeps.a doesn't link the Security Framework, the symbol would not be found. The current build process works well because the TLS function of libcurl depends on OpenSSL, which is statically linked by libpulsarwithdeps.a as well.

Here is how I solved the similar error when linking statically to libcurl on macOS:

-DCMAKE_CXX_FLAGS="-framework SystemConfiguration -framework CoreFoundation"

The command above links to the SystemConfiguration and CoreFoundation frameworks of macOS.

The dependency tree of current master:

Pulsar.node
  - libpulsarwithdeps.a
    - libcurl.a (TLS related symbols are included from libssl.a)
    - libssl.a
    - ...

After this patch:

Pulsar.node
  - libpulsarwithdeps.a
    - libcurl.a (TLS related symbols are included from Security Framework)
    - libssl.a
    - ...

@shibd
Copy link
Member Author

shibd commented Jan 16, 2023

It's not a good idea, I'll close it first.

@shibd shibd closed this Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to create producer: ConnectError when upgrading to 1.8.0
3 participants