Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broker fails to start with function worker enabled and broker client using TLS #5151

Merged
merged 2 commits into from
Sep 24, 2019

Conversation

cdbartholomew
Copy link
Contributor

Motivation

I have a broker that is configured to run the function worker and for the broker client to use TLS. Here are the settings:

functionsWorkerEnabled=true
brokerClientTlsEnabled=true

After upgrading to 2.4.1, my broker would fail to start. This was working in 2.3.1. The function worker client would fail to connect. The error indicates that it is trying to use TLS on the plaintext port:

12:39:20.691 [main] INFO org.apache.pulsar.functions.worker.WorkerService - Created Pulsar client
12:39:21.057 [pulsar-io-24-1] INFO org.apache.pulsar.broker.service.ServerCnx - New connection from /192.168.34.192:60886
12:39:21.045 [pulsar-client-io-47-1] INFO org.apache.pulsar.client.impl.ConnectionPool - [[id: 0x688be07c, L:/192.168.34.192:60886 - R:192.168.34.192/192.168.34.192:6650]] Connected to server
12:39:21.080 [pulsar-io-24-1] WARN org.apache.pulsar.broker.service.ServerCnx - [/192.168.34.192:60886] Got exception TooLongFrameException : Adjusted frame length exceeds 5253120: 369295620 - discarded
io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 5253120: 369295620 - discarded
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:522) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:500) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.exceededFrameLength(LengthFieldBasedFrameDecoder.java:387) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:430) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:433) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:330) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]

Modifications

Looking at the startup code when running the function worker with the broker, it is checking for TLS enabled in the function_worker.yml file to determine whether or not to use the TLS port, but when setting TLS enabled on the function worker, it is checking the broker.conf.

Since the function worker is running with the broker, it makes sense to look to the broker.conf as the single source of truth about whether or not to use TLS. I changed the code to check the broker client is configured to use TLS. If it is, then use TLS for the function worker, otherwise use plain text.

The same code exists in the startup path for the normal broker and the standalone version, so I made the change in both places.

Verifying this change

I ran the broker unit tests on this change and they all passed.

I also built a new docker image with this change and started using it in my cluster. The broker now starts successfully.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): no
  • The public API: no
  • The schema: no
  • The default values of configurations: no
  • The wire protocol: no
  • The rest endpoints: no
  • The admin cli options: no
  • Anything that affects deployment: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@merlimat merlimat added this to the 2.4.2 milestone Sep 8, 2019
@merlimat merlimat added the type/bug The PR fixed a bug or issue reported a bug label Sep 8, 2019
boolean useTls = workerConfig.isUseTls();
// If the broker client is configured to use TLS, then we
// configure the function worker to use TLS
boolean useTls = brokerConfig.isBrokerClientTlsEnabled();
Copy link
Member

@jiazhai jiazhai Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change workerConfig.isUseTls() will not take effect any more.
This seems not friendly for worker config.

@@ -187,7 +189,7 @@ private static boolean argsContains(String[] args, String arg) {
workerConfig.setZooKeeperSessionTimeoutMillis(brokerConfig.getZooKeeperSessionTimeoutMillis());
workerConfig.setZooKeeperOperationTimeoutSeconds(brokerConfig.getZooKeeperOperationTimeoutSeconds());

workerConfig.setUseTls(brokerConfig.isTlsEnabled());
workerConfig.setUseTls(useTls);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changes is right. we should reuse var useTls. As commenets above, value for useTls need more confirm

@jiazhai
Copy link
Member

jiazhai commented Sep 11, 2019

@cdbartholomew Thanks for the fix. This may work for your test, but may not friendly for user setting.
From my view the right logic could like this:

  1. workerConfig.isUseTls is the main config. If user set workerConfig.isUseTls=true, then we should comply with it.
  2. once workerConfig.isUseTls=true, then it is expect that the broker has already has broker.conf configured with tls enabled and "webServicePortTls" with right value,
  3. For the reason analysed in when authentication enabled pulsar-admin topics list will fail for nonpersistent topics #4560, function worker will create internal admin/client inside worker, it should passively config these:
brokerClientTlsEnabled=
brokerClientAuthenticationPlugin=
brokerClientAuthenticationParameters=

As you find above workerConfig.isUseTls is the main config that user wanted. and configs in 2 and 3 above, are some passively configs or configs that should comply with user's requirements.

@tuteng, who is work on fix for #4560.
@jerrypeng, who init this worker config.
@murong00, who also touched this part of code recently.

Copy link
Member

@jiazhai jiazhai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request some change

@cdbartholomew
Copy link
Contributor Author

@jiazhai , thanks for the feedback. I agree with your comment about`workerConfig.isUseTls. Now that I think about it more, with the change I have proposed there is no way to use TLS for connecting to other brokers (especially geo-replicated ones) but to not use TLS for the function worker running locally with the broker. When running locally there is no need to encrypt the traffic, so the overhead may not be wanted.

If I revert the change on line 163, then the user can choose whether to run the local function worker with TLS or not using the worker config. I will make that change and test it out.

@cdbartholomew
Copy link
Contributor Author

I made the changed described above so that the function worker uses the useTls setting in function_worker.yml when running with the broker. I tested with the useTls on and off and it works in both cases.

@sijie
Copy link
Member

sijie commented Sep 15, 2019

@jiazhai please review it.

Copy link
Member

@jiazhai jiazhai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, Thanks @cdbartholomew for the fix.

@jiazhai
Copy link
Member

jiazhai commented Sep 23, 2019

run java8 tests
run integration tests

@merlimat merlimat merged commit 6741bfe into apache:master Sep 24, 2019
wolfstudy pushed a commit that referenced this pull request Nov 20, 2019
…using TLS (#5151)

* Fixing function worker running with broker when using TLS

* Get function worker TLS config from function_worker.yml

(cherry picked from commit 6741bfe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants