Skip to content

ARTEMIS-994 Support Netty Native Epoll on Linux#1093

Closed
michaelandrepearce wants to merge 1 commit into
apache:masterfrom
michaelandrepearce:netty-native-epoll
Closed

ARTEMIS-994 Support Netty Native Epoll on Linux#1093
michaelandrepearce wants to merge 1 commit into
apache:masterfrom
michaelandrepearce:netty-native-epoll

Conversation

@michaelandrepearce
Copy link
Copy Markdown
Contributor

The following changes are made to support Epoll.

  • Refactored SharedNioEventLoopGroup into renamed SharedEventLoopGroup to be generic (as so we can re-use for both Nio and Epoll)
  • Add support and toggles for Epoll in NettyAcceptor and NettyConnector (with fall back to NIO if cannot load Epoll)
  • Removal from code of PartialPooledByteBufAllocator, caused bad address when doing native, and no longer needed - see jira discussion

New Connector Properties:

useEpoll - toggles to use epoll or not, default true (but we failback to nio gracefully)
epollRemotingThreads = same behaviour as nioRemotingThreads but for Epoll.
useEpollGlobalWorkerPool = same behaviour as useNioGlobalWorkerPool but for Epoll.

New Acceptor Properties:

useEpoll - toggles to use epoll or not, default true (but we failback to nio gracefully)
useEpollGlobalWorkerPool = same behaviour as useNioGlobalWorkerPool but for Epoll.

@clebertsuconic
Copy link
Copy Markdown
Contributor

I am running the testsuite, and if everything is ok I should merge it tomorrow...

I have 50% of tests running and so far so good.. will know in 2 hours...

BTW: can you run these commands please?

git pull -rebase upstream master
git push origin -f {your branch name}

and then:

git rebase -i upstream/master

And squash these committs into one, with a description that matches it?

I could do this myself, but I would rather have you finding the best commit name once you squash them.

@clebertsuconic
Copy link
Copy Markdown
Contributor

Awesome work! Testsuite is good...

The following changes are made to support Epoll.

Refactored SharedNioEventLoopGroup into renamed SharedEventLoopGroup to be generic (as so we can re-use for both Nio and Epoll)

Add support and toggles for Epoll in NettyAcceptor and NettyConnector (with fall back to NIO if cannot load Epoll)

Removal from code of PartialPooledByteBufAllocator, caused bad address when doing native, and no longer needed - see jira discussion


New Connector Properties:

useEpoll - toggles to use epoll or not, default true (but we failback to nio gracefully)
epollRemotingThreads = same behaviour as nioRemotingThreads but for Epoll.
useEpollGlobalWorkerPool = same behaviour as useNioGlobalWorkerPool but for Epoll.

New Acceptor Properties:

useEpoll - toggles to use epoll or not, default true (but we failback to nio gracefully)
useEpollGlobalWorkerPool = same behaviour as useNioGlobalWorkerPool but for Epoll.
@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

@clebertsuconic I have just now rebased and squashed, hope this is what you were wanting me to do.

@clebertsuconic
Copy link
Copy Markdown
Contributor

@michaelandrepearce I don't think you really need three parameters for this... just one would do...

  • useEpoll...

the other two you introduced could just use the same parameter, being the same semantic.. being just epoll.

That way you would configure to use epoll or not with a single switch. If you need to configure different settings.. than you can just update the values.. it gets easier on users IMHO.

WDYT?

clebertsuconic added a commit to clebertsuconic/artemis that referenced this pull request Mar 15, 2017
@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

@clebertsuconic I though about this but annoyingly theyre named nioBlah as such just using that would be not directly indicate you're affecting epoll. Like wise if we were to strip the nio part then they would be generic but would break any compatibility with existing clients as they'd have to change their properties if used.

This is why I introduce the extra duplicate props. If you think renaming the existing properties to generic in nature I can do this but as noted wouldn't be back compatible.

@clebertsuconic
Copy link
Copy Markdown
Contributor

@michaelandrepearce I will add a new generic parameter, and deprecate the old one.

Will do that on a separate commit. I just wanted to know if you had any reason.

@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

@clebertsuconic sounds good.

clebertsuconic added a commit to clebertsuconic/artemis that referenced this pull request Mar 15, 2017
@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

@clebertsuconic I notice this didn't get merged still. Did I miss understand and you expected me to make the configuration changes? if so no worries, just let me know.

@clebertsuconic
Copy link
Copy Markdown
Contributor

@michaelandrepearce nope.. I just procrastinated for an afternoon :) give me till tomorrow

@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

@clebertsuconic I was thinking this afternoon has 2.0.0 been cut for tagging? If not then maybe worth trying to get the property change done before that as a 2.0.0 is a major breaking release anyhow would avoid need for deprecating and creating a new generic we just would need a rename which would be cleaner

@clebertsuconic
Copy link
Copy Markdown
Contributor

@michaelandrepearce 2.0.0 (Artemis) is already cut. 3 days ago.. it was just the vote on...

So this will be for 2.0.1.

I was thinking about setting the default as NIO at least for a while, and having a --epoll --no-epoll property through the CLI.

clebertsuconic added a commit to clebertsuconic/artemis that referenced this pull request Mar 16, 2017
@clebertsuconic
Copy link
Copy Markdown
Contributor

@michaelandrepearce I did some performance tests with https://github.com/ssorj/quiver and the test hung... I will have to take double care before merging this.. it may take some extra time, as I don't want to make the broker unstable now.

I would like to investigate what's happening before merging this.

@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

michaelandrepearce commented Mar 17, 2017

@clebertsuconic thanks, ill try look into also see if i spot anything.

I assume this is repeatable.
i note the artemis version is 1.5.4 in that project
Also i assume the master currently doesn't suffer this if you run it as is?
Lastly does this occur when set to nio also in the branch? or only when epoll is on?

(i had a quick go using it out the box via dnf install on fedora, but couldn't get it to run against current master (without my changes), i probably need to rebuild with 2.0.0?)

@clebertsuconic
Copy link
Copy Markdown
Contributor

@michaelandrepearce I couldn't check other options.. I was using AMQP..

so, I was using this command line on quiver to replicate it:

quiver q0 --sender qpid-messaging-cpp --receiver qpid-messaging-cpp --messages 1m --body-size 100 --credit 1000 --timeout 10

you may need to use a few snapshots on quiver. but most stuff is available on fedora. Look at the list of dependencies if you like to try it.

@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

michaelandrepearce commented Mar 18, 2017

Hi @clebertsuconic i managed to get it running.

Some notes:
I was not able to run your command as is, using the version that comes via dnf / repo, seems some options are not supported or something :S

I got the below output.

$ quiver q0 --sender qpid-messaging-cpp --receiver qpid-messaging-cpp --messages 1m --body-size 100 --credit 1000 --timeout 10
usage: quiver [-h] [-m COUNT] [--impl NAME] [--body-size COUNT]
[--credit COUNT] [--timeout SECONDS] [--output DIRECTORY]
[--init-only] [--quiet] [--verbose]
ADDRESS
quiver: error: unrecognized arguments: --sender qpid-messaging-cpp --receiver qpid-messaging-cpp

I did however successfully run using just:

$ quiver q0 --messages 1m --body-size 100 --credit 1000 --timeout 10

I am using fedora 25.

It ran fine with both server acceptor set to epoll and nio for me.

Attached is the output i got for both runs.

quiver.epoll.txt
quiver.nio.txt

@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

the only comment is with epoll, i notice the producer was producing faster than the consumer during the test as such some queue depth occured, with nio producer and consumer speeds were closer as such less queue depth occured.

epoll producer rate > nio producer rate by approx 200 messages a second where as consumer rates differed only by approx 50.

@franz1981
Copy link
Copy Markdown
Contributor

@michaelandrepearce thanks for the results!

clebertsuconic added a commit to clebertsuconic/artemis that referenced this pull request Mar 18, 2017
@clebertsuconic
Copy link
Copy Markdown
Contributor

@michaelandrepearce you have to follow all the dependencies on quiver. probably the cpp didn't finish compilation on make?

There's something going on.. I will review it this week. if you run the cpp module, which generates a bit more load on AMQP and you will have a few weird errors. it just needs some testing.

@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

michaelandrepearce commented Mar 18, 2017

@clebertsuconic i have re-run with march larger run size, and now able to reproduce.

So it seems there is a direct memory leak, this occurs with both epoll and nio set, as such i assume this is around removal of PartialPooledByteBufAllocator for the default netty allocator rather than epoll itself. I will dig into this area first.

@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

i turned on the netty leak detector:

22:55:59,492 SEVERE [io.netty.util.ResourceLeakDetector] LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 4
#4:
io.netty.buffer.AdvancedLeakAwareByteBuf.readBytes(AdvancedLeakAwareByteBuf.java:504)
io.netty.buffer.WrappedByteBuf.readBytes(WrappedByteBuf.java:661)
org.apache.activemq.artemis.protocol.amqp.proton.handler.ProtonHandler.inputBuffer(ProtonHandler.java:182)
org.apache.activemq.artemis.protocol.amqp.proton.AMQPConnectionContext.inputBuffer(AMQPConnectionContext.java:110)
org.apache.activemq.artemis.protocol.amqp.broker.ActiveMQProtonRemotingConnection.bufferReceived(ActiveMQProtonRemotingConnection.java:134)
org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:629)
org.apache.activemq.artemis.core.remoting.impl.netty.ActiveMQChannelHandler.channelRead(ActiveMQChannelHandler.java:68)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
io.netty.handler.codec.ByteToMessageDecoder.handlerRemoved(ByteToMessageDecoder.java:219)
io.netty.channel.DefaultChannelPipeline.callHandlerRemoved0(DefaultChannelPipeline.java:631)
io.netty.channel.DefaultChannelPipeline.remove(DefaultChannelPipeline.java:468)
io.netty.channel.DefaultChannelPipeline.remove(DefaultChannelPipeline.java:428)
org.apache.activemq.artemis.core.protocol.ProtocolHandler$ProtocolDecoder.decode(ProtocolHandler.java:185)
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
org.apache.activemq.artemis.core.protocol.ProtocolHandler$ProtocolDecoder.channelRead(ProtocolHandler.java:128)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:1018)
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:394)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:299)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
java.lang.Thread.run(Thread.java:745)

@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

Also:

#1:
io.netty.buffer.AdvancedLeakAwareByteBuf.writeBytes(AdvancedLeakAwareByteBuf.java:600)
io.netty.buffer.AbstractByteBuf.readBytes(AbstractByteBuf.java:829)
io.netty.buffer.WrappedByteBuf.readBytes(WrappedByteBuf.java:616)
io.netty.buffer.AdvancedLeakAwareByteBuf.readBytes(AdvancedLeakAwareByteBuf.java:469)
io.netty.handler.codec.ByteToMessageDecoder.handlerRemoved(ByteToMessageDecoder.java:217)
io.netty.channel.DefaultChannelPipeline.callHandlerRemoved0(DefaultChannelPipeline.java:631)
io.netty.channel.DefaultChannelPipeline.remove(DefaultChannelPipeline.java:468)
io.netty.channel.DefaultChannelPipeline.remove(DefaultChannelPipeline.java:428)
org.apache.activemq.artemis.core.protocol.ProtocolHandler$ProtocolDecoder.decode(ProtocolHandler.java:185)
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
org.apache.activemq.artemis.core.protocol.ProtocolHandler$ProtocolDecoder.channelRead(ProtocolHandler.java:128)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:1018)
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:394)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:299)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
java.lang.Thread.run(Thread.java:745)

@clebertsuconic
Copy link
Copy Markdown
Contributor

@michaelandrepearce that's what I was going after.. some sort of leak while dealing with AMQP.

I will review it this week.

@clebertsuconic
Copy link
Copy Markdown
Contributor

I think I got to the bottom of this... there was a leak.. and some proton miss use.

@clebertsuconic
Copy link
Copy Markdown
Contributor

@michaelandrepearce did you ammend anything on your PR? I can't build it any longer (especially outside of Linux.. like on a mac).

@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

No haven't touched it

@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

michaelandrepearce commented Mar 22, 2017

@clebertsuconic i just did a complete fresh clone and rebuild of this PR/My branch which i rebased the other day. It still built ok for me. Should i rebase again? Maybe some upstream commit in the last few days might have broken something?

@asfbot
Copy link
Copy Markdown

asfbot commented Mar 23, 2017

=?utf-8?Q?Michael_Andr=C3=A9_Pearce?= on dev@activemq.apache.org replies:
Hi Clebert=20

Any luck getting it building?

Did you take my branch or did you take the pr and apply to a local fork/bran=
ch with other changes? If so can you point me to that so I can see what is u=
p.

Also re the amqp stuff, I see a lot of upstream changes if I take latest now=
is that expected to resolve the proton memory issue seen during load test w=
ith quiver that was stopping this being merged?

Cheers
Mike

Sent from my iPhone
R/My branch which i rebased the other day. It still built ok for me. Should i=
rebase again? Maybe some commit in the last few days might have broken some=
thing?
r
e

@asfbot
Copy link
Copy Markdown

asfbot commented Mar 23, 2017

Clebert Suconic on dev@activemq.apache.org replies:
I can't make it to build on mac and windows.

I need to make some changes also as we talked. I was using this to test
the proton changes. It works and builds on Linux. Need to fix the build on
mac and windows now.
ee
:
s
--=20
Clebert Suconic

@asfbot
Copy link
Copy Markdown

asfbot commented Mar 23, 2017

=?utf-8?Q?Michael_Andr=C3=A9_Pearce?= on dev@activemq.apache.org replies:
That's odd re Mac not building for you, my laptop is MacBook and primary dev=
env , as noted i rebuilt to recheck using a fresh clone and wiped my local m=
aven repo, I also built on fedora using aws.

What is the extract failure you're seeing?=20
ee

@asfbot
Copy link
Copy Markdown

asfbot commented Mar 23, 2017

Clebert Suconic on dev@activemq.apache.org replies:
Not finding epoll pakcets on netty connector.

Now that you mentioned. I think rebuilt netty. And I provably did
something wrong.

Will redo in the morning with a fresh Maven. Going to sleep now :)
t
d
n
st
ad
is
ve
e
--=20
Clebert Suconic

clebertsuconic pushed a commit to clebertsuconic/artemis that referenced this pull request Mar 23, 2017
The following changes are made to support Epoll.

Refactored SharedNioEventLoopGroup into renamed SharedEventLoopGroup to be generic (as so we can re-use for both Nio and Epoll)

Add support and toggles for Epoll in NettyAcceptor and NettyConnector (with fall back to NIO if cannot load Epoll)

Removal from code of PartialPooledByteBufAllocator, caused bad address when doing native, and no longer needed - see jira discussion

New Connector Properties:

useEpoll - toggles to use epoll or not, default true (but we failback to nio gracefully)
remotingThreads = same behaviour as nioRemotingThreads. Previous property is depreated.
useGlobalWorkerPool = same behaviour as useNioGlobalWorkerPool. Old property is deprecated.

New Acceptor Properties:

useEpoll - toggles to use epoll or not, default true (but we failback to nio gracefully)
useGlobalWorkerPool = same behaviour as useNioGlobalWorkerPool but for Epoll.

This closes apache#1093
@asfgit asfgit closed this in a610748 Mar 23, 2017
@clebertsuconic
Copy link
Copy Markdown
Contributor

Please review master ?

@michaelandrepearce
Copy link
Copy Markdown
Contributor Author

michaelandrepearce commented Mar 24, 2017 via email

clebertsuconic pushed a commit to clebertsuconic/artemis that referenced this pull request Apr 11, 2017
The following changes are made to support Epoll.

Refactored SharedNioEventLoopGroup into renamed SharedEventLoopGroup to be generic (as so we can re-use for both Nio and Epoll)

Add support and toggles for Epoll in NettyAcceptor and NettyConnector (with fall back to NIO if cannot load Epoll)

Removal from code of PartialPooledByteBufAllocator, caused bad address when doing native, and no longer needed - see jira discussion

New Connector Properties:

useEpoll - toggles to use epoll or not, default true (but we failback to nio gracefully)
remotingThreads = same behaviour as nioRemotingThreads. Previous property is depreated.
useGlobalWorkerPool = same behaviour as useNioGlobalWorkerPool. Old property is deprecated.

New Acceptor Properties:

useEpoll - toggles to use epoll or not, default true (but we failback to nio gracefully)
useGlobalWorkerPool = same behaviour as useNioGlobalWorkerPool but for Epoll.

This closes apache#1093
franz1981 pushed a commit to franz1981/activemq-artemis that referenced this pull request May 12, 2017
The following changes are made to support Epoll.

Refactored SharedNioEventLoopGroup into renamed SharedEventLoopGroup to be generic (as so we can re-use for both Nio and Epoll)

Add support and toggles for Epoll in NettyAcceptor and NettyConnector (with fall back to NIO if cannot load Epoll)

Removal from code of PartialPooledByteBufAllocator, caused bad address when doing native, and no longer needed - see jira discussion

New Connector Properties:

useEpoll - toggles to use epoll or not, default true (but we failback to nio gracefully)
remotingThreads = same behaviour as nioRemotingThreads. Previous property is depreated.
useGlobalWorkerPool = same behaviour as useNioGlobalWorkerPool. Old property is deprecated.

New Acceptor Properties:

useEpoll - toggles to use epoll or not, default true (but we failback to nio gracefully)
useGlobalWorkerPool = same behaviour as useNioGlobalWorkerPool but for Epoll.

This closes apache#1093
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants