New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in input event handling leads to NullPointerException #5106

Open
bernd opened this Issue Sep 13, 2018 · 0 comments

Comments

Projects
None yet
2 participants
@bernd
Member

bernd commented Sep 13, 2018

Expected Behavior

It should be possible to create an input and update it right after the creation. (e.g. when triggered from code)

Current Behavior

When a user creates an input and updates it right after the creation, the input doesn't start and throws a NullPointerException.

Input creation and update triggers events which are handled by the InputEventListener.
The event handler for the update event restarts the input by calling .stop() and then .launch().

The problem is this case is, that the input gets restarted while it is still starting up.

I noticed the issue when using a GELF UDP input which uses the UDPTransport class.

The UDPTransport.launch() method is using the bootstrap field several times during startup.

public void launch(final MessageInput input) throws MisfireException {
try {
bootstrap = getBootstrap(input);
final NettyTransportType transportType = nettyTransportConfiguration.getType();
int numChannels = (transportType == NettyTransportType.EPOLL || transportType == NettyTransportType.KQUEUE) ? workerThreads : 1;
for (int i = 0; i < numChannels; i++) {
LOG.debug("Starting channel on {}", socketAddress);
bootstrap.bind(socketAddress)
.addListener(new InputLaunchListener(channels, input, getRecvBufferSize()))
.syncUninterruptibly();
}
} catch (Exception e) {
throw new MisfireException(e);
}
}

In UDPTransport.stop() the bootstrap variable is set to null.

public void stop() {
if (channels != null) {
channels.close().syncUninterruptibly();
}
if (eventLoopGroup != null) {
eventLoopGroup.shutdownGracefully();
}
bootstrap = null;
}

So while the input is still running the code in UDPTransport.launch() the bootstrap field is set to null when UDPTransport.stop() gets triggered and the next time .launch() is using it, a NullPointerException gets thrown.

Example:

2018-09-13 20:32:43,954 INFO : org.graylog2.inputs.InputEventListener - ########## Input created: 5b9aad4b38f80c19f997f71a
2018-09-13 20:32:43,957 INFO : org.graylog2.inputs.transports.UdpTransport - ########## ENTER .launch()
2018-09-13 20:32:43,958 INFO : org.graylog2.inputs.InputEventListener - ########## Input updated: 5b9aad4b38f80c19f997f71a
2018-09-13 20:32:43,958 INFO : org.graylog2.inputs.InputStateListener - Input [GELF UDP/5b9aad4b38f80c19f997f71a] is now STARTING
2018-09-13 20:32:43,959 INFO : org.graylog2.inputs.transports.UdpTransport - ########## ENTER .stop()
2018-09-13 20:32:43,962 INFO : org.graylog2.inputs.InputStateListener - Input [GELF UDP/5b9aad4b38f80c19f997f71a] is now STOPPING
2018-09-13 20:32:43,965 WARN : org.graylog2.inputs.transports.UdpTransport - receiveBufferSize (SO_RCVBUF) for input GELFUDPInput{title=GELF, type=org.graylog2.inputs.gelf.udp.GELFUDPInput, nodeId=28164cbe-4ad9-4c9c-a76e-088655aa7889} (channel [id: 0xb4982537, L:/0:0:0:0:0:0:0:0%0:12201]) should be 262144 but is 425984.
2018-09-13 20:32:43,973 INFO : org.graylog2.inputs.transports.UdpTransport - ########## EXIT  .stop()
2018-09-13 20:32:43,975 INFO : org.graylog2.inputs.InputStateListener - Input [GELF UDP/5b9aad4b38f80c19f997f71a] is now STOPPED
2018-09-13 20:32:43,976 INFO : org.graylog2.inputs.InputStateListener - Input [GELF UDP/5b9aad4b38f80c19f997f71a] is now TERMINATED
2018-09-13 20:32:44,066 INFO : org.graylog2.inputs.transports.UdpTransport - ########## EXIT  .launch()
2018-09-13 20:32:44,066 WARN : org.graylog2.inputs.transports.UdpTransport - receiveBufferSize (SO_RCVBUF) for input GELFUDPInput{title=GELF, type=org.graylog2.inputs.gelf.udp.GELFUDPInput, nodeId=28164cbe-4ad9-4c9c-a76e-088655aa7889} (channel [id: 0xeb347039, L:/0:0:0:0:0:0:0:0%0:12201]) should be 262144 but is 425984.
2018-09-13 20:32:44,068 ERROR: org.graylog2.shared.inputs.InputLauncher - The [org.graylog2.inputs.gelf.udp.GELFUDPInput] input with ID <5b9aad4b38f80c19f997f71a> misfired. Reason: null
org.graylog2.plugin.inputs.MisfireException: org.graylog2.plugin.inputs.MisfireException: java.lang.NullPointerException
	at org.graylog2.plugin.inputs.MessageInput.launch(MessageInput.java:158) ~[classes/:?]
	at org.graylog2.shared.inputs.InputLauncher$1.run(InputLauncher.java:84) [classes/:?]
	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) [metrics-core-4.0.2.jar:4.0.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
	at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266) [?:1.8.0_181]
	at java.util.concurrent.FutureTask.run(FutureTask.java) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: org.graylog2.plugin.inputs.MisfireException: java.lang.NullPointerException
	at org.graylog2.inputs.transports.UdpTransport.launch(UdpTransport.java:120) ~[classes/:?]
	at org.graylog2.plugin.inputs.MessageInput.launch(MessageInput.java:155) ~[classes/:?]
	... 8 more
Caused by: java.lang.NullPointerException
	at org.graylog2.inputs.transports.UdpTransport.launch(UdpTransport.java:115) ~[classes/:?]
	at org.graylog2.plugin.inputs.MessageInput.launch(MessageInput.java:155) ~[classes/:?]
	... 8 more
2018-09-13 20:32:44,072 INFO : org.graylog2.inputs.InputStateListener - Input [GELF UDP/5b9aad4b38f80c19f997f71a] is now FAILED

I added some custom log statements to the code to illustrate the issue.

Possible Solution

We have to make sure to serialize the event handlers for the same input so they cannot run in parallel.

I think we can either add locks to the InputEventHandler or update all inputs so they are thread safe. The issue in this case is, that the UDPTransport is not thread safe. So we have to decide if the expect them to be thread safe or not.

Steps to Reproduce (for bugs)

I noticed the issue while installing a content pack that contains an input with extractors and/or static fields. The content pack installation code first creates an input and then updates it to add extractors and static fields right after that. This seems to be so fast that the race condition gets triggered.

  1. Create content pack with an input that has extractors and/or static fields
  2. Delete the input
  3. Install the content pack with the input
  4. Check the logs, the input will probably not start due to the issue described above

Context

Another problem is, that our input persistence layer doesn't offer an API to atomically create an input with extractors, converters and static fields. (it might be possible by adding the additional fields to a Map<String, Object> on input creation) But even if it would offer that API, the input system is still vulnerable to this race condition.

Your Environment

  • Graylog Version: 3.0.0-alpha.2-SNAPSHOT (rev b8b6a06)

@bernd bernd added bug blocker labels Sep 13, 2018

@bernd bernd added this to the 3.0.0 milestone Sep 13, 2018

@jalogisch jalogisch added the triaged label Sep 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment