Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-396 Poll delay when no readers are available #308

Closed
wants to merge 8 commits into from

Conversation

pgoranss
Copy link
Contributor

Adding polling pause to slow relentless polling from Connect when no readers are available.

This should address the CPU spike in
https://issues.jboss.org/browse/DBZ-396

@rhauch
Copy link
Member

rhauch commented Oct 18, 2017

I actually don't think we want this delay on ever call to ChainedReader.poll() when there are no results, because that will happen frequently when MySQL binlog has no updates and a delay is already incorporated in that case.

As mentioned on DBZ-396, I think the better way of handling this is to introduce a new BlockingReader with a poll() method that blocks until stop() is called. So something like:

private final CoundDownLatch latch = new CoundDownLatch(1);
public void poll() throws InterruptedException {
    latch.await();
}
public void stop() {
   latch.countDown();
}

Then, in MySqlConnectorTask.start(...), we'd want to add this reader with something like:

                readers.add(snapshotReader);

                if (taskContext.isInitialSnapshotOnly()) {
                    logger.warn("This connector will only perform a snapshot, and will stop after that completes.");
                    readers.add(new BlockingReader());  // <------ ADD THIS LINE
                    readers.uponCompletion("Connector configured to only perform snapshot, and snapshot completed successfully. Connector will terminate.");
                } else {

This will essentially block poll() until stop is called, which I think is what we want. @gunnarmorling, what do you think?

@pgoranss
Copy link
Contributor Author

pgoranss commented Oct 19, 2017

@rhauch I think the existing poll delay you are referring to (when MySQL binlog has no updates), is implemented in the AbstractReader poll(). It delays in this while loop until records are available, at which point it returns those records up the stack thru ChainedReader etc.

Meanwhile, the ChainedReader poll() implementation has no delay when no readers are available. So I believe adding a delay in here like I did will only be executed when switching between readers, or if none are available.

In any case, need some delay since as you said we can't truly stop the connector. I think the BlockingReader idea is a good approach too and I could PoC that. My questions on that:

  1. Would this always be running and therefore ChainedReader never can be in a stopped state?
  2. Would the BlockingReader delay be hardcoded (5 min)?
  3. Any performance impact adding this new Reader?

@rhauch
Copy link
Member

rhauch commented Oct 19, 2017

  1. Would this always be running and therefore ChainedReader never can be in a stopped state?

After the snapshot completes, the connector would transition to the BlockingReader and would continue forever doing nothing until the connector is stopped. Again, unfortunately Connect only allows a connector to stop by throwing an exception and then considering that connector as "failed", so IMO that's not really an option. The snapshot JMX metrics should be able to tell you the progress of the snapshot and whether it has completed. It's also probably worth adding an INFO level log message to the BlockingReader saying something like: "Connector has completed all of its work but will continue in the running state. It can be shutdown at any time."

  1. Would the BlockingReader delay be hardcoded (5 min)?

If you look at my suggested code above, there actually is no delay. The code uses a latch to block indefinitely until the task is stopped.

  1. Any performance impact adding this new Reader?

Well it certainly would perform far better than the current code, and it's also a bit more efficient than sleeping and periodically waking up. Plus, the sleep approach might cause issues with stopping the connector, or at least would require the stop() method to interrupt the thread. That seems really complicated and brittle, whereas using the latch is super simple.

@pgoranss
Copy link
Contributor Author

pgoranss commented Oct 19, 2017

@rhauch In your example you have the BlockingReader just only added to MySqlConnectorTask for case:
taskContext.isInitialSnapshotOnly()

Do you see a reason to not add it by default, or is it truly specific to this case? i.e. always add it to ChainedReaders rather than only in this case

@gunnarmorling
Copy link
Member

i.e. always add it to ChainedReaders rather than only in this case

We wouldn't want to add it if the binlog reader takes over after the snapshot reader.

+1 on @rhauch's approach overall, it seems very reasonable to me.

@rhauch
Copy link
Member

rhauch commented Oct 19, 2017

I agree with @gunnarmorling: there's no value to add it after the SnapshotReader, since the SnapshotReader never finishes, either. Adding it only will add confusion for developers in the future.

@pgoranss
Copy link
Contributor Author

I took at stab at this and updated the PR, comments welcome

Copy link
Member

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have a few questions below.

@Override
protected void doStop() {
this.completeSuccessfully();
latch.countDown();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest using a try-finally block here, so that the latch is counted down no matter what.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All set

latch.await();
return super.poll();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractReader seems to be fairly heavy weight. Did you consider having this just implement Reader, and if so what were the tradeoffs that made you go this route?

(Really, I'm just asking. I haven't thought much about this and am interested in your thoughts.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tried just implementing Reader it ran into some issues. One example: it appeared to Re-Snapshot the DB every time the connector restarted.

So I figured extending AbstractReader was a requirement. Maybe it's not I'm just new to the codebase so didn't dig too deep.

@gunnarmorling
Copy link
Member

gunnarmorling commented Oct 19, 2017 via email

@pgoranss
Copy link
Contributor Author

@gunnarmorling Yes agreed, I will explore this further.

@gunnarmorling
Copy link
Member

Hey @pgoranss, just checking in on this one. Did you have a chance to gain any further insights? Thanks!

@pgoranss
Copy link
Contributor Author

@gunnarmorling Not yet, I've been tied up with some other work, but it's on my list to follow up in the next week.

@gunnarmorling
Copy link
Member

Ok, cool. Thanks for the update!

@pgoranss
Copy link
Contributor Author

pgoranss commented Nov 7, 2017

@gunnarmorling I've updated the BlockingReader to just implement Reader. I did copy the uponCompletion functionality from AbstractReader since I think that was the correct implementation. Let me know your thoughts.

Copy link
Member

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM!

@gunnarmorling gunnarmorling self-assigned this Nov 14, 2017
@gunnarmorling
Copy link
Member

Hi @pgoranss, I've squashed all the commits into one, rebased it and applied to master and 0.6. Also added a commit to get rid of some trailing whitespace. Thanks a lot for this contribution!

@pgoranss pgoranss deleted the DBZ-396 branch November 14, 2017 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants