New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric "vertx.eventbus.messages.pending" can contain incorrect value. #1480

Closed
sfitts opened this Issue Jun 28, 2016 · 15 comments

Comments

Projects
None yet
3 participants
@sfitts

sfitts commented Jun 28, 2016

Version: 3.3.0

In the case where a handler is removed after a message is sent, but before it is received then the pending count can be incremented, but not decremented. Consider the following sequence:

A message is delivered locally and a handler is found at line 366 in EventBusImpl.java. This increments the pending count metric and calls deliverToHandler. When the runOnContext block is scheduled it is determined that the handler has been removed at line 498 and so the message is never actually processed. As a result the pending count is never decremented making is seem like there are pending messages when in fact there aren't.

This happens most often with the "credit" handler which is installed to receive the credit based flow control messages. It is often the case that the hander gets removed because the producer has been closed, but the consumer on the other side doesn't know this and sends the credit update.

@sfitts

This comment has been minimized.

sfitts commented Jun 28, 2016

This also results in the message contributing to the "delivered" metric, even though it isn't (though since the % of these is low, that's not as problematic).

@vietj

This comment has been minimized.

Contributor

vietj commented Jun 28, 2016

it seems to me that extrapolating a pending count based on these callbacks is not a good idea (although I think it's useful) because a message may not be effectively delivered for the reason you explained but there are other reasons, like a paused MessageConsumer that is unregistered.

@vietj

This comment has been minimized.

Contributor

vietj commented Jun 28, 2016

Perhaps that having a callback to signal that a message has been received for a particular handler would be more useful and provide something equivalent :

void enqueuedMessage(H handler, boolean local);

called in deliverToHandler - this would maintain a pending count per handler, that the implementation could handle as it wants. This way the metrics implementation is aware that a message shall be delivered for a particular handler and if that handler is unregistered it can update anything accordingly.

@vietj

This comment has been minimized.

@sfitts

This comment has been minimized.

sfitts commented Jun 29, 2016

OK, took me a bit to reason out the approach here, but I think it looks good. Having the pending count is definitely useful since it is the only way to know if messages are getting delayed due to lack of scheduling resources. Tracking that on a per-handler basis makes sense (since it gives you a more granular picture of which resources are the issue).

Based on this I assume that you'd mark the pending count for the handler in scheduleMessage and then adjust in either beginHandleMessage or unregisterHandler. You could maintain a global count if you wanted by using state in HandlerMetric. This would also allow for tracking of the time spent waiting to be scheduled.

Thanks for looking at this.

vietj added a commit to vert-x3/vertx-dropwizard-metrics that referenced this issue Jun 30, 2016

Fix the bug with pending count that is not correct when an handler is…
… unsubscribed when the message is scheduled for delivery - see eclipse-vertx/vert.x#1480
@vietj

This comment has been minimized.

Contributor

vietj commented Jun 30, 2016

the schedule message has been implemented and tested for Dropwizard in https://github.com/vert-x3/vertx-dropwizard-metrics/tree/eventbusmetrics-schedule-message

can you have a look and see how it goes for you ?

@vietj vietj referenced this issue Jun 30, 2016

Closed

Vert.x 3.3.1 umbrella #132

21 of 21 tasks complete
@sfitts

This comment has been minimized.

sfitts commented Jul 1, 2016

Will do -- I have a newbie question however -- where are the snapshot releases hosted? I looked in sonatype snapshots and it seems to only have 3.0.0. Also came up empty poking around the wiki, though I could easily have missed it. TIA for the help.

@vietj

This comment has been minimized.

Contributor

vietj commented Jul 1, 2016

    <profile>
      <id>sonatype-repository</id>
      <repositories>
        <repository>
          <id>sonatype-nexus-releases</id>
          <url>https://oss.sonatype.org/content/repositories/releases</url>
          <layout>default</layout>
        </repository>
        <repository>
          <id>sonatype-nexus-snapshots</id>
          <url>https://oss.sonatype.org/content/repositories/snapshots</url>
          <snapshots>
            <enabled>true</enabled>
          </snapshots>
          <layout>default</layout>
          <releases>
            <enabled>false</enabled>
          </releases>
        </repository>
    </repositories>
    </profile>
@vietj

This comment has been minimized.

@sfitts

This comment has been minimized.

sfitts commented Jul 1, 2016

Thanks -- not sure how I missed that.

@vietj

This comment has been minimized.

Contributor

vietj commented Jul 4, 2016

hi any feedback ?

@sfitts

This comment has been minimized.

sfitts commented Jul 5, 2016

Sorry -- I'm obviously missing something basic. I tried this on Thursday, but didn't see the newer code in sonatype, so I figured I'd wait a bit. Then life intruded so I just tried it again now I and still don't see the new code. Here is the URL to the sources JAR that I think is current:

https://oss.sonatype.org/content/repositories/snapshots/io/vertx/vertx-dropwizard-metrics/3.3.1-SNAPSHOT/vertx-dropwizard-metrics-3.3.1-20160704.234042-13-sources.jar

In my gradle build I have the following:

    repositories {
        maven {
            url 'https://oss.sonatype.org/content/repositories/snapshots'
        }
        mavenCentral()
        jcenter()

        // For hadrian analytics
        maven {
            url "http://repository.opendatagroup.com/maven"
        }

And I'm able to reference 3.3.1-SNAPSHOT just fine, but I don't see your new code. Sorry, I know I'm not being much help right now, but I'm not sure what I'm doing wrong.

@vietj

This comment has been minimized.

Contributor

vietj commented Jul 5, 2016

hum actually it's not deployed because it's not merged, I'm afraid you should build both yourself.

@sfitts

This comment has been minimized.

sfitts commented Jul 5, 2016

Got it -- failed to notice it was still in a branch. Unfortunately getting everything built and tested with our server is going to take a bit and I won't have time until after I'm done with some other work stuff. Walked through the code and it looks good, though I did have one quick question.

Once the code is available in a SNAPSHOT release I can easily give it a try and tell you how it behaves in our environment (for what's the worth).

@tsegismont

This comment has been minimized.

Contributor

tsegismont commented Jul 26, 2017

@sfitts the fix has been released now. Please reopen if there's anything outstanding.

@tsegismont tsegismont closed this Jul 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment