Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HystrixDashboard - doesn't seem to display the right TPS #1251

Closed
jmirc opened this issue Jun 18, 2016 · 4 comments
Closed

HystrixDashboard - doesn't seem to display the right TPS #1251

jmirc opened this issue Jun 18, 2016 · 4 comments

Comments

@jmirc
Copy link
Contributor

jmirc commented Jun 18, 2016

I am using the version 1.1.2.RELEASE of turbine stream and I think the dashboard has an issue. Let me know, if I am wrong or not.

Here is an example of the data that I got from the turbine stream

data: {"rollingCountFallbackFailure":0,"rollingCountFallbackSuccess":0,"propertyValue_circuitBreakerRequestVolumeThreshold":"200","propertyValue_circuitBreakerForceOpen":false,"propertyValue_metricsRollingStatisticalWindowInMilliseconds":"10000","latencyTotal_mean":55,"type":"HystrixCommand","rollingCountResponsesFromCache":0,"TypeAndName":"TypeAndName=>HystrixCommand_lcs.aurora-read","rollingCountTimeout":0,"propertyValue_executionIsolationStrategy":"THREAD","instanceId":"lcs:08fe0c944c1c:8080","rollingCountFailure":0,"rollingCountExceptionsThrown":0,"latencyExecute_mean":55,"isCircuitBreakerOpen":false,"errorCount":0,"group":"AuroraLcsCacheDao","rollingCountSemaphoreRejected":0,"latencyTotal":{"0":35,"25":52,"50":69,"75":69,"90":85,"95":87,"99":143,"99.5":143,"100":441},"requestCount":2597,"rollingCountCollapsedRequests":0,"rollingCountShortCircuited":0,"latencyExecute":{"0":35,"25":52,"50":68,"75":69,"90":85,"95":87,"99":144,"99.5":144,"100":441},"propertyValue_circuitBreakerSleepWindowInMilliseconds":"5000","currentConcurrentExecutionCount":1,"propertyValue_executionIsolationSemaphoreMaxConcurrentRequests":"10","errorPercentage":0,"rollingCountThreadPoolRejected":0,"propertyValue_circuitBreakerEnabled":true,"propertyValue_executionIsolationThreadInterruptOnTimeout":true,"propertyValue_requestCacheEnabled":true,"rollingCountFallbackRejection":0,"propertyValue_requestLogEnabled":true,"rollingCountSuccess":2612,"propertyValue_fallbackIsolationSemaphoreMaxConcurrentRequests":"50","InstanceKey":"InstanceKey=>lcs:08fe0c944c1c:8080","propertyValue_circuitBreakerErrorThresholdPercentage":"50","propertyValue_circuitBreakerForceClosed":false,"name":"lcs.aurora-read","reportingHosts":17,"propertyValue_executionIsolationThreadPoolKeyOverride":"null","propertyValue_executionIsolationThreadTimeoutInMilliseconds":"10000"}

Here is the extracted interesting data:
propertyValue_metricsRollingStatisticalWindowInMilliseconds: 10000
reportingHosts: 17 -- As it is described this will get summed across instances in a cluster
requestCount: 2597

If I am looking the code in the hystrixCommand.js file I can find the following method

function preProcessData(data) {
...
   // do math
   convertAllAvg(data);
   calcRatePerSecond(data);
}

The convertAllAvg method updates the propertyValue_metricsRollingStatisticalWindowInMilliseconds value in the following way

    // the following will break when it becomes a compound string if the property is dynamically changed
   convertAvg(data, "propertyValue_metricsRollingStatisticalWindowInMilliseconds", false);

....

    function convertAvg(data, key, decimal) {
    if (decimal) {
       data[key] = getInstanceAverage(data[key], data["reportingHosts"], decimal);
    } else {
       data[key] = getInstanceAverage(data[key], data["reportingHosts"], decimal);
    }
   }

After this call, the propertyValue_metricsRollingStatisticalWindowInMilliseconds is set to Math.floor(10000 / 17) = 588

So now, the TPS is starting to be wrong.

function calcRatePerSecond(data) {
  var numberSeconds = data["propertyValue_metricsRollingStatisticalWindowInMilliseconds"] / 1000; (1)
  var totalRequests = data["requestCount"]; (2)
  if (totalRequests < 0) {
    totalRequests = 0;
  }
  data["ratePerSecond"] =  roundNumber(totalRequests / numberSeconds); (3)
  data["ratePerSecondPerHost"] =  roundNumber(totalRequests / numberSeconds / data["reportingHosts"]) (4)
}

(1) numberSeconds = 588 / 1000 = 0.588
(2) totalRequests = 2597
(3) ratePerSecond = 2597 / 0.588 = 4416
(4) ratePerSecondPerHost = 2597 / 0.588 / 17 = 259

It should be

(1) numberSeconds = 10000 / 1000 = 10
(2) totalRequests = 2597
(3) ratePerSecond = 2597 / 10 = 259.7
(4) ratePerSecondPerHost = 2597 / 10000 / 17 = 15.27

In conclusion we should remove the line

 convertAvg(data, "propertyValue_metricsRollingStatisticalWindowInMilliseconds", false);

Let me know if I am wrong.

@mattrjacobs
Copy link
Contributor

Yeah, I definitely agree with your explanation. Thanks for the explanation. I'd be happy to review a PR for a fix here - this looks like a bug to me.

jmirc added a commit to jmirc/Hystrix that referenced this issue Jun 21, 2016
mattrjacobs added a commit that referenced this issue Jun 21, 2016
spencergibb pushed a commit to spring-cloud/spring-cloud-netflix that referenced this issue Jun 21, 2016
spencergibb pushed a commit to spring-cloud/spring-cloud-netflix that referenced this issue Jun 21, 2016
@mattrjacobs
Copy link
Contributor

Closed via #1253

@tjuchniewicz
Copy link

tjuchniewicz commented Sep 21, 2017

What about TPS calculation for thread pool? Now it differs from calculation for commands. Should be the same.

Please also see my comment: spring-cloud/spring-cloud-netflix#1113 (comment). We introduced incompatibility with Turbine.

@cforce
Copy link

cforce commented Mar 1, 2018

Which versions of hystrix and which versions of turbine (as spring cloud release) are not compatible?
What is the way forward, what libs version do i have to get rid of?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants