Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-1151: Batching in DisruptorQueue #765

Merged
merged 7 commits into from
Oct 30, 2015

Conversation

revans2
Copy link
Contributor

@revans2 revans2 commented Sep 28, 2015

I wanted to come up with an alternative to #694 that does batching at the queue level instead of at that spout level. This is the result of that work, although it still needs some testing/cleanup before I consider it fully ready. This is based off of #750 using the latest version of disruptor, and has a few bug fixes for automatic back-pressure that I need to split out and include as a separate pull request.

The work here is driven by results I got from some micro benchmarks I wrote looking at what the bottlenecks are in storm and what the theoretical limit is for storm. In this case doing a simple word count topology as represented by Q_SWC_X_04. With the same batching here the throughput when from 300,000 sentences/second processed to around 1.5 million, as run on my Mac Book Pro Laptop.

https://github.com/revans2/storm-micro-perf

The reason for queue level batching is that it gives a finer grained control over latency, which I would like to use for automatic tuning in the future. It also is a smaller code change. If others disagree I am okay with #694. But I do want to do a head to head comparison between the two approaches.

@harshach
Copy link
Contributor

@revans2 this looks great . I am in favor of batching at disruptor. So this means we don't have any issues with acking?

@revans2
Copy link
Contributor Author

revans2 commented Sep 28, 2015

I am just rebasing the code now so you can test that out yourself. This code has no issues with acking. But there are a few real issues.

First the latency on low throughput queues is much higher. This is because it has to wait for the batch to time out. That timeout is set to 1 ms by default, so it is not that bad, but we should be able to do some on the fly adjustments in a follow on JIRA to dynamically adjust the batch size for each queue to compensate.

Second the number of threads used is a lot more. 1 more per disruptor queue. I expect to reduce the total number of disruptor queues once I have optimized other parts of the code. As it stands right now I don't want to do that, because the two queues per bolt/spout design still improves performance in many cases.

Third in the worst case situation it is possible to allocate many more objects than previously. It is not actually that many more, we already allocate a lot of objects, which needs to be looked at on a separate JIRA at some point.

Also I don't want to shove this code in without doing a real comparison between the two approaches and the code. This is one way of doing batching, but there are others that may have advantages over this, or may compliment this approach as well. I just want storm to eventually be a lot closer to the 1 million sentences/second mark than it is now.

@mjsax
Copy link
Member

mjsax commented Sep 28, 2015

Curious to see first performance results :)

@revans2
Copy link
Contributor Author

revans2 commented Sep 29, 2015

@mjsax The difference is not that huge yet. It is 2x in a number of use cases. But we have a lot of slowness in the metrics processing code, which, at least in the case of word count, is dominating the performance, which is probably why you were not seeing a 6x performance improvement in your pull like you were seeing before. I am going to be doing some follow on optimization work on that. The big thing right now is that the System CPU utilization went from 15-30% depending on the OS, etc. to 2-6%. This frees up a huge amount of CPU for other processing, so the overall throughput capability of a node increased significantly.

(delete-node cluster-state path)) ;; delete the znode since the worker is not congested
(if on?
(set-ephemeral-node cluster-state path nil acls))))) ;; create the znode since worker is congested
(when (not on?)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is one conditional statement to run, is there any special reason for replacing "if" with "when"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mostly that I put in some debugging and didn't want an (if (do (log-message) ...))) to fix an issue, and ended up not changing it back after removing the debug logging. If you want me to change it back and add back in the comments I can.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK, never mind.

@zhuoliu
Copy link

zhuoliu commented Sep 29, 2015

Really exciting work! It is very interesting that we move the overflow buffer inside the disruptor queue and use it for batch publishing purpose. I would also look forward to seeing how tuning the disruptor queue size, batch interval will impact the throughput of topologies with different tuple-sending/processing speeds.
A little concern is that this triples the number of threads for each executor (from 1 to 3). If more extensive tests show that it does not cause too much contention on CPU and thus lower throughput/high resource usage, it should be fine.

@kishorvpatil
Copy link
Contributor

Ran some sanity tests. LGTM. +1

@revans2
Copy link
Contributor Author

revans2 commented Oct 10, 2015

I have some new test results. I did a comparison of several different branches. I looked at this branch, the upgraded-disruptor branch #750, STORM-855 #694, and apache-master 0.11.0-SNAPSHOT (04cf3f6). I had to make a few modifications to get my test to work. I applied the following patch https://gist.github.com/revans2/84301ef0fde0dc4fbe44 to each of the branches. For STORM-855 I had to modify the test a bit so it would optionally do batching. In that case batching was enabled on all streams and all spouts and bolts.

I then ran the test at various throughputs 100, 200, 400, 800, 1600, 3200, 6400, 10000, 12800, 25600. and possibly a few others when looking for it to hit the maximum throughput, and different batch sizes.

Each test ran for 5 mins. Here is the results of that test, excluding the tests where the worker could not keep up with the rate.

99%-ile ns 99.9%-ils ns throughput branch-batch mean latency ns avg service latency ms std-dev ns
2,613,247 4,673,535 100 STORM-855-0 2,006,347.25 1.26 2,675,778.36
2,617,343 4,423,679 200 STORM-855-0 1,991,238.45 1.29 2,024,687.45
2,623,487 5,619,711 400 STORM-855-0 1,999,926.81 1.24 1,778,335.92
2,627,583 4,603,903 1600 STORM-855-0 1,971,888.24 1.30 893,085.40
2,635,775 8,560,639 800 STORM-855-0 2,010,286.65 1.35 2,134,795.12
2,654,207 302,252,031 3200 STORM-855-0 2,942,360.75 2.13 16,676,136.60
2,684,927 124,190,719 3200 batch-v2-1 2,154,234.45 1.41 6,219,057.66
2,701,311 349,700,095 5000 batch-v2-1 2,921,661.67 1.78 18,274,805.30
2,715,647 7,356,415 100 storm-base-1 2,092,991.53 1.30 2,447,956.21
2,723,839 4,587,519 400 storm-base-1 2,082,835.21 1.31 1,978,424.49
2,723,839 6,049,791 100 dist-upgraade-1 2,091,407.68 1.31 2,222,977.89
2,725,887 10,403,839 1600 batch-v2-1 2,010,694.30 1.27 2,095,223.90
2,725,887 4,607,999 200 storm-base-1 2,074,784.50 1.30 1,951,564.93
2,727,935 4,513,791 200 dist-upgraade-1 2,082,025.31 1.33 2,057,591.08
2,729,983 4,182,015 400 dist-upgraade-1 2,056,282.29 1.43 862,428.67
2,732,031 4,632,575 800 storm-base-1 2,092,514.39 1.27 2,231,550.66
2,734,079 4,472,831 800 dist-upgraade-1 2,095,994.08 1.28 1,870,953.62
2,740,223 4,192,255 200 batch-v2-1 2,011,025.19 1.21 911,556.19
2,742,271 4,726,783 1600 storm-base-1 2,089,581.40 1.35 2,410,668.79
2,748,415 4,444,159 400 batch-v2-1 2,055,600.78 1.34 1,729,257.92
2,748,415 4,575,231 100 batch-v2-1 2,035,920.21 1.31 1,213,874.52
2,754,559 16,875,519 1600 dist-upgraade-1 2,098,441.13 1.35 2,279,870.41
2,754,559 3,969,023 800 batch-v2-1 2,026,222.88 1.29 767,491.71
2,793,471 53,477,375 3200 storm-base-1 2,147,360.05 1.42 3,668,366.37
2,801,663 147,062,783 3200 dist-upgraade-1 2,358,863.31 1.59 7,574,577.81
13,344,767 180,879,359 6400 batch-v2-100 11,319,553.69 10.62 7,777,381.54
13,369,343 15,122,431 3200 batch-v2-100 10,699,832.23 10.02 1,623,949.38
13,418,495 15,392,767 800 batch-v2-100 10,589,813.17 9.86 2,439,134.80
13,426,687 14,680,063 400 batch-v2-100 10,738,973.68 10.03 2,298,229.99
13,484,031 14,368,767 200 batch-v2-100 10,941,653.28 10.20 2,471,899.43
13,508,607 14,262,271 100 batch-v2-100 11,099,257.68 10.35 1,658,054.66
13,524,991 14,376,959 1600 batch-v2-100 10,723,471.83 10.00 1,477,621.07
346,554,367 977,272,831 12800 batch-v2-100 18,596,303.93 15.59 78,326,501.83
710,934,527 827,326,463 4000 STORM-855-100 351,305,653.90 339.28 141,283,307.30
783,286,271 1,268,776,959 5000 STORM-855-100 332,417,358.65 312.07 139,760,316.82
888,668,159 1,022,361,599 3200 STORM-855-100 445,646,342.60 431.55 179,065,279.65
940,048,383 1,363,148,799 6400 storm-base-1 20,225,300.17 17.17 134,848,974.52
1,043,333,119 1,409,286,143 10000 batch-v2-1 22,750,840.18 6.13 146,235,076.73
1,209,008,127 1,786,773,503 6400 dist-upgraade-1 28,588,397.01 24.70 181,801,409.69
1,747,976,191 1,946,157,055 1600 STORM-855-100 738,741,774.85 734.75 374,194,675.56
2,642,411,519 3,124,756,479 20000 batch-v2-100 133,706,248.88 51.67 497,027,226.45
3,374,317,567 3,892,314,111 10000 dist-upgraade-1 141,866,760.39 69.39 589,014,777.73
3,447,717,887 3,869,245,439 10000 storm-base-1 139,149,514.03 56.45 609,509,456.98
3,456,106,495 3,953,131,519 22000 batch-v2-100 274,785,584.11 93.37 743,434,065.83
3,512,729,599 3,898,605,567 800 STORM-855-100 1,354,193,514.47 1,361.58 779,667,263.64
3,963,617,279 4,416,602,111 5500 STORM-855-100 450,364,286.22 415.96 575,017,536.40
4,185,915,391 5,347,737,599 4500 STORM-855-0 366,268,233.66 259.94 995,928,429.75
4,919,918,591 5,582,618,623 6000 STORM-855-100 534,520,242.96 497.47 758,754,139.61
4,919,918,591 5,582,618,623 6000 STORM-855-100 534,520,242.96 497.47 758,754,139.61
7,071,596,543 7,843,348,479 400 STORM-855-100 2,652,137,010.52 2,630.51 1,589,666,333.78
14,159,970,303 15,653,142,527 200 STORM-855-100 5,202,877,719.25 5,206.33 3,199,275,795.66
27,648,851,967 31,205,621,759 100 STORM-855-100 10,201,124,134.76 10,169.37 6,289,786,882.10

I then filtered the list to show the maximum throughput for a given latency (several different ones)

99th percentile:

99%-ile ns 99.9%-ils ns throughput branch-batch mean latency ns avg service latency ms std-dev ns
2,613,247 4,673,535 100 STORM-855-0 2,006,347.25 1.26 2,675,778.36
2,617,343 4,423,679 200 STORM-855-0 1,991,238.45 1.29 2,024,687.45
2,623,487 5,619,711 400 STORM-855-0 1,999,926.81 1.24 1,778,335.92
2,627,583 4,603,903 1600 STORM-855-0 1,971,888.24 1.30 893,085.40
2,654,207 302,252,031 3200 STORM-855-0 2,942,360.75 2.13 16,676,136.60
2,701,311 349,700,095 5000 batch-v2-1 2,921,661.67 1.78 18,274,805.30
13,344,767 180,879,359 6400 batch-v2-100 11,319,553.69 10.62 7,777,381.54
346,554,367 977,272,831 12800 batch-v2-100 18,596,303.93 15.59 78,326,501.83
2,642,411,519 3,124,756,479 20000 batch-v2-100 133,706,248.88 51.67 497,027,226.45
3,456,106,495 3,953,131,519 22000 batch-v2-100 274,785,584.11 93.37 743,434,065.83

99.9th percentile:

99%-ile ns 99.9%-ils ns throughput branch-batch mean latency ns avg service latency ms std-dev ns
2,754,559 3,969,023 800 batch-v2-1 2,026,222.88 1.29 767,491.71
2,627,583 4,603,903 1600 STORM-855-0 1,971,888.24 1.30 893,085.40
13,369,343 15,122,431 3200 batch-v2-100 10,699,832.23 10.02 1,623,949.38
13,344,767 180,879,359 6400 batch-v2-100 11,319,553.69 10.62 7,777,381.54
346,554,367 977,272,831 12800 batch-v2-100 18,596,303.93 15.59 78,326,501.83
2,642,411,519 3,124,756,479 20000 batch-v2-100 133,706,248.88 51.67 497,027,226.45
3,456,106,495 3,953,131,519 22000 batch-v2-100 274,785,584.11 93.37 743,434,065.83

mean latency:

99%-ile ns 99.9%-ils ns throughput branch-batch mean latency ns avg service latency ms std-dev ns
2,627,583 4,603,903 1600 STORM-855-0 1,971,888.24 1.30 893,085.40
2,793,471 53,477,375 3200 storm-base-1 2,147,360.05 1.42 3,668,366.37
2,701,311 349,700,095 5000 batch-v2-1 2,921,661.67 1.78 18,274,805.30
13,344,767 180,879,359 6400 batch-v2-100 11,319,553.69 10.62 7,777,381.54
346,554,367 977,272,831 12800 batch-v2-100 18,596,303.93 15.59 78,326,501.83
2,642,411,519 3,124,756,479 20000 batch-v2-100 133,706,248.88 51.67 497,027,226.45
3,456,106,495 3,953,131,519 22000 batch-v2-100 274,785,584.11 93.37 743,434,065.83

service latency ms (storm's complete latency):

99%-ile ns 99.9%-ils ns throughput branch-batch mean latency ns avg service latency ms std-dev ns
2,740,223 4,192,255 200 batch-v2-1 2,011,025.19 1.21 911,556.19
2,623,487 5,619,711 400 STORM-855-0 1,999,926.81 1.24 1,778,335.92
2,725,887 10,403,839 1600 batch-v2-1 2,010,694.30 1.27 2,095,223.90
2,684,927 124,190,719 3200 batch-v2-1 2,154,234.45 1.41 6,219,057.66
2,701,311 349,700,095 5000 batch-v2-1 2,921,661.67 1.78 18,274,805.30
1,043,333,119 1,409,286,143 10000 batch-v2-1 22,750,840.18 6.13 146,235,076.73
346,554,367 977,272,831 12800 batch-v2-100 18,596,303.93 15.59 78,326,501.83
2,642,411,519 3,124,756,479 20000 batch-v2-100 133,706,248.88 51.67 497,027,226.45
3,456,106,495 3,953,131,519 22000 batch-v2-100 274,785,584.11 93.37 743,434,065.83

I also looked at about the maximum throughput each branch could handle.

branch-batch throughput mean latency 99%-lie latency
STORM-855-0 4,500 366,268,233.66 4,185,915,391
STORM-855-100 5,500 450,364,286.22 3,963,617,279
storm-base-1 10,000 139,149,514.03 3,447,717,887
dist-upgrade-1 10,000 141,866,760.39 3,374,317,567
batch-v2-1 10,000 22,750,840.18 1,043,333,119
batch-v2-100 22,000 274,785,584.11 3,456,106,495

I really would like some feedback here, because these numbers seem to contradict STORM-855 using my original speed of light test. I don't really like that test, even though I wrote it, because the throughput is limited only by storm, so with acking disabled it is measuring what the latency is when we hit the wall, and cannot provide any more throughput. No one should run in production that way. When acking is enabled and we are using max-spout pending for flow control the throughput is directly related to the end to end latency. This too shouldn't be the common case in production because it means we cannot keep up with the incoming rate and are falling behind.

This seems to indicate that the only time STORM-855 makes since is when looking at the 99%-ile latency at a very low throughput, and then it only seems to save 1/20th of a ms advantage over the others. In other cases it looks like the throughput per host it can support is about 1/2 of that without the change. This branch however has a weakness on the low end when batching is enabled it is about 12 ms slower, but on the high end it can handle more then 2x the throughput with little change to the latency. If that 12 ms is important I think we can mitigate it by allowing the batch size to self-adjust on a per queue bases.

I really would like others to look at my numbers and my test to see if there are issues with it that I am missing, because like I said it seems to contradict the numbers from STORM-855. The only thing I can think of is that the messaging layer is the bottleneck in the speed of light test, which is what it was intended to stress test, and STORM-855 is giving a significant batching advantage there. If that is the case then we should look at what STORM-855 is doing around that to try and combine it with the batching we are doing here.

@ptgoetz @d2r @rfarivar @mjsax @kishorvpatil @knusbaum please let me know what you think.

@revans2
Copy link
Contributor Author

revans2 commented Oct 10, 2015

Sorry I forgot to mention that for one test STORM-855-100 was able to run at a throughput of 6000 sentences/second successfully, but in other runs it failed, so I set the number to 5500, even though a 6000 is shown in the results.

@revans2
Copy link
Contributor Author

revans2 commented Oct 14, 2015

I just rebased on the latest fixed disruptor upgrade code, now in pull request #797.

The critical path did not change at all, so the numbers should be similar.

@mjsax
Copy link
Member

mjsax commented Oct 15, 2015

I had a look at the numbers. It's a lot of stuff and hard to parse for me... I am not sure what you mean by "contradict the numbers"? Can you explain in more detail?

However, I agree with your thought about STORM-855 #694 It basically reduces the contention on the output queue, because less calls are made here. I profiled Storm once and observed that on high-end data-rates (when we hit the wall) the contention on the output-queue is the bottleneck (the writing and reading thread have to lock the queue and a lot of waiting for the lock consumes a fair share of the consumed time). This waiting-for-the-lock time is reduced significantly by batching as STORM-855 does.

@revans2
Copy link
Contributor Author

revans2 commented Oct 15, 2015

@mjsax

What I saw when testing STORM-855 was that the maximum throughput was cut almost in half from 10,000 sentences per second to 5,500. But your numbers showed maximum throughput more than doubling from around 7,960,300 tuples sent in 30 seconds to 16,347,100 in the same time period (no-acking). And 1,832,160 in 30 seconds to 2,323,580 an increase of 25% with acking.

To me this feels like a contradiction. The only thing I can think of is that the messaging layer is so scary slow that cutting the maximum throughput of a worker by half has no impact on the overall performance if it can double the throughput of the messaging layer, by doing more batching.

This is likely the case, as on the high end 16,347,100 / 30 seconds / 24 workers is about 22,000 tuples per second per worker, where as 5,500 sentences per second results in about 181,500 total tuples per second/worker being processed.

I'm just looking for feedback from others on this, but it looks like I need to do a distributed apples to apples comparison as well to see the impact the messaging layer has.

@mjsax
Copy link
Member

mjsax commented Oct 15, 2015

Thanks for clarification. :)

@revans2
Copy link
Contributor Author

revans2 commented Oct 21, 2015

So I have been running a number of tests trying to come to a conclusive decision on how storm should handle batching, and trying to understand the difference between my test results and the test results from #694. I ran the word count test I wrote as a part of #805 on a 35 node storm cluster. This was done against several different storm versions, the baseline in the #805 pull request; this patch + #805 (batch-v2); and #694 + #805 + modifications to use the hybrid approach to enable acking and batch to work in a multi-process topology (STORM-855). To avoid having all of the numbers be hard to parse I am just going to include some charts, but if anyone wants to see the raw numbers or reproduce it themselves I am happy to provide data and/or branches.

The numbers below were collected after the topology had been running for at least 200 seconds. This is to avoid startup issues like JIT etc. I filtered out any 30 second interval where the measured throughput was not +/- 10% of the target throughput on the assumption that if the topology cannot keep up with the desired throughput or it was trying to catch up from previous slowness it would not be within that range. I did not filter based off of the number of failures that happened, simply because that would have resulted in removing all of the STORM-855 with batching enabled results. None of the other test configurations saw any failures at all during testing.

throughput-vs-latency

This shows the 99%-ile latency vs measured throughput. It is not too interesting except to note that batching in STORM-855 at low throughput resulted in nothing being fully processed. All of the tuples timed out before they could finish. Only at a medium throughput above 16,000 sentences/second were we able to maintain enough tuples to complete batches regularly, but even then many tuples would still time out. This should be able to be fixed with a batch timeout, but that is not implemented yet.

To get a better view I adjusted the latency to be a log scale.

throughput-vs-latency-log

From this we can see that on the very low end batching-v2 is increasing the 99%-ile latency from 5-10 ms to 19-21 ms. Most of that you can get back by configuring the batch size to 1, instead of the default 100 tuples. However, once the baseline stops functioning at around 7000 sentences/sec the batching code is able to continue working, with either a batch size of 1 or 100. I believe that this has to do with the automatic backpressure. In the baseline code backpressure does not take into account the overflow buffer, but in the batching code it does. I think this gives the topology more stability in maintaining a throughput, but I don't have any solid evidence for that.

I then zoomed in on the graphs to show what a 2 second SLA would look like

throughput-vs-latency-2-sec

and a 100 ms SLA.

throughput-vs-latency-100-ms

In both cases the batching v2 with a batch size of 100 was able to handle the highest throughput for that given latency.

Then I wanted to look at memory and CPU Utilization.

throughput-vs-mem

Memory does not show much, the amount of memory used varies a bit from one to the other, but if you realize this is for 35 worker processes it is varying from 70 MB/worker to about 200 MB/worker. The numbers simply show that as the throughput increases the memory utilizations does too, and it does not vary too much from one implementation to another.

throughput-vs-cpu

CPU however shows that on the low end we are going from 7 or 8 cores worth of CPU time to about 35 cores worth for the batching code. This seems to be the result of the batch flushing threads waking up periodically. We should be able to mitigate this by adjusting that interval to be larger, but that would in turn impact the latency.

I believe that with further work we should be able to reduce that CPU utilization and the latency on the low end by dynamically adjusting the batch size and timeout based off of a specified SLA. At this point I feel this branch is ready for a formal review and inclusion into storm, the drawbacks to this patch do not seem to out weigh the clear advantages to it. Additionally with the stability problems associated with #694 I cannot feel good in recommending it at this time. It is clear that some of what it is doing is worthwhile, and I think we should explore the alternative batch serialization between worker processes as a potential stand alone piece.

@d2r @knusbaum @kishorvpatil @harshach @HeartSaVioR @ptgoetz

If you could please review this remembering that it is based off of #797, I would like to try and get this in soon.

@mjsax
Copy link
Member

mjsax commented Oct 22, 2015

Great analysis!
I would like to run this tests, too. Especially to work on the flushing of batches. It's a pity that too many tuples times out in your test. To get a fair comparison, I would like to fix this, and redo the test.

Btw: I am still wondering, you #694 behaves so badly with low throughput. From a conceptual point of view there is no reason for this. Maybe my code is not good enough (as I am a Clojure newbe). Hope I can reproduce your result and get better insight into the problem.

@revans2
Copy link
Contributor Author

revans2 commented Oct 22, 2015

@mjsax If you want to take some time to debug things for a better comparison I am happy to wait. It is entirely possible that I somehow messed things up when I was enabling hybrid-mode so getting and apples to apples comparison is important. I want the best final solution no matter what it is, but I also don't want to wait forever. For efficiency and cluster utilization purposes I really would like to pull in batching into the clusters that I run, but because the configuration/interfaces are incompatible with one another between the two approaches I don't want to do it until open source has decided on a direction they want to go in.

In the mean time I am going to be working on improving the metrics gathering. You said previously that with your first batching attempt you saw up to 6x the throughput. I suspect a lot of that is because the metrics are so very very slow.

@mjsax
Copy link
Member

mjsax commented Oct 22, 2015

I am not sure how the metric relate to it... Furthermore, I also used Aeolus on the current master and get 6x improvement (with acking disabled -- acking is currently not supported by Aeolus).

@revans2
Copy link
Contributor Author

revans2 commented Oct 22, 2015

@mjsax At least in my profiling the next largest consumer of CPU in word count, without acking enabled, after adding in batching is the metrics. They are horribly slow. I assumed that because Aeolus was only counting batches, not individual tuples that it mitigated a lot of this slowness.

@mjsax
Copy link
Member

mjsax commented Oct 22, 2015

Make sense. It might be worth explore counting in batches... (ie, increase a counter by "batch-size" for each batch). So we still get the correct value. Of course the metric will not increase slowly over time but "jump" in larger intervals. Not sure if this i acceptable behavior and/or what other metrics are there that cannot be updated in a "batch" fashion... count might be a too simple example.

@revans2
Copy link
Contributor Author

revans2 commented Oct 27, 2015

Just rebased.

@kishorvpatil
Copy link
Contributor

@d2r @knusbaum @mjsax @harshach @HeartSaVioR @ptgoetz
Now that #797 is in. We should consider pulling in batching solution. Can you please review this PR and pass on your opinion?

I am still +1 on this patch.

@mjsax
Copy link
Member

mjsax commented Oct 29, 2015

I did not find time to rerun the tests and fix #694. If you want to move on with this PR, just go ahead... I will still try to get #694 in better shape --- still hope I can get it accepted. (If you are still interested.)

@revans2
Copy link
Contributor Author

revans2 commented Oct 29, 2015

@mjsax I am still very interested in #694 if it makes thing better. I am skeptical that it will add much over this does because they both cover similar areas, but if it does then we should try to use it.

@harshach
Copy link
Contributor

ran few topologies with acking and without. I am +1 on merging this in.

@revans2 revans2 changed the title Disruptor batching v2 STORM-1151: Batching in DisruptorQueue Oct 30, 2015
@asfgit asfgit merged commit c415d32 into apache:master Oct 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants