Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARTEMIS-2216 Use a specific executor for pageSyncTimer #2484

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
5 participants
@qihongxu
Copy link
Contributor

qihongxu commented Dec 29, 2018

Improve paging throughput by using a specific executor for pageSyncTimer

Improving throughput on paging mode is one of our concerns since our cluster uses paging a lot.
We found that pageSyncTimer in PagingStoreImpl shared the same executor with pageCursorProvider from thread pool. In heavy load scenario like hundreds of consumers receiving messages simultaneously, it became difficult for pageSyncTimer to get the executor due to race condition. Therefore page sync was delayed and producers suffered low throughput.

To achieve higher performance we assign a specific executor to pageSyncTimer to avoid racing. And we run a small-scale test on a single modified broker.

Broker: 4C/8G/500G SSD
Producer: 200 threads, non-transactional send
Consumer 200 threads, transactional receive
Message text size: 100-200 bytes randomly
AddressFullPolicy: PAGE

Test result:

  Only Send TPS Only Receive TPS Send&Receive TPS
Original ver 38k 33k 3k/30k
Modified ver 38k 34k 30k/12.5k

The chart above shows that on modified broker send TPS improves from “poor” to “extremely fast”, while receive TPS drops from “extremely fast” to “not-bad” under heavy load. Considering consumer systems usually have a long processing chain after receiving messages, we don’t need too fast receive TPS. Instead, we want to guarantee send TPS to cope with traffic peak and lower producer’s delay time. Moreover, send and receive TPS in total raises from 33k to about 43k. From all above this trade-off seems beneficial and acceptable.

@qihongxu qihongxu changed the title Use a specific executor for pageSyncTimer ARTEMIS-2216 Use a specific executor for pageSyncTimer Dec 29, 2018

@qihongxu qihongxu force-pushed the qihongxu:pageSyncTimer_executor branch from 01a09f2 to 330e3ae Dec 29, 2018

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Dec 29, 2018

Comments on this are:

We have shared executor and thread pools as a way to pool these its part artemis design so that these can be controlled, eg the number of threads are configurable. And theres uses cases for why, we dont just add executors. If we add a new pool it should be configurable like others.

Whilst your use case this significantly improves things. As you note this change isnt entirely positive to all users as it does impact use cases where people may care more on the consumer side (i am one one of those). If anything if the broker is saturated its more important consumers for us can dequeue and if anything producers backpressure. As if consumers cant keep up and producer peak continues you can very soon end up situation where paging depth will just grow and grow.

To be sure how much a concern this change is it would be good to get more stats therefore on how as producer rates ramup how consumer rates ramp down. And crticial cross over points. Along with number of remote consumers and producers (in different combos of using transactions vs not). All of this could be mute if there is a way to eliminate the negative effect this change has.

To note im actually shocked that simply adding a thread it has such negative effect on the consumer. Maybe with stats or further investigation may become obvious and possibly easy to explain or resolve.

Just a side query If you do non transactional send why bother doing transactional consume? In my mind if you want strong transactional guarentees youd typically have both consumer and sender transactional. If you dont need this then have both non transactional. Also remember transactions mean a journal entry also.

@qihongxu

This comment has been minimized.

Copy link
Contributor Author

qihongxu commented Dec 30, 2018

@michaelandrepearce As presented in test result adding a specific executor do improve producer rates, and also means page() method in PagingStoreImpl will acquire writelock more often. At the same time QueueImpl::deliver will call isPaging() to check whether queue is paging or not, which acquire readlock and make it a race condition. This may explain why the increase of producer rates is accompanied by the decrease of consumer rates.
According to this we have an internal branch that removes isPaging() method’s readlock in PagingStoreImpl, along with adding pageSyncTimer’s specific executor. After that we performed a same test and get a result of 21k/25k Send&Receive–in total 46k TPS. This version runs smoothly in our use case but we are still exploring potential effects.
The reason why we use transactional receive is to make sure every message indeed delivered and consumed. Say that if we do non-tx receive and consumer system fail in the following processing chain, this message may be delivered but actually “lost”. So we choose to let consumer decide when should they ack and commit, instead of auto-commit acks.

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Dec 30, 2018

If you could supply more complete stats with ramp ups e.g pef at 1k, 2k, 3k...5k...15k..20k.21k.22k...25k etc etc, over time. Rather than just a summary snapshot. Also needed will be stats with other consumer and producers e.g. with without transaction

Instead of transaction consumer you could use client acknowledge or even individual acknowledge.

@clebertsuconic

This comment has been minimized.

Copy link
Contributor

clebertsuconic commented Dec 30, 2018

The receiver could be slower because of more IO. That’s not a bad thing.

I’m on vacation this week. I won’t be able to get to this. Just saying why the receiver could be slower.

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Dec 30, 2018

The bit around lock contention on paging check is very interesting where @qihongxu removed it and performance rose, this would suggest consumer perf change isnt related to io but due to over lock contention in code. Def would be good to see if its safe to remove.

I took this code and have been running some perf tests on some servers with fast ssds we have. I dont see disk io maxed when we go paging but do see somewhat of a perf drop when we hit paging so seems theres def areas in paging code to improve somewhere.

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Dec 31, 2018

@qihongxu,

re:
"
According to this we have an internal branch that removes isPaging() method’s readlock in PagingStoreImpl, along with adding pageSyncTimer’s specific executor. After that we performed a same test and get a result of 21k/25k Send&Receive–in total 46k TPS. This version runs smoothly in our use case but we are still exploring potential effects.
"

Agree that as the QueueImpl:checkDepage which is called from deliver method, only schedules another depage asynchronously that the read barrier lock for that is over kill as it doesn't have to be strongly consistent correct.

I think changing the current method could be dangerous, unless we do a fuller analysis on all its uses, but i think we could easily add a new method named something like "isPagingDirtyRead" which can return without the readlock, it will mean the returned value isn't always 100% accurate but in cases such as checkDepage, we don't need it to be, its just a single, and we can call it where we know its safe to.

Would you be happy adding this, (will need to be exposed via PageSubscription)? And updating checkDepage to use that one?

@qihongxu qihongxu force-pushed the qihongxu:pageSyncTimer_executor branch from 330e3ae to 3c01f25 Jan 3, 2019

@qihongxu

This comment has been minimized.

Copy link
Contributor Author

qihongxu commented Jan 3, 2019

@qihongxu,

re:
"
According to this we have an internal branch that removes isPaging() method’s readlock in PagingStoreImpl, along with adding pageSyncTimer’s specific executor. After that we performed a same test and get a result of 21k/25k Send&Receive–in total 46k TPS. This version runs smoothly in our use case but we are still exploring potential effects.
"

Agree that as the QueueImpl:checkDepage which is called from deliver method, only schedules another depage asynchronously that the read barrier lock for that is over kill as it doesn't have to be strongly consistent correct.

I think changing the current method could be dangerous, unless we do a fuller analysis on all its uses, but i think we could easily add a new method named something like "isPagingDirtyRead" which can return without the readlock, it will mean the returned value isn't always 100% accurate but in cases such as checkDepage, we don't need it to be, its just a single, and we can call it where we know its safe to.

Would you be happy adding this, (will need to be exposed via PageSubscription)? And updating checkDepage to use that one?

@michaelandrepearce
As you suggested we tried update checkDepage with a new "isPagingDirtyRead" method which can return without the readlock. But after running same tests on this version it seems that checkDepage did not affect receive performance too much. Instead, CursorIterator#hasNext in PageSubscriptionImpl also called isPaging() and had a significant impact on receive performance. According to this we updated hasNext() and forced it to use the new “isPagingDirtyRead”. Below is the result chart.

(P.S the original ver is a control group, not exactly the “original” master-branch build. It has been applied with specific pageSyncTimer executor, and cache durable in PageReference - See PR Artemis-2214.)

  Send&Receive In Total
Ver Original 30k/12.5k 42.5k
Ver Modified checkDepage 31.1k/11.3k 42.4k
Ver Modified hasNext 23.8k/23.5k 47.3k

Detailed TPS chart can be seen in attachment. (Sorry we dont have enough time for running tests. Just got back to work today)
TPS chart.docx

I have pushed a new commit adding “isPagingDirtyRead” for hasNext(). This version seems to reach a balance between send&receive. Please review it at your convenience and tell us if any more test is needed :)

@qihongxu

This comment has been minimized.

Copy link
Contributor Author

qihongxu commented Jan 3, 2019

Instead of transaction consumer you could use client acknowledge or even individual acknowledge.

It seems that both client-acknowledge and individual-acknowledge mode will finally use Transaction at server side. Considering these modes have no obvious difference in performance, we choose to use transaction as it’s more reliable and supports rollback.

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 3, 2019

@qihongxu Just as a confirmation, I have used https://github.com/jvm-profiling-tools/async-profiler on lock events ie time while waiting to enter into a lock.
I have found the same exact behaviours explained above:
QueueImpl::checkDepage
image
QueueImpl::DepageRunner
image

both are calling PagingStoreImpl::isPaging and are blocking each others

} finally {
lock.readLock().unlock();
}
}

@Override
public boolean isPagingDirtyRead() {
if (addressFullMessagePolicy == AddressFullMessagePolicy.BLOCK) {

This comment has been minimized.

Copy link
@franz1981

franz1981 Jan 3, 2019

Contributor

we can read it just once and save it in a local variable, avoiding 3 volatile loads: same can be done on the original version too

This comment has been minimized.

Copy link
@michaelandrepearce

michaelandrepearce Jan 3, 2019

Contributor

nice idea! - note the original now delegates to the dirty franz, so just needs tidying here the once.

This comment has been minimized.

Copy link
@wy96f

wy96f Jan 4, 2019

Contributor

addressFullMessagePolicy would be changed if address setting is reapplied. so we need to load the value.

This comment has been minimized.

Copy link
@franz1981

franz1981 Jan 4, 2019

Contributor

Yes, but we can just volatile load once before checking its value 3 times, on each call of isPagingDirtyRead

This comment has been minimized.

Copy link
@michaelandrepearce

michaelandrepearce Jan 4, 2019

Contributor

@wy96f i think what @franz1981 is trying to say, is we can do the volatile read just once per method call by keeping a local method var, by adding one line e.g.

     AddressFullMessagePolicy addressFullMessagePolicy = this.addressFullMessagePolicy;
     if (addressFullMessagePolicy == AddressFullMessagePolicy.BLOCK) {
        return false;
     }
     if (addressFullMessagePolicy == AddressFullMessagePolicy.FAIL) {
        return isFull();
     }
     if (addressFullMessagePolicy == AddressFullMessagePolicy.DROP) {
        return isFull();
     }
     return paging;

This comment has been minimized.

Copy link
@wy96f

wy96f Jan 4, 2019

Contributor

Yes, but we can just volatile load once before checking its value 3 times, on each call of isPagingDirtyRead

get it. nice catch 👍

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 3, 2019

Instead of transaction consumer you could use client acknowledge or even individual acknowledge.

It seems that both client-acknowledge and individual-acknowledge mode will finally use Transaction at server side. Considering these modes have no obvious difference in performance, we choose to use transaction as it’s more reliable and supports rollback.

@qihongxu you're using JMS api not core api then?

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 3, 2019

CursorIterator:hasNext

Im bit concerned with this doing a dirty read, as this isnt something that is trigger an ascyn action, in actual fact the hasNext is purposefully synchronized. (especially as recently some concurrency issues have been found in paging (e.g. howards current pr) im sensitive to us being too hap hazard here.

@franz1981 if checkDepage is removed from using the same lock, why would DepageRunner now be locking it up so bad, if one is removed from contention the other should no longer be impacted? I'm missing something here? In general a dirty read to trigger an async process is fine, but something thats acting expecting a sync'd state im a little more concerned we'd be introducing a concurrency issue.

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 3, 2019

The version I have shown is just master ie any read lock is there!

@@ -1350,7 +1350,7 @@ public synchronized boolean hasNext() {
return true;
}

if (!pageStore.isPaging()) {
if (!pageStore.isPagingDirtyRead()) {

This comment has been minimized.

Copy link
@michaelandrepearce

michaelandrepearce Jan 3, 2019

Contributor

Concern here is this ins't an async case.

Btw i cannot see the change in QueueImpl: checkDepage to use the new paging dirty read.

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 3, 2019

The version I have shown is just master ie any read lock is there!

I get that, im more relfecting on this change in PR.

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 3, 2019

@qihongxu i dont see checkDepage using the dirtyRead in current commit

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 3, 2019

@michaelandrepearce @qihongxu Just a lil OT but I'm getting these warns on master:

2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] AMQ142007: Can not find record 8,103,136 during compact replay
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] AMQ142007: Can not find record 8,103,139 during compact replay
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] AMQ142007: Can not find record 8,103,142 during compact replay
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] AMQ142007: Can not find record 8,103,145 during compact replay
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] AMQ142007: Can not find record 8,103,148 during compact replay
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] AMQ142007: Can not find record 8,103,151 during compact replay
2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] AMQ142007: Can not find record 8,103,154 during compact replay
@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 3, 2019

I have taken a look on the version used for this PR + the change suggested by @michaelandrepearce
on checkDepage and I have found something.
LivePageCacheImpl (in violet) is now a major contention point:
image
While re isPaging:
image
and
image

Are now source of contention

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 3, 2019

Looking at the CPU graph instead I can see many odd things ie Compaction is stealing lot of cpu and I/O:
image

@wy96f

This comment has been minimized.

Copy link
Contributor

wy96f commented Jan 4, 2019

CursorIterator:hasNext

Im bit concerned with this doing a dirty read, as this isnt something that is trigger an ascyn action, in actual fact the hasNext is purposefully synchronized. (especially as recently some concurrency issues have been found in paging (e.g. howards current pr) im sensitive to us being too hap hazard here.

I don't get why isPaging() in hasNext() needs to be consistent. The paging status can change after isPaging() unless we readlock isPaging() and subsequent operations. Without readlock, a) if isPaging() returns true, but the other thread set it to false, subsequent next() call reads no data and returns null; b) if isPaging() returns false, but the other thread set it to true(a new message coming), deliverAsync would be called later anyway.

@clebertsuconic

This comment has been minimized.

Copy link
Contributor

clebertsuconic commented Jan 4, 2019

during initial development years ago I needed to be careful when to leave paging. this needed some sync points to make sure depage would set it synchronously.

I'm not saying anything against the change (I'm still in vacation mode, I'm back on monday).. just saying what was the original intent when it was written.

@qihongxu

This comment has been minimized.

Copy link
Contributor Author

qihongxu commented Jan 4, 2019

@michaelandrepearce

@qihongxu you're using JMS api not core api then?

Yes (sometimes for compatibility concern)

@qihongxu i dont see checkDepage using the dirtyRead in current commit

Sorry I probably did not make it clear enough in my last comment. As the chart shows we tried checkDepage using the dirtyRead, but found it makes no difference in perf. We even changed isPaging() method in PageSubscriptionImpl from return pageStore.isPaging(); to return pageStore.isPagingDirtyRead();. In this way not only checkDepage, but also other methods that call PageSubscription.isPaging() (as @franz1981 shown in flame graphs above) will all use dirtyRead. But still, same perf as before.
From all above we thought it's safer to leave others unchanged, only force CursorIterator:hasNext to use dirtyRead since it affects perf a lot. My latest commit is based on these concerns.

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 4, 2019

@qihongxu So out the two, the checkDepage is the safer one to use a dirty read, so id expect to see that changed first before anything else, Franz actually included checkDepage in his latest run and graphs.

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 4, 2019

CursorIterator:hasNext

Im bit concerned with this doing a dirty read, as this isnt something that is trigger an ascyn action, in actual fact the hasNext is purposefully synchronized. (especially as recently some concurrency issues have been found in paging (e.g. howards current pr) im sensitive to us being too hap hazard here.

I don't get why isPaging() in hasNext() needs to be consistent. The paging status can change after isPaging() unless we readlock isPaging() and subsequent operations. Without readlock, a) if isPaging() returns true, but the other thread set it to false, subsequent next() call reads no data and returns null; b) if isPaging() returns false, but the other thread set it to true(a new message coming), deliverAsync would be called later anyway.

Id agree, im just cautious as we've been hit a few times with concurrency issues that have been a nightmare to find (and quite recently as last month!). if thats the case then the sync method could be removed also reducing locks, though i think @clebertsuconic has given some further info to why this is like it is.

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 4, 2019

@franz1981 nice graphs, looks like essentially isPaging in QueueImpl unless everything is dirtyRead then we simply move the problem somewhere else, i think we probably need to make then a more general call if all these are safe to have dirty read or not.

IMO and the more i look at the usage in QueueImpl the moment the value is returned from isPaging the state of isPaging could change as now outside the read lock, and there for anyhow any logic or code is already working on dirty state immediately. As such i think we're safe making all of the isPaging dirty. Any objections?

@qihongxu @wy96f @franz1981 - i think this would need consensus.... so thoughts also please

....
Im starting to feel like Alice here and were going to end up going into a rabbit hole ;) and will end up with the original isPaging just being dirty.

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 4, 2019

Looking at the CPU graph instead I can see many odd things ie Compaction is stealing lot of cpu and I/O:

Nice find.

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 4, 2019

@michaelandrepearce

Do you get this on master or this PR (im mean is that a typo)?

I've got that on master!

ok so i think we not need worry on this for realms of this PR. (probably needs looking into, but doesn;t need to be solved in this pr - imo)

@qihongxu qihongxu force-pushed the qihongxu:pageSyncTimer_executor branch from 3c01f25 to 482f9bc Jan 4, 2019

@wy96f

This comment has been minimized.

Copy link
Contributor

wy96f commented Jan 4, 2019

Id agree, im just cautious as we've been hit a few times with concurrency issues that have been a nightmare to find (and quite recently as last month!).

em.....Could you please tell us which issues? We need to verify how it affects our cluster.

@qihongxu

This comment has been minimized.

Copy link
Contributor Author

qihongxu commented Jan 4, 2019

@michaelandrepearce
Removed readlock of isPaging().
Also as @franz1981 suggested now only volatile load addressFullMessagePolicy once on each call.
Please review and notify me if any more test needed:)

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 4, 2019

em.....Could you please tell us which issues? We need to verify how it affects our cluster.

The big issue im relating to, which became a night mare for my organisation, was that under high concurrency (high throughput and low latency broker setup with concurrent consumers we saw the issue), the buffers can get mixed up, and was causing index out of bounds issues.

Fixes were multiple:
024db5b
da7fb89

Im also aware there have been some other concurrency fixes for smaller issues.

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 4, 2019

@qihongxu looks good to me. It be good for the record, if you have a final version perf result.

@franz1981 whats the current PR look like in heat maps?

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 4, 2019

whats the current PR look like in heat maps?

I'm finishing a thing to fix LivePageCacheImpl too, hopefully today and I will post: impl-wise seems good to me as well 👍

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 4, 2019

@michaelandrepearce @qihongxu I have re-implemented the LivePageCache to be completly lock-free: https://github.com/franz1981/activemq-artemis/tree/lock-free-live-page-cache
Fell free to try it into this PR too and I can do a PR to your branch.
I will provide soon some charts of the contention state 👍

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 4, 2019

@michaelandrepearce @qihongxu Ok, checked: the latest version of this PR + my branch https://github.com/franz1981/activemq-artemis/tree/lock-free-live-page-cache is fully non-blocking :)

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 4, 2019

@franz1981 did you send a pr to @qihongxu branch so he can merge it and this pr picks it up?

Be great to see a final stat in @qihongxu test env

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 4, 2019

@michaelandrepearce Good idea! let me do it now
Please @wy96f too take a look of the implementation I did to check if seems correct (and you like it ;))

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 4, 2019

@michaelandrepearce Done, the PR has been sent, now we can just wait the perf results on it :)
I have improved quite a bit the live page cache behaviour/reliability (especially if OOME), but sadly I see that the most called method getMessage cannot be improved anymore without making the lock-free code a real nightmare.
The original version was O(n) depending which message was queried, because it needs to walk the entire linked list of paged messages.
In my version I have amortized the cost by using an interesting hybrid between an ArrayList and a LinkedList, similar to https://en.wikipedia.org/wiki/Unrolled_linked_list, but (very) optimized for addition.
I'm mentioning this, because is a long time I want to design a single-threaded version of this same data-structure to be used as the main datastructure inside QueueImpl.

@qihongxu

This comment has been minimized.

Copy link
Contributor Author

qihongxu commented Jan 5, 2019

@michaelandrepearce @qihongxu Ok, checked: the latest version of this PR + my branch https://github.com/franz1981/activemq-artemis/tree/lock-free-live-page-cache is fully non-blocking :)

Nice work!!

Be great to see a final stat in @qihongxu test env

As soon as I'm back Monday I will try implement same tests on both version(this and franz's).

@qihongxu

This comment has been minimized.

Copy link
Contributor Author

qihongxu commented Jan 7, 2019

@michaelandrepearce @franz1981
After we ran tests on both version (one with no lock and the other with new LivePageCache & no lock ), the result chart is as below.

  Send&Receive In Total
Ver Original 30k/12.5k 32.5k
Ver Modified checkDepage 31.1k/11.3k 42.4k
Ver Modified hasNext 23.8k/23.5k 47.3k
Ver no lock 22.9k/26.5k 49.4k
Ver no lock & new livePageCache 24.5k/22k 46.5k

Tests are implemented with same conditions as before. Clients consumed and produced simultaneously on the same queue with 10 million messages in it.
All version are based on Artemis-2214, which cache something in PagedRef.
TPS chart2.docx

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 7, 2019

@qihongxu

Ver no lock & new livePageCache

Is making use of the last 2 commits I have sent as a PR to your repository?
It should be way lot faster then Ver no lock, given that it uses a lot less the live page cache

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 7, 2019

@qihongxu ping! :) I'm too curious :)

@qihongxu

This comment has been minimized.

Copy link
Contributor Author

qihongxu commented Jan 7, 2019

@qihongxu

Ver no lock & new livePageCache

Is making use of the last 2 commits I have sent as a PR to your repository?
It should be way lot faster then Ver no lock, given that it uses a lot less the live page cache

Yes 'Ver no lock & new livePageCache' uses both your commit and the one in this PR.

A possible explanation might be that in our case consumers were reading old pages while producers were writing new pages. There is no intersection between these two groups such as a common page being writing&reading by both of them.

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 7, 2019

A possible explanation might be that in our case consumers were reading old pages while producers were writing new pages. There is no intersection between these two groups such as a common page being writing&reading by both of them.

I have verified that the live page cache is being accessed on producer side (mostly) and that could be the reason behind 24.5k vs 22.9k (without the new live page cache).
I can't understand the reason why consumers are slowed down...

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 7, 2019

@franz1981 based on this, shall we merge this pr as is which is quite impressive result at a combined 49.4k.

And then work on livepagecache changes separately?

Im aware we will be wanting to cut a new release soon

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 7, 2019

@michaelandrepearce I would like first to trigger a CI job of some kind, maybe @clebertsuconic can help with his superbox (just this time) to get an answer sooner?

Re the cache I was thinking already to send another PR, but I have verified that is virtually impossible that's the reason of the consumer slow-down. These are the numbers of a the bench comparing it with the original version:

Benchmark               (size)      (type)   Mode  Cnt          Score          Error  Units
CacheBench.getMessage1      32     chunked  thrpt   10  150039261.251 ± 12504804.694  ops/s
CacheBench.getMessage1      32  linkedlist  thrpt   10   31776755.611 ±  1405838.635  ops/s
CacheBench.getMessage1    1024     chunked  thrpt   10   31433127.788 ±  3902498.303  ops/s
CacheBench.getMessage1    1024  linkedlist  thrpt   10    2638404.341 ±   119171.758  ops/s
CacheBench.getMessage1  102400     chunked  thrpt   10     344799.911 ±    27267.965  ops/s
CacheBench.getMessage1  102400  linkedlist  thrpt   10      20020.925 ±     5392.418  ops/s
CacheBench.getMessage3      32     chunked  thrpt   10  384605640.192 ± 35164543.632  ops/s
CacheBench.getMessage3      32  linkedlist  thrpt   10   14124979.510 ±  2875341.272  ops/s
CacheBench.getMessage3    1024     chunked  thrpt   10   90418506.375 ±  4593688.556  ops/s
CacheBench.getMessage3    1024  linkedlist  thrpt   10    1562687.000 ±    91433.926  ops/s
CacheBench.getMessage3  102400     chunked  thrpt   10     978575.016 ±    44800.161  ops/s
CacheBench.getMessage3  102400  linkedlist  thrpt   10      21614.717 ±     5344.742  ops/s

Where getMessage1 is LivePageCacheImpl::getMessage called @ random positions by 1 thread and
getMessage3 is LivePageCacheImpl::getMessage called @ random positions by 3 threads.
chunked is the version and linkedlist the original version: the difference is quite large and the new version just scale linearly...

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 7, 2019

@franz1981 i think well have to do some more real world testing with it (difference between a isolated bench and a full e2e test), with @qihongxu help hopefully, it might be something odd that by releasing some performance it causes an odd bottleneck or contention somewhere else in code.

Are you ok, if once full CI passes we merge as is, and continue this on a separate pr?

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 7, 2019

Yes if @qihongxu agree to give an hand :P re this pr probably make sense to include at least the second commit of my PR that was unrelated to the new cache impl and was just avoiding an unnecessary query that was always performed...

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 8, 2019

@qihongxu @michaelandrepearce
I'm running now a CI job: it will take some time, but when it will be fine I will merge this 👍
@qihongxu After all the relevant bits re paging will be merged I will send another PR with the same 2 commits I have sent to your branch: are you available to give some help to check the effects of that PR on your tests?

@michaelandrepearce

This comment has been minimized.

Copy link
Contributor

michaelandrepearce commented Jan 8, 2019

@qihongxu big thanks for all the effort on this!! And providing the testing time

@qihongxu

This comment has been minimized.

Copy link
Contributor Author

qihongxu commented Jan 8, 2019

@qihongxu @michaelandrepearce
I'm running now a CI job: it will take some time, but when it will be fine I will merge this
@qihongxu After all the relevant bits re paging will be merged I will send another PR with the same 2 commits I have sent to your branch: are you available to give some help to check the effects of that PR on your tests?

@qihongxu big thanks for all the effort on this!! And providing the testing time

My pleasure :) We are glad to see any boost in perf, especially on paging mode.

I will keep a close watch on the new PR you mentioned and ran some more tests as we have done in this issue if needed.

Also thank you all for reviews and works on this PR!

@franz1981

This comment has been minimized.

Copy link
Contributor

franz1981 commented Jan 8, 2019

FYI: we are working to fix the CI machines now so I won't be able ATM to run any job on it :(
I hope to have the CI machines up and running soon to continue 👍

@asfgit asfgit closed this in 4e55c64 Jan 14, 2019

michaelandrepearce referenced this pull request Jan 15, 2019

ARTEMIS-2224 lock-free LivePageCache + tests
LivePageCacheImpl has been reimplemented to be
lock-free, multi-producer and multi-consumer
in any of its operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.