-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding utility threads for anti-cache eviction #175
Comments
Edit: Nevermind, I think I found the problem. I'm throwing the Exception all the way up but ExecutionEngineJNI isn't checking the right buffer for the exception causing a failure. In point 2: "You will also need to modify ExecutionEngineJNI.antiCacheReadBlocks() to look for errors in the new utility buffers." What kind of errors should I be checking for? After adding the additional buffer, I'm failing three tests, two of which are related to not catching an UnknownBlockAccessException correctly. I am pretty sure I have initialized and passed the new buffer correctly though it must not be passing exceptions it back up to the Java frontend. Here is the latest commit. Thanks. |
Any idea what could be causing this? The logs of commit 52 just have this single error and I'm not sure how to diagnose the source. I'm not sure why this test would have an issue with the changes I've made as it only should affect the AntiCaching.
|
How should I go about testing item 1 and 2? In general, when would AntiCacheEvictionManager::readBlock throw an exception, besides when it's a system error such as UnknownBlockAccess? |
To be honest, the Exception mechanisms are not my strongest coding area, so if I were doing it by myself, I would simply query the AntiCacheEvictionManager for an Abort true/false. Exceptions are probably a better-engineered solution. AntiCacheEvictionManager::readBlock() could throw a (to be implemented) AbortAndReissueException when the block needed is from an SSD or disk backing store. I don't have a function/method yet to relay that information but the AntiCacheEvictionManager will know for which AntiCacheDBs it will stall and for which ones transactions will be aborted and reissued. Perhaps a boolean-returning method TransactionAbort? Something like that is what I would implement. |
Hi Andy and Michael, I have implemented 1 and 2 and it has passed the tests on Jenkins. The commits are: 0bf9ff6..641c861. I have started looking at 3 and have some thoughts about how to handle conflicting reads and writes while evicting data. I will sync up with Stan first and flesh out a base design for discussion with you guys. Thanks, |
I think I'm at a good place to sync as well. The migration between layers works and blocks can be found in any layer. In addition all the multilevel configuration is tested. I just now added a method in AntiCacheDB to identify whether it is a stalling or aborting layer. We need to discuss how we're going to decide this, as well as what specific policy we'd like to start with for block placement. |
More than just passing existing test cases, do you add test cases for the new features? |
There are new tests in the EE (anticache_eviction_manager_test) to test the physical act of migration and LRU block selection, as well as a new junit test (TestAntiCacheMultiLevel) to configure multilevel, evict and merge tuples, as well as fill a level and be forced to write to the one below. They are based upon your edu.brown.hstore.TestAntiCacheManager test. |
Beautiful. Should we merge this code back into the master? |
Let me rerun those performance tests overnight and I'll submit a pull request tomorrow. I want to skim the code and make sure any hacky debugging printfs are gone. |
Ok. Let's try to schedule a call for this Friday. Can you send an email to the group? |
Will do. |
Would you guys be available to take a look at my initial patch for the second bullet point? I wrote a long commit message to convey the overall design. This however is based on my current understanding of the frontend, so please do point out what looks wrong to you. Many thanks! |
It all makes sense to me. Should we meet tomorrow and sync up? |
Tomorrow is NEDB day, so we're all going to be busy. On Thursday, January 29, 2015 10:53 AM Michael Giardino wrote:
Andy Pavlo |
The following is a rough outline for how to add support for an additional thread to operate "down" in the EE while the main PartitionExecutor thread processes transactions. This is not possible in the current architecture because there is a single shared buffer that we use to pass data + error codes between the Java layer and the C++ layer (through JNI). The mapping between the Java and C++ layers is as follows:
ExecutionEngineJNI.exceptionBuffer
-> `VoltDBEngine::m_reusedResultBuffer``ExecutionEngineJNI.deserializer
(this is a wrapper for the ByteBuffer) ->VoltDBEnginer::m_reusedResultBuffer
Note that the memory is allocated in Java and then we pass down the pointers to the C++ layer.
To give an example why this is a problem now with the existing AntiCacheManager implementation, I will now discuss a race condition that can occur. The Java AntiCacheManager has its own thread that it uses to unevict data at a partition. If this uneviction process encounters an error, then it will write a
SerializedException
into that partition's shared buffer. If the PartitionExecutor is processing a txn at the same time it may trip its own exception and want to write into that same buffer. If the other thread is trying to deserialize the exception, then the contents will get collobered. This is a race condition that we have definitely seen crop up before.ExecutionEngineJNI
and update the parameters toExecutionEngine.nativeSetBuffers()
to pass down this new buffer pointer. You can see how we did the same thing withExecutionEngineJNI.ariesLogBuffer
. You will need to updateVoltDBEngine::setBuffers()
accordingly.VoltDBEngine::antiCacheReadBlocks()
to use this new utility buffer when there is an exception. You can see the FIXME in the code that makes reference to this problem. You will also need to modifyExecutionEngineJNI.antiCacheReadBlocks()
to look for errors in the new utility buffers.ReadWriteSet
, and then write out the block. I'm not sure how we just want to do this just yet because I don't want to have to check for a lock every time we normally execute txns. We can talk about this problem when you get to this point.The text was updated successfully, but these errors were encountered: