New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/remove checkpoint double pk lookup closes #333 #340
Conversation
63ebb4c
to
c1cdd15
Compare
Rebased on master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diegosalvi
I am sorry, but build is failing consistently:
[ERROR] testWithTransactionsWithCheckpoints(herddb.server.hammer.DirectMultipleConcurrentUpdatesTest) Time elapsed: 13.647 s <<< ERROR!
herddb.model.StatementExecutionException: herddb.storage.DataStorageManagerException: inconsistency! table mytable no record in memory for 746573745f313438 page 74, activePages {1=[10083,160,6400], 5=[10108,160,5920], 6=[10085,160,4960], 7=[10082,160,6240], 8=[10102,160,6240], 9=[10098,160,7040], 10=[10137,160,5920], 11=[10235,159,5883], 12=[10240,160,7040], 13=[10110,160,7200], 14=[10112,160,6400], 15=[10116,160,6880], 16=[10133,160,6720], 17=[10143,161,6279], 18=[10086,160,6080], 19=[10090,160,5920], 20=[10111,160,5280], 21=[10111,160,5760], 22=[10111,160,5600], 23=[10110,160,6400], 24=[10115,160,6080], 25=[10112,160,6400], 26=[10110,160,6720], 27=[10112,160,6240], 28=[10106,160,6240], 29=[10115,160,6880], 30=[10112,160,6400], 31=[10116,160,7040], 32=[10128,160,5920], 33=[10143,161,7245], 34=[10098,160,6080], 35=[7220,160,4960], 37=[10109,160,6080], 38=[10111,160,6240], 39=[10106,160,5760], 40=[10106,160,5600], 41=[10117,160,5120], 42=[10118,160,4480], 43=[10106,160,4800], 44=[10110,160,5920], 45=[10114,160,4320], 46=[10104,160,4800], 47=[10109,160,5920], 48=[10111,160,3040], 49=[10111,160,3520], 50=[10111,160,4320], 51=[10110,160,2880], 52=[10103,160,3200], 53=[10113,160,2400], 54=[10111,160,3040], 55=[10101,160,2880], 56=[10107,160,2400], 57=[10107,160,1920], 58=[10108,160,1920], 59=[10108,160,1120], 60=[10112,160,1600], 61=[10108,160,1440], 62=[10106,160,1120], 63=[10104,160,320], 64=[10111,160,320], 65=[10107,160,0], 67=[10113,160,0], 68=[10105,160,0], 69=[10114,160,0], 70=[10106,160,0], 71=[10124,160,0], 72=[10095,160,0], 73=[10102,160,0], 74=[10100,160,0], 75=[10115,160,0]} after many trials
at herddb.core.DBManager.executePlan(DBManager.java:672)
at herddb.core.TestUtils.scan(TestUtils.java:71)
at herddb.server.hammer.DirectMultipleConcurrentUpdatesTest.performTest(DirectMultipleConcurrentUpdatesTest.java:215)
at herddb.server.hammer.DirectMultipleConcurrentUpdatesTest.testWithTransactionsWithCheckpoints(DirectMultipleConcurrentUpdatesTest.java:87)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
Caused by: herddb.storage.DataStorageManagerException: inconsistency! table mytable no record in memory for 746573745f313438 page 74, activePages {1=[10083,160,6400], 5=[10108,160,5920], 6=[10085,160,4960], 7=[10082,160,6240], 8=[10102,160,6240], 9=[10098,160,7040], 10=[10137,160,5920], 11=[10235,159,5883], 12=[10240,160,7040], 13=[10110,160,7200], 14=[10112,160,6400], 15=[10116,160,6880], 16=[10133,160,6720], 17=[10143,161,6279], 18=[10086,160,6080], 19=[10090,160,5920], 20=[10111,160,5280], 21=[10111,160,5760], 22=[10111,160,5600], 23=[10110,160,6400], 24=[10115,160,6080], 25=[10112,160,6400], 26=[10110,160,6720], 27=[10112,160,6240], 28=[10106,160,6240], 29=[10115,160,6880], 30=[10112,160,6400], 31=[10116,160,7040], 32=[10128,160,5920], 33=[10143,161,7245], 34=[10098,160,6080], 35=[7220,160,4960], 37=[10109,160,6080], 38=[10111,160,6240], 39=[10106,160,5760], 40=[10106,160,5600], 41=[10117,160,5120], 42=[10118,160,4480], 43=[10106,160,4800], 44=[10110,160,5920], 45=[10114,160,4320], 46=[10104,160,4800], 47=[10109,160,5920], 48=[10111,160,3040], 49=[10111,160,3520], 50=[10111,160,4320], 51=[10110,160,2880], 52=[10103,160,3200], 53=[10113,160,2400], 54=[10111,160,3040], 55=[10101,160,2880], 56=[10107,160,2400], 57=[10107,160,1920], 58=[10108,160,1920], 59=[10108,160,1120], 60=[10112,160,1600], 61=[10108,160,1440], 62=[10106,160,1120], 63=[10104,160,320], 64=[10111,160,320], 65=[10107,160,0], 67=[10113,160,0], 68=[10105,160,0], 69=[10114,160,0], 70=[10106,160,0], 71=[10124,160,0], 72=[10095,160,0], 73=[10102,160,0], 74=[10100,160,0], 75=[10115,160,0]} after many trials
at herddb.core.TableManager.fetchRecord(TableManager.java:2944)
at herddb.core.TableManager.accessRecord(TableManager.java:2874)
at herddb.core.TableManager.lambda$streamTableData$22(TableManager.java:2801)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.Streams$StreamBuilderImpl.tryAdvance(Streams.java:397)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at herddb.core.StreamDataScanner.fetchNext(StreamDataScanner.java:53)
at herddb.core.StreamDataScanner.<init>(StreamDataScanner.java:44)
at herddb.core.TableManager.scanWithStream(TableManager.java:2636)
at herddb.core.TableManager.scan(TableManager.java:2374)
at herddb.core.TableSpaceManager.scan(TableSpaceManager.java:543)
at herddb.core.DBManager.scan(DBManager.java:753)
at herddb.core.DBManager.executePlanAsync(DBManager.java:684)
at herddb.core.DBManager.executePlan(DBManager.java:661)
... 30 more
I'll double check. I didn't have any problems on my pc
|
I cannot replicate the problem but I presume that is related with scan procedure. Actually checkpoint execute presuming a "stop the world" procedure (for both read and writes) but is isn't like that: scan do not take the checkpointLock so concurrent checkpoints can be done. Fetched records and pages can be moved away during scan (and record fetch) generating the problem. We could force pause of scans too or abort this patch (to do just one PK access we need to checkAndSet the PK so publishing on PK records from unflushed/uncreated pages) |
c1cdd15
to
2b47349
Compare
Pushed a modification on record memory "publish" Still a WIP |
abstract KeyToPageIndex createIndex(); | ||
|
||
@Test | ||
public void putIf() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add another test with concurrent updates ?
Added a concurrent test for "putIf" in key to page. |
Commented code removed, my bad :( The stacktrace can be kept to known why the test exists in the firts place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work.
will merge as soon as Travis is green again
This pull requext resolve #333
A new method on KeyToPageIndex has been added to update index values ONLY if current value is as expected parameter (like atomic compare and set). Such method is used during dirty page rebuild to avoid a double lookup (1st to known is current record is "old" and a 2nd one to update the page reference).
To be able to udpate page reference, page id "generation" has been moved in checkpoint code and there maintained.
This patch must be merged after #338 (is build on top of it)