Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/remove checkpoint double pk lookup closes #333 #340

Merged
merged 7 commits into from Mar 5, 2019

Conversation

diegosalvi
Copy link
Contributor

@diegosalvi diegosalvi commented Feb 19, 2019

This pull requext resolve #333

A new method on KeyToPageIndex has been added to update index values ONLY if current value is as expected parameter (like atomic compare and set). Such method is used during dirty page rebuild to avoid a double lookup (1st to known is current record is "old" and a 2nd one to update the page reference).

To be able to udpate page reference, page id "generation" has been moved in checkpoint code and there maintained.

This patch must be merged after #338 (is build on top of it)

@diegosalvi diegosalvi self-assigned this Feb 19, 2019
@diegosalvi diegosalvi force-pushed the fix/remove-checkpoint-double-pk-lookup branch 2 times, most recently from 63ebb4c to c1cdd15 Compare February 20, 2019 07:48
@diegosalvi
Copy link
Contributor Author

Rebased on master

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diegosalvi
I am sorry, but build is failing consistently:

[ERROR] testWithTransactionsWithCheckpoints(herddb.server.hammer.DirectMultipleConcurrentUpdatesTest)  Time elapsed: 13.647 s  <<< ERROR!
herddb.model.StatementExecutionException: herddb.storage.DataStorageManagerException: inconsistency! table mytable no record in memory for 746573745f313438 page 74, activePages {1=[10083,160,6400], 5=[10108,160,5920], 6=[10085,160,4960], 7=[10082,160,6240], 8=[10102,160,6240], 9=[10098,160,7040], 10=[10137,160,5920], 11=[10235,159,5883], 12=[10240,160,7040], 13=[10110,160,7200], 14=[10112,160,6400], 15=[10116,160,6880], 16=[10133,160,6720], 17=[10143,161,6279], 18=[10086,160,6080], 19=[10090,160,5920], 20=[10111,160,5280], 21=[10111,160,5760], 22=[10111,160,5600], 23=[10110,160,6400], 24=[10115,160,6080], 25=[10112,160,6400], 26=[10110,160,6720], 27=[10112,160,6240], 28=[10106,160,6240], 29=[10115,160,6880], 30=[10112,160,6400], 31=[10116,160,7040], 32=[10128,160,5920], 33=[10143,161,7245], 34=[10098,160,6080], 35=[7220,160,4960], 37=[10109,160,6080], 38=[10111,160,6240], 39=[10106,160,5760], 40=[10106,160,5600], 41=[10117,160,5120], 42=[10118,160,4480], 43=[10106,160,4800], 44=[10110,160,5920], 45=[10114,160,4320], 46=[10104,160,4800], 47=[10109,160,5920], 48=[10111,160,3040], 49=[10111,160,3520], 50=[10111,160,4320], 51=[10110,160,2880], 52=[10103,160,3200], 53=[10113,160,2400], 54=[10111,160,3040], 55=[10101,160,2880], 56=[10107,160,2400], 57=[10107,160,1920], 58=[10108,160,1920], 59=[10108,160,1120], 60=[10112,160,1600], 61=[10108,160,1440], 62=[10106,160,1120], 63=[10104,160,320], 64=[10111,160,320], 65=[10107,160,0], 67=[10113,160,0], 68=[10105,160,0], 69=[10114,160,0], 70=[10106,160,0], 71=[10124,160,0], 72=[10095,160,0], 73=[10102,160,0], 74=[10100,160,0], 75=[10115,160,0]} after many trials
	at herddb.core.DBManager.executePlan(DBManager.java:672)
	at herddb.core.TestUtils.scan(TestUtils.java:71)
	at herddb.server.hammer.DirectMultipleConcurrentUpdatesTest.performTest(DirectMultipleConcurrentUpdatesTest.java:215)
	at herddb.server.hammer.DirectMultipleConcurrentUpdatesTest.testWithTransactionsWithCheckpoints(DirectMultipleConcurrentUpdatesTest.java:87)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
Caused by: herddb.storage.DataStorageManagerException: inconsistency! table mytable no record in memory for 746573745f313438 page 74, activePages {1=[10083,160,6400], 5=[10108,160,5920], 6=[10085,160,4960], 7=[10082,160,6240], 8=[10102,160,6240], 9=[10098,160,7040], 10=[10137,160,5920], 11=[10235,159,5883], 12=[10240,160,7040], 13=[10110,160,7200], 14=[10112,160,6400], 15=[10116,160,6880], 16=[10133,160,6720], 17=[10143,161,6279], 18=[10086,160,6080], 19=[10090,160,5920], 20=[10111,160,5280], 21=[10111,160,5760], 22=[10111,160,5600], 23=[10110,160,6400], 24=[10115,160,6080], 25=[10112,160,6400], 26=[10110,160,6720], 27=[10112,160,6240], 28=[10106,160,6240], 29=[10115,160,6880], 30=[10112,160,6400], 31=[10116,160,7040], 32=[10128,160,5920], 33=[10143,161,7245], 34=[10098,160,6080], 35=[7220,160,4960], 37=[10109,160,6080], 38=[10111,160,6240], 39=[10106,160,5760], 40=[10106,160,5600], 41=[10117,160,5120], 42=[10118,160,4480], 43=[10106,160,4800], 44=[10110,160,5920], 45=[10114,160,4320], 46=[10104,160,4800], 47=[10109,160,5920], 48=[10111,160,3040], 49=[10111,160,3520], 50=[10111,160,4320], 51=[10110,160,2880], 52=[10103,160,3200], 53=[10113,160,2400], 54=[10111,160,3040], 55=[10101,160,2880], 56=[10107,160,2400], 57=[10107,160,1920], 58=[10108,160,1920], 59=[10108,160,1120], 60=[10112,160,1600], 61=[10108,160,1440], 62=[10106,160,1120], 63=[10104,160,320], 64=[10111,160,320], 65=[10107,160,0], 67=[10113,160,0], 68=[10105,160,0], 69=[10114,160,0], 70=[10106,160,0], 71=[10124,160,0], 72=[10095,160,0], 73=[10102,160,0], 74=[10100,160,0], 75=[10115,160,0]} after many trials
	at herddb.core.TableManager.fetchRecord(TableManager.java:2944)
	at herddb.core.TableManager.accessRecord(TableManager.java:2874)
	at herddb.core.TableManager.lambda$streamTableData$22(TableManager.java:2801)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.stream.Streams$StreamBuilderImpl.tryAdvance(Streams.java:397)
	at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
	at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
	at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
	at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
	at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
	at herddb.core.StreamDataScanner.fetchNext(StreamDataScanner.java:53)
	at herddb.core.StreamDataScanner.<init>(StreamDataScanner.java:44)
	at herddb.core.TableManager.scanWithStream(TableManager.java:2636)
	at herddb.core.TableManager.scan(TableManager.java:2374)
	at herddb.core.TableSpaceManager.scan(TableSpaceManager.java:543)
	at herddb.core.DBManager.scan(DBManager.java:753)
	at herddb.core.DBManager.executePlanAsync(DBManager.java:684)
	at herddb.core.DBManager.executePlan(DBManager.java:661)
	... 30 more

@diegosalvi
Copy link
Contributor Author

diegosalvi commented Feb 21, 2019 via email

@diegosalvi
Copy link
Contributor Author

I cannot replicate the problem but I presume that is related with scan procedure. Actually checkpoint execute presuming a "stop the world" procedure (for both read and writes) but is isn't like that: scan do not take the checkpointLock so concurrent checkpoints can be done. Fetched records and pages can be moved away during scan (and record fetch) generating the problem.

We could force pause of scans too or abort this patch (to do just one PK access we need to checkAndSet the PK so publishing on PK records from unflushed/uncreated pages)

@diegosalvi diegosalvi closed this Feb 25, 2019
@diegosalvi diegosalvi reopened this Feb 25, 2019
@diegosalvi diegosalvi force-pushed the fix/remove-checkpoint-double-pk-lookup branch from c1cdd15 to 2b47349 Compare March 4, 2019 17:01
@diegosalvi
Copy link
Contributor Author

Pushed a modification on record memory "publish"

Still a WIP

@eolivelli eolivelli self-requested a review March 5, 2019 11:37
abstract KeyToPageIndex createIndex();

@Test
public void putIf() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add another test with concurrent updates ?

@diegosalvi
Copy link
Contributor Author

Added a concurrent test for "putIf" in key to page.
Fixed an error on MemoryDataStorageManager refusing index pages to be rewritten (actually BLink rewrite pages when it known they arent checkpoint pages)

@diegosalvi
Copy link
Contributor Author

Commented code removed, my bad :(

The stacktrace can be kept to known why the test exists in the firts place.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.
will merge as soon as Travis is green again

@diegosalvi diegosalvi merged commit 6393743 into master Mar 5, 2019
@diegosalvi diegosalvi deleted the fix/remove-checkpoint-double-pk-lookup branch March 5, 2019 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checkpoint: reduce PK lookup
2 participants