New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests which show database integrity corruptions #1838

Closed
wants to merge 2 commits into
base: develop-4.x.x
from

Conversation

Projects
None yet
4 participants
@adamretter
Member

adamretter commented Apr 18, 2018

Two test suites for database corruption, one for XML documents and one for Binary documents. Both test suites run the same tests but for the respective document type.

Currently there are:

  • 4 failing test cases for Binary documents.
  • 6 failing test cases for XML documents.

Each test case is executed on a clean instance of the database to ensure there are no "knock-on" effects from one test case to another.

Perhaps the most revealing of the test cases is AbstractRecoveryTest#storeAndLoad().

I have previously fixed and can provide a PR for the binary issues!
Unfortunately the XML corruptions are much more serious, and difficult to understand, as they appear to be between eXist-db's BTree's and the Journal recovery.

I previously raised the issue of these corruptions in July 2017, but they have received little attention. I am opening this issue to make the tests more accessible and to increase visibility.

@adamretter adamretter added this to the 4.1.1 milestone Apr 18, 2018

@dizzzz

This comment has been minimized.

Member

dizzzz commented Apr 18, 2018

impressive

@wolfgangmm

This comment has been minimized.

Member

wolfgangmm commented Apr 18, 2018

There seems to be a bug in the test: looking at AbstractRecoverTest.storeAndLoad, it deletes the temporary data dir via existEmbeddedServer.restart(), so recovery doesn't find a transaction log to recover from.

To see the real issue I have to disable temporary storage. Would be nicer to run it with temporary data dir though.

@adamretter adamretter force-pushed the adamretter:feature/corruption-tests branch from 3a418f5 to cb38457 Apr 19, 2018

@adamretter

This comment has been minimized.

Member

adamretter commented Apr 19, 2018

@wolfgangmm Right you are. I fixed that and just rebased. Still failures, but less of them now.

@wolfgangmm

This comment has been minimized.

Member

wolfgangmm commented Apr 21, 2018

After two days of debugging I managed to fix the first of the underlying issues, bringing the number of failing XML recovery tests down to 4. The remaining failures are likely related. The issue is caused by a long comment in conf.xml whose value needs to be stored into an overflow page as it exceeds 4k. The link to this overflow page got lost as the recovery log did not record the information.

@dizzzz

This comment has been minimized.

Member

dizzzz commented Apr 21, 2018

@wolfgangmm Nice catch wolf!

@adamretter adamretter force-pushed the adamretter:feature/corruption-tests branch from cb38457 to 881d89a May 10, 2018

@adamretter adamretter modified the milestones: eXist-4.1.1, eXist-4.2.1 Jun 6, 2018

@adamretter adamretter modified the milestones: eXist-4.2.1, eXist-4.2.2 Jun 14, 2018

@adamretter adamretter modified the milestones: eXist-4.2.2, eXist-4.3.1 Jul 7, 2018

@adamretter adamretter modified the milestones: eXist-4.3.1, eXist-4.3.2 Jul 24, 2018

@duncdrum

This comment has been minimized.

Member

duncdrum commented Sep 21, 2018

needs a rebase

@adamretter adamretter modified the milestones: eXist-4.3.2, eXist-4.4.1 Sep 21, 2018

@adamretter adamretter force-pushed the adamretter:feature/corruption-tests branch from 881d89a to f114a8c Sep 25, 2018

@adamretter adamretter changed the base branch from develop to develop-4.x.x Sep 25, 2018

adamretter added some commits Apr 18, 2018

[test/bugfix] Deadlock timeout on Lucene ConcurrencyTest was previous…
…ly too much (prev. 1 hour, now 3 minutes)

@adamretter adamretter force-pushed the adamretter:feature/corruption-tests branch from f114a8c to 18dd9a3 Sep 27, 2018

@dizzzz

This comment has been minimized.

Member

dizzzz commented Oct 31, 2018

Needs to be rebased

@adamretter

This comment has been minimized.

Member

adamretter commented Oct 31, 2018

Closed/Superseded by #2241

@adamretter adamretter closed this Oct 31, 2018

@adamretter adamretter deleted the adamretter:feature/corruption-tests branch Oct 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment