Skip to content

Conversation

@adamretter
Copy link
Contributor

@adamretter adamretter commented Apr 18, 2018

Two test suites for database corruption, one for XML documents and one for Binary documents. Both test suites run the same tests but for the respective document type.

Currently there are:

  • 4 failing test cases for Binary documents.
  • 6 failing test cases for XML documents.

Each test case is executed on a clean instance of the database to ensure there are no "knock-on" effects from one test case to another.

Perhaps the most revealing of the test cases is AbstractRecoveryTest#storeAndLoad().

I have previously fixed and can provide a PR for the binary issues!
Unfortunately the XML corruptions are much more serious, and difficult to understand, as they appear to be between eXist-db's BTree's and the Journal recovery.

I previously raised the issue of these corruptions in July 2017, but they have received little attention. I am opening this issue to make the tests more accessible and to increase visibility.

@adamretter adamretter added bug issue confirmed as bug high prio labels Apr 18, 2018
@adamretter adamretter added this to the 4.1.1 milestone Apr 18, 2018
@dizzzz
Copy link
Member

dizzzz commented Apr 18, 2018

impressive

@wolfgangmm
Copy link
Member

wolfgangmm commented Apr 18, 2018

There seems to be a bug in the test: looking at AbstractRecoverTest.storeAndLoad, it deletes the temporary data dir via existEmbeddedServer.restart(), so recovery doesn't find a transaction log to recover from.

To see the real issue I have to disable temporary storage. Would be nicer to run it with temporary data dir though.

@adamretter adamretter force-pushed the feature/corruption-tests branch from 3a418f5 to cb38457 Compare April 19, 2018 21:48
@adamretter
Copy link
Contributor Author

@wolfgangmm Right you are. I fixed that and just rebased. Still failures, but less of them now.

@wolfgangmm
Copy link
Member

After two days of debugging I managed to fix the first of the underlying issues, bringing the number of failing XML recovery tests down to 4. The remaining failures are likely related. The issue is caused by a long comment in conf.xml whose value needs to be stored into an overflow page as it exceeds 4k. The link to this overflow page got lost as the recovery log did not record the information.

@dizzzz
Copy link
Member

dizzzz commented Apr 21, 2018

@wolfgangmm Nice catch wolf!

@adamretter adamretter force-pushed the feature/corruption-tests branch from cb38457 to 881d89a Compare May 10, 2018 12:23
@adamretter adamretter modified the milestones: eXist-4.1.1, eXist-4.2.1 Jun 6, 2018
@adamretter adamretter modified the milestones: eXist-4.2.1, eXist-4.2.2 Jun 14, 2018
@adamretter adamretter modified the milestones: eXist-4.2.2, eXist-4.3.1 Jul 7, 2018
@adamretter adamretter modified the milestones: eXist-4.3.1, eXist-4.3.2 Jul 24, 2018
@duncdrum
Copy link
Contributor

needs a rebase

@adamretter adamretter modified the milestones: eXist-4.3.2, eXist-4.4.1 Sep 21, 2018
@adamretter adamretter force-pushed the feature/corruption-tests branch from 881d89a to f114a8c Compare September 25, 2018 16:43
@adamretter adamretter changed the base branch from develop to develop-4.x.x September 25, 2018 16:43
@dizzzz
Copy link
Member

dizzzz commented Oct 31, 2018

Needs to be rebased

@adamretter
Copy link
Contributor Author

adamretter commented Oct 31, 2018

Closed/Superseded by #2241

@adamretter adamretter closed this Oct 31, 2018
@adamretter adamretter deleted the feature/corruption-tests branch October 31, 2018 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug issue confirmed as bug high prio

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants