Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PermissionDeniedException in NativeBroker.defragXMLresource #5311

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

line-o
Copy link
Member

@line-o line-o commented May 23, 2024

Description:

Additional bugfix for NativeBroker.defragXMLresource where in specific setups with modules and collections with set group ID set a PermissionDeniedException is thrown when the modified XML resource is defragmented.

Example setup

  • an xml resource owned by admin/test with -rwxrwx---
  • in a collection owned by admin/test with crwxrws---
  • modified by a module owned by test/test with -rwxr-sr-x
  • called by guest

Before this patch is applied the defragXMLResource method attempted to change the owner of the tempDoc which it cannot do as the effective subject is guest.

Reference:

refs #5273
refs #5276

Type of tests:

Code review was addressed in UpdateInsertTriggersDefrag.java

When XML resources are modified using an effective group because the setGID bit is set on the resource NativeBroker.defragXMLResource might throw a PermissionDeniedException.

With this patch applied, this can no longer happen as the call to DocumentImpl#copyOf is removed in defragXMLresource.
The temporary document is just a container to realign nodes on one BTree page and is destroyed within the method.
@line-o line-o requested a review from a team as a code owner May 23, 2024 21:20
Copy link
Member

@adamretter adamretter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely needs a more nuanced fix

final DocumentImpl tempDoc = new DocumentImpl(null, doc.getDocId(), doc);
tempDoc.copyOf(this, doc, doc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think just removing this is the correct or secure solution. Can you write a bit more detail in the description of the PR about exactly what the problem was please, that might help me to suggest a better solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message body has some more detail.

With this patch applied, this can no longer happen as the call to DocumentImpl#copyOf is removed in defragXMLresource.
The temporary document is just a container to realign nodes on one BTree page and is destroyed within the method.

I hope this clarifies the motivation and justification.
I would like to point out that we arrived at this solution together with @wolfgangmm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely needs a more nuanced fix

Please only block PRs where you have either proof that it creates a problem or at least share the thoughts what the issue might be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only block PRs where you have either proof that it creates a problem

Obviously!

or at least share the thoughts what the issue might be

I am concerned that the temporary document which looks persistent to me is not correctly initialised, also I am concerned that the security on it is now too relaxed, this could lead to a data leak/exploit.

Can you clearly describe the initial problem that you are trying to solve in the PR description please. For example, something like: There is an active user U with the security profile s(U), there is document D with the permissions p(D), when the action A is executed by the subject AS with security profile s(AS), it is rejected because s(AS) is incompatible with p(D).
Obviously you need to replace the symbols above with the actual details.

hope this clarifies the motivation and justification.

It just discusses the solution you arrived at, it does not clearly describe the problem you were trying to solve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debugging this code with Juri yesterday and as the original author of the defrag method, I confirm that the offending line tempDoc.copyOf(this, doc, doc); is not only unnecessary here, but can cause database corruption. The tempDoc is only used as a temporary container for copying nodes during defrag and is thrown away afterwards! Original permissions and ownership on the resource are not affected by the defragXMLResource operation in any way, because the tempDoc is never written. What is written is the original DocumentImpl, which still has the proper permissions and ownership.

It seems I originally added the call to tempDoc.copyOf 19 years ago and it was preserved during a series of refactorings later though the implementation of DocumentImpl had changed to the point that it now may actually cause a corruption when called in this case.

Please be aware that this issue is a ticking time bomb that will lead to data wipe out!!! Together with the even more fatal #5296 (because it's easier to run into) this is absolutely worth a hotfix, so I would give it a thumbs up, not excluding that more improvements could be done at a later point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wolfgangmm thanks for your reply. It is interesting to hear about these symptoms, and it does sound like something needs to be fixed. However, the actual original issue and its cause remain unclear. I would like further information please so that we may all understand it...

but can cause database corruption

Could you explain how that is possible please? As I understood it from @line-o's description there was a PermissionDeniedException which has not yet been explained. I am not sure how a PermissionDeniedException, which is a genuine way of forbidding a subject from doing something they are not allowed to do, could cause a database corruption here?

Original permissions and ownership on the resource are not affected by the defragXMLResource operation in any way, because the tempDoc is never written.

I did not expect that they would be.

What is written is the original DocumentImpl, which still has the proper permissions and ownership.

Okay.

What I (and presumably others) don't yet understand, and what has not yet been explained, is how can the current approach that places the same permissions onto tempDoc as the original resource that is to be defragmented, cause a PermissionDeniedException (and then also if I now understand you, a subsequent database corruption)?

Please be aware that this issue is a ticking time bomb that will lead to data wipe out!!!

I agree that it sounds like it needs to be fixed. At the moment I don't have enough information to see if the proposed fix correctly addresses the cause. I trust you and @line-o will provide the requested information about the underlying issue and cause...

Together with the even more fatal #5296 (because it's easier to run into)

Let's consider that separately. As at the moment #5296 is blocked by work that is still needed to address issues introduced in #5276 which itself was rushed and merged in violation of the agreed community process for reviewing and merging PRs.

this is absolutely worth a hotfix,

From what you have said so far, I agree. We just need more information to be able to complete the review of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@line-o added some information about how to reproduce the issue. It requires a certain combination of permissions. The corruption occurs because the PermissionDeniedException aborts the defrag process after some steps have already been applied. Therefore another way to prevent this might be to move the offending code further up to the start of the method. We did not follow this route as dropping the line is the cleaner way to solve the issue.

The PermissionDeniedException is eventually thrown in line 664 of DocumentImpl#copyOf:

permissions.setOwner(prev.get_1().getOwner());

why this happens I cannot explain entirely and it seems very weird that the owners would be different. It must be related to the setgid. But in itself it has nothing to do with the actual defrag routine and points to another bug in permission handling.

Copy link
Member

@adamretter adamretter May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was my feeling from the start, which is why I wanted more information from yourselves. Based on your summation it seems to me that the this PR is a workaround and not a fix for the root issue.

Well, the root issue this PR tries to solve is the database corruption resulting from the defrag operation. The only correct resolution for this is to fix defrag, which we did. I strongly object against calling a fix a workaround.

Fixing the permission issue may help with other operations, but won't change this PR in any way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wolfgangmm I think you have accidentally deleted or overwritten my comments
Screenshot 2024-05-24 at 16 39 27

@wolfgangmm Are you able to restore my comments please? If not, I can try and re-create them...

…Resource

Testing has shown that the previous call to construct tempDoc was actually the correct one,
as it is using the default file permissions of the pool.
That is key for situations in which the subject initiating the defragmentation is not the
_owner_ of the document and only belongs to the group with write permission.
Copy link

sonarcloud bot commented May 24, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

@line-o
Copy link
Member Author

line-o commented May 24, 2024

@adamretter Please note that the test was adapted to address your review in #5273 and also note that the test was written based on src/test/java/org/exist/xquery/update/AbstractTestUpdate.java
Any improvements in this family of tests for updates should be addressed separately. The seriousness of the problem fixed weighs higher in my opinion.

@adamretter
Copy link
Member

adamretter commented May 24, 2024

Any improvements in this family of tests for updates should be addressed separately. The seriousness of the problem fixed weighs higher in my opinion.

I don't agree. Your code fixed one thing you were interested in but also breaks another thing that you are not interested in. That is not how changes should be implemented in collaborative project IMHO.

@line-o
Copy link
Member Author

line-o commented May 24, 2024

@adamretter What do you mean with this statement?

Your code fixed one thing you were interested in but also breaks another thing that you are not interested in.

What did the code break and which code are you talking about, the test?

@line-o
Copy link
Member Author

line-o commented May 24, 2024

Maybe let's look at this the other way around. What is the use of the call to DocumentImpl#copyOf here? @adamretter

@wolfgangmm
Copy link
Member

@adamretter please watch your language, which I perceive as disrespectful in the last comments answering @line-o. We have no direct interest in this bugfix as we're not using any code affected by it. It's all just for the community's benefit.

@adamretter
Copy link
Member

@wolfgangmm I will. No disrespect was or is intended. I'm sorry you felt that way. I have constantly written offers to help throughout this thread. I trust you have seen them?
I think I've also made it clear that I'd like to see this fixed. It sounds like it could be important. But I want to see a fix to the problem and not to the symptom.
To be doubly clear - I can't accept this PR when the problem (the root cause) is not understood.

@wolfgangmm
Copy link
Member

@adamretter good. @line-o submitted the PR with the best intentions and did a great job reproducing this rather puzzling edge case.

In principle I agree the problem concerning permissions would be worth more investigation because it might also affect other copy operations. I was just arguing that it is not the correct solution for the corruption issue we try to address with this PR. The offending line was carried on over the years without checking if it still applies to the particular case of the defrag operation. As things are, it should clearly be dropped.

I'll try to provide more details concerning the other, the permission problem as time allows. It should be straightforward to reproduce using the setup instructions @line-o provided.

@adamretter
Copy link
Member

adamretter commented May 24, 2024

@line-o submitted the PR with the best intentions

I have never questioned that, nor have I cast any doubt to the contrary.

I am simply following a scientific method without emotion to: (a) understand the problem, (b) locate the root of the problem, (c) propose a hypothesis for a fix, and (d) test and prove such a hypotheis through experimental evidence.

it is not the correct solution for the corruption issue we try to address with this PR

@wolfgangmm Perhaps I am misunderstanding something... My understanding is:

  1. copyOf sets the permissions of the tempDoc the same as the original doc.
  2. defrag throws a PermissionDeniedException; which only as a side-effect corrupts the database due to inconsistent state.
  3. you state that the issue is not within copyOf, but that there is a problem with permissions elsewhere whose symptoms are exhibited through the PermissionDeniedException.

Do I have that correct?

Assuming that I have understood correctly, then I don't understand how this PR fixes the real issue. Removing the call to copyOf doesn't solve the issue, rather it still feels to me that this is indeed a workaround, that if anything makes the situation more difficult as it hides the evidence of an underlying issue which before this PR we could of observed and therefore potentially addressed.

I see that there is a test added in this PR. If I was to run just that test without the changes in this PR against eXist-db 7.0.0-SNAPSHOT, would it show me the PermissionDeniedException? If so I can then use that to debug the underlying issue, and I will work forward on that basis. If not, then can you tell me what the test demonstrates?

@line-o
Copy link
Member Author

line-o commented May 30, 2024

@adamretter I am wondering how to put two recent quotes from you together:

last week in this PR:

Your code fixed one thing you were interested in but also breaks another thing that you are not interested in.

By the way, you still haven't answered what I broke, allegedly.

In Slack 3 days ago:

My general advice to my customers still with eXist-db is not to use XQuery Update, and to instead always overtime the complete document

So I fixed a thing users in our community are interested in: exist-db's XQuery update extension. You clearly stated that you are not interested in it. So what is your vested interest here, as this change only affects defragmentation which can only ever run after modification?

@adamretter
Copy link
Member

adamretter commented May 30, 2024

By the way, you still haven't answered what I broke, allegedly.

Yes I did... it's been in the review comments of your PR for weeks now - as discussed there, your newly added tests are leaking resources.

My general advice to my customers still with eXist-db is not to use XQuery Update, and to instead always overtime the complete document

So I fixed a thing users in our community are interested in: exist-db's XQuery update extension. You clearly stated that you are not interested in it. So what is your vested interest here, as this change only affects defragmentation which can only ever run after modification?

I don't really see what you are getting at here! My interest is advising eXist-db users as to the best way of using eXist-db. Your fix addresses an issue with defragmentation, it does not fix other issues with XQuery Update.

@line-o
Copy link
Member Author

line-o commented May 30, 2024

OK, so then have a look at the test in this PR as it is changed to address your review.

it does not fix other issues with XQuery Update.

But it does fix an existing, reproducable and severe issue with XQuery Update. Other issues, you hinted at, are out of scope.

@adamretter
Copy link
Member

OK, so then have a look at the test in this PR as it is changed to address your review.

I'm not seeing that in this PR.

@line-o
Copy link
Member Author

line-o commented May 30, 2024

Well it is the other changed file.
And your review so far did only address the removal of the unnecessary call of copyOf.
I would still like to hear why you think it cannot be removed.

Copy link
Member

@dizzzz dizzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

As it addresses a 'timebomb' I'd vote to pull the PR in ASAP.

@adamretter
Copy link
Member

Well it is the other changed file.

No it isn't. It does not fully address my review and is still missing several things.

@line-o
Copy link
Member Author

line-o commented Jun 12, 2024

No it isn't. It does not fully address my review and is still missing several things.

I would need some more info on what you think is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants