Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue190 Add a recovery test for Compensate and AfterLRA annotations #240

Merged
merged 2 commits into from Nov 6, 2019

Conversation

xstefank
Copy link
Member

@xstefank xstefank commented Oct 7, 2019

Signed-off-by: xstefank xstefank122@gmail.com

resolves #190

@mmusgrov
Copy link
Contributor

mmusgrov commented Oct 7, 2019

I cannot find a test that validates that it is possible to register for an AfterLRA notification when the LRA is already cancelling or closing. This requirement is implicit in the spec but it is probably best to add a sentence that makes it explicit. This behaviour is required since the cancellation or closure of an LRA can take an indefinite amount of time. In contrast, note that it would never make sense to allow participants to register after the end phase has begun.

@xstefank
Copy link
Member Author

xstefank commented Oct 7, 2019

@mmusgrov since that scenario doesn't require a restart of the service, do you mind if I move it to a new issue?

@mmusgrov
Copy link
Contributor

mmusgrov commented Oct 8, 2019 via email

@xstefank
Copy link
Member Author

xstefank commented Oct 8, 2019

the mentioned scenario moved to #241. This PR is ready for review.

// wait for the timeout cancellation of the LRA while the service is still down
// Compensate should be attempted to be called while the participant service is down
try {
Thread.sleep(500);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeOut is also 500, so this is not according to the spec.

Spec around timeout (javaDoc): When this period has elapsed the LRA becomes eligible for cancellation.

This doesn't mean that when period has elapsed, the Compensate method is called.

So that means that we need to adapt the spec; "When this period has elapsed, LRA is cancelled and the Compensate method (if any) MUST be called immediate."

With the current specification, there is no deterministic way in writing such a test as this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense as when the participant or the service starting the LRA specifies timeout then this is a declaration that it will keep the resources for compensation for this period (or slightly more). If we allow Compensation for a timeout in the eventual point in the future after the timelimit is reached this information may no longer be present or relevant and thus we may get into invalid states.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we say something like when the period has elapsed the LRA should make reasonable effort to only transition to cancelled rather than mandate a specific action at a specific time? I am very hesitant to describe something as being done at an exact time. Even then, perhaps the LRA checks if it is not timed out, this check passes, then it moves to complete the LRA but due to a thread scheduling delay the actual complete call takes place after the timeout has occured. For example:
Time 1: lra.setTimeout(time3)
Time 2: if (!lraHasTimedOut(lra.getTimeout))
Time 3: // server overload, delay of an undefined period of time before next instruction, this LRA is now in theory timed out
Time 4: lra.complete() // Impossible to prevent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should not be any suggestion that a participant can ever decided ot perform it's compensation procedure unilaterally and still expect an overall atomic outcome for the transaction to be achieved.

A resource should guarantee to keep the resources available for compensation until the LRA directs it otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the period has elapsed the LRA should make reasonable effort to only transition to cancelled rather than mandate a specific action at a specific time?

Yes, this is what we currently have in the spec (or how I interpret it, we have no specific time requirements)
But this TCK test mandate that we have specific time requirements. During undeploy, wait and deploy, it assumes the timeout is completely processed.

We have to accept that when we say 'eventual consistency' and 'at a certain point in the future', etc ..., you basically loose the ability to really test this.

We have already introduced some 'tricks' to trigger the recovery (to make things testable) but maybe we should go the opposite direction and accept that parts of the spec can't be tested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xstefank @mmusgrov the way we are discussion things at the moment isn't working at all. People aren't replying to the correct answer, you can't indicate to what question you reply, Github is not even showing all comments anymore, etc...

This way, it is complete waste of my and everyone else his time.

And yes, this comment on the PR isn't either a good place for the above comment but that is the way we seem to work ....

Copy link
Contributor

@mmusgrov mmusgrov Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to reply to a specific comment github has a "Quote Reply" option in the ... pull down.

When github hides comments it adds a blue clickable "Load More" where it has hidden them. Clicking on that will expose them again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdebusscher Part of the problem is that we are fundamentally at odds on how this spec should work which generates lots of discussion. Are the issues a better way to discuss things? If so then as soon as it becomes clear on a PR that there are disagreements we should move back onto the issue for further discussion. Or do you have another suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make assumption on LraRecoveryService to be instantiable without CDI?

@xstefank I think it is okay for the TCK but I would prefer to avoid reliance on CDI in the spec itself since that constrains which frameworks/containers we can integrate with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdebusscher let's take the discussion on how to better discuss issues on gitter please. I agree that a lot of at least my PRs often diverge into some different issues.

@mmusgrov

Can we make assumption on LraRecoveryService to be instantiable without CDI?

@xstefank I think it is okay for the TCK but I would prefer to avoid reliance on CDI in the spec itself since that constrains which frameworks/containers we can integrate with.

This is still only TCK. Basically the original idea was to have it instantianable through service loader but I changed to only be a CDI bean. Now we need to go a step back to something similar as service loader. I think that you actually agree with the original statement (Can we make assumption on LraRecoveryService to be instantiable without CDI?)

Copy link
Contributor

@mmusgrov mmusgrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few inline comments. The main one was

Can you also include documentation that describes what each test is doing and how it does it.

If you can do that then I will review the rest of the new code.

@xstefank
Copy link
Member Author

@mmusgrov ready for review again.

Copy link
Contributor

@mmusgrov mmusgrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have nearly finished my review but this is what I have so far. I will finish of looking at the rest of the code either later today or over the weekend. But the PR looks goods so far.

// wait for the timeout cancellation of the LRA while the service is still down
// Compensate should be attempted to be called while the participant service is down
try {
Thread.sleep(500);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a short delay is probably enough although a possible implementation could be to mark the LRA as Cancelling when the timer expires and it then cancels the LRA on its next scheduled recovery pass which could be a while. Maybe the safest option is to "trigger a recovery pass".

Signed-off-by: xstefank <xstefank122@gmail.com>
Copy link
Member

@rdebusscher rdebusscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we discuss isn't efficient, useful and possibly some issues aren't properly covered.

Anyway, I'm just approving this since I can't remove myself as reviewer.

@mmusgrov
Copy link
Contributor

mmusgrov commented Nov 4, 2019

The way we discuss isn't efficient, useful and possibly some issues aren't properly covered.

I agree but the tool isn't the best. Do we have suggestions on how to improve it?

Anyway, I'm just approving this since I can't remove myself as reviewer.

You are allowed to abstain. Before you joined the team we asked the broader community for permission to merge PRs.

@mmusgrov mmusgrov merged commit a29866d into eclipse:master Nov 6, 2019
import java.net.URL;

@ApplicationScoped
public class LRATestService {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment to get a link - please ignore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a recovery test for Compensate and AfterLRA annotations
5 participants