Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7159 fix restassured etc #7165

Merged
merged 4 commits into from
Aug 7, 2020
Merged

7159 fix restassured etc #7165

merged 4 commits into from
Aug 7, 2020

Conversation

landreev
Copy link
Contributor

@landreev landreev commented Aug 6, 2020

What this PR does / why we need it:
The issue was opened initially because I forgot to check in some RestAssured code (UtilIT.java, etc.) that was needed for the tests to run. (The new publish mechanism involves waiting for the dataset to unlock, if locks are present; so that was missing).
But we also discovered some non-test related issues that needed to be addressed:
There was a massive sleep statement in the service bean before calling the async. portion of the command that we forgot about. In this PR I added an onSuccess() to the PublishDataset command; the second command call is made from there, and this appears to make any sleep delay unnecessary (per suggestion from @scolapasta)
Due to some weirdness in how the onSuccess() of the second command is executed, I also realized that even after all the changes in #6918 there was still a possibility for the dataset to become unlocked before the publication was finalized. (If the indexing and/or export are taking a long time, for ex.). This was addressed by moving the unlocking code to onSuccess().
Further simplified the RestAssured changes (only UtilIT.java is modified).
Modifed PublishDatasetResult to treat "publish workflow" and "asynchronous publishing in progress" as distinct states.

Which issue(s) this PR closes:

Closes #7159

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

…for RestAssured to work;

but also some changes for the application proper to address some problematic things in the
execution framework of the publish commands. Will explain more in the PR. (#7159)
@sekmiller sekmiller self-assigned this Aug 6, 2020
@coveralls
Copy link

coveralls commented Aug 6, 2020

Coverage Status

Coverage decreased (-0.006%) to 19.54% when pulling 32756fc on 7159-fix-restassured-etc into 8223e27 on develop.

@qqmyers
Copy link
Member

qqmyers commented Aug 6, 2020

Wasn't the point (one of the points) of having onsuccess to reduce the time during which things were locked/the apparent time needed to complete? The move to unlock in onsuccess of FinalizePub would undo that. That may not be a bad thing - the dataset doesn't act fully published until the index is updated and metadata exports are available - but it does seem like a change. And, if FullTextIndexing is on, the lock may stay around longer than is reasonable (whatever that is). (If I'm waiting to make sure my publication worked and didn't fail and I have to wait for full-text indexing to complete, even though publish itself has been done for a while - is that what we'd want?)

@@ -1020,7 +1020,7 @@ public Response publishDataset(@PathParam("id") String id, @QueryParam("type") S
PublishDatasetResult res = execCommand(new PublishDatasetCommand(ds,
createDataverseRequest(user),
isMinor));
return res.isCompleted() ? ok(json(res.getDataset())) : accepted(json(res.getDataset()));
return res.isWorkflow() ? accepted(json(res.getDataset())) : ok(json(res.getDataset()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this result in a 200 when the dataset is still inprogress/publishing not yet finalized? Seems like 202 is the right code for that (as it was) and the test should be watching for a 202?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the async publication in progress it returns a 200; the presence of the "finalizing" lock is the indication of it being in progress.
To me it seemed like the 202 return code was specifically added for the workflow situation; and there may be some third party code that relies on it in that context. When the async publication was added we just used that mechanism that was there for the workflow. But using the same code for this scenario seemed wrong.
And most importantly, yes, this allowed me not to have to modify much more RestAssured code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the practicality and don't think it's a big issue in terms of any effect, but I think 202 is the right code for anything asynchronous that could fail after you get your response. If its not worth addressing now, I'd suggest a ToDo/note in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe the same code 202, but something in the body to differentiate from the "workflow" state. As it was implemented, it was returning the code 202 and {"status":"WORKFLOW...} in both cases - which was wrong.
It feels like this warrants its own issue - I'll open it.

@landreev
Copy link
Contributor Author

landreev commented Aug 6, 2020

@qqmyers

The primary reason this had to be done was that there is something wrong with how the execute() and onSuccess() are executed in the Finalize command specifically. (Apparently on account of how it's called from the dataset service - @scolapasta can explain it better). Normally execute() would be in its own transaction; then, by the time it's done the db update of the dataset is finalized, and the dataset/version appear published; then onSuccess() kicks in and does its thing. With Finalize, the 2 appear to run in the same transaction - so the dataset does not get updated until onSuccess is done. (this must be happening with other commands throughout the application; and we'll probably need to address it). The locks on the other hand were removed in transactions, at the end of execute(), so they would disappear before onSuccess() was done - meaning, while the dataset was still appearing as draft. I honestly wasn't thinking much about why we were indexing and exporting that way... moving the unlocking was just the easiest way to fix it.

Alternatively, once the transaction situation is fixed, the unlock can be moved back. Gustavo had a suggestion of a transaction attribute in a certain place that could fix it (for this specific command call). The question is, do you believe this is important enough to address right now, or can we live with it in v5.0?

Wasn't the point (one of the points) of having onsuccess to reduce the time during which things were locked/the apparent time needed to complete? ... That may not be a bad thing - the dataset doesn't act fully published until the index is updated and metadata exports are available - but it does seem like a change. ...

@landreev
Copy link
Contributor Author

landreev commented Aug 6, 2020

I'm going to try @scolapasta's suggestion for a fix, for callFinalizePublishCommandAsynchronously(). Just want to confirm first that the tests are passing in the standard environment.

@sekmiller sekmiller removed their assignment Aug 6, 2020
return checkDatasetLocks(datasetIdAsString, lockType, apiToken);
}

static Response checkDatasetLocks(String idOrPersistentId, String lockType, String apiToken) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broad question - now that finalize is always asynchronous, it looks like these tests check that the publish call succeeds and the lock goes away, but is there anything to check that the async publish succeeded rather than failed? (E.g. test that a new version exists after the lock?).

Copy link
Contributor Author

@landreev landreev Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the proper way to handle it, yes.
It didn't look like we were ever checking anything other than the return http code of that publish call. But you are right, even that was more than what we are doing now.
In many tests the success of the publish call is tested indirectly, because what the test does requires the dataset to be published. But I agree it would be prudent, to check on the version numbers before and after - or perform some other explicit checks.

@djbrooke djbrooke added this to the Dataverse 5 milestone Aug 6, 2020
Moved the unlock calls back to the end of the execute method. #7159
@landreev
Copy link
Contributor Author

landreev commented Aug 6, 2020

@qqmyers @scolapasta
Adding the TransactionAttributeType.SUPPORTS to the dataset service method appears to have fixed it with the finalize command. I have moved the unlocking code back; and ran more testing.
Once again, we'll have to look for similar situations elsewhere where commands are executed from service beans. Or even from the API beans that (for some reason?) have the @Stateless annotation.

@scolapasta
Copy link
Contributor

So is this ready for review again?

And in this scenario is there still a chance it gets unlocked before the execute transaction completes? It's probably ok for now, but I wonder if we need to still somehow be after the execute transaction completes (but not onsuccess because we need it to happen even if there's a failure.

@landreev
Copy link
Contributor Author

landreev commented Aug 6, 2020

Seeing how it's the very last thing in execute() - I don't think so? I mean, not in any way that could encourage or enable a user to try to publish the dataset again, or to edit it etc. while the original transaction is still being executed; in a practical situation.

And in this scenario is there still a chance it gets unlocked before the execute transaction completes? It's probably ok for now, but I wonder if we need to still somehow be after the execute transaction completes (but not onsuccess because we need it to happen even if there's a failure.

@landreev
Copy link
Contributor Author

landreev commented Aug 6, 2020

So is this ready for review again?

I think so. I'm going to open a new issue for the other thing that was brought up. Changing the return code of the publish API; and the handling of it in RestAssured, with some added checks on the success of the operation. That's 2 things really - but I believe they should be handled in the same issue.

@landreev
Copy link
Contributor Author

landreev commented Aug 6, 2020

Of course one test failed on the last full jenkins run:

edu.harvard.iq.dataverse.api.FileMetadataIT.testJsonParserWithDirectoryLabels
Multiple Failures (2 failures)
	java.lang.AssertionError: Expected status code <201> doesn't match actual status code <500>.

	java.lang.AssertionError: Expected status code <200> doesn't match actual status code <404>.

is this anything we've seen before, as an intermittent thing?

@landreev
Copy link
Contributor Author

landreev commented Aug 6, 2020

Needless to say, it passes on my own system consistently.
If I need to insert a 1 second sleep in there, I will.

@scolapasta
Copy link
Contributor

I have not seen those before. If you'd like I can spin up an instance of two to see if it's consistent.

@landreev
Copy link
Contributor Author

landreev commented Aug 6, 2020

We can just keep running it at https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/view/change-requests/job/PR-7165

The test doesn't try to publish any datasets.

I have not seen those before. If you'd like I can spin up an instance of two to see if it's consistent.

Copy link
Contributor

@scolapasta scolapasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some minor comments; nothing critical.

try {
Thread.sleep(15000);
Thread.sleep(1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try it without this? I'd love to get this removed, if we don't actually need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I leave it in place?
I initially removed it completely; but I'm just afraid to take it out. Publishing appears to be working (at least I haven't seen it fail yet). But I believe if it happens instantly/very fast, on small datasets w/ fast registration, etc., it can confuse the autorefresh on the page.

dataset = ((PublishDatasetResult) r).getDataset();
}

if (dataset != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is just to be extra safe, but would this ever be false? (since we're already in OnSuccess, that means the execute succeeded, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in case something else still goes wrong, in some other way? - idk.

@qqmyers
Copy link
Member

qqmyers commented Aug 6, 2020

@landreev - not sure about the 500 error but the 404 could be from things now being asynchronous - the new dataset isn't guaranteed to be there by the next call anymore so could get a 404, and it could be a race - sometimes it works.

@landreev
Copy link
Contributor Author

landreev commented Aug 6, 2020

@landreev - not sure about the 500 error but the 404 could be from things now being asynchronous - the new dataset isn't guaranteed to be there by the next call anymore so could get a 404, and it could be a race - sometimes it works.

@qqmyers Except none of the datasets used in FilemetadataIT appear to be getting published. Unless it's done in some hidden indirect way that I'm not seeing... but I don't think so. (?)

Well, it worked on the next try, after the last commit. Run number 3 is blue again.

@qqmyers
Copy link
Member

qqmyers commented Aug 6, 2020

Good point - I was thinking create but that's not async/wasn't changed. Your change of the log text must have done it!

IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from Code Review 🦁 to QA 🔎✅ Aug 6, 2020
@kcondon kcondon self-assigned this Aug 7, 2020
@kcondon kcondon merged commit c7bbd8c into develop Aug 7, 2020
IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from QA 🔎✅ to Done 🚀 Aug 7, 2020
@kcondon kcondon deleted the 7159-fix-restassured-etc branch August 7, 2020 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Fix RestAssured tests broken by #6918
9 participants