Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7940 stop harvest in progress #9187

Merged
merged 7 commits into from
Dec 12, 2022
Merged

Conversation

landreev
Copy link
Contributor

What this PR does / why we need it:

This provides a mechanism for a dataverse sysadmin to stop a long-running harvesting job in progress.

Which issue(s) this PR closes:

Closes #7940

Special notes for your reviewer:

Suggestions on how to test this:

will update w/ instructions

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@coveralls
Copy link

coveralls commented Nov 23, 2022

Coverage Status

Coverage increased (+0.02%) to 20.0% when pulling 8e70d99 on 7940-stop-harvest-in-progress into 6b1ffa7 on develop.

@scolapasta scolapasta self-assigned this Nov 29, 2022
@mreekie
Copy link

mreekie commented Dec 1, 2022

This just has the normal review and QA left.
It's hard to know how to size what's left as it's waiting for review and hasn't been reviewed in detail.
It looks like the review may be straightforward.

@mreekie mreekie added Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Size: 3 A percentage of a sprint. 2.1 hours. Size: 10 A percentage of a sprint. 7 hours. and removed Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Size: 3 A percentage of a sprint. 2.1 hours. labels Dec 1, 2022
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it will work overall. I think there's a cut/paste error and I suggested a doc update but otherwise looks good to me. There's also some other cleanup that's not directly related to stopping where I had another question.

sudo touch /usr/local/payara5/glassfish/domains/domain1/logs/stopharvest_bigarchive.70916
sudo chown dataverse /usr/local/payara5/glassfish/domains/domain1/logs/stopharvest_bigarchive.70916

We recommend that stop stop any running harvesting jobs using this mechanism if you need to restart the application server, otherwise the ongoing harvest will be killed, but may be left marked as if it's still in progress in the database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo - 'stop stop' . It was also a little unclear - suggested change: 'Note: If the application server is stopped and restarted, any running harvesting jobs will be killed but may remain marked as in progress in the database. We thus recommend using the mechanism here to stop the harvesting prior to a server restart.'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I adopted your version almost verbatim.

currentRun.setFailedDatasetCount(new Long(failedCount));
currentRun.setDeletedDatasetCount(new Long(deletedCount));
}
recordHarvestJobStatus(hcId, currentTime, harvestedCount, failedCount, deletedCount, ClientHarvestRun.RunResultType.INTERRUPTED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand - is this a typo? the setHarvestSuccess method is using the RunResultType.INTERRUPTED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was of course a typo, thank you.
(I'm working on automated tests right now that would catch this)

+ " dvobject o WHERE d.id = o.id AND o.owner_id in ("
+ dvs + ")").getSingleResult();
return (Long) em.createNativeQuery("SELECT count(d.id) FROM dataset d "
+ " WHERE d.harvestingclient_id in ("
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see where the join of dataset and dvobject isn't needed, but why the switch to harvestingclient_id? Was the old code just wrong or is this a typo?

Copy link
Contributor Author

@landreev landreev Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry for the delay)
The new query is simpler, and correct, all harvested datasets are uniquely tied to their specific harvesting configurations (clients).

The old query must have followed the logic from pre-Dataverse 4 times; back then harvested datasets ("studies") were similarly tied to their "harvesting dataverses". (A harvesting dataverse could only contain harvested studies, all harvested from the same place). That old query is still producing the expected results in our prod., because we still have dedicated collections for each harvested source. Strictly speaking, the dvobject-dataset joint in it was useful, although o.dtype='Dataset' would achieve the same result. But I just remembered that this was no longer required. In Dataverse 4+ it is possible to harvest into a collection that has other content - local datasets and/or datasets harvested from other places. So, shorter answer, yes, that query was wrong.

@kcondon
Copy link
Contributor

kcondon commented Dec 8, 2022

Test results:

  1. This worked when halting a no set harvest from demo but did not seem to halt a no set harvest from prod. I will retest.
    Set up both a no set prod and no set demo harvest, start both. First halt demo, works, then halt prod did not work. Is leaving the demo halt file in place interfering in some way? I removed it but still continued.
    [Kevin] Update: It appears that the demo harvest was not halted but completed.

  2. I am also seeing stack trace errors in the log related to this, perhaps we are pulling the rug out from some process and it gets a null ptr?
    This happens some time around touching the halt file, perhaps after chown and the process detects it?
    ImportServiceBean_err.txt

@kcondon kcondon self-assigned this Dec 9, 2022
@landreev
Copy link
Contributor Author

landreev commented Dec 9, 2022

Not sure if all of these problems are related to my PR.
With the "no set" harvest from demo, I suspect it may just mean that the harvest finished too fast, before the stop file was created (would make sense if the default set on demo is not large enough).
The import error is just that - the client failing to import a record it got from the server; whatever the reason for that was, it would not be related to anything in this PR (it didn't touch any of the actual harvesting/importing/etc. functionality). Some records failing to harvest should not by itself be considered a problem, as long as the harvesting job finishes, and the failure is properly recorded.

The part about the long prod. harvest not stopping with the stop file in place does sound like a real problem. So let's run that again on Monday and I'll try to diagnose what's going on.

@kcondon
Copy link
Contributor

kcondon commented Dec 12, 2022

Retesting this am, found it was a user error around the stop file name syntax. Should be stopharvest. but I confusingly named my harvest client stop_harvest_demo so the correct file name given the syntax would be stopharvest.stop_harvest_demo. This worked.

Testing rerun behavior, ie pause use case. It works but needed to rerun it twice, will check again.

@kcondon kcondon merged commit 6c87b39 into develop Dec 12, 2022
@kcondon kcondon deleted the 7940-stop-harvest-in-progress branch December 12, 2022 18:57
@pdurbin pdurbin added this to the 5.13 milestone Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[feature request] stop an harvest job in progress
7 participants