Improve gather stage error handling #21

amercader · 2013-03-14T17:27:20Z

The way queue.py handles the gather stage is very inconvenient, as it captures all exceptions that may happen in the harvester gather_stage, preventing all debugging and leaving the job in a half state.

Similarly to the refactoring on the fetch stage, there should be no exceptions caught, as harvesters should be the ones being robust enough (if there are exceptions happening, we want to see them). If gather stage returns anything that is not a list of ids, the process stops. If the harvest object ids list is empty, the process stops. gather_finished is always set up, allowing the "run" command to flag the job as Finished (we want the "run" command to do it so the harvest source status is reindexed).

Also the debug messages with thousands of harvest object ids are not very useful.

On a later stage, we could investigate into implementing retry times as in the fetch stage.

The text was updated successfully, but these errors were encountered:

See issue for full details. Basically we don't want to catch any exception at the queue.py level, as they prevent debugging. Harvesters should deal with them and return a list of ids or an empty list if no objects need to be fetched. Also improved the debug messages.

davidread · 2015-05-15T10:23:28Z

You say that one's harvester's gather_stage/fetch_stage "should be the ones being robust enough" but I think that is most unhelpful for harvester writers.

A harvester writer should be most concerned about the normal operation. They should be able to then deploy a harvester and not have to worry about having to catch every last exception type. They may not be able to simulate everything that could go wrong. For example a socket time-out raises a different exception between Python 2.6 and 2.7.

So exceptions will sometimes happen when these things are deployed - we must accept that. As it is, this will cause the harvest to stop in an incomplete state, leaving it in limbo, unable to be rerun and display no error message in the web interface.

The reason you give is to aid development. Well how about printing the exception in the log and saving a HarvestGatherError/HarvestObjectError so that the user can see there was a problem. This should be quite sufficient for a harvester developer, reduces the burden on them and much more helpful to general users.

I have some code to do this - what do you think?

amercader · 2015-05-15T13:02:26Z

I said these things a long time ago :) I guess that my reasoning at the time is that for instance your harvester gathers objects from a remote API and fails on the last page because you messed up with the loop code you want to notice this to fix it. But you are right that if the issue is flagged as new harvest error that should make it apparent.

Happy to review and merge your changes.

davidread · 2015-05-15T13:27:24Z

Cool. I have various things to be more defensive with harvesters so that we don't end up in limbo states or without errors showing.

fix for py3 also works for py2;

amercader closed this as completed May 16, 2013

amercader pushed a commit that referenced this issue Oct 8, 2021

Merge pull request #21 from GSA/json-serialization-fix

99538a4

fix for py3 also works for py2;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve gather stage error handling #21

Improve gather stage error handling #21

amercader commented Mar 14, 2013

davidread commented May 15, 2015

amercader commented May 15, 2015

davidread commented May 15, 2015

Improve gather stage error handling #21

Improve gather stage error handling #21

Comments

amercader commented Mar 14, 2013

davidread commented May 15, 2015

amercader commented May 15, 2015

davidread commented May 15, 2015