Skip to content

fix(archiver): ensure items are returned to their list if any failure occurs#101

Merged
ljgray merged 2 commits intomasterfrom
ljg/archiver-catch-error
Jul 13, 2023
Merged

fix(archiver): ensure items are returned to their list if any failure occurs#101
ljgray merged 2 commits intomasterfrom
ljg/archiver-catch-error

Conversation

@ljgray
Copy link
Copy Markdown
Contributor

@ljgray ljgray commented Jul 12, 2023

Issue: if an error occurs after a dataset/state has been popped from a list but before it has been added to the chime database (such as a connection error with the database), that dataset/state will be lost from the list and won't ever be added to the database.

This just wraps the entire process of popping a dataset/state from a list and adding it to the chime database in a try-except block to make sure the dataset/state gets pushed back to the list if any sort of error occurs. We could consider rewriting this logic, but I think this is a minimal fix for it which shouldn't really affect any other behaviour

@ljgray ljgray requested review from jrs65 and ketiltrout July 12, 2023 19:59
Copy link
Copy Markdown
Member

@ketiltrout ketiltrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like a log message (thought it doesn't have to be the one I suggest) to help us detect when we actually hit this except block. Other than that, I think it's fine.

Comment thread comet/archiver.py Outdated
@ljgray
Copy link
Copy Markdown
Contributor Author

ljgray commented Jul 13, 2023

I've updated it to log any uncaught exceptions, but also to separately wrap the pushback in a try-except which will dump the comet request string to the logs if the pushback fails. This might help us recover manually if there's a redis issue, although it's hard to say how useful that will be depending on what's actually happened

@ljgray ljgray requested a review from ketiltrout July 13, 2023 18:55
Copy link
Copy Markdown
Member

@ketiltrout ketiltrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better.

@ljgray
Copy link
Copy Markdown
Contributor Author

ljgray commented Jul 13, 2023

I'll merge now, but when should we deploy?

@ljgray ljgray merged commit a376eae into master Jul 13, 2023
@ljgray ljgray deleted the ljg/archiver-catch-error branch July 13, 2023 21:02
@ketiltrout
Copy link
Copy Markdown
Member

I think it's fairly safe to do so at any time because it doesn't affect the running instrument, but it's @mondana 's call.

@jrs65
Copy link
Copy Markdown
Contributor

jrs65 commented Jul 14, 2023

@ljgray Longer term it's probably more robust to do something like this:

  • Add an extra list for states being processed (probably should end up with at most one entry in it)
  • Have the main loop first process states in this list (i.e. add them to the database). One the state is processed successfully use LREM to remove it from the processing list.
  • After this has been done have the main loop use BLMOVE to wait for a new list item and move it to the processing list in a single call.

That should be robust to connection failures both for MySQL and Redis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants