Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry for curl requests #223

Merged
merged 8 commits into from
Jul 11, 2022
Merged

Add retry for curl requests #223

merged 8 commits into from
Jul 11, 2022

Conversation

CodyCBakerPhD
Copy link
Collaborator

fix #197

Attempting to resolve curl requests errors to S3 paths, up to max_retries using the suggested exponential back-off

@CodyCBakerPhD CodyCBakerPhD self-assigned this Jul 11, 2022
@CodyCBakerPhD CodyCBakerPhD added the category: bug errors in the code or code behavior label Jul 11, 2022
@bendichter
Copy link
Contributor

I think this looks good but perhaps we need an importable function that can help users do this

@CodyCBakerPhD
Copy link
Collaborator Author

I think this looks good but perhaps we need an importable function that can help users do this

I was toying around with some ways to do that; it was proving a tad tricky since the user (or ourselves) might wish to call any executable code on the open io handling the S3 requests - but my idea was to make a context class that does something like

with retry_s3_call(max_retries=10):
    # try to do something on an open io

where the thing you try to do could be as simple as nwbfile = io.read() or run_some_check_on_nwbfile(nwbfile=nwbfile) etc, but wasn't sure how to construct such a context capable of running arbitrary code multiple times (or a function for that matter).

@CodyCBakerPhD
Copy link
Collaborator Author

Though I suppose that a context is simply not possible for that kind of thing... https://stackoverflow.com/a/64770597

Perhaps a simple function like

def try_until_success(command, attempts=1):
    for _ in range(attempts):
        try:
            return command()
        except Exception as exc:
            err = exc

    raise err

might work (from https://stackoverflow.com/a/64770679)

@bendichter
Copy link
Contributor

bendichter commented Jul 11, 2022

yeah that would be cool but I don't know if that's possible. Usually contexts pass in an object that is used within the context, e.g.

with open("fpath", "r") as f:
    # use f here

I think the best approach might be to do two things:

  1. adjust NWBHDF5IO to use retries if driver == "ros3"
  2. create a robust_read function that can be used like
robust_read(nwbfile.acquisition["ElectricalSeries"].data, slice(1, 10), max_retries=5)

what do you think?

@bendichter
Copy link
Contributor

One tricky thing is that the HDF5 Datasets themselves don't have information about whether they are s3 or local. I can't think of a way to automatically switch to retrying if ros3 is being used. I suppose we could just always retry, but even that is tricky because we are interfacing with the h5py.Dataset objects directly. There are lots of ways we could potentially approach this. Let's discuss in our meeting today.

@CodyCBakerPhD
Copy link
Collaborator Author

I suppose we could just always retry

That's what I've done in latest commit using a very similar function to what you outlined in 2. above. It shouldn't hurt in the non-S3 case since it will just execute and return the output nonetheless on first try (or first retry).

Could also put in switch at the corresponding deployment levels to determine whether to do it at the classic way or S3 way.

Will discuss this later, then.

@CodyCBakerPhD
Copy link
Collaborator Author

CodyCBakerPhD commented Jul 11, 2022

Summary: this patch is fine for now, it hits most of the major error points where this can occur - long term solution is to support s3fs through PyNWB and that should automatically handle this kind of thing much better (and offer improved performance) but only downside is Windows would still have to go through ros3.

Otherwise, a more sophisticated approach would be a decorator that wraps functions in the registry (and io.read()) and does more or less the same thing as this utility but would therefore allow less awkward function arg/kwarg passing.

@CodyCBakerPhD CodyCBakerPhD merged commit 1f4010c into dev Jul 11, 2022
@CodyCBakerPhD CodyCBakerPhD deleted the add_curl_retry branch July 11, 2022 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Can't read data (curl cannot perform request)
2 participants