-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UX: Better message when ORA does not find store via HTTP #6535
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR has no information on the rational of this particular approach for a fix, so I am wondering why the exception handler is implemented in verify_store()
and not in _get_version_info()
. This present fix solves the problem for the usage pattern directly associated with the bug report, but leaves it in place for usage via verify_ds_in_store()
.
IMHO _get_version_config()
must document and homogenize its exception behavior. The FileNotFoundError
error is already handled, as is the UnkownLayoutVersion
, as is the NoLayoutVersion
case (none documented either). I don't see why a very similar scenario now is handled with yet another exception type.
If that is necessary, why is the handling for HTTP different from other transports? Both UnkownLayoutVersion
, and NoLayoutVersion
are handled gracefully (read-only), and FileNotFoundError
is subsequently tests if there even is a store. Not so for HTTP which is left to crash with "Invalid store" -- although both intermittent connections issues, and "no store" are valid causes of such behavior (again none of these differences documented).
Why must a user see
[INFO] Fetching 'http://studyforrest.ds.inm7.de/ria-layout-version'
?
Isn't this no more than an implementation detail? Whenever things work, it carries no meaning. When things do not work, the exception has the info on what went wrong.
This is the downloader unconditionally logging at INFO level. Not sure what to do about that ATM. I think this needs to be addressed in a different PR, where the downloaders need to be (somewhat) changed to account for the fact that a download may be an internal thing and that this kind of messaging needs to be left to higher-level code. Re difference between HTTP and other access methods: The reason was that we can't easily distinguish what went wrong, since the exception we get in the HTTP case doesn't come as a proper chain, but has its chain folded into its string. So, a |
383e195
to
2fe29c7
Compare
@mih : How about this: 2fe29c7 ? Note, that messaging for
Agree on that? |
I grew a bit tired re waiting over the years: #6564 -- I think the logging part it is a no brainer, it just needs to be done. I don't see why it would need to wait for even better errors. |
No remarks on the code, but just a note that I would appreciate changes to the looks of cloning the studyforrest dataset sooner rather than later. Its my go-to example throughout talks, tutorials, and during workshops (the next one is on Wednesday), and the current way a clone looks like really makes the impression that something went wrong or broke for someone who just executed their first clone... :/ |
With this PR merged into the present mainline it now looks like
I think the volume of reasoning on the error handling in the patch is an indication that this is not in a good state. But given the urgency, we should move. We just need to fix the actual error message first. The store at It might also make sense to not use a This is all just speculation, because I honestly don't see why
|
Yes. The fundamental issue is SSH, which doesn't parse stderr currently. That's b/c originally we didn't do any validation of store/datasets but simply tried to access keys and treat failures as "key not available" and be done. That's why we only get a RemoteError no matter how things failed. Will adjust the message. |
I must be missing a large chunk of the problem. I am taking about the unprotected use of Now you are saying this (in this context):
AFAICS: This method ( Outside this file it is used in None of which indicates:
What am I missing? |
2fe29c7
to
973daf4
Compare
Now:
If it happens at the dataset level after the store validation already succeded, the message would reflect that and report |
This let's ORA account for HTTP failures during initremote/enableremote and report an inaccessible store (and its URL). Otherwise we'd spit out a possibly gigantic exception traceback that is not helpful for users. With support for annexremote's logging framework we'll have other means for detailed messaging soon. The approach implemented here tries to have a somewhat uniform handling across different IO classes by raising FileNotFoundError from within `HTTPRemoteIO.read_file` if we encountered a 404, while all other exceptions coming from the downloaders are bubbling up as they indicate a more general issue. AccessDeniedError and AccessFailedError are handled by the special remote class to provide a leaner, user-facing message when this happens during the verification of the store itself or the dataset within. Everything else will lead to failure as before.
973daf4
to
acf586f
Compare
Unify the way `RemoteError` renders the first level of an exception cause chain with what `format_exception_with_cause()` does down the line.
Reasoning in commit message.
What's curious to me right now, is that this
Appears to come from If you are fine with this approach, I'll squash. |
(Somehwhat) unify exceptions raised by the `read_file` methods of the IO classes. Use proper exception chains all the way up, so that `read_file`, `_get_version_config` and the `verify_store/verify_ds_in_store` methods amend the exception chain with their respective higher-level assessment. Goal is to have maintain the full chain for debugging but have it's start (first part of user-facing message) be comprehensible and hopefully actionable for (non-technical) users. For example: A message starting with "RIA store unavailable. -caused by- ..." may make clearer whether this is something to report to the store maintainer or something they can do anything about. While, of course this depends on the cause, I think it's much easier to parse when the first "high-level" assessment isn't something to be concluded from a long chain of technical gibberish. Furthermore, as the cause one would first see "Failed to access http://studyforrest.ds.inm7.de/ria-layout-version" indicating the RIA store's address (incl. access method) instead of the first thing being a beast like "Failed to establish a new session 1 times. -caused by- HTTPConnectionPool(host='studyforrest.ds.inm7.de', port=80): Max retries exceeded with url: /ria-layout-version (Caused by NewConnectionError( ..." I think the chain of causes should only be thrown out after we have proper logging in special remotes, where this part then could move to.
4d89cdd
to
e6b22d1
Compare
Code Climate has analyzed commit e6b22d1 and detected 6 issues on this pull request. Here's the issue category breakdown:
View more on Code Climate. |
Codecov Report
@@ Coverage Diff @@
## master #6535 +/- ##
==========================================
+ Coverage 91.06% 91.16% +0.10%
==========================================
Files 349 351 +2
Lines 44218 44288 +70
==========================================
+ Hits 40265 40375 +110
+ Misses 3953 3913 -40
Continue to review full report at Codecov.
|
TL;DR final change was: From 0.15 [INFO ] [INFO] Fetching 'http://studyforrest.ds.inm7.de/ria-layout-version' To 0.16 RIA store unavailable. -caused by- Failed to access http://studyforrest.ds.inm7.de/ria-layout-version -caused by- Failed to access http://studyforrest.ds.inm7.de/ria-layout-version -caused by- Failed to establish a new session 1 times. -caused by- HTTPConnectionPool(host='studyforrest.ds.inm7.de', port=80): Max retries exceeded with url: /ria-layout-version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4719758070>: Failed to establish a new connection: [Errno -2] Name or service not known')) Not sure how I feel about this. Certainly the "Failed to establish a new session 1 times." made little sense, but now we have a detailed technical log about a "failure" that any user will simply have to endure and cannot do anyhing about. The dataset has a remote configured that is not publicly accessible. |
@mih reported on the following current behavior:
This let's ORA account for HTTP failures during initremote/enableremote and report an invalid store (and its URL).
With support for
annexremote
's logging framework we'll have other means for detailed messaging soon.Please note, that the fact we get an
ok
result is not ORA's fault. It'sgit annex init
that actually succeeds (zero-exit), while ORA now correctly reportsINITREMOTE-FAILURE Invalid store at ria+http://studyforrest.ds.inm7.de
:`git clone` + `git annex --debug init`
Overall the above call would now look like this:
Changelog
🐛 Bug Fixes