Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a warning when a regex in a provider isn't valid #3258

Merged
merged 5 commits into from
Mar 29, 2019

Conversation

driusan
Copy link
Contributor

@driusan driusan commented Mar 22, 2019

When a provider has an invalid regular expression, datalad
currently provides obtuse error messages like
"bad character range [re.py_compile:251] (error)".

This catches any exceptions while compiling the re to explicitly
warn the user that there is an invalid regular expression in the
provider.

Unfortunately, there doesn't seem to be any way to provide the
path to the provider config file at this point in the code, so we
simply give the provider name.

When a provider has an invalid regular expression, datalad
currently provides obtuse error messages like
"bad character range [re.py_compile:251] (error)".

This catches any exceptions while compiling the re to explicitly
warn the user that there is an invalid regular expression in the
provider.

Unfortunately, there doesn't seem to be any way to provide the
path to the provider config file at this point in the code, so we
simply give the provider name.
if re.match(url_re, url):
lgr.debug("Returning provider %s for url %s", provider, url)
matching_providers.append(provider)
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't exception more specific? but oh my:

$> python -c "import re; re.match('(?<ZZ>.*)', '1')"   
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 141, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: syntax error

remnants of the past syntax (pre 2.5?)

Overall, here it is - re.error

import re; 

try:
  re.match('(?<ZZ>.*)', '1')
except re.error as exc:
  print(type(exc))

Please add a unittest to datalad/downloaders/tests/test_providers.py to assure that no exception is raised and the warning is logged ( with swallow_logs(new_level=logging.WARNING) as cml: ...assert_in(...) - search for those to see how it is done)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't exception more specific? but oh my

The reason I just caught all exceptions is that I can't think of any type of exception that would fail to compile the regex that you wouldn't want to turn into a warning. Regardless of how it's invalid, it should probably warn.

Please add a unittest to [...]

Sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is just a wrong regex - it is one thing. But may be it managed to run out of RAM and issued MemoryError right then -- should we just keep plowing forward? If user preseed Ctrl-C - should we ignore and just skip this particular regex issuing a warning that regex was invalid? ;-)
Regardless how unlikely, unless there is really an intention to ignore all errors, better to be specific. If something else unexpected happens (e.g. KeyboardInterrupt) - it might need to be handled "upstairs".

@yarikoptic
Copy link
Member

Unfortunately, there doesn't seem to be any way to provide the
path to the provider config file at this point in the code, so we
simply give the provider name.

yeah, ATM IIRC they are all identified by their names, not config paths. I guess could be changed/extended if there is a use case for it

@driusan
Copy link
Contributor Author

driusan commented Mar 22, 2019

yeah, ATM IIRC they are all identified by their names, not config paths.

It would be especially useful for the error below this where it says multiple configs are found and it's using the first one. For this regex error it would be useful for the user to know the path, but the name should be enough to hunt it down, it's just not as easy.

@codecov
Copy link

codecov bot commented Mar 22, 2019

Codecov Report

Merging #3258 into master will decrease coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3258      +/-   ##
==========================================
- Coverage   91.02%   90.94%   -0.08%     
==========================================
  Files         263      263              
  Lines       34125    34240     +115     
==========================================
+ Hits        31062    31140      +78     
- Misses       3063     3100      +37
Impacted Files Coverage Δ
datalad/downloaders/providers.py 95.47% <100%> (+0.06%) ⬆️
datalad/downloaders/tests/test_providers.py 100% <100%> (ø) ⬆️
datalad/cmdline/main.py 77.13% <0%> (-2.35%) ⬇️
datalad/downloaders/tests/test_http.py 88.86% <0%> (-2.23%) ⬇️
datalad/plugin/tests/test_addurls.py 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0bb31b3...9c803d0. Read the comment docs.

Dave MacFarlane added 4 commits March 22, 2019 16:17
Attempting to provide the error message from the
exception was too ambitious. The exception is not
the same in Python 2.7 and Python 3.
@yarikoptic
Copy link
Member

sweet - thanks!

@yarikoptic yarikoptic merged commit 6d1e534 into datalad:master Mar 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants