-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More flexibility with wildcards in selection #2551
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2551 +/- ##
===========================================
- Coverage 90.68% 90.68% -0.01%
===========================================
Files 169 169
Lines 22833 22828 -5
Branches 2940 2939 -1
===========================================
- Hits 20707 20702 -5
Misses 1540 1540
Partials 586 586 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thanks for adding tests. One question, is there a performance difference? Ie if you do select_atoms('name H?')
on a large (~100k atoms) system, how does the timing of this compare?
assert ag == ag_wild | ||
|
||
def test_wildcard_double_selection(self, universe): | ||
ag = universe.select_atoms('resname ASN or resname ASP or resname HSD') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a shortcut here is resname ASN ASP HSD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will also need a CHANGELOG entry and adding yourself to AUTHORS
Thank you for the feedback. I implemented the suggested changes and added two more corner case tests that I could think of. As for the performance, I measured it using a bigger system (340K atoms) and the old version which doesn't support different wildcards performed slightly better (0.6s vs 0.9s). I am not aware of the underlying implementation of fnmatch but we could also try using the python re module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool, I just wanted to check we weren't tanking performance by making this change. Couple tweaks needed then we'll be good to go
package/CHANGELOG
Outdated
@@ -15,7 +15,7 @@ The rules for this file: | |||
------------------------------------------------------------------------------ | |||
mm/dd/yy richardjgowers, kain88-de, lilyminium, p-j-smith, bdice, joaomcteixeira, | |||
PicoCentauri, davidercruz, jbarnoud, RMeli, IAlibay, mtiberti, CCook96, | |||
Yuan-Yu, xiki-tempula | |||
Yuan-Yu, xiki-tempula, Iv-Hristov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So because this is your first ever contribution, you also need to add your name to the AUTHORS
file
package/CHANGELOG
Outdated
@@ -56,6 +56,7 @@ Fixes | |||
* Added parmed to setup.py | |||
|
|||
Enhancements | |||
* Changed selection wildcards to support multiple wildcards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference the initial issue #
package/MDAnalysis/core/selection.py
Outdated
@@ -499,8 +500,7 @@ def apply(self, group): | |||
class StringSelection(Selection): | |||
"""Selections based on text attributes | |||
|
|||
Supports the use of one wildcard at the start, | |||
end, and middle of strings | |||
Supports multiple wildcards, based on fnmatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a .. versionchanged::
thing
package/MDAnalysis/core/selection.py
Outdated
mask |= np.char.startswith(values, val[:wc_pos]) | ||
mask &= np.char.endswith(values, val[wc_pos+1:]) | ||
|
||
values = getattr(group, self.field).astype(np.str_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I think I was calling .astype(np.str_)
here because we were pushing it into a np.char
function. Now we're using fnmatch
this likely isn't necessary, try removing this?
@IAlibay @richardjgowers Thank you for the feedback! I have combined all the wildcard tests to eliminate code duplication as Irfan suggested. |
package/MDAnalysis/core/selection.py
Outdated
Supports the use of one wildcard at the start, | ||
end, and middle of strings | ||
.. versionchanged:: 0.21 | ||
Supports multiple wildcards, based on fnmatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Iv-Hristov the text entry in a versionchanged
needs to be indented (this is what is causing Travis to fail).
So:
.. versionchanged:: 1.0.0
Supports...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Iv-Hristov this looks like an excellent contribution.
But the documentation is missing. fnmatch extends the simple *
-globbing that we had before. It is very important that this is documented. Find the parts in the documentation that explained the globbing and update them (e.g. Selections should now get its own section on pattern matching).
@lilyminium will need to know this, too, for the AtomSelection Language section of the User Guide.
I know that this seems a lot of extra work for only a few lines of code. But that's why we make you do PRs, so that you get a real idea what it means to produce software that are used by many people.
@orbeckst Thank you for the feedback! I will make sure to do that later today. |
I think I documented all the functionality and added two more tests. However, I am having a problem where pytest.mark.parametrize complains if I try and use square brackets as an input because it expects a string. I tried a few escape characters such as '', '\', '\Q... \E' but nothing seemed to work. Does anyone have any experience with what escape sequence might work so that I can squish all the tests together? This is what gives the error "tuple object not callable":
` |
You missed the comma after line EDIT: Good practice is to have a comma even after the last element so that you can easily add more elements to the list without the highly informative "tuple object not callable" error ;-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good, just one typo in the docs.
Pattern matching | ||
---------------- | ||
|
||
The pattern matching notation described bellow is used to specify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below
---------------- | ||
|
||
The pattern matching notation described bellow is used to specify | ||
patterns for matching strings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patterns for matching strings: | |
patterns for matching strings (based on :mod:`fnmatch`): |
Congratulations @Iv-Hristov on your first merged PR. Nice contribution! |
Fixes #2436
Changes made in this Pull Request:
PR Checklist