More flexibility with wildcards in selection #2551

Iv-Hristov · 2020-02-25T22:17:24Z

Fixes #2436

Changes made in this Pull Request:

Selection strings changed to use fnmatch. This now allows for more flexible wildcard usage as well as for using multiple wildcards at once.
Added two new tests to match the new functionality.

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

codecov · 2020-02-25T23:31:53Z

Codecov Report

Merging #2551 into develop will decrease coverage by 0.00%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #2551      +/-   ##
===========================================
- Coverage    90.68%   90.68%   -0.01%     
===========================================
  Files          169      169              
  Lines        22833    22828       -5     
  Branches      2940     2939       -1     
===========================================
- Hits         20707    20702       -5     
  Misses        1540     1540              
  Partials       586      586

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc2f1a5...77443a3. Read the comment docs.

richardjgowers

This looks good, thanks for adding tests. One question, is there a performance difference? Ie if you do select_atoms('name H?') on a large (~100k atoms) system, how does the timing of this compare?

richardjgowers · 2020-02-26T10:19:05Z

testsuite/MDAnalysisTests/core/test_atomselections.py

+        assert ag == ag_wild
+
+    def test_wildcard_double_selection(self, universe):
+        ag = universe.select_atoms('resname ASN or resname ASP or resname HSD')


a shortcut here is resname ASN ASP HSD

richardjgowers

Will also need a CHANGELOG entry and adding yourself to AUTHORS

Iv-Hristov · 2020-02-26T21:59:23Z

Thank you for the feedback. I implemented the suggested changes and added two more corner case tests that I could think of. As for the performance, I measured it using a bigger system (340K atoms) and the old version which doesn't support different wildcards performed slightly better (0.6s vs 0.9s). I am not aware of the underlying implementation of fnmatch but we could also try using the python re module.

richardjgowers

Ok cool, I just wanted to check we weren't tanking performance by making this change. Couple tweaks needed then we'll be good to go

richardjgowers · 2020-02-27T09:50:53Z

package/CHANGELOG

@@ -15,7 +15,7 @@ The rules for this file:
 ------------------------------------------------------------------------------
 mm/dd/yy richardjgowers, kain88-de, lilyminium, p-j-smith, bdice, joaomcteixeira,
         PicoCentauri, davidercruz, jbarnoud, RMeli, IAlibay, mtiberti, CCook96,
-         Yuan-Yu, xiki-tempula
+         Yuan-Yu, xiki-tempula, Iv-Hristov


So because this is your first ever contribution, you also need to add your name to the AUTHORS file

richardjgowers · 2020-02-27T09:51:02Z

package/CHANGELOG

@@ -56,6 +56,7 @@ Fixes
  * Added parmed to setup.py

 Enhancements
+  * Changed selection wildcards to support multiple wildcards


reference the initial issue #

richardjgowers · 2020-02-27T09:51:20Z

package/MDAnalysis/core/selection.py

@@ -499,8 +500,7 @@ def apply(self, group):
 class StringSelection(Selection):
    """Selections based on text attributes

-    Supports the use of one wildcard at the start, 
-    end, and middle of strings
+    Supports multiple wildcards, based on fnmatch


add a .. versionchanged:: thing

richardjgowers · 2020-02-27T09:52:48Z

package/MDAnalysis/core/selection.py

-                mask |= np.char.startswith(values, val[:wc_pos])
-                mask &= np.char.endswith(values, val[wc_pos+1:])
-
+            values = getattr(group, self.field).astype(np.str_)


so I think I was calling .astype(np.str_) here because we were pushing it into a np.char function. Now we're using fnmatch this likely isn't necessary, try removing this?

testsuite/MDAnalysisTests/core/test_atomselections.py

Iv-Hristov · 2020-02-27T21:32:25Z

@IAlibay @richardjgowers Thank you for the feedback! I have combined all the wildcard tests to eliminate code duplication as Irfan suggested.

IAlibay · 2020-02-28T06:37:30Z

package/MDAnalysis/core/selection.py

-    Supports the use of one wildcard at the start, 
-    end, and middle of strings
+    .. versionchanged:: 0.21
+    Supports multiple wildcards, based on fnmatch


@Iv-Hristov the text entry in a versionchanged needs to be indented (this is what is causing Travis to fail).

So:

.. versionchanged:: 1.0.0 Supports...

orbeckst

@Iv-Hristov this looks like an excellent contribution.

But the documentation is missing. fnmatch extends the simple *-globbing that we had before. It is very important that this is documented. Find the parts in the documentation that explained the globbing and update them (e.g. Selections should now get its own section on pattern matching).

@lilyminium will need to know this, too, for the AtomSelection Language section of the User Guide.

I know that this seems a lot of extra work for only a few lines of code. But that's why we make you do PRs, so that you get a real idea what it means to produce software that are used by many people.

Iv-Hristov · 2020-03-02T10:28:01Z

@orbeckst Thank you for the feedback! I will make sure to do that later today.

…to develop

package/doc/sphinx/source/documentation_pages/selections.rst

Iv-Hristov · 2020-03-03T07:15:46Z

I think I documented all the functionality and added two more tests. However, I am having a problem where pytest.mark.parametrize complains if I try and use square brackets as an input because it expects a string. I tried a few escape characters such as '', '\', '\Q... \E' but nothing seemed to work. Does anyone have any experience with what escape sequence might work so that I can squish all the tests together?

This is what gives the error "tuple object not callable":

@pytest.mark.parametrize('selstring, wildstring', [

    ('resname TYR THR', 'resname T*R'),
    ('resname ASN GLN', 'resname *N'),
    ('resname ASN ASP', 'resname AS*'),
    ('resname TYR THR', 'resname T?R'),
    ('resname ASN ASP HSD', 'resname *S?'),
    ('resname LEU LYS', 'resname L**'),
    ('resname MET', 'resname *M*')
    ('resname GLN GLU', 'resname GL[NY]')

])
def test_wildcard_selection(self, universe, selstring, wildstring):
    ag = universe.select_atoms(selstring)
    ag_wild = universe.select_atoms(wildstring)
    assert ag == ag_wild

`

orbeckst · 2020-03-03T07:31:18Z

You missed the comma after line ('resname MET', 'resname *M*').

EDIT: Good practice is to have a comma even after the last element so that you can easily add more elements to the list without the highly informative "tuple object not callable" error ;-)

…to develop

orbeckst

Looks really good, just one typo in the docs.

orbeckst · 2020-03-03T20:13:47Z

package/doc/sphinx/source/documentation_pages/selections.rst

+Pattern matching
+----------------
+
+The pattern matching notation described bellow is used to specify 


orbeckst · 2020-03-03T20:15:33Z

package/doc/sphinx/source/documentation_pages/selections.rst

+----------------
+
+The pattern matching notation described bellow is used to specify 
+patterns for matching strings:


Suggested change

patterns for matching strings:

patterns for matching strings (based on :mod:`fnmatch`):

orbeckst · 2020-03-04T00:37:28Z

Congratulations @Iv-Hristov on your first merged PR. Nice contribution!

Iv-Hristov added 5 commits February 25, 2020 19:27

Changed string selection to support multiple wildcards using fnmatch

be679df

Remove test for multiple wildcards

7f69a11

Add tests for the new wildcard functionality and fixed comments

4444422

Fix a small error in one of the new tests

7a4a608

Small comment fix

e953bc9

richardjgowers requested changes Feb 26, 2020

View reviewed changes

richardjgowers reviewed Feb 26, 2020

View reviewed changes

richardjgowers requested changes Feb 26, 2020

View reviewed changes

richardjgowers self-assigned this Feb 26, 2020

Iv-Hristov added 3 commits February 26, 2020 21:22

Add two corner case tests

16d405a

Add two corner case tests

b7251ac

Small commit

faac32f

orbeckst added the GSOC Starter label Feb 26, 2020

richardjgowers requested changes Feb 27, 2020

View reviewed changes

IAlibay reviewed Feb 27, 2020

View reviewed changes

testsuite/MDAnalysisTests/core/test_atomselections.py Outdated Show resolved Hide resolved

Parametrized tests

e2d8acc

IAlibay reviewed Feb 28, 2020

View reviewed changes

Iv-Hristov added 2 commits February 28, 2020 07:18

Fixed indendation

2743be9

Fixed identation

7b3dc83

richardjgowers approved these changes Feb 28, 2020

View reviewed changes

Iv-Hristov added 3 commits February 29, 2020 10:27

Resolved conflict in AUTHORS

c18417d

Resolved conflict in AUTHORS

fc21b84

Merge branch 'develop' into develop

33d237b

orbeckst requested changes Feb 29, 2020

View reviewed changes

Iv-Hristov added 2 commits March 2, 2020 21:58

Documentation update

89899bc

Merge branch 'develop' of https://github.com/Iv-Hristov/mdanalysis in…

02a3ae0

…to develop

IAlibay reviewed Mar 2, 2020

View reviewed changes

package/doc/sphinx/source/documentation_pages/selections.rst Show resolved Hide resolved

Add more tests

028715d

Iv-Hristov and others added 2 commits March 3, 2020 08:29

Fixed comma in pytest.parametrize

0b00c01

Merge branch 'develop' into develop

d23f209

richardjgowers approved these changes Mar 3, 2020

View reviewed changes

Iv-Hristov added 2 commits March 3, 2020 20:01

Fixed underline length

61b2430

Merge branch 'develop' of https://github.com/Iv-Hristov/mdanalysis in…

2eb3a59

…to develop

orbeckst requested changes Mar 3, 2020

View reviewed changes

Corrected typo

77443a3

orbeckst approved these changes Mar 4, 2020

View reviewed changes

orbeckst merged commit eb18a33 into MDAnalysis:develop Mar 4, 2020

orbeckst mentioned this pull request Jun 13, 2020

simple selection by name is slow due to fnmatch #2751

Closed

fiona-naughton added enhancement Component-Selections labels Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More flexibility with wildcards in selection #2551

More flexibility with wildcards in selection #2551

Iv-Hristov commented Feb 25, 2020 •

edited by orbeckst

codecov bot commented Feb 25, 2020 •

edited

richardjgowers left a comment

richardjgowers Feb 26, 2020

richardjgowers left a comment

Iv-Hristov commented Feb 26, 2020

richardjgowers left a comment

richardjgowers Feb 27, 2020

richardjgowers Feb 27, 2020

richardjgowers Feb 27, 2020

richardjgowers Feb 27, 2020

Iv-Hristov commented Feb 27, 2020 •

edited

IAlibay Feb 28, 2020 •

edited

orbeckst left a comment

Iv-Hristov commented Mar 2, 2020

Iv-Hristov commented Mar 3, 2020

orbeckst commented Mar 3, 2020 •

edited

orbeckst left a comment

orbeckst Mar 3, 2020

orbeckst Mar 3, 2020

orbeckst commented Mar 4, 2020

	patterns for matching strings:
	patterns for matching strings (based on :mod:`fnmatch`):

More flexibility with wildcards in selection #2551

More flexibility with wildcards in selection #2551

Conversation

Iv-Hristov commented Feb 25, 2020 • edited by orbeckst

PR Checklist

codecov bot commented Feb 25, 2020 • edited

Codecov Report

richardjgowers left a comment

Choose a reason for hiding this comment

richardjgowers Feb 26, 2020

Choose a reason for hiding this comment

richardjgowers left a comment

Choose a reason for hiding this comment

Iv-Hristov commented Feb 26, 2020

richardjgowers left a comment

Choose a reason for hiding this comment

richardjgowers Feb 27, 2020

Choose a reason for hiding this comment

richardjgowers Feb 27, 2020

Choose a reason for hiding this comment

richardjgowers Feb 27, 2020

Choose a reason for hiding this comment

richardjgowers Feb 27, 2020

Choose a reason for hiding this comment

Iv-Hristov commented Feb 27, 2020 • edited

IAlibay Feb 28, 2020 • edited

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

Iv-Hristov commented Mar 2, 2020

Iv-Hristov commented Mar 3, 2020

orbeckst commented Mar 3, 2020 • edited

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst Mar 3, 2020

Choose a reason for hiding this comment

orbeckst Mar 3, 2020

Choose a reason for hiding this comment

orbeckst commented Mar 4, 2020

Iv-Hristov commented Feb 25, 2020 •

edited by orbeckst

codecov bot commented Feb 25, 2020 •

edited

Iv-Hristov commented Feb 27, 2020 •

edited

IAlibay Feb 28, 2020 •

edited

orbeckst commented Mar 3, 2020 •

edited