Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Matching whole word that ends with a period causes odd results #1028

Open
1 task done
cdusold opened this issue Jun 19, 2018 · 6 comments
Open
1 task done

Matching whole word that ends with a period causes odd results #1028

cdusold opened this issue Jun 19, 2018 · 6 comments

Comments

@cdusold
Copy link

cdusold commented Jun 19, 2018

Prerequisites

Description

Whole Word find functionality does not match Whole Word replace functionality if the search starts with a period or ends with a period. That is to say, it will match strings in find, but refuse to replace them. This seems to indicate a disconnect between the two functions, and lead me to not realizing I accidentally left Whole Word on as find was matching strings it should not have in theory.

Steps to Reproduce

  1. atom --safe
  2. In the untitled document enter a.b
  3. Hit enter
  4. Enter a.
  5. Hit ctrl-f
  6. Enter a.
  7. Click the "Whole Word" search option.
  8. Hit ctrl-f again
  9. Enter a\.
  10. Turn on Use Regex (ctrl+alt+/ if you want a shortcut)
  11. Hit "Replace"

Expected behavior: Honestly, not entirely sure, but at least it should be consistent. substring 'a.' of 'a.b' isn't a full word, and the line where 'a.' is all that's there doesn't get picked up at all, and if anything that looks more like a whole word. I'm pretty sure a whole word search of 'a.' shouldn't be able to match anything as it isn't a whole word, but multi-word searches can use whole word just fine.

At the very least, I expect a matched string to be replaced at step 11.

Actual behavior: The substring 'a.' of 'a.b' is matched, but the next line isn't. Adding or removing whitespace of any kind (tab, space, and newline) before 'a.b' toggles whether it matches or not. And hitting Replace occasionally makes it stop matching in the find results, but I'm not sure if there's a pattern to it.

Reproduces how often: Every time I've tried.

Versions

Atom : 1.27.2
Electron: 1.7.15
Chrome : 58.0.3029.110
Node : 7.9.0

apm 1.19.0
npm 3.10.10
node 6.9.5 x64
atom 1.27.2
python 2.7.12
git 2.13.0.windows
visual studio 2015

Windows 10 Pro
Version 1709
OS Build 16299.251

Additional Information

A multi-word "Whole Word" search ending in a period doesn't ever match, but one with a space afterwards does (e.g. longer test. vs. longer test. matching against "This is a longer test. Test.").

I found this by trying to find and replace self. from a method I turned into a function in a python script. I accidentally had Whole Word on from an earlier replacement. It matched all of them, but when I hit "Replace," nothing happened. So the logic for finding seems to be different than the logic for replacing while Whole Word is on.

@cdusold
Copy link
Author

cdusold commented Jun 20, 2018

Same issue with searches starting with periods.

@Balaji-v
Copy link

Balaji-v commented Aug 6, 2018

This issue still exists with Atom 1.28.2

@rsese
Copy link

rsese commented Aug 13, 2018

Thanks for the report! I think (I'm not very experienced with regexes) what you're describing is expected?

Whole word search looks like it uses regex word boundaries:

expression = "\\b#{expression}\\b" if @wholeWord

And I think the issue is that the dot isn't considered an alphanumeric character? This StackOverflow post talks about a similar issue and this regex check seems to confirm:

https://regex101.com/r/s9ySh9/1

I also checked in Sublime and it behaves the same as well where only the first line a.b matches with whole word + regex enabled when searching for a\..

Does that make sense @cdusold or am I mis-understanding?

@cdusold
Copy link
Author

cdusold commented Aug 14, 2018

@rsese You are misunderstanding a little, but this explains a little too.
The biggest bug noticeable is the complete lack of replacing the matched term. If it's matched, it should be replaced when hitting the button.
The symptoms are slightly different in 1.29, but the underlying problem seems to still be there.
When I type

a.b
a.b
a.
a.b
a.

into the file, and search a\. whole word, it matches all three (I used to experience it matching alternatingly, for some reason, but figured that was probably just an odd result of whatever was going on.). Then tapping replace (doesn't matter what is in the replace box) causes it to highlight the next one, claim it's on "1 of 2" (instead of "1 of 3" as before), glitch out slightly and then say "two found" in the status box, despite there still being three that \ba\.\b still in the text that would be matched. Then hitting find cycles between the two other ones, that the cursor wasn't on after hitting replace, hitting replace again on either of them seems to cause it to say "1 of 1" then immediately correct itself to "[#] of 3".

This is on Ubuntu 16.04.5 LTS this time. As soon as I have the chance, I'll double check Windows 10 and get a screen cap of the behavior. Adding spaces before any of the a.b words still toggles that line matching or not (definitely a bug) and seems to also toggle others in a very peculiar pattern. (I definitely need to get a gifv of this.)

@Aerijo
Copy link
Contributor

Aerijo commented Aug 14, 2018

@rsese @cdusold can confirm
ezgif com-optimize

@Aerijo
Copy link
Contributor

Aerijo commented Aug 14, 2018

Found the issue with the weird marker changes when pressing space, will patch soon.

Edit: It was more complicated than I thought. I believe (currently) the issue is that only the text in the range of two existing matches surrounding a change is looked at, and so the word break fails because it was at the end of the range, so the word character to the right is not detected. I.e.,

// all match initially (|...| is match)
|a.|a
|a.|b
|a.|c

// make a change between a.a and a.b
|a.|a "foo"
|a.|b
|a.|c

// recalculates from the start of a.a to the end of a.b
(looks at) "a.a\na."
           ^     ^- fails, because the b (important for the word break condition) is not looked at
           |- works because we see the whole thing 

// "new" markers are
|a.|a
a.b
|a.|c

This will take some more effort to fix. The actual searching gets down into the native superstring module. And this is a bigger problem than word breaks; any lookahead should be failing (and it does 😒)

I haven't looked at the replace issue yet, but it's definitely related to this issue. Lookahead also fails to replace.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants