Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular expression search issue with hex bytes #573

Closed
rohitab opened this issue Dec 15, 2022 · 8 comments
Closed

Regular expression search issue with hex bytes #573

rohitab opened this issue Dec 15, 2022 · 8 comments
Assignees
Labels

Comments

@rohitab
Copy link
Contributor

rohitab commented Dec 15, 2022

Far Manager version

3.0.6069.0 x64

OS version

10.0.22623.1028

Other software

No response

Steps to reproduce

Setup

  1. Extract hex-test.bin from attached archive hex-test.zip
  2. In Far Manager, press F3 to load the file in Viewer, then press F4 to view in Hex mode
  3. You will see the following data: 01 31 87 EB AC 0C CD ED │ 52 6F 68 69 74 61 62 0D
  4. Press F7 to search. Select "Search for text" and enable checkbox for "Regular expressions".

Tests

  1. Search for \x31\x87
  2. Search for \x31..\xAC
  3. Search for \x52\x6F\x68\x69\x74
  4. Search for \x52\x6F\x68\x69
  5. Search for \xED

Expected behavior

Results

  1. Bytes 31 87 are be found and selected
  2. Bytes 31 87 EB AC are found and selected
  3. Bytes 52 6F 68 69 74 are found and selected
  4. Bytes 52 6F 68 69 are found and selected
  5. Byte ED is found and selected

Actual behavior

Results

  1. Error message "Could not find the string /\x31\x87/"
  2. OK. Text is found as expected
  3. Search window closes. Text is not found or selected. There are no error messages either.
  4. OK. Text is found as expected
  5. Byte CD is selected instead of ED. Continuing the search selects ED

Searching for the same expression using grep works fine and the text is found as expected.
For example, grep -P "\x31\x87" hex-test.bin returns "grep: hex-test.bin: binary file matches"

If the same searches are performed using the Editor instead of Viewer, the results are the same, expect for Item 3; in the Editor, the text is found and selected.

I've tried searching using both \xNN and \xNNNN formats. Both of them give the same results.

@rohitab rohitab added the bug label Dec 15, 2022
@w17
Copy link
Contributor

w17 commented Dec 15, 2022

RegEx search works with text not raw bytes.
So really pattern searched after converting text to unicode lines in current codepage.
I suppose after selecting 28591 codepage everything should work like you want.

@w17
Copy link
Contributor

w17 commented Dec 15, 2022

  1. in viewer -- bug

@alabuzhev
Copy link
Contributor

Both Viewer & Editor work with decoded Unicode codepoints, not raw bytes in the underlying files.
For example, if the file is opened in CP1252, 87 becomes U2021 '‡' and this is what the search or any other operation sees.
It may be counter-intuitive at times, but it is what it is.

Search window closes. Text is not found or selected. There are no error messages either.

This one is funny. The search string is "\x52\x6F\x68\x69\x74", i.e. 20 individual characters.
The viewer "optimizes" the search by checking the string length: if it's longer than the file size, it just calls it a day without invoking any searchers, assuming that it can't be found by definition anyway and not realizing that regexes can be arbitrarily long. Probably it has not been noticed earlier because viewed files are usually larger than typical regexes.

w17 added a commit that referenced this issue Dec 16, 2022
@w17
Copy link
Contributor

w17 commented Dec 17, 2022

The only bug here should be fixed in 6071.

@w17 w17 self-assigned this Dec 17, 2022
@rohitab
Copy link
Contributor Author

rohitab commented Dec 18, 2022

Thank you both for the explanation.

@w17, changing the code page to ISO 8859-1 worked.

I just tested version 3.0.6071.0 x64. Item 3 is now fixed.

One last thing. I may be making a mistake here, but searching for /.*/s does not find and select all the text. Anything after a carriage return or line feed, 0D or 0A is not selected. If I search for /\x61.*/s or /ab.*/s it finds and selects 61 62. Shouldn't it find and select 61 62 0D since I've included the s flag? Even if the file contains additional characters after 0D, they are not matched.

According to the documentation:
s - consider the whole text as one line, . matches any character
. - any character except carriage return. If there is s among the options then dot matches any character

@w17
Copy link
Contributor

w17 commented Dec 19, 2022

Unfortunately multiline regex search is not supported.
Regular expression checked on each individual line (in editor or in viewer text mode).
It is not a secret you can see that in the help (just pres F1 in the serch dialog).

@w17
Copy link
Contributor

w17 commented Dec 19, 2022

You can try plugins like REsearch to avoid this limitation.

@rohitab
Copy link
Contributor Author

rohitab commented Dec 19, 2022

No problem. Just curious, what is the s flag used for then?

It might be a good idea to update the Regular expressions help topic and remove references to \n and \r since they will never work.

Also might be helpful to include a short note for \xNNNN in the Regular expressions help topic, stating that conversion will take place according to the selected code page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants