Add support for the same encodings that ripgrep supports #12

acheronfail · 2020-07-01T10:23:07Z

We can't trust the absolute_offset that ripgrep reports for non UTF-8 encoded files (see BurntSushi/ripgrep#1627 (comment)). So we need to parse the file ourselves.

Goals for this issue:

Use the same approach to encoding sniffing that ripgrep uses, either:
- checking for a UTF-8 or UTF-16 BOM, and then using that encoding (defaulting to UTF-8 otherwise)
- using the encoding passed on the command line
~~Find the exact location of the match a non UTF-8 encoded file, and insert the replacement text in the specified encoding~~. We changed tactics, but the result is the same. We now decode into UTF8/ASCII, perform the replacements and then re-encode before writing to disk

Supported encodings (tests exist for them):

The text was updated successfully, but these errors were encountered:

acheronfail · 2023-03-20T10:34:36Z

I'm going to close this for now.

If you spot any specific encoding issues while using rgr, please create an issue!

acheronfail · 2023-03-20T10:36:53Z

We can improve the encoding situation for rgr by contributing an upstream change in rg, see: BurntSushi/ripgrep#1629

acheronfail added the enhancement New feature or request label Jul 1, 2020

acheronfail closed this as completed Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for the same encodings that ripgrep supports #12

Add support for the same encodings that ripgrep supports #12

acheronfail commented Jul 1, 2020 •

edited

acheronfail commented Mar 20, 2023

acheronfail commented Mar 20, 2023

Add support for the same encodings that ripgrep supports #12

Add support for the same encodings that ripgrep supports #12

Comments

acheronfail commented Jul 1, 2020 • edited

acheronfail commented Mar 20, 2023

acheronfail commented Mar 20, 2023

acheronfail commented Jul 1, 2020 •

edited