Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ripgrep outputs BOM character for match on first line of a file with a BOM #632

Closed
roblourens opened this issue Oct 10, 2017 · 7 comments
Closed

Comments

@roblourens
Copy link
Contributor

From microsoft/vscode#35633

As the title says, I noticed that ripgrep will output the BOM character when printing a match in the first line. Not really blocking anything, but I think it would make sense for ripgrep to strip this.

@BurntSushi
Copy link
Owner

@roblourens That is interesting. I think my intuition was the opposite, but I could be completely wrong? For example, if you run rg pattern file > results and file has a BOM, shouldn't results also have a BOM?

@FSMaxB
Copy link

FSMaxB commented Oct 10, 2017

But shouldn't the output encoding depend on the locale?

@BurntSushi
Copy link
Owner

@FSMaxB ripgrep doesn't respect any locale settings and always uses UTF-8 for output. That seems orthogonal to the issue reported here.

@FSMaxB
Copy link

FSMaxB commented Oct 10, 2017

Ok, sorry for the noise.

@roblourens
Copy link
Contributor Author

It seems to me like the BOM is metadata and not really part of the actual text of the file. But I'm not sure and haven't checked what other tools do.

@okdana
Copy link
Contributor

okdana commented Oct 10, 2017

rg's behaviour seems consistent with BSD grep, GNU grep, and ag:

% printf '\xef\xbb\xbftest\n' > bom.txt
% file bom.txt
bom.txt: UTF-8 Unicode (with BOM) text
% /usr/bin/grep . bom.txt | xxd -a
00000000: efbb bf74 6573 740a    ...test.
% /usr/local/bin/grep . bom.txt | xxd -a
00000000: efbb bf74 6573 740a    ...test.
% /usr/local/bin/ag --no-numbers . bom.txt | xxd -a
00000000: efbb bf74 6573 740a    ...test.
% /usr/local/bin/rg --no-line-number . bom.txt | xxd -a
00000000: efbb bf74 6573 740a    ...test.

I can see why someone might expect it to be stripped out tho

@BurntSushi
Copy link
Owner

BurntSushi commented Oct 21, 2017

I don't quite know what the right answer is here. Given that both BSD and GNU grep leave the BOM in tact, I'm also inclined to take that path as well. In particular, in the absence of more concrete use cases where removing the BOM makes sense, I'd like to side with tradition on this one.

I'm going to close this for now, but if there is a more compelling argument to be made, please make it here and we can revisit this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants