Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #277 utf-8 byte order mark #279

Merged
merged 2 commits into from Jul 13, 2017
Merged

Issue #277 utf-8 byte order mark #279

merged 2 commits into from Jul 13, 2017

Conversation

doug24
Copy link
Contributor

@doug24 doug24 commented Jul 2, 2017

Issue #277 change to keep original file utf-8 bom. Also fixed a problem where the "replace" added an extra new line to files that did not end with a new line.

I believe this fixes the current problem, but we could use more tests for replace on files with different encodings.

For Issue #210, added an additional fix for regex replace, where the regex contains a $ token. The $ (end of line) token will only match UNIX new line character, so Windows and Mac new lines must be converted to the UNIX new line before calling the replace method.

…bom and line endings;

Issue dnGrep#210 add additional fix for regex replace where regex contains a $ eol token.
@JVimes
Copy link
Contributor

JVimes commented Jul 4, 2017

I'll take a look as soon as I can. Thanks!

bb = inputStream.ReadByte();
if (0xBF == bb)
{
result = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReSharper got me in the habit of "invert if-statement to reduce nesting", which I've found I really like. Makes the code easier to follow. What do you think about something like this?

public static bool HasUtf8ByteOrderMark(Stream inputStream)
{
    int bb = inputStream.ReadByte();
    if (0xEF != bb) return false;
    bb = inputStream.ReadByte();
    if (0xBB != bb) return false;
    bb = inputStream.ReadByte();
    if (0xBF != bb) return false;
    inputStream.Seek(0, SeekOrigin.Begin);
    return true;
}

Copy link
Contributor Author

@doug24 doug24 Jul 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, except the input stream needs to be reset to the origin on any return condition. How about getting rid of the if-statements all together?

        public static bool HasUtf8ByteOrderMark(Stream inputStream)
        {
            int b1 = inputStream.ReadByte();
            int b2 = inputStream.ReadByte();
            int b3 = inputStream.ReadByte();
            inputStream.Seek(0, SeekOrigin.Begin);

            return (0xEF == b1 && 0xBB == b2 && 0xBF == b3);
        }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's it! That's a winner. 😸

@JVimes JVimes merged commit 2c0b835 into dnGrep:master Jul 13, 2017
@majkinetor
Copy link
Contributor

Thanks for quickly looking into this.

@doug24 doug24 deleted the UnicodeBOM branch July 15, 2017 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants