Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comby reports wrong range is the line ending is \r\n (CRLF) instead of \n (LF) #183

Closed
vn-ki-cn opened this issue May 12, 2020 · 3 comments · Fixed by #239
Closed

Comby reports wrong range is the line ending is \r\n (CRLF) instead of \n (LF) #183

vn-ki-cn opened this issue May 12, 2020 · 3 comments · Fixed by #239

Comments

@vn-ki-cn
Copy link

Describe the bug
Comby reports wrong range is the line ending is \r\n (CRLF) instead of \n (LF)

Reproducing

  • Create a file with \r\n as line ending
  • Use comby -json-lines to get output in json
  • See that the range reported is wrong

Expected behavior
Correct range is reported

@rvantonder
Copy link
Member

Hi there! I did the following (though I am not on a windows machine, but I think this will help us get to the issue):

echo -n '\r\n' > file.txt 
# now the file.txt contains only `\r\n`
comby ':[1]' '' -match-only -json-lines file.txt 2> /dev/null | python -m json.tool

The output of the above is:

{
    "matches": [
        {
            "environment": [
                {
                    "range": {
                        "end": {
                            "column": 1,
                            "line": 2,
                            "offset": 2
                        },
                        "start": {
                            "column": 1,
                            "line": 1,
                            "offset": 0
                        }
                    },
                    "value": "\n",
                    "variable": "1"
                }
            ],
            "matched": "\r\n",
            "range": {
                "end": {
                    "column": 1,
                    "line": 2,
                    "offset": 2
                },
                "start": {
                    "column": 1,
                    "line": 1,
                    "offset": 0
                }
            }
        }
    ],
    "uri": "/private/tmp/file.txt"
}
  • The "value": "\n" part looks like a bug: this should likely be \r\n, and in that case, it looks like the reported range for "variable": "1" is wrong, yes.

  • The range for matched looks OK to me (offsets are 0-based, line/column is 1-based). Do you expect different values here?

@vn-ki-cn
Copy link
Author

vn-ki-cn commented May 13, 2020

/tmp/comby-fixer-15612932977450046446.tmp
$ echo -n "this is line 1\r\nthis is line 2 \r\n" > file.txt                                           

/tmp/comby-fixer-15612932977450046446.tmp
$ xxd file.txt 
00000000: 7468 6973 2069 7320 6c69 6e65 2031 0d0a  this is line 1..
00000010: 7468 6973 2069 7320 6c69 6e65 2032 200d  this is line 2 .
00000020: 0a                                       .

/tmp/comby-fixer-15612932977450046446.tmp
$ comby ':[1]' '' -match-only -json-lines file.txt 2> /dev/null | python -m json.tool
{
    "uri": "/tmp/comby-fixer-15612932977450046446.tmp/file.txt",
    "matches": [
        {
            "range": {
                "start": {
                    "offset": 0,
                    "line": 1,
                    "column": 1
                },
                "end": {
                    "offset": 33,
                    "line": 3,
                    "column": 1
                }
            },
            "environment": [
                {
                    "variable": "1",
                    "value": "this is line 1\nthis is line 2 \n",
                    "range": {
                        "start": {
                            "offset": 0,
                            "line": 1,
                            "column": 1
                        },
                        "end": {
                            "offset": 33,
                            "line": 3,
                            "column": 1
                        }
                    }
                }
            ],
            "matched": "this is line 1\r\nthis is line 2 \r\n"
        }
    ]
}

as you can see with the above output there are 2 lines in the file, but the comby thinks it's just 1 huge line and reports the range as such.

EDIT: the value appears to be wrong as well

@rvantonder
Copy link
Member

as you can see with the above output there are 2 lines in the file, but the comby thinks it's just 1 huge line

The matched part for :[1] matches across new lines (it is not line-based), so the output "this is line 1\r\nthis is line 2 \r\n" here seems correct to me. The JSON format means that newlines are encoded as escape sequences \n. When the \r\n escape sequences are interpreted, this corresponds to two lines. The range reported for this fragment is:

"start": {
                    "offset": 0,
                    "line": 1,
                    "column": 1
                },
                "end": {
                    "offset": 33,
                    "line": 3,
                    "column": 1
                }

The current convention is to put the end of the range at the position after the match. Thus, if the last character of the range is a newline, the position after is the next line, at the first column position (line 3, column 1). It's possible to change the convention of what range means, but as it is this is currently consistent and doesn't seem incorrect to me.


That said, the value for

                    "variable": "1",
                    "value": "this is line 1\nthis is line 2 \n",

indeed looks wrong to me, since the \r\n's should be in there. I'll investigate that :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants