Skip to content

Conversation

@renkun-ken
Copy link
Member

@renkun-ken renkun-ken commented Apr 10, 2021

Closes #415

This PR makes STR_CONST token which is either an absolute path or a path relative to workspace rootPath.

It also checks if the file is a text file by reading its first 1000 bits into a raw vector using readBin and see if rawToChar succeeds it is a text file by detecting its text encoding with {stringi} functions.

image

@renkun-ken renkun-ken requested a review from randy3k April 11, 2021 02:15
@renkun-ken
Copy link
Member Author

I have no idea why the code coverage fails while the builds succeed. The error looks a bit weird

actual vs expected
- "file:///var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T//RtmpEg9DcE/var.R"
+ "file:///var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/RtmpMKNtQM/file92e2f824109.R"

where the actual uri is not even a random filename.

str_expr <- substr(document$content[str_line1], str_col1, str_col2)
str_text <- tryCatch(as.character(parse(text = str_expr, keep.source = FALSE)),
error = function(e) NULL)
if (is.character(str_text)) {
Copy link
Member

@randy3k randy3k Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to use parse here? as.character(expression("abc")) seems to work.

Copy link
Member Author

@renkun-ken renkun-ken Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used to handle raw strings where the text is like r"{hello}", same as how STR_CONST is handled in the document link provider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you are right.

R/utils.R Outdated

is_text_file <- function(path, n = 1000) {
bin <- readBin(path, "raw", n = n)
result <- tryCatch(rawToChar(bin), error = function(e) NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will work.
rawToChar(as.raw(c(1:255))) basically converts all bytes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in fact a detection of \0 to cause error based on the assumption that a binary file is quite likely to have \0 in it.

Copy link
Member

@randy3k randy3k Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary true, especially when we read only the first 1000 byte. As we only support UTF-8 files, maybe better to use stringi::stri_enc_isutf8(bin)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, stri_enc_isutf8 looks better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ai, it doesn't work well if you accidentally cut of an unicode at the 1000 byte.

stringi::stri_enc_isutf8(charToRaw("中")[1:2])

Though it should not be a big concern most of the time.

Copy link
Member Author

@renkun-ken renkun-ken Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a UTF-8 character has at most 4 bytes, what about retry with the last byte in the raw vector dropped at most 3 times. This should work if the 1000 bytes cut in the middle of the a unicode character.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a more robust way is to check stringi::stri_enc_isutf8 first, and then fallback to stringi::stri_enc_detect so that if the top encoding has confidence=1 then it is almost always a text file. In this case, I guess we could accept other text encodings if it is so confidently guessed so.

I test with a variety of text and binary files and this approach seems to always deliver the correct result.

@randy3k
Copy link
Member

randy3k commented Apr 13, 2021

@renkun-ken
Please don't merge it now, I want to investigate why the coverage test failed.

@renkun-ken
Copy link
Member Author

renkun-ken commented Apr 16, 2021

@randy3k Would you mind I switch the coverage to using ubuntu? It works as expected: https://github.com/renkun-ken/languageserver/runs/2359742236?check_suite_focus=true.

@randy3k
Copy link
Member

randy3k commented Apr 16, 2021

I chose macOS because Travis coverage test is in Linux. However, the macOS build is causing troubles here, so it makes senses to switch to Linux for now. With that said, it would be interesting to understand why it failed in macOS.

@renkun-ken
Copy link
Member Author

I chose macOS because Travis coverage test is in Linux. However, the macOS build is causing troubles here, so it makes senses to switch to Linux for now.

Sure, I'll merge that switch then.

With that said, it would be interesting to understand why it failed in macOS.

Exactly, I did a little digging but no conclusion so far.

@renkun-ken renkun-ken merged commit 40bda48 into REditorSupport:master Apr 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

definitionProvider should work with file path

2 participants