Add encoding detection callback#2788
Conversation
| .findFirst() | ||
| .get(); | ||
|
|
||
| assertEquals("\"Привет мир\"", utf8Type.getField("s1").getAssignment().toString()); |
There was a problem hiding this comment.
I love this expected value :-)
There was a problem hiding this comment.
If you don't read Cyrillic, it means "Hello World" :)
|
I would like to have available file name+path in encoding detection callback. encoding//src/main/java/spoon/support/visitor/replace/ReplacementVisitor.java=ISO-8859-1So I can imagine that many clients might be able to select encoding depending on the file name or file path. Of course bytes of file might be good input too! WDYT? But before we design the final API for encoding detection, please note this bug in Spoon: class VirtualFile implements SpoonFile {
...
@Override
public InputStream getContent() {
return new ByteArrayInputStream(content.getBytes());
}
...
}here we convert String value to bytes using system dependent encoding ... and later we convert bytes back to chars using another encoding. ... but how to mix it together with encoding detector?? ;-) @surli @monperrus Do you agree to refactor the related file API here? |
interface SpoonFile {
default char[] getContentChars(Environment env) {
... move loading code from `FileCompilerConfig#initializeCompiler` here...
}
}
class VirtualFile implements SpoonFile {
...
@Override
public char[] getContentChars(Environment env) {
return content.toCharArray();
}
...
}WDYT? |
We can use |
|
The And I personally would prefer new interface |
|
So, as far as I understand you want me to add Am I right? As you can see WDYT? |
if you add "default" implementation in
yes
I believe that this encoding is ignored (The compilation unit has no access to origin byte[] so encoding is useless.), so we can simply provide default encoding from environment. |
|
@pvojtechovsky, I refactored the API as you suggested. |
| byte[] bytes; | ||
| try { | ||
| InputStream contentStream = getContent(); | ||
| bytes = new byte[contentStream.available()]; |
There was a problem hiding this comment.
There is no way how to get size of the input stream.
Use ByteArrayOutputStream, then IOUtils.copy to copy input to output and then ByteArrayOutputStream#toByteArray
| @@ -54,8 +56,9 @@ default char[] getContentChars(Environment env) { | |||
| byte[] bytes; | |||
| try { | |||
| InputStream contentStream = getContent(); | |||
There was a problem hiding this comment.
The contentStream must be closed. The best is:
try (InputStream contentStream = getContent()) {There was a problem hiding this comment.
Ooops, my bad, thanks.
|
Looks got to me. Thank You Egor! |
This PR provides ability to specify user-defined callback, which could be used to detect encoding for each file separately. See #2781.
For example I use juniversalchardet lib:
Spoon api:
It helps to have correct AST in projects with mixed encodings.