Revise usage of `encoding` in more CompilationUnitResolver methods #2641

mickaelistria · 2024-06-26T09:08:49Z

Making the encoding part of the API seems wrong as currently the encoding is not used consistently: Only Batch Compiler uses it in org.eclipse.jdt.internal.compiler.batch.CompilationUnit.getContents() while the IDE uses org.eclipse.jdt.internal.core.util.Util.getResourceContentsAsCharArray(IFile) which disrespect the given classpath encoding but uses the encoding configured in workspace and the utf bom.

We should revise usage of encoding here to either actually use it, or get rid of it.

The text was updated successfully, but these errors were encountered:

robstryker · 2024-06-26T14:27:34Z

The encodings in #2560 are used. I'm not sure why they are presumed to be ignored.

A quick trace through some methods shows that the encodings are used.

eclipse.jdt.core/org.eclipse.jdt.core/dom/org/eclipse/jdt/core/dom/CompilationUnitResolver.java

Line 1143 in aca0803

contents = Util.getFileCharContent(new File(sourceUnitPath), encoding);

And here:

eclipse.jdt.core/org.eclipse.jdt.core/dom/org/eclipse/jdt/core/dom/CompilationUnitResolver.java

Line 592 in aca0803

contents = Util.getFileCharContent(new File(sourceUnits[i]), encoding);

From what I can tell, both locations make use of contents = Util.getFileCharContent(new File(sourceUnits[i]), encoding);

I am not sure why @jukzi thinks the encodings are unused.

jukzi · 2024-06-26T15:08:12Z

My concern is not they are unused but they are used inconsistently.
Both references you gave create a batch.CompilationUnit while for example org.eclipse.jdt.internal.core.CompilationUnit does not know about it

robstryker · 2024-06-26T15:24:56Z

I think you might be looking at this the wrong way.

The encodings are required so that we know how to read the source file and turn it into a string of characters that we can parse. It is not terribly important that the resulting org.eclipse.jdt.internal.core.CompilationUnit is or is not aware of what encoding it had when it was read from the filesystem. It is more important that the file be read correctly and turned into the jdt.internal.core.CompilationUnit correctly.

The encodings are 100% used (in this PR at least) to read the source files before parsing them and turning them into jdt dom items.

This new interface provides only 5 methods. Only 2 of them require an array of encodings. And these encodings are 100% used in every occasion to read the file from the filesystem. And I would imagine very strongly that anyone extending or using this non-API code would similarly be using the encoding to read the file from the filesystem before parsing it. I know our alternate javac implementation does. The encodings are used, even if they are not part of the returned object.

jukzi · 2024-06-26T15:50:08Z

ASTParser has a org.eclipse.jdt.core.dom.ASTParser.setEnvironment(String[], String[], String[], boolean)
where someone could setup arbitrary encodings distinct from those used to read a File with Platform.

org.eclipse.jdt.internal.core.CompilationUnit on the other hand reads the content using getResourceContentsAsCharArray() - which does not know about the encoding configured in ASTParser. instead it uses the settings from Platform.

I would like to verify it in the IDE but as far as i see setEnvironment is only used durring tests, so that normally sourcepathsEncodings should be always null.

jukzi · 2024-06-26T15:51:32Z

robstryker · 2024-06-26T16:11:20Z

As with all things where a large historical codebase is involved, we assume:

Things are generally the way they are for a reason, and
It's best not to change that unless you:
a) understand the reason it is the way it is,
b) know 100% that the reason is wrong or faulty, and
c) have a fix

In this case, we assume that the logic that was in ASTParser and the logic that was in CompilationUnitResolver and ECJCompilationUnitResolver was generally correct and had a reason to be there.

The previous PR that sparked this discussion attempts to be very careful about merely moving functionality around, but not changing it much at all, as well as formalizing the contract with the caller.

While there's always a push and pull about whether to add something to the interface or not, there are typical concerns that should be weighed against each other.

Which implementation existed before? Try to stay faithful to that one.
Is something about the existing implementation dangerous or wrong? A high bar should be present here.
Should something NOT be added to the interface because it is 100% unnecessary? Variables that no implementer want or desire to use can easily be removed from the interface, but variables where the pre-existing implementation CLEARLY use those variables, and where there is nothing obviously wrong or dangerous about it, should likely be maintained, both to stay faithful to the pre-existing implementation and test cases, as well as to minimize disruption.

If there was something clearly wrong about including the encodings, or if the CompilationUnitResolver clearly wasn't using them, we would have a good argument to remove them from the interface. Since the variables are clearly being used and respected in their implementation at least, if not by callers, we have to fall back to some type of harm calculation.

Removing encodings from the interface has the potential to break any tests that make use of that ability to override the encodings while providing no significant benefit other than the smaller incremental benefit of pruning a codebase of unused features. I would argue the disruption of potentially adding chaos to the test suite and being forced to either disable a number of tests or find some other way to work around the issue points to a greater harm in removing the parameter than in allowing it to remain.

robstryker · 2024-07-15T17:08:02Z

@mickaelistria Is this issue still relevant? Or can it be closed?

mickaelistria mentioned this issue Jun 26, 2024

An extension point allowing third parties to override source to dom tree implementations #2560

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise usage of `encoding` in more CompilationUnitResolver methods #2641

Revise usage of `encoding` in more CompilationUnitResolver methods #2641

mickaelistria commented Jun 26, 2024

robstryker commented Jun 26, 2024

jukzi commented Jun 26, 2024

robstryker commented Jun 26, 2024

jukzi commented Jun 26, 2024

jukzi commented Jun 26, 2024

robstryker commented Jun 26, 2024

robstryker commented Jul 15, 2024

Revise usage of encoding in more CompilationUnitResolver methods #2641

Revise usage of encoding in more CompilationUnitResolver methods #2641

Comments

mickaelistria commented Jun 26, 2024

robstryker commented Jun 26, 2024

jukzi commented Jun 26, 2024

robstryker commented Jun 26, 2024

jukzi commented Jun 26, 2024

jukzi commented Jun 26, 2024

robstryker commented Jun 26, 2024

robstryker commented Jul 15, 2024

Revise usage of `encoding` in more CompilationUnitResolver methods #2641

Revise usage of `encoding` in more CompilationUnitResolver methods #2641