Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise usage of encoding in more CompilationUnitResolver methods #2641

Open
mickaelistria opened this issue Jun 26, 2024 · 7 comments
Open

Comments

@mickaelistria
Copy link
Contributor

Quoting @jukzi on #2560 (comment)

Making the encoding part of the API seems wrong as currently the encoding is not used consistently: Only Batch Compiler uses it in org.eclipse.jdt.internal.compiler.batch.CompilationUnit.getContents() while the IDE uses org.eclipse.jdt.internal.core.util.Util.getResourceContentsAsCharArray(IFile) which disrespect the given classpath encoding but uses the encoding configured in workspace and the utf bom.

We should revise usage of encoding here to either actually use it, or get rid of it.

@robstryker
Copy link
Contributor

The encodings in #2560 are used. I'm not sure why they are presumed to be ignored.

A quick trace through some methods shows that the encodings are used.

contents = Util.getFileCharContent(new File(sourceUnitPath), encoding);

And here:

contents = Util.getFileCharContent(new File(sourceUnits[i]), encoding);

From what I can tell, both locations make use of contents = Util.getFileCharContent(new File(sourceUnits[i]), encoding);

I am not sure why @jukzi thinks the encodings are unused.

@jukzi
Copy link
Contributor

jukzi commented Jun 26, 2024

My concern is not they are unused but they are used inconsistently.
Both references you gave create a batch.CompilationUnit while for example org.eclipse.jdt.internal.core.CompilationUnit does not know about it

@robstryker
Copy link
Contributor

I think you might be looking at this the wrong way.

The encodings are required so that we know how to read the source file and turn it into a string of characters that we can parse. It is not terribly important that the resulting org.eclipse.jdt.internal.core.CompilationUnit is or is not aware of what encoding it had when it was read from the filesystem. It is more important that the file be read correctly and turned into the jdt.internal.core.CompilationUnit correctly.

The encodings are 100% used (in this PR at least) to read the source files before parsing them and turning them into jdt dom items.

This new interface provides only 5 methods. Only 2 of them require an array of encodings. And these encodings are 100% used in every occasion to read the file from the filesystem. And I would imagine very strongly that anyone extending or using this non-API code would similarly be using the encoding to read the file from the filesystem before parsing it. I know our alternate javac implementation does. The encodings are used, even if they are not part of the returned object.

@jukzi
Copy link
Contributor

jukzi commented Jun 26, 2024

ASTParser has a org.eclipse.jdt.core.dom.ASTParser.setEnvironment(String[], String[], String[], boolean)
where someone could setup arbitrary encodings distinct from those used to read a File with Platform.

org.eclipse.jdt.internal.core.CompilationUnit on the other hand reads the content using getResourceContentsAsCharArray() - which does not know about the encoding configured in ASTParser. instead it uses the settings from Platform.

I would like to verify it in the IDE but as far as i see setEnvironment is only used durring tests, so that normally sourcepathsEncodings should be always null.

@jukzi
Copy link
Contributor

jukzi commented Jun 26, 2024

image

@robstryker
Copy link
Contributor

As with all things where a large historical codebase is involved, we assume:

  1. Things are generally the way they are for a reason, and

  2. It's best not to change that unless you:
    a) understand the reason it is the way it is,
    b) know 100% that the reason is wrong or faulty, and
    c) have a fix

In this case, we assume that the logic that was in ASTParser and the logic that was in CompilationUnitResolver and ECJCompilationUnitResolver was generally correct and had a reason to be there.

The previous PR that sparked this discussion attempts to be very careful about merely moving functionality around, but not changing it much at all, as well as formalizing the contract with the caller.

While there's always a push and pull about whether to add something to the interface or not, there are typical concerns that should be weighed against each other.

  1. Which implementation existed before? Try to stay faithful to that one.
  2. Is something about the existing implementation dangerous or wrong? A high bar should be present here.
  3. Should something NOT be added to the interface because it is 100% unnecessary? Variables that no implementer want or desire to use can easily be removed from the interface, but variables where the pre-existing implementation CLEARLY use those variables, and where there is nothing obviously wrong or dangerous about it, should likely be maintained, both to stay faithful to the pre-existing implementation and test cases, as well as to minimize disruption.

If there was something clearly wrong about including the encodings, or if the CompilationUnitResolver clearly wasn't using them, we would have a good argument to remove them from the interface. Since the variables are clearly being used and respected in their implementation at least, if not by callers, we have to fall back to some type of harm calculation.

Removing encodings from the interface has the potential to break any tests that make use of that ability to override the encodings while providing no significant benefit other than the smaller incremental benefit of pruning a codebase of unused features. I would argue the disruption of potentially adding chaos to the test suite and being forced to either disable a number of tests or find some other way to work around the issue points to a greater harm in removing the parameter than in allowing it to remain.

@robstryker
Copy link
Contributor

@mickaelistria Is this issue still relevant? Or can it be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants