Skip to content

Conversation

@mernst
Copy link
Contributor

@mernst mernst commented May 29, 2020

In many parts of the Commons IO API, null may be passed as a Charset or the name of one, and Commons IO uses the platform's default character encoding.
I assumed that was the case for ReversedLinesFileReader, and the result was a null pointer exception.

Consider the following minimal example:

import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import org.apache.commons.io.input.ReversedLinesFileReader;

public class TestReversedLinesFileReader {
  public static void main(String[] args) throws IOException {
    ReversedLinesFileReader rfr =
        new ReversedLinesFileReader(new File("TestReversedLinesFileReader.java"),
                                    (Charset) null);
  }
}

Running this program results in:

Exception in thread "main" java.lang.NullPointerException
	at java.lang.String.getBytes(String.java:940)
	at org.apache.commons.io.input.ReversedLinesFileReader.<init>(ReversedLinesFileReader.java:130)
	at org.apache.commons.io.input.ReversedLinesFileReader.<init>(ReversedLinesFileReader.java:78)
	at TestReversedLinesFileReader.main(TestReversedLinesFileReader.java:8)

I assume that the ReversedLinesFileReader constructor is intended to accept null as an argument because of this line:

        final Charset charset = Charsets.toCharset(encoding);

which calls toCharset(Charset) which is a no-op unless its argument is null. So there would be no purpose to that line except to handle a null encoding argument to the constructor.

The problem comes a few lines later when the possibly-null value encoding is used:
This pull request changes the code so the encoding field is non-null even if the encoding formal parameter is null, and uses this.encoding instead of encoding in 4 locations.
I think that setting the field encoding to a non-null value is the right thing because there are three calls to new String that use encoding as if it is non-null.

An alternate fix would be to:

  • document the Charset argument as being non-null,
  • throw an exception within the constructor if a client passes null,
  • remove the declaration of the charset local variable, and
  • chang all uses of the charset local variable into uses of encoding.

@coveralls
Copy link

coveralls commented May 29, 2020

Coverage Status

Coverage decreased (-0.03%) to 89.751% when pulling 042eb26 on mernst:ReversedLinesFileReader-encoding into cd154e9 on apache:master.

@garydgregory
Copy link
Member

@mernst
Thank you for the PR. You are missing a unit test to show what issue this solves, IOW the test fails without the change on the main folder and passes with with change.

@mernst
Copy link
Contributor Author

mernst commented May 30, 2020

@garydgregory I added the test case from the pull request description. It fails on master and passes in this pull request.

public void testNullEncoding() throws IOException, URISyntaxException {
new ReversedLinesFileReader(new File(this.getClass().getResource("/test-file-empty.bin").toURI()),
(Charset) null);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mernst,

Thank you for your update to this PR.

The new unit test indeed tests that the ctor does not blow on a null Charset but it does not test that the default Charset kicks in.

Another update would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garydgregory I agree that would be a useful test, even though it is not directly related to this bug fix.

I've made my best guess at how to test for the default Charset in a system-independent way.

I'm not sure whether this is what you had in mind. If not, could point me at documentation about how Commons IO prefers to test that the default Charset is being used?

Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mernst,

A better way to test would be to write to the temp file with Charset.defaultCharset() and still read with a null Charset.

From a black-box perspective, writing with a null Charset does not prove anything since you're relying on the fact that another API does the right thing with a null Charset. This would really match the test with expectations.

Copy link
Contributor Author

@mernst mernst Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree these tests are useful to improve code coverage. But they are not directly related to the change, and previous changes to this code were permitted without adding these missing tests. Could we create a new pull request or issue to improve the coverage of the test suite, to avoid blocking this pull request?

@mernst
Copy link
Contributor Author

mernst commented Jun 5, 2020

@garydgregory Can this pull request be merged?

@garydgregory
Copy link
Member

I will review tomorrow.

@Test
public void testNullEncoding() throws IOException, URISyntaxException {
final File file = new File(temporaryFolder, "write.txt");
final String text = "Hello /u1234";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is /u1234 supposed to be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping. Why is the weird Unicode escape needed here? If it is needed, please add a comment, otherwise, we don't need it right?

public void testNullEncoding() throws IOException, URISyntaxException {
new ReversedLinesFileReader(new File(this.getClass().getResource("/test-file-empty.bin").toURI()),
(Charset) null);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mernst,

A better way to test would be to write to the temp file with Charset.defaultCharset() and still read with a null Charset.

From a black-box perspective, writing with a null Charset does not prove anything since you're relying on the fact that another API does the right thing with a null Charset. This would really match the test with expectations.

@garydgregory garydgregory changed the title Prevent NullPointerException in ReversedLinesFileReader constructor Prevent NullPointerException in ReversedLinesFileReader constructors Aug 29, 2020
@garydgregory
Copy link
Member

@mernst
I implemented this differently in git master now. You still get credited in changes.xml ;-) Please verify and close.
TY for your patience.

@mernst
Copy link
Contributor Author

mernst commented Aug 30, 2020

@garydgregory Thanks for the fix! I appreciate it. Credit is secondary.

@mernst mernst closed this Aug 30, 2020
@mernst mernst deleted the ReversedLinesFileReader-encoding branch January 5, 2021 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants