Post release fixes by aherbert · Pull Request #44 · apache/commons-csv

aherbert · 2019-06-13T21:21:32Z

Fixes for issues found during release review of version 1.7 RC1.

Build now passes:

mvn checkstyle:check pmd:check

It still fails:

mvn findbugs:check

The parser field introduced into CSVRecord for CSV-239 is not Serializable. This should be fixed separately.

coveralls · 2019-06-13T22:48:08Z

Coverage increased (+1.7%) to 92.543% when pulling 1539f48 on aherbert:post-release-fixes into 3718ec3 on apache:master.

coveralls · 2019-06-13T22:48:08Z

Coverage increased (+1.7%) to 92.993% when pulling e9e98f0 on aherbert:post-release-fixes into 42b9fdb on apache:master.

garydgregory

Thank you for the PR. Overall, I wonder if there is anything we should promote to commons-parent?

garydgregory · 2019-06-14T11:41:20Z

pom.xml

    <checkstyle.version>3.0.0</checkstyle.version>
    <checkstyle.header.file>${basedir}/LICENSE-header.txt</checkstyle.header.file>
-    <checkstyle.resourceExcludes>LICENSE.txt, NOTICE.txt</checkstyle.resourceExcludes>
+    <checkstyle.resourceExcludes>LICENSE.txt, NOTICE.txt, **/maven-archiver/pom.properties</checkstyle.resourceExcludes>


Should this also be in commons-parent?

AFAICS there is no checkstyle configuration in commons-parent, or PMD. I thought it was just up to projects to configure as appropriate. Using the maven plugin property checkstyle.resourceExcludes is not done on other commons projects I know. These use a <configuration> section and <resourceExcludes> tag. There is more than one way to skin a cat.

Are you suggesting to add profiles like those for other optional plugins such as jacoco? Each project is of a different age and coding style. So a standard checkstyle probably would not work as is and a transition period to a common style would be needed. A common style would certainly make browsing source from different projects easier. I'd say the discussion on this is outside the scope of this PR.

outside the scope of this PR

Agreed.

garydgregory · 2019-06-14T11:41:41Z

pom.xml

+            <ruleset>${basedir}/src/main/resources/pmd/pmd-ruleset.xml</ruleset>
+          </rulesets>
+        </configuration>
+      </plugin>


Should this also be in commons-parent?

garydgregory · 2019-06-14T11:43:36Z

src/main/java/org/apache/commons/csv/CSVParser.java

                                String.format(
-                                    "The header contains a duplicate name: \"%s\" in %s. If this is valid then use CSVFormat.withAllowDuplicateHeaderNames().",
+                                    "The header contains a duplicate name: \"%s\" in %s. " +
+                                    "If this is valid then use CSVFormat.withAllowDuplicateHeaderNames().",


I am not a fan of programatically building strings this when the ONLY purpose is line length. It can also give one the impression there is an EOL when there is none.

OK. This was just to fix checkstyle line length and allow checkstyle:check to complete. I can put in a checkstyle exclusion instead. But if targeted to a specific line it is a pain to maintain in-line with the code. Targeted to a method then the scope of checking is often too wide. I could move these two checks within the if (containsHeader) statement to methods:

allowDuplicateHeaderNameOrThrow(boolean emptyHeader) allowMissingColumnNameOrThrow(boolean emptyHeader)

The line length violation will be in a single short method that can then be allowed.

I've looked at this again and the logic is a bit convoluted:

The containsHeader flag is set when the header is already in the header map.
The `emptyHeader flag is set when the header is null or only whitespace.

Then there is a check on the containsHeader flag. If true then this has been seen before and some cases are tested.
An exception is thrown if the header is not empty and settings do not allow duplicates. This means that empty is an allowed duplicate.
An exception is thrown if the header is empty and settings do not allow missing column names. But this can only be thrown if the header has been seen before (is in the header map).

This means that a single missing column name would be allowed if the settings allowed duplicates but not missing column names. This does not seem right. If missing column names are not allowed then this should always be checked irrespective of what is in the header map. The unit tests only checks for two missing column names:

// This passes as the exception is thrown @Test(expected = IllegalArgumentException.class) public void testHeadersMissingException() throws Exception { final Reader in = new StringReader("a,,c,,d\n1,2,3,4\nx,y,z,zz"); CSVFormat.DEFAULT.withHeader().parse(in).iterator(); } // This fails to throw an exception. // Only 1 column name is missing and this is not recognised as a problem. @Test(expected = IllegalArgumentException.class) public void testHeadersMissing1ColumnException() throws Exception { final Reader in = new StringReader("a,,c,d\n1,2,3,4\nx,y,z,zz"); CSVFormat.DEFAULT.withHeader().parse(in).iterator(); }

This seems to be a bug so should be under a Jira ticket.

You also only add to the header map when the header is not null. But if you are allowing missing column names then a null is a valid header and your list of header names does not match the order of the header record. Thus getHeaderNames() will not return a list of the correct size (number of columns). So perhaps "" should be added to the list of header names. This is a change outside the scope of this PR, and since I am not familiar with what the header names are meant to achieve perhaps this is intended (i.e. only return a list of header names that are not null).

A final issue is that the header map exists to convert a column header into an index. If you are allowing duplicates then the mapping is one to many and the map may not function as expected when trying to obtain the column index using a column header. Perhaps this hole in functionality should be documented in the getHeaderMap javadoc, i.e. the map of column name to column index cannot represent a one to many relationship when duplicate column names are present. This could be solved with user beware documentation that is this case the map will contain the index of the last matching column with that header if duplicates have been allowed. Or more drastically by controlling the Map with a specialisation that does special behaviour when a request with a duplicate column name occurs: throws an exception; returns as array of all matching columns, etc. Again outside the scope of this PR.

garydgregory · 2019-06-14T11:44:46Z

src/test/java/org/apache/commons/csv/CSVPrinterTest.java


+    @Test
+    public void testPrintReaderWithoutQuoteToWriter() throws IOException {
+        // Test to target the use of IOUtils::copyLarge.


Should this be a method comment (Javadoc or plain)?

garydgregory · 2019-06-14T11:44:52Z

src/test/java/org/apache/commons/csv/CSVPrinterTest.java

+
+    @Test
+    public void testPrintReaderWithoutQuoteToAppendable() throws IOException {
+        // Test to target the use of IOUtils::copy.


Should this be a method comment (Javadoc or plain)?

Sure. None of the other methods had Javadoc and when I initially added it as a Javadoc it was more lines than the 3 lines inside the method. It looked odd. I'll change.

This requires adding the default ruleset and then modifying with suppressions.

Requires the CSVFormat to have no quote or escape character, and the formatted value to be a java.io.Reader.

garydgregory reviewed Jun 14, 2019

View reviewed changes

aherbert and others added 12 commits June 15, 2019 15:10

Fix checkstyle: remove tabs

dd0bb04

Fix checkstyle: Split long line

d73e344

Fix checkstyle: exclude pom.properties

aaef154

Update findbugs to allow deliberate fall-through

084333f

Fix pmd: Remove ternary operator returning false

74ce045

Fix pmd: Remove implicit final

4a9a91e

Fix pmd: Ignore TooManyStaticImports.

db610b3

This requires adding the default ruleset and then modifying with suppressions.

Add tests to cover use of the IOUtils class.

69a9b7d

Requires the CSVFormat to have no quote or escape character, and the formatted value to be a java.io.Reader.

Clean-up findbugs exclude filter.

53b8e39

Removed unused import

5472126

Updated test comments for print tests targeting IOUtils.

9108a3b

Fix checkstyle: Suppress line length warning in CSVParser.

e9e98f0

aherbert force-pushed the post-release-fixes branch from 6b31460 to e9e98f0 Compare June 15, 2019 14:11

garydgregory merged commit 03550ab into apache:master Jun 15, 2019

Conversation

aherbert commented Jun 13, 2019

Uh oh!

coveralls commented Jun 13, 2019

Uh oh!

coveralls commented Jun 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garydgregory left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coveralls commented Jun 13, 2019 •

edited

Loading