You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following CSV file, with "\r" style line endings...
colA,colB,colC
a,A,"x"
b,B,k
...should parse as [[colA, colB, colC], [a, A, x], [b, B, k]]. However, when lineSeparatorDetectionEnabled=true and normalizeLineEndingsWithinQuotes=false, I instead get [[colA, colB, colC], [a, A, "x"\rb, B, k]].
Here is a complete test case, which fails with Univocity 2.9.1 on Windows 11 and Java 17:
import com.univocity.parsers.csv.CsvFormat;
import com.univocity.parsers.csv.CsvParser;
import com.univocity.parsers.csv.CsvParserSettings;
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import org.junit.Assert;
import org.junit.Test;
public class UnivocityLineEndingBugTest {
private static final boolean TRIGGER_BUG = true;
private static CsvParserSettings createUnivocitySettings() {
final CsvParserSettings settings = new CsvParserSettings();
final CsvFormat format = settings.getFormat();
settings.setDelimiterDetectionEnabled(false);
format.setDelimiter(',');
settings.setQuoteDetectionEnabled(false);
format.setQuote('\"');
format.setQuoteEscape('\"');
settings.setKeepEscapeSequences(false);
settings.setKeepQuotes(false);
// Setting this to true will also cause the bug to go away.
settings.setNormalizeLineEndingsWithinQuotes(false);
//format.setNormalizedNewline('\n');
if (TRIGGER_BUG) {
settings.setLineSeparatorDetectionEnabled(true);
} else {
settings.setLineSeparatorDetectionEnabled(false);
format.setLineSeparator("\r");
}
return settings;
}
@Test
public void testBug() throws IOException {
String csvFile =
"colA,colB,colC\r" +
"a,A,\"x\"\r" +
"b,B,k\r";
CsvParserSettings settings = createUnivocitySettings();
List<List<String>> result = new ArrayList<>();
try (Reader reader = new StringReader(csvFile)) {
CsvParser parser = new CsvParser(settings);
parser.beginParsing(reader);
while (true) {
String row[] = parser.parseNext();
if (row == null)
break;
// System.out.println(Arrays.toString(row));
result.add(new ArrayList<>(Arrays.asList(row)));
}
}
System.out.println(result.toString());
Assert.assertEquals("[[colA, colB, colC], [a, A, x], [b, B, k]]", result.toString());
}
}
Thank you for your work on the excellent Univocity library! I am using it for Ultorg and am in the process of writing unit tests, which is how I found the bug above...
The text was updated successfully, but these errors were encountered:
Also note that the Javadoc and parameter name for CharInputReader.enableNormalizeLineEndings(escaping) seems to reverse the actual behavior of the method as assumed by callers and implemented in AbstractCharInputReader. In fact, in the latter overridden method, the parameter has been renamed to normalizeLineEndings, which seems like a more correct name.
The following CSV file, with "\r" style line endings...
...should parse as [[colA, colB, colC], [a, A, x], [b, B, k]]. However, when lineSeparatorDetectionEnabled=true and normalizeLineEndingsWithinQuotes=false, I instead get [[colA, colB, colC], [a, A, "x"\rb, B, k]].
Here is a complete test case, which fails with Univocity 2.9.1 on Windows 11 and Java 17:
Thank you for your work on the excellent Univocity library! I am using it for Ultorg and am in the process of writing unit tests, which is how I found the bug above...
The text was updated successfully, but these errors were encountered: