-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-2769] CSV Adapter does not handle - Empty and malformed csv lines, space before or after comma #1032
base: main
Are you sure you want to change the base?
Conversation
CSV Adapter does not handle - Empty and malformed csv lines, space before or after comma
30,"Accounts" | ||
|
||
31,"My Dept" | ||
32, "My Another Depts" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this line should be accepted; the space seems to belong to a field which is the quoted - it becomes cloudy for me that this field is quoted or not
try { | ||
current = rowConverter.convertRow(strings); | ||
} catch (Exception e) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should skip records if the conversion fails; I think every row should contain a full record.
A more descriptive error message and some info about where is the line in the file which is incorrect would probably help the end user to correct the mistake.
https://tools.ietf.org/html/rfc4180
@@ -237,6 +241,9 @@ public void close() { | |||
abstract E convertRow(String[] rows); | |||
|
|||
protected Object convert(CsvFieldType fieldType, String string) { | |||
if (string != null) { | |||
string = string.trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would probably mean that a space value is not anymore considered a value - it will be the same as an empty string
I think rfc4180 permits multiple spaces without quotes
Proposed solution looks very specific. What people think about the following alternative:
default Stream<String> lines() {
// TODO add close() logic
return new BufferedReader(reader()).lines();
}
public CsvSchema(Stream<Source> sources, CsvTable.Flavor flavor) {
} If user wants to modify / filter lines based on particular logic they can wrap existing source and override return lines().map(String::trim()).filter(line -> !line.isEmpty()); It also has the advantage of accepting generic sources of data like HDFS. |
80f411d
to
ca27fe9
Compare
49cb002
to
8768a23
Compare
8a5cf83
to
cf7f71b
Compare
CSV Adapter does not handle - Empty and malformed csv lines, space before or after comma
File: org.apache.calcite.adapter.csv.CsvEnumerator.java in example folder did not handle CSV with space between delimiter i.e."," and also did not handle empty lines or malformed lines in the middle of the file or to the end. The sqlline crashes is there is an empty line with java.lang.ArrayIndexOutOfBoundsException.
Also test cases fail. Refer jira: https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-2769