-
Notifications
You must be signed in to change notification settings - Fork 111
TAJO-1374 Support multi-bytes delimiter for CSV file #400
Conversation
Hi @navis, In these days, TravisCI has been too slow. There have been many pending pull requests for this reason. So, it would be better if you submit the patch to the jira and mark the jira as PATCH AVAILABLE. Please see this mailing thread too. |
@navis Good to see you |
@jinossy If it will be not deprecated in near future, can we still get some usefulness in this? Fixed test fails, anyway. |
The patch looks nice to me. Of course, your work is useful. It would be great if this feature is applied to DelimitedTextFile too because DelimitedTextFile is new replacement to CSVFile. DelimitedTextFile's performance is really great. According to some benchmark result, it can parse more than 500MB CSV data sets per second. It also can boost up query response times in many cases, especially I/O intensive workloads. I think that the work for DelimitedTextFile does not need to be done in this issue. We can do in another jira. Anyway, could you fix some test failure? It still has one test failure. |
@@ -134,7 +134,7 @@ else if (textBytes.length <= fieldId) { | |||
} | |||
textBytes[fieldId] = null; | |||
} else { | |||
//non-projection | |||
values[fieldId] = NullDatum.get(); //non-projection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code fills the NullDatum to values array of LazyTuple when user does not insert any data on this index. However, VTuple class does not initialize the values array on itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted this part. I've hardly noticed tajo expects null for not-picked and byte[0] for not-existing picked.
@navis Thanks |
Fixed test fails |
+1 LGTM I'll commit it shortly. |
Simple patch