Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

TAJO-1374 Support multi-bytes delimiter for CSV file #400

Closed
wants to merge 1 commit into from

Conversation

navis
Copy link
Contributor

@navis navis commented Mar 6, 2015

Simple patch

@hyunsik
Copy link
Member

hyunsik commented Mar 6, 2015

Hi @navis,
Thank you for your contribution. There may be mistake in the title. Could you rename the title?

In these days, TravisCI has been too slow. There have been many pending pull requests for this reason. So, it would be better if you submit the patch to the jira and mark the jira as PATCH AVAILABLE. Please see this mailing thread too.

http://markmail.org/message/itfwyhcoppqmaouu

@jinossy
Copy link
Member

jinossy commented Mar 6, 2015

@navis Good to see you
The CSVFile will be deprecated. please refer to DelimitedTextFile

@navis
Copy link
Contributor Author

navis commented Mar 9, 2015

@jinossy If it will be not deprecated in near future, can we still get some usefulness in this? Fixed test fails, anyway.

@navis navis changed the title HIVE-1374 Support multi-bytes delimiter for CSV file TAJO-1374 Support multi-bytes delimiter for CSV file Mar 9, 2015
@hyunsik
Copy link
Member

hyunsik commented Mar 10, 2015

The patch looks nice to me. Of course, your work is useful.

It would be great if this feature is applied to DelimitedTextFile too because DelimitedTextFile is new replacement to CSVFile. DelimitedTextFile's performance is really great. According to some benchmark result, it can parse more than 500MB CSV data sets per second. It also can boost up query response times in many cases, especially I/O intensive workloads.

I think that the work for DelimitedTextFile does not need to be done in this issue. We can do in another jira.

Anyway, could you fix some test failure? It still has one test failure.

@@ -134,7 +134,7 @@ else if (textBytes.length <= fieldId) {
}
textBytes[fieldId] = null;
} else {
//non-projection
values[fieldId] = NullDatum.get(); //non-projection
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code fills the NullDatum to values array of LazyTuple when user does not insert any data on this index. However, VTuple class does not initialize the values array on itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted this part. I've hardly noticed tajo expects null for not-picked and byte[0] for not-existing picked.

@jinossy
Copy link
Member

jinossy commented Mar 10, 2015

@navis
OK, I've create new jira issue TAJO-1381.
If you fix test failure, we can start review.

Thanks

@navis
Copy link
Contributor Author

navis commented Mar 10, 2015

Fixed test fails

@jinossy
Copy link
Member

jinossy commented Mar 13, 2015

+1 LGTM I'll commit it shortly.
Thank you for your contribution

@asfgit asfgit closed this in 7f05695 Mar 13, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants