New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-11991] Set headers to use for CSV output #8030
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jniocheCF,
thanks for your pull request and sorry that it took almost a month until somebody had a look at it.
I've added a few comments.
We would also need to add tests to CsvOutputFormatTest
to check that it works correctly.
Thank you,
Fabian
@@ -161,6 +173,29 @@ public void open(int taskNumber, int numTasks) throws IOException { | |||
super.open(taskNumber, numTasks); | |||
this.wrt = this.charsetName == null ? new OutputStreamWriter(new BufferedOutputStream(this.stream, 4096)) : | |||
new OutputStreamWriter(new BufferedOutputStream(this.stream, 4096), this.charsetName); | |||
// print headers | |||
if (this.headers != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the code to print the headers to a method?
String v = headers[i]; | ||
if (v != null) { | ||
if (i != 0) { | ||
this.wrt.write(this.fieldDelimiter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Construct the header row in a StringBuilder
and print it all at once?
* @see CsvOutputFormat | ||
* @see DataSet#writeAsText(String) Output files and directories | ||
*/ | ||
public DataSink<T> writeAsCsv(String filePath, String rowDelimiter, String fieldDelimiter, WriteMode writeMode, String[] headers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a balance between having a good amount of shortcuts and a bloated API.
We do not want to add too specific methods to the DataSet
class.
I think we should keep the DataSet
class as is it. If somebody wants to add header rows to their CSV files, they can manually create a CsvOutputFormat
, configure it correspondingly, and call DataSet.write()
.
I'm closing this as "Abandoned", since there is no more activity and the code base has moved on quite a bit. Please re-open this if you feel otherwise and work should continue. |
What is the purpose of the change
(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)
This PR adds a new functionality allowing to specify headers for CSV files in output.
Brief change log
Added a method to the Dataset class and modified the CSVOutputFormat.
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation