New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-19868][csv] Csv Serialization schema contains line delimiter #13925
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 8e91a4e (Wed Nov 04 08:56:59 UTC 2020) ✅no warnings Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
@wuchong @JingsongLi , could you please help to review this pull request if you are available? |
...formats/flink-csv/src/main/java/org/apache/flink/formats/csv/CsvFileSystemFormatFactory.java
Show resolved
Hide resolved
flink-formats/flink-csv/src/main/java/org/apache/flink/formats/csv/CsvRowFormatFactory.java
Outdated
Show resolved
Hide resolved
...-formats/flink-csv/src/main/java/org/apache/flink/formats/csv/CsvRowSerializationSchema.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT.
Waiting for the build to pass.
The failed e2e test seems not related to this PR. Will merge it. |
What is the purpose of the change
CsvRowSerializationSchema
andCsvRowDataSerializationSchema
serializeRow
andRowData
to byte array end with\n
at default. For the row-wise formats, it should consume and produce the bytes for a row.csv.line-delimiter
option is not a good option for csv format and shoud be removed forCsvRowSerializationSchema
andCsvRowDataSerializationSchema
to not append a new line.Brief change log
CsvSchema
inCsvRowSerializationSchema
andCsvRowDataSerializationSchema
configures the line separator to empty string.line-delimiter
option inCsvFormatFactory
,CsvRowFormatFactory
andCsvFileSystemFormatFactory
is removed.CsvFileSystemFormatFactory
encodesRowData
record end with line delimiter\n
to write.Verifying this change
CsvRowDataSerDeSchemaTest
andCsvRowDeSerializationSchemaTest
to verify whether the byte array whichCsvRowSerializationSchema
andCsvRowDataSerializationSchema
serializeRow
andRowData
to includes line delimiter\n
.line-delimiter
option of test cases inCsvFormatFactoryTest
andCsvRowFormatFactoryTest
to verify whether theline-delimiter
option could be removed normally.Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation