Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-19868][csv] Csv Serialization schema contains line delimiter #13925

Merged
merged 2 commits into from Nov 6, 2020
Merged

[FLINK-19868][csv] Csv Serialization schema contains line delimiter #13925

merged 2 commits into from Nov 6, 2020

Conversation

SteNicholas
Copy link
Member

What is the purpose of the change

CsvRowSerializationSchema and CsvRowDataSerializationSchema serialize Row and RowData to byte array end with \n at default. For the row-wise formats, it should consume and produce the bytes for a row. csv.line-delimiter option is not a good option for csv format and shoud be removed for CsvRowSerializationSchema and CsvRowDataSerializationSchema to not append a new line.

Brief change log

  • CsvSchema in CsvRowSerializationSchema and CsvRowDataSerializationSchema configures the line separator to empty string.
  • line-delimiter option in CsvFormatFactory, CsvRowFormatFactory and CsvFileSystemFormatFactory is removed.
  • *CsvFileSystemFormatFactory encodes RowData record end with line delimiter \n to write.

Verifying this change

  • Remove the line delimiter config of test cases in CsvRowDataSerDeSchemaTest and CsvRowDeSerializationSchemaTest to verify whether the byte array which CsvRowSerializationSchema and CsvRowDataSerializationSchema serialize Row and RowData to includes line delimiter \n.
  • Remove theline-delimiter option of test cases in CsvFormatFactoryTest and CsvRowFormatFactoryTest to verify whether the line-delimiter option could be removed normally.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Nov 4, 2020

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 8e91a4e (Wed Nov 04 08:56:59 UTC 2020)

✅no warnings

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Nov 4, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@SteNicholas
Copy link
Member Author

SteNicholas commented Nov 4, 2020

@wuchong @JingsongLi , could you please help to review this pull request if you are available?

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT.

Waiting for the build to pass.

@wuchong
Copy link
Member

wuchong commented Nov 6, 2020

The failed e2e test seems not related to this PR.

Will merge it.

@wuchong wuchong merged commit 2d12d32 into apache:master Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants