Skip to content

fix(query): Copy into CSV file support both CRLF and LF delimiter #18250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 26, 2025

Conversation

b41sh
Copy link
Member

@b41sh b41sh commented Jun 25, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Description:

When importing CSV files using COPY INTO, Databend currently defaults to using Linux-style line feeds (LF) as the record delimiter. This can cause issues when users import CSV files generated on Windows, which typically use carriage return line feeds (CRLF) as the delimiter.

The problem is that when a CRLF-delimited file is imported, the extra carriage return character (\r) is treated as part of the data, often resulting in an unwanted space character at the end of each field. This can lead to unexpected behavior and prevent users from finding matching data during queries.

This PR addresses this issue by:

  • Modifying the COPY INTO command to automatically support both CRLF and LF as valid record delimiters when importing CSV files.
  • This change ensures compatibility with CSV files generated on both Windows and Linux/macOS systems, providing a seamless data import experience for users regardless of the source of their data.
  • fixes: #[Link the issue here]

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@b41sh b41sh requested a review from youngsofun June 25, 2025 12:12
@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Jun 25, 2025
@b41sh b41sh requested a review from sundy-li June 25, 2025 12:12
@youngsofun
Copy link
Member

@b41sh

this is good when loading, but I am afraid unload will be affected too: user specified \n, but result in file with \r\n

@b41sh
Copy link
Member Author

b41sh commented Jun 25, 2025

@b41sh

this is good when loading, but I am afraid unload will be affected too: user specified \n, but result in file with \r\n

@youngsofun RecordDelimiter is only used for CsvReader, unload is not affected.

Copy link
Member

@sundy-li sundy-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

others LGTM

@sundy-li sundy-li merged commit 916e2b4 into databendlabs:main Jun 26, 2025
163 of 166 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix this PR patches a bug in codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants