New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix allow cr end of line for csv #56901
Fix allow cr end of line for csv #56901
Conversation
cc @Avogar |
A test is needed. |
We need a test for parallel parsing and count optimization. You can create large enougn file (maybe with ~1000000 rows) and run queries:
|
This is an automated comment for commit 619b2f2 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
The test is added . cc @Avogar |
const DB::ContextPtr context = DB::CurrentThread::get().getGlobalContext(); | ||
auto format_settings = getFormatSettings(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use DB::CurrentThread::get().getGlobalContext()
. We should propogate query context to this place. And it will require some changes in FormatFactory.
We can change file_segmentation_engine
here:
ClickHouse/src/Formats/FormatFactory.h
Line 135 in e71ca7d
FileSegmentationEngine file_segmentation_engine; |
To
file_segmentation_engine_creator
with type FileSegmentationEngineCreator = std::function<FileSegmentationEngine(const FormatSettings & settings)>
And add new method to FormatFactory:
registerFileSegmentationEngineCreator(const String & name, FileSegmentationEngineCreator file_segmentation_engine_creator);
And we will need to change registerFileSegmentationEngine
implementation so it uses file_segmentation_engine_creator
.
After doing it, we can use registerFileSegmentationEngineCreator
in CSV format, so we will be able to use FormatSettings
to propogate our setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hello, I have modify the code according to you advices, but the test still can not pass, please help to review my code if the implemetion is right ? @Avogar
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
According to #54340, while allow cr at end of line for csv, some code also need to changes for it.
Documentation entry for user-facing changes