Skip to content

Fix the CSV serialization/deserialization bug for Time#87622

Merged
yariks5s merged 8 commits intomasterfrom
yarik/fix-time-csv-parsing-crash
Oct 1, 2025
Merged

Fix the CSV serialization/deserialization bug for Time#87622
yariks5s merged 8 commits intomasterfrom
yarik/fix-time-csv-parsing-crash

Conversation

@yariks5s
Copy link
Copy Markdown
Member

Closes #82407

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a bug when incorrect type order during CSV deserialization led to the LOGICAL_ERROR

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Sep 25, 2025

Workflow [PR], commit [2ffa20f]

Summary:

job_name test_name status info comment
Integration tests (arm_binary, distributed plan, 2/4) failure
test_trace_log_memory_context/test.py::test_memory_context_in_trace_log FAIL
Integration tests (arm_binary, distributed plan, 3/4) failure
test_storage_rabbitmq/test.py::test_rabbitmq_tsv_with_delimiter FAIL
test_storage_rabbitmq/test.py::test_rabbitmq_macros FAIL
test_keeper_memory_soft_limit/test.py::test_soft_limit_create FAIL

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Sep 25, 2025
@Avogar Avogar self-assigned this Sep 25, 2025
{
if (!tryReadTimeText(x, istr) || !checkChar('\'', istr))
{
istr.position() = const_cast<char *>(begin); // rollback on failure
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect, as the buffer could have changed.

}
time_t x = 0;
ReadBufferFromString buf(time_str);
// Parse inside an isolated buffer, and fail gracefully if it doesn't fit
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this change?

@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Sep 26, 2025

@yariks5s, good news, the error was not anywhere around your code for the Time data type, and these changes are unneeded.

The error was introduced in this PR: #51716, @Avogar
It was impossible to implement that feature correctly, and it does not make sense in general.

I recommend @Avogar deprecating that old mis-feature.

@yariks5s
Copy link
Copy Markdown
Member Author

I think we can close this PR now as it will not fix the problem completely

@yariks5s yariks5s closed this Sep 26, 2025
@yariks5s yariks5s deleted the yarik/fix-time-csv-parsing-crash branch September 26, 2025 11:04
@Avogar
Copy link
Copy Markdown
Member

Avogar commented Sep 26, 2025

The problem not in the #51716, the problem is here:

throwUnexpectedDataAfterParsedValue(column, istr, settings, "Time");

Function throwUnexpectedDataAfterParsedValue calls column->popBack(1), but here in SerializationTime::deserializeTextCSV we don't add value to the column at the time we call this function. So we delete previous row and end up with 1 row less in the chunk.

Also I noticed that input_format_allow_errors_num doesn't work in this case because UNEXPECTED_DATA_AFTER_PARSED_VALUE is not listed as parsing error here:

bool isParseError(int code)
{
return code == ErrorCodes::CANNOT_PARSE_INPUT_ASSERTION_FAILED
|| code == ErrorCodes::CANNOT_PARSE_QUOTED_STRING
|| code == ErrorCodes::CANNOT_PARSE_DATE
|| code == ErrorCodes::CANNOT_PARSE_DATETIME
|| code == ErrorCodes::CANNOT_PARSE_NUMBER
|| code == ErrorCodes::CANNOT_PARSE_UUID
|| code == ErrorCodes::CANNOT_PARSE_BOOL
|| code == ErrorCodes::CANNOT_READ_ARRAY_FROM_TEXT
|| code == ErrorCodes::CANNOT_READ_MAP_FROM_TEXT
|| code == ErrorCodes::CANNOT_READ_ALL_DATA
|| code == ErrorCodes::TOO_LARGE_STRING_SIZE
|| code == ErrorCodes::ARGUMENT_OUT_OF_BOUND /// For Decimals
|| code == ErrorCodes::INCORRECT_DATA /// For some ReadHelpers
|| code == ErrorCodes::CANNOT_PARSE_DOMAIN_VALUE_FROM_STRING
|| code == ErrorCodes::CANNOT_PARSE_IPV4
|| code == ErrorCodes::CANNOT_PARSE_IPV6
|| code == ErrorCodes::UNKNOWN_ELEMENT_OF_ENUM
|| code == ErrorCodes::CANNOT_PARSE_ESCAPE_SEQUENCE;
}

@yariks5s yariks5s restored the yarik/fix-time-csv-parsing-crash branch September 26, 2025 12:24
@yariks5s yariks5s reopened this Sep 26, 2025
Co-authored-by: Pavel Kruglov <48961922+Avogar@users.noreply.github.com>
@Avogar Avogar dismissed alexey-milovidov’s stale review September 30, 2025 14:35

The fix now is correct.

@yariks5s
Copy link
Copy Markdown
Member Author

yariks5s commented Oct 1, 2025

@yariks5s yariks5s added this pull request to the merge queue Oct 1, 2025
Merged via the queue into master with commit d6813be Oct 1, 2025
120 of 123 checks passed
@yariks5s yariks5s deleted the yarik/fix-time-csv-parsing-crash branch October 1, 2025 18:39
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical Error: Invalid number of rows in Chunk

4 participants