You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use case
Allow users to specify a table to upload logs which fail to parse during uploads using the Template Format.
Describe the solution you'd like
My first suggested, and naive solution, is to store all characters that have been successfully parsed so far into a string, and then to upload that string, as usual, to the table being uploaded to. If parsing is unsuccessful, then simply store, in a string, all characters until format_template_rows_between_delimiter, and upload this string to a BACKUP table specified by the user in a new setting (which could be for example: invalid_logs_table='logs.unstructured_log_rows').
I assume the failure is caused by the implementation not keeping any data in memory while parsing, which in turn I assume is done for efficiency. If this is the case, then my second suggested solution is that clickhouse proceeds until format_template_rows_between_delimiter and at least returns the line numbers that caused the issue, while still uploading the logs that can successfully be parsed. Keeping track of line number is negligible relative to keeping track of a line.
Ideally I would like to be able to run a command similar to this:
Describe alternatives you've considered
The alternative is to preprocess a log to ensure all logs are structured correctly before sending them to clickhouse. The solution I'm suggesting is preferred because it does not increase upload time and it will allow clickhouse users to store logs and logs that need more processing in the same place.
The text was updated successfully, but these errors were encountered:
Use case
Allow users to specify a table to upload logs which fail to parse during uploads using the Template Format.
Describe the solution you'd like
My first suggested, and naive solution, is to store all characters that have been successfully parsed so far into a string, and then to upload that string, as usual, to the table being uploaded to. If parsing is unsuccessful, then simply store, in a string, all characters until
format_template_rows_between_delimiter
, and upload this string to a BACKUP table specified by the user in a new setting (which could be for example:invalid_logs_table='logs.unstructured_log_rows'
).I assume the failure is caused by the implementation not keeping any data in memory while parsing, which in turn I assume is done for efficiency. If this is the case, then my second suggested solution is that clickhouse proceeds until
format_template_rows_between_delimiter
and at least returns the line numbers that caused the issue, while still uploading the logs that can successfully be parsed. Keeping track of line number is negligible relative to keeping track of a line.Ideally I would like to be able to run a command similar to this:
Describe alternatives you've considered
The alternative is to preprocess a log to ensure all logs are structured correctly before sending them to clickhouse. The solution I'm suggesting is preferred because it does not increase upload time and it will allow clickhouse users to store logs and logs that need more processing in the same place.
The text was updated successfully, but these errors were encountered: