MDEV-39412: parse error reading tabs in ranges#5049
MDEV-39412: parse error reading tabs in ranges#5049bsrikanth-mariadb wants to merge 1 commit intobb-12.3-MDEV-39368-test-replayfrom
Conversation
Note:
while reading from information_schema.optimizer_context one level of unescaping
is already done i.e. (\\t becomes \t or \\\\t becomes \\t)
w.r.t the MDEV, there are 2 problems: -
1.
When reading from the sql script file, json parser is not able to parse
the range value in json_read_value() from json_lib.c
"ranges": [
"(b\t\t\t\t\t\t) <= (b) <= (b???????)"
],
mainly the \t\t stuff, and hence a warning.
It also stops loading the context into memory.
Since, a new table is created with empty data, and without context,
we get Impossible WHERE noticed after reading const tables
2.
There is unescaping call being made in read_string() from sql_json_lib.cc
while parsing of the context. With this \\t was becoming \t.
However, print_range() from opt_range.cc already does escaping of the values.
The value "b\t\t\t" was in fact produced as "\b\\t\\t\\t".
Later, we try to compare range values from the query and the context.
Here a mismatch a found because, in one case there was escaping,
and in the other case escaping got removed.
Solutions
=========
For Problem 1. have escaping for ranges.
This should be done while dumping range values into the context.
For Problem 2. Remove unscaping call in read_string().
There was a problem hiding this comment.
Code Review
This pull request addresses MDEV-39412 by implementing JSON escaping for range information in optimizer context dumps and adding relevant regression tests. Feedback highlights that removing the unescaping logic from the JSON reader breaks standard behavior and should be reverted. Additionally, an optimization was suggested to move a buffer allocation outside a loop in the context dumping logic to improve efficiency.
| } | ||
|
|
||
| value= strdup_root(mem_root, val_buf.c_ptr_safe()); | ||
| value= strmake_root(mem_root, (const char *) je->value, je->value_len); |
There was a problem hiding this comment.
Removing the unescaping logic from read_string breaks the standard behavior of a JSON parser. JSON strings are expected to be unescaped to correctly handle sequences like \n, \t, \", and \\.
The issue described in the PR (where \\t becomes \t but should remain \\t) is actually addressed by the double-escaping introduced in dump_mrr_info_calls. With double-escaping, the JSON will contain \\\\t, which after SQL unescaping becomes \\t, and then the JSON reader's unescaping will turn it into \t (the literal backslash and 't').
By removing unescaping here, you are making the JSON reader return the raw, escaped content of the JSON string, which is non-standard and will break other potential users of this utility function. Please restore the unescaping logic.
StringBuffer<128> val_buf;
if (json_unescape_to_string((const char *) je->value, je->value_len,
&val_buf))
{
err_buf->append(STRING_WITH_LEN("un-escaping error of "));
err_buf->append(read_elem_key, strlen(read_elem_key));
err_buf->append(STRING_WITH_LEN(" element"));
return true;
}
value= strdup_root(mem_root, val_buf.c_ptr_safe());| List_iterator rc_li(irc->range_list); | ||
| Json_writer_array ranges_wrapper(ctx_writer, "ranges"); | ||
| while (const char *range_str= rc_li++) | ||
| { | ||
| Json_writer_array ranges_wrapper(ctx_writer, "ranges"); | ||
| List_iterator rc_li(irc->range_list); | ||
| while (const char *range_str= rc_li++) | ||
| ranges_wrapper.add(range_str, strlen(range_str)); | ||
| const String range_info(range_str, strlen(range_str), | ||
| system_charset_info); | ||
| StringBuffer<128> escaped_range_info; | ||
| json_escape_to_string(&range_info, &escaped_range_info); | ||
| ranges_wrapper.add(escaped_range_info.c_ptr_safe(), | ||
| escaped_range_info.length()); | ||
| } | ||
| ranges_wrapper.end(); |
There was a problem hiding this comment.
The current implementation re-allocates the StringBuffer and its internal storage on every iteration of the loop. It is more efficient to move the buffer declaration outside the loop and reuse it by clearing its length in each iteration. Additionally, using a scoped block for the Json_writer_array is a cleaner way to ensure the array is closed correctly via its destructor, rather than calling end() explicitly.
{
Json_writer_array ranges_wrapper(ctx_writer, "ranges");
List_iterator rc_li(irc->range_list);
StringBuffer<128> escaped_range_info;
while (const char *range_str= rc_li++)
{
const String range_info(range_str, strlen(range_str),
system_charset_info);
escaped_range_info.length(0);
json_escape_to_string(&range_info, &escaped_range_info);
ranges_wrapper.add(escaped_range_info.c_ptr_safe(),
escaped_range_info.length());
}
}| rc= json_read_object(&je, array, &err_buf) || | ||
| strcmp(parsed_esc_str, esc_str_val); |
There was a problem hiding this comment.
This test change validates that the reader does NOT unescape, which is incorrect for a JSON library. For example, a JSON string "a\\bc" should be unescaped to a\bc (where \b is a backspace character if following standard JSON rules, or literal backslash and 'b' if escaped as \\b). Once read_string is restored to its correct behavior, this test should be updated to verify that unescaping is handled correctly.
Note:
while reading from information_schema.optimizer_context one level of unescaping is already done i.e. (\t becomes \t or \\t becomes \t)
w.r.t the MDEV, there are 2 problems: -
When reading from the sql script file, json parser is not able to parse the range value in json_read_value() from json_lib.c "ranges": [
"(b\t\t\t\t\t\t) <= (b) <= (b???????)"
],
mainly the \t\t stuff, and hence a warning.
It also stops loading the context into memory.
Since, a new table is created with empty data, and without context, we get Impossible WHERE noticed after reading const tables
There is unescaping call being made in read_string() from sql_json_lib.cc while parsing of the context. With this \t was becoming \t. However, print_range() from opt_range.cc already does escaping of the values. The value "b\t\t\t" was in fact produced as "\b\t\t\t". Later, we try to compare range values from the query and the context.
Here a mismatch a found because, in one case there was escaping, and in the other case escaping got removed.
Solutions
For Problem 1. have escaping for ranges.
This should be done while dumping range values into the context.
For Problem 2. Remove unscaping call in read_string().