MDEV-39412: parse error reading tabs in ranges by bsrikanth-mariadb · Pull Request #5049 · MariaDB/server

bsrikanth-mariadb · 2026-05-07T05:26:31Z

Note:
while reading from information_schema.optimizer_context one level of unescaping is already done i.e. (\t becomes \t or \\t becomes \t)

w.r.t the MDEV, there are 2 problems: -

When reading from the sql script file, json parser is not able to parse the range value in json_read_value() from json_lib.c "ranges": [
"(b\t\t\t\t\t\t) <= (b) <= (b???????)"
],
mainly the \t\t stuff, and hence a warning.
It also stops loading the context into memory.
Since, a new table is created with empty data, and without context, we get Impossible WHERE noticed after reading const tables

There is unescaping call being made in read_string() from sql_json_lib.cc while parsing of the context. With this \t was becoming \t. However, print_range() from opt_range.cc already does escaping of the values. The value "b\t\t\t" was in fact produced as "\b\t\t\t". Later, we try to compare range values from the query and the context.

Here a mismatch a found because, in one case there was escaping, and in the other case escaping got removed.

Solutions

For Problem 1. have escaping for ranges.
This should be done while dumping range values into the context.

For Problem 2. Remove unscaping call in read_string().

Note: while reading from information_schema.optimizer_context one level of unescaping is already done i.e. (\\t becomes \t or \\\\t becomes \\t) w.r.t the MDEV, there are 2 problems: - 1. When reading from the sql script file, json parser is not able to parse the range value in json_read_value() from json_lib.c "ranges": [ "(b\t\t\t\t\t\t) <= (b) <= (b???????)" ], mainly the \t\t stuff, and hence a warning. It also stops loading the context into memory. Since, a new table is created with empty data, and without context, we get Impossible WHERE noticed after reading const tables 2. There is unescaping call being made in read_string() from sql_json_lib.cc while parsing of the context. With this \\t was becoming \t. However, print_range() from opt_range.cc already does escaping of the values. The value "b\t\t\t" was in fact produced as "\b\\t\\t\\t". Later, we try to compare range values from the query and the context. Here a mismatch a found because, in one case there was escaping, and in the other case escaping got removed. Solutions ========= For Problem 1. have escaping for ranges. This should be done while dumping range values into the context. For Problem 2. Remove unscaping call in read_string().

gemini-code-assist

Code Review

This pull request addresses MDEV-39412 by implementing JSON escaping for range information in optimizer context dumps and adding relevant regression tests. Feedback highlights that removing the unescaping logic from the JSON reader breaks standard behavior and should be reverted. Additionally, an optimization was suggested to move a buffer allocation outside a loop in the context dumping logic to improve efficiency.

gemini-code-assist · 2026-05-07T05:31:41Z

-  }
-
-  value= strdup_root(mem_root, val_buf.c_ptr_safe());
+  value= strmake_root(mem_root, (const char *) je->value, je->value_len);


Removing the unescaping logic from read_string breaks the standard behavior of a JSON parser. JSON strings are expected to be unescaped to correctly handle sequences like \n, \t, \", and \\.

The issue described in the PR (where \\t becomes \t but should remain \\t) is actually addressed by the double-escaping introduced in dump_mrr_info_calls. With double-escaping, the JSON will contain \\\\t, which after SQL unescaping becomes \\t, and then the JSON reader's unescaping will turn it into \t (the literal backslash and 't').

By removing unescaping here, you are making the JSON reader return the raw, escaped content of the JSON string, which is non-standard and will break other potential users of this utility function. Please restore the unescaping logic.

StringBuffer<128> val_buf; if (json_unescape_to_string((const char *) je->value, je->value_len, &val_buf)) { err_buf->append(STRING_WITH_LEN("un-escaping error of ")); err_buf->append(read_elem_key, strlen(read_elem_key)); err_buf->append(STRING_WITH_LEN(" element")); return true; } value= strdup_root(mem_root, val_buf.c_ptr_safe());

gemini-code-assist · 2026-05-07T05:31:41Z

+    List_iterator rc_li(irc->range_list);
+    Json_writer_array ranges_wrapper(ctx_writer, "ranges");
+    while (const char *range_str= rc_li++)
    {
-      Json_writer_array ranges_wrapper(ctx_writer, "ranges");
-      List_iterator rc_li(irc->range_list);
-      while (const char *range_str= rc_li++)
-        ranges_wrapper.add(range_str, strlen(range_str));
+      const String range_info(range_str, strlen(range_str),
+                              system_charset_info);
+      StringBuffer<128> escaped_range_info;
+      json_escape_to_string(&range_info, &escaped_range_info);
+      ranges_wrapper.add(escaped_range_info.c_ptr_safe(),
+                         escaped_range_info.length());
    }
+    ranges_wrapper.end();


The current implementation re-allocates the StringBuffer and its internal storage on every iteration of the loop. It is more efficient to move the buffer declaration outside the loop and reuse it by clearing its length in each iteration. Additionally, using a scoped block for the Json_writer_array is a cleaner way to ensure the array is closed correctly via its destructor, rather than calling end() explicitly.

{ Json_writer_array ranges_wrapper(ctx_writer, "ranges"); List_iterator rc_li(irc->range_list); StringBuffer<128> escaped_range_info; while (const char *range_str= rc_li++) { const String range_info(range_str, strlen(range_str), system_charset_info); escaped_range_info.length(0); json_escape_to_string(&range_info, &escaped_range_info); ranges_wrapper.add(escaped_range_info.c_ptr_safe(), escaped_range_info.length()); } }

gemini-code-assist · 2026-05-07T05:31:42Z

+  rc= json_read_object(&je, array, &err_buf) ||
+      strcmp(parsed_esc_str, esc_str_val);


This test change validates that the reader does NOT unescape, which is incorrect for a JSON library. For example, a JSON string "a\\bc" should be unescaped to a\bc (where \b is a backspace character if following standard JSON rules, or literal backslash and 'b' if escaped as \\b). Once read_string is restored to its correct behavior, this test should be updated to verify that unescaping is handled correctly.

bsrikanth-mariadb added the MariaDB Corporation label May 7, 2026

bsrikanth-mariadb requested a review from spetrunia May 7, 2026 05:28

gemini-code-assist Bot reviewed May 7, 2026

View reviewed changes

bsrikanth-mariadb mentioned this pull request May 7, 2026

MDEV-39412: parse error reading tabs in ranges #4994

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MDEV-39412: parse error reading tabs in ranges#5049

MDEV-39412: parse error reading tabs in ranges#5049
bsrikanth-mariadb wants to merge 1 commit intobb-12.3-MDEV-39368-test-replayfrom
bb-12.3-mdev-39412-parse-error-reading-tabs-in-ranges

bsrikanth-mariadb commented May 7, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

gemini-code-assist Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

		rc= json_read_object(&je, array, &err_buf) \|\|
		strcmp(parsed_esc_str, esc_str_val);

Uh oh!

Conversation

bsrikanth-mariadb commented May 7, 2026

Solutions

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant