Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

substring in select distinct may lead to incorrect result #52991

Open
r33s3n6 opened this issue Apr 29, 2024 · 4 comments
Open

substring in select distinct may lead to incorrect result #52991

r33s3n6 opened this issue Apr 29, 2024 · 4 comments

Comments

@r33s3n6
Copy link

r33s3n6 commented Apr 29, 2024

1. Minimal reproduce step (Required)

Firstly, execute init.sql to create the table. Then executing error.sql yields unexpected results. Note that reproducing these results might not be entirely stable. Typically, it can be completed within three attempts. You can try executing error.sql multiple times or execute init.sql again to rebuild the table.

init.sql.txt
error.sql.txt

2. What did you expect to see? (Required)

The first column is substring(repeat(c_bek45hvu8g,9),-10000)

SUBSTRING(str,pos) from MySQL documentation:

It is also possible to use a negative value for pos. In this case, the beginning of the substring is pos characters from the end of the string, rather than the beginning

when abs(pos) > length(str), an empty string will be returned by TiDB.

mysql> select substring('abcdefg', -100);
+----------------------------+
| substring('abcdefg', -100) |
+----------------------------+
|                            |
+----------------------------+
1 row in set (0.00 sec)

The maximum length of the string is 90, which is less than 10000.

mysql> select max(length(repeat(c_bek45hvu8g,9))) from t_m1i;
+-------------------------------------+
| max(length(repeat(c_bek45hvu8g,9))) |
+-------------------------------------+
|                                  90 |
+-------------------------------------+
1 row in set (0.01 sec)

Therefore, the result set should only contain NULL and empty strings.

3. What did you see instead (Required)

However, it seems that in TiDB, when evaluating substring, it may be reading beyond the boundaries of the string, resulting in incorrect output.
output_re_main2.log

4. What is your TiDB version? (Required)

Release Version: v8.0.0
Edition: Community
Git Commit Hash: 8ba1fa452b1ccdbfb85879ea94b9254aabba2916
Git Branch: HEAD
UTC Build Time: 2024-03-28 14:22:15
GoVersion: go1.21.4
Race Enabled: false
Check Table Before Drop: false
Store: tikv

topology:

distributed.yaml:

global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/tidb-deploy"
  data_dir: "/tidb-data"

pd_servers:
  - host: 10.0.2.31

tidb_servers:
  - host: 10.0.2.21

tikv_servers:
  - host: 10.0.2.11
  - host: 10.0.2.12
  - host: 10.0.2.13

monitoring_servers:
  - host: 10.0.2.8

grafana_servers:
  - host: 10.0.2.8

alertmanager_servers:
  - host: 10.0.2.8

tiflash_servers:
  - host: 10.0.2.32

single.yaml

global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/tidb-deploy"
  data_dir: "/tidb-data"

pd_servers:
  - host: 10.0.2.73

tidb_servers:
  - host: 10.0.2.72

tikv_servers:
  - host: 10.0.2.71

tiflash_servers:
  - host: 10.0.2.74

about us

We are the BASS team from the School of Cyber Science and Technology at Beihang University. Our main focus is on system software security, operating systems, and program analysis research, as well as the development of automated program testing frameworks for detecting software defects. Using our self-developed database vulnerability testing tool, we have identified the above-mentioned vulnerabilities in TiDB that may lead to database logic error.

@lcwangchao
Copy link
Collaborator

It seems to be a tiflash bug:

When use disabled tiflash, it output is all the right:

TiDB root@127.0.0.1:test> set @@tidb_allow_mpp=0;
Query OK, 0 rows affected
Time: 0.000s
TiDB root@127.0.0.1:test> explain SELECT DISTINCT
                       ->
                       ->   substring(
                       ->       cast(repeat(
                       ->         cast(ref_4.c_bek45hvu8g as char),
                       ->         9) as char),
                       ->       cast(-10000 as signed)) as c2
                       -> FROM
                       ->   t_m1i as ref_4;
+-----------------------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                    | estRows | task      | access object | operator info                                                                                                                                                                                                        |
+-----------------------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| HashAgg_4             | 163.00  | root      |               | group by:Column#6, funcs:firstrow(Column#5)->Column#3                                                                                                                                                                |
| └─Projection_13       | 504.00  | root      |               | substring(cast(repeat(cast(test.t_m1i.c_bek45hvu8g, var_string(5)), 9), var_string(5)), -10000)->Column#5, substring(cast(repeat(cast(test.t_m1i.c_bek45hvu8g, var_string(5)), 9), var_string(5)), -10000)->Column#6 |
|   └─TableReader_9     | 504.00  | root      |               | data:TableFullScan_7                                                                                                                                                                                                 |
|     └─TableFullScan_7 | 504.00  | cop[tikv] | table:ref_4   | keep order:false                                                                                                                                                                                                     |
+-----------------------+---------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
4 rows in set
Time: 0.003s
TiDB root@127.0.0.1:test> SELECT DISTINCT
                       ->
                       ->   substring(
                       ->       cast(repeat(
                       ->         cast(ref_4.c_bek45hvu8g as char),
                       ->         9) as char),
                       ->       cast(-10000 as signed)) as c2
                       -> FROM
                       ->   t_m1i as ref_4
                       -> ;
+--------+
| c2     |
+--------+
|        |
| <null> |
+--------+
2 rows in set

Then I set @@tidb_allow_mpp back to ON:

TiDB root@127.0.0.1:test> set @@tidb_allow_mpp=1;
Query OK, 0 rows affected
Time: 0.001s
TiDB root@127.0.0.1:test> SELECT DISTINCT
                       ->
                       ->   substring(
                       ->       cast(repeat(
                       ->         cast(ref_4.c_bek45hvu8g as char),
                       ->         9) as char),
                       ->       cast(-10000 as signed)) as c2
                       -> FROM
                       ->   t_m1i as ref_4
                       -> ;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| c2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| <null>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| cfffonqcfffonqcfffugugugugugugugugugne9d80xne9d80xne9d80xne9d80xne9d80xne9d80xne9d80xne9d80xne9d80xcagtcuq3uxcagtcuq3uxcagtcuq3uxcagtcuq3uxcagtcuq3uxcagtcuq3uxcagtcuq3uxcagtcuq3uxcagtcuq3uxudududududududududmmmmmmmmmjgmdzdbqjgmdzdbqjgmdzdbqjgmdzdbqjgmdzdbqjgmdzdbqjgmdzdbqjgmdzdbqjgmdzdbqzzzzzzzzzgg6uc4wblvgg6uc4wblvgg6uc4wblvgg6uc4wblvgg6uc4wblvgg6uc4wblvgg6uc4wblvgg6uc4wblvgg6uc4wblvnp6_qnp6_qnp6_qnp6_qnp6_qnp6_qnp6_qnp6_qnp6_qojbi0xojbi0xojbi0xojbi0xojbi0xojbi0xojbi0xojbi0xojbi0x... |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3 rows in set
Time: 0.044s

@lcwangchao lcwangchao added component/tiflash sig/execution SIG execution and removed sig/sql-infra SIG: SQL Infra labels Apr 30, 2024
@lcwangchao
Copy link
Collaborator

@zanmato1984 PTAL

@lcwangchao
Copy link
Collaborator

A more simple SQL to reproduce it:

TiDB root@127.0.0.1:test> DESC SELECT c_bek45hvu8g,substring(repeat(c_bek45hvu8g, 9), -10000) as c FROM t_m1i;
+------------------------+---------+--------------+---------------+------------------------------------------------------------------------------------------+
| id                     | estRows | task         | access object | operator info                                                                            |
+------------------------+---------+--------------+---------------+------------------------------------------------------------------------------------------+
| TableReader_12         | 504.00  | root         |               | MppVersion: 2, data:ExchangeSender_11                                                    |
| └─ExchangeSender_11    | 504.00  | mpp[tiflash] |               | ExchangeType: PassThrough                                                                |
|   └─Projection_4       | 504.00  | mpp[tiflash] |               | test.t_m1i.c_bek45hvu8g, substring(repeat(test.t_m1i.c_bek45hvu8g, 9), -10000)->Column#3 |
|     └─TableFullScan_10 | 504.00  | mpp[tiflash] | table:t_m1i   | keep order:false                                                                         |
+------------------------+---------+--------------+---------------+------------------------------------------------------------------------------------------+
4 rows in set
Time: 0.003s

@zanmato1984
Copy link
Contributor

Seems the substring implementation in tiflash doesn't respect the pos being negative.

However I think this is a pretty minor usage so I'm adjusting the severity to major.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants