Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Is there a way to dump the varchar or text column as hex(col) and insert them with unhex("<col_hexstring">) #1371

Open
trikker opened this issue Dec 19, 2023 · 1 comment

Comments

@trikker
Copy link

trikker commented Dec 19, 2023

Describe the Feature Request
I am dumping MySQL data(there are blob and text columns in MySQL) and loading to Apache Doris. Apache Doris is a high-performance, real-time analytical database. Its introduction: https://doris.apache.org/docs/get-starting/what-is-apache-doris.

Doris is 90% compatible with MySQL in terms of SQL syntax and connection protocols. You can use mysql client to connect to Apache Doris and executing SQLs. However, Doris has some incompatible SQL syntax with MySQL like hex and unhex. For example when you insert a string to a varchar column:

In MySQL, you can do that like this:

root@testdb Tue Dec 19 10:11:36 2023>show create table t\G
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(60) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

root@testdb Tue Dec 19 10:11:42 2023>insert into t(name) values(0x616263);
Query OK, 1 row affected (0.01 sec)

root@testdb Tue Dec 19 10:11:57 2023>select * from t;
+----+------+
| id | name |
+----+------+
|  1 | abc  |
+----+------+
1 row in set (0.00 sec)

In Apache Doris, you cannot directly using the hex value(0x616263) and can only do that like this:

'root'@testdb Tue Dec 19 10:15:49 2023>show create table tt\G
*************************** 1. row ***************************
       Table: tt
Create Table: CREATE TABLE `tt` (
  `id` int(11) NULL,
  `name` varchar(60) NULL
) ENGINE=OLAP
UNIQUE KEY(`id`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`id`) BUCKETS AUTO
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"is_being_synced" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);
1 row in set (0.00 sec)

# report error if directly use the hex value 0x616263
'root'@testdb Tue Dec 19 10:15:55 2023>insert into tt values(1, 0x616263);
ERROR 1054 (42S22): errCode = 2, detailMessage = Unknown column '0x616263' in 'table list'


'root'@testdb Tue Dec 19 10:16:17 2023>insert into tt values(1, unhex('616263'));
Query OK, 1 row affected (0.03 sec)
{'label':'insert_9041987fb6464c8c_95891465b86aecd2', 'status':'VISIBLE', 'txnId':'3822043'}

'root'@testdb Tue Dec 19 10:16:21 2023>select * from tt;
+------+------+
| id   | name |
+------+------+
|    1 | abc  |
+------+------+
1 row in set (0.03 sec)

I want to set the --hex-blob option to true(default is true) and dump the blob and varchar columns as hex value. But I cannot load the dumped data to Doris, is there any way I can add hex to surround the hex value?

Current behaviour
No unhex surrounding the hex value when dumping the blob and text columns. The dumped data is 0x616263.

Expected behavior
Add unhex surrounding to the hex value when dumping the blob and text columns. I want to dumped data to be unhex('616263').

Environment
No need to describe.

Additional context
No.

@davidducos
Copy link
Member

Hi @trikker , uuuufffff.... we might be able to do that with masquerade, but the problem is that masquerade uses the same memory reserved to the data on the column, and you are adding unhex(...) which are 7 more characters. There is no space there... we might be able to do it, if you create a config file like this:

[`test`.`tt`]
columns_on_select=id,concat(concat('      ',name),' ')

And then we add a function to masquerade that inserts unhex( to the begining and then ) to the end.
It will be a bit crazy but I think that it will be doable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants