Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

column with DEFAULT condition is not evaluated correctly when importing from Native file #54619

Closed
lesandie opened this issue Sep 14, 2023 · 11 comments · Fixed by #54655
Closed
Assignees

Comments

@lesandie
Copy link
Contributor

Describe the unexpected behaviour

If importing data from a Native file using clickhouse-client older than 23.1 and there is a column with DEFAULT condition, this condition will not be evaluated correctly.

How to reproduce

https://fiddle.clickhouse.com/5a0a8dc6-238b-4b07-b4b9-38ada65f78e1

and repro code here:

CREATE TABLE default.generate
(
    `key` String,
    `condition` UInt8,
)
ENGINE = MergeTree()
ORDER BY tuple();

INSERT INTO default.generate VALUES ('a', 0);
INSERT INTO default.generate VALUES ('b', 0);
INSERT INTO default.generate VALUES ('c', 0);

SELECT * FROM default.generate INTO OUTFILE '/var/lib/clickhouse/user_files/generate.native' FORMAT Native;

CREATE TABLE default.repro
(
    `key` String,
    `condition` UInt8,
    `derived_condition` UInt8 DEFAULT (condition = 0),
)
ENGINE = MergeTree()
ORDER BY tuple();

INSERT INTO default.repro FROM INFILE '/var/lib/clickhouse/user_files/generate.native' FORMAT Native;

SELECT * FROM default.repro;

┌─key─┬─condition─┬─derived_condition─┐
│ c   │         0 │                 0 │
│ a   │         0 │                 0 │
│ b   │         0 │                 0 │
└─────┴───────────┴───────────────────┘

derived condition should be evaluated as 1 and not 0

  • Which ClickHouse server version to use

Older than 23.1.

Expected behavior

SELECT * FROM default.repro;

┌─key─┬─condition─┬─derived_condition─┐
│ c   │         0 │                 1 │
│ a   │         0 │                 1 │
│ b   │         0 │                 1 │
└─────┴───────────┴───────────────────┘

Additional context

The bug has been fixed in 23.2 (clickhouse-client) and I'm checking in which commit.

@den-crane
Copy link
Contributor

technically it's not a bug.
the insert has been missed the column list and inserts 3 columns from the file with 2 columns

user should use INSERT INTO default.repro(key, condition) here.

https://fiddle.clickhouse.com/89efe375-5bf1-4a91-a15d-f3bbc529a85c

@den-crane
Copy link
Contributor

den-crane commented Sep 14, 2023

And it's still not working for Parquet https://fiddle.clickhouse.com/3ed5d1ab-8f22-4070-ba14-5d6b37585075
Though may be it should not work, and error is good, but in this case it should be the same for Native

@den-crane
Copy link
Contributor

den-crane commented Sep 14, 2023

One more problem https://fiddle.clickhouse.com/d90f890f-0b89-44ca-af6d-31d8260d6715

insert into function file('generate.native') 
SELECT * FROM VALUES (('a', 0),('b', 0),('c', 0))  format Native ;

select * from file('generate.native') format Pretty; -- OK, two columns in the file
+----+----+
| c1 | c2 |
+----+----+
| a  |  0 |
+----+----+
| b  |  0 |
+----+----+
| c  |  0 |
+----+----+

CREATE TABLE repro
(
    `key` String,
    `condition` UInt8,
    `derived_condition` UInt8 DEFAULT (condition = 0),
)
ENGINE = MergeTree()
ORDER BY tuple();

INSERT INTO repro  select * from file('generate.native');

SELECT * FROM repro format Pretty;
+-----+-----------+-------------------+
| key | condition | derived_condition |  -- expected derived_condition=1, not 0
+-----+-----------+-------------------+
|     |         0 |                 0 |
+-----+-----------+-------------------+
|     |         0 |                 0 |
+-----+-----------+-------------------+
|     |         0 |                 0 |
+-----+-----------+-------------------+

INSERT INTO repro(key, condition)  select * from file('generate.native');

Received exception from server (version 23.8.2):
Code: 20. DB::Exception: Received from localhost:9000. 
DB::Exception: Number of columns doesn't match. (NUMBER_OF_COLUMNS_DOESNT_MATCH)

????????????????


INSERT INTO repro(key, condition) select c1,c2 from file('generate.native');
OK



SELECT * FROM repro format Pretty;
+-----+-----------+-------------------+
| key | condition | derived_condition | Now inserted correct derived_condition=1
+-----+-----------+-------------------+
| a   |         0 |                 1 |
+-----+-----------+-------------------+
| b   |         0 |                 1 |
+-----+-----------+-------------------+
| c   |         0 |                 1 |
+-----+-----------+-------------------+
+-----+-----------+-------------------+
| key | condition | derived_condition |
+-----+-----------+-------------------+
|     |         0 |                 0 |
+-----+-----------+-------------------+
|     |         0 |                 0 |
+-----+-----------+-------------------+
|     |         0 |                 0 |
+-----+-----------+-------------------+

@den-crane
Copy link
Contributor

cc @Avogar

@Avogar
Copy link
Member

Avogar commented Sep 14, 2023

And it's still not working for Parquet https://fiddle.clickhouse.com/3ed5d1ab-8f22-4070-ba14-5d6b37585075

It works, you just need to specify input_format_parquet_allow_missing_columns=1.

@Avogar
Copy link
Member

Avogar commented Sep 14, 2023

INSERT INTO repro(key, condition)  select * from file('generate.native');

Received exception from server (version 23.8.2):
Code: 20. DB::Exception: Received from localhost:9000. 
DB::Exception: Number of columns doesn't match. (NUMBER_OF_COLUMNS_DOESNT_MATCH)

Seemls like a bug in use_structure_from_insertion_table_in_table_functions, disabling it fixes the error. Will take a look

@Avogar
Copy link
Member

Avogar commented Sep 14, 2023

-- expected derived_condition=1, not 0

Investigating, looks strange

@Avogar Avogar self-assigned this Sep 14, 2023
@lesandie
Copy link
Contributor Author

user should use INSERT INTO default.repro(key, condition) here.

Why?. If file has many columns, explicit them all in the INSERT can be painful if you want to insert them all and let clickhouse evaluate the DEFAULT expressions only.

I may be wrong here, talking about the behaviour of INSERT, my expectations are that whatever you use INSERT ... SELECT, INSERT FROM OUTFILE ... INSERT with inlined data, if the columns are not specified, then INSERT should process all columns and evaluate the DEFAULT expressions.

@den-crane
Copy link
Contributor

I may be wrong here, talking about the behaviour of INSERT

I think it's not documented - undefined behaviour, at least I don't remember such documents.
You have 3 column in the insert and 2 columns in file, how do you expect them to match? By positions? By names?

@Avogar
Copy link
Member

Avogar commented Sep 14, 2023

If importing data from a Native file using clickhouse-client older than 23.1 and there is a column with DEFAULT condition, this condition will not be evaluated correctly.

BTW, we no longer support versions older than 23.1, so it won't be fixed. I will fix issues found by den-crane in master. If you have any questions how insert into table from files works, I will try to explain

@lesandie
Copy link
Contributor Author

lesandie commented Sep 14, 2023

You have 3 column in the insert and 2 columns in file, how do you expect them to match? By positions? By names?

Yeah, good point. I would expect by name if I specify the column list. If not specified i would expect by position but these are my thoughts ... and as I said I may be wrong because implementation is another story (clickhouse inference mechanism works nicely but it is an overhead and taking into account all possible branches/possibilities will add unwanted complexity)

BTW, we no longer support versions older than 23.1, so it won't be fixed. I will fix issues found by den-crane in master. If you have any questions how insert into table from files works, I will try to explain

Yeah, fix is in 23.2 so no big deal, and I did not expect a backport. Upgrade is the best path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants