Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

21.10+: MODIFY COLUMN to Nullable may lead to data becoming unavaliable / incorrect #42083

Closed
filimonov opened this issue Oct 5, 2022 · 15 comments · Fixed by #65056
Closed
Labels
bug Confirmed user-visible misbehaviour in official release comp-mutations ALTER UPDATE/DELETE v22.3-affected v22.8-affected

Comments

@filimonov
Copy link
Contributor

filimonov commented Oct 5, 2022

repro

DROP TABLE IF EXISTS column_modify_test;

CREATE TABLE column_modify_test (id UInt64, val String, other_col UInt64) engine=MergeTree ORDER BY id SETTINGS min_bytes_for_wide_part=0;
INSERT INTO column_modify_test VALUES (1,'one',0);
INSERT INTO column_modify_test VALUES (2,'two',0);

-- on 21.9 that was done via mutations mechanism
ALTER TABLE column_modify_test MODIFY COLUMN val Nullable(String);

-- but since 21.10 it only applies that to new part, so old parts keep the old schema
SELECT * FROM system.mutations;

INSERT INTO column_modify_test VALUES (3,Null,0);

select name, path, type, active, modification_time from system.parts_columns where table='column_modify_test' and column='val';

-- till now everythings looks ok
SELECT * FROM column_modify_test;

-- now we do mutation. It will affect one of the parts
-- and it what part it will update columns.txt to the latest 'correct' state w/o updating the column file!
alter table column_modify_test update other_col=1 where id = 1 SETTINGS mutations_sync=1;

-- row 1 is damaged now: the column files of val columns & columns.txt is out of sync!
SELECT *, throwIf(val <> 'one') as issue FROM column_modify_test WHERE id = 1 FORMAT CSV;
-- expected result is: 1,"one",1,0

-- Code: 366. DB::Exception: Received from localhost:9000. DB::Exception: Logical error: Sizes of nested column and null map of Nullable column are not equal. (SIZES_OF_NESTED_COLUMNS_ARE_INCONSISTENT)
@filimonov filimonov added comp-mutations ALTER UPDATE/DELETE major potential bug To be reviewed by developers and confirmed/rejected. bug Confirmed user-visible misbehaviour in official release and removed potential bug To be reviewed by developers and confirmed/rejected. labels Oct 5, 2022
@filimonov
Copy link
Contributor Author

Workaround to make data readable again: for parts with old state/type of columns rewrite columns.txt

@Algunenano
Copy link
Member

I can't seem to reproduce it, neither in master nor in 22.3. Neither if I run the queries manually or if I call them with --queries-file:

$ ch_client_prod-01 --queries-file repro.txt 
delpop  ds      0000000000      DELETE WHERE t > \'1970-01-01 00:00:00\'        2022-10-03 18:14:44     ['all'] [1]     []  01               1970-01-01 00:00:00
all_1_1_0       /mnt/ch/ch_data/data/prod-1/store/e3a/e3a7f67e-b40d-4b69-a3cf-eecd12a36483/all_1_1_0/   String  1       2022-10-05 11:29:20
all_2_2_0       /mnt/ch/ch_data/data/prod-1/store/e3a/e3a7f67e-b40d-4b69-a3cf-eecd12a36483/all_2_2_0/   String  1       2022-10-05 11:29:20
all_3_3_0       /mnt/ch/ch_data/data/prod-1/store/e3a/e3a7f67e-b40d-4b69-a3cf-eecd12a36483/all_3_3_0/   Nullable(String)    12022-10-05 11:29:20
3       \N      0
2       two     0
1       one     0
3       \N      0
1       \N      1
2       two     0
1       \N      1
3       \N      0
2       two     0

Does this need some kind of restart, debug build or anything?

@filimonov
Copy link
Contributor Author

filimonov commented Oct 5, 2022

I've edited test a bit to show that it reproduces.

1       \N      1

that line is wrong, it should be

1       one      1

@filimonov
Copy link
Contributor Author

@Algunenano
Copy link
Member

Great, now I understand it. Thanks!

I guess that as a workaround to avoid this problem at the user level would be to force a rewrite of the parts after MODIFY COLUMN by calling OPTIMIZE FINAL.

@filimonov
Copy link
Contributor Author

Great, now I understand it. Thanks!

I guess that as a workaround to avoid this problem at the user level would be to force a rewrite of the parts after MODIFY COLUMN by calling OPTIMIZE FINAL.

OPTIMIZE FINAL can be superexpensive.
ALTER TABLE MODIFY COLUMN x = x WHERE 1

can be used.

But usually the problem discovered much later.

for d in `find . -name "*-*"`; do
  lc=`cat $d/columns.txt | grep "val String" | wc -l`
  if [[ ! -f $d/val.null.bin && $lc -gt 0 ]]; then
    echo "Fixing $d"
    cp -f /tmp/columns.txt $d/columns.txt
  fi
done

@filimonov
Copy link
Contributor Author

Root cause: #27787

Related: #42074

@filimonov filimonov changed the title MODIFY COLUMN issue may lead to data becoming unavaliable 21.10+: MODIFY COLUMN to Nullable may lead to data becoming unavaliable / incorrect Oct 5, 2022
@Slach
Copy link
Contributor

Slach commented Oct 7, 2022

ALTER TABLE MODIFY COLUMN x = x WHERE 1

Doesn't work, part types still inconsistent ;( but not fot all cases
Only OPTIMIZE FINAL helps ;(

@filimonov
Copy link
Contributor Author

filimonov commented Oct 7, 2022

Revert does not resolves the root cause of the issue.

columns.txt file still can be damaged by mutation / alter in a way when some columns will become unreadable.

Also related to #38067

@alexey-milovidov alexey-milovidov added st-need-info We need extra data to continue (waiting for response) and removed major labels Oct 20, 2022
@alexey-milovidov
Copy link
Member

@filimonov please provide a test case.

@den-crane
Copy link
Contributor

The root cause has been fixed #42319
tests are in 02461_alter_update_respect_part_column_type_bug.sql

@lesandie
Copy link
Contributor

lesandie commented Oct 30, 2022

Also another case after ALTER TABLE t MODIFY COLUMN c Nullable(UInt64):

Code: 341. DB::Exception: Exception happened during execution of mutation '0000000001' with part 'all_0_7_1_8' 
It seems part all_0_7_1_8 has missing columns ...

And this column is part of a projection but I don't know if it also is affected. I'll investigate if this is also fixed by #42319

@alexey-milovidov
Copy link
Member

The issue is not fixed.

@alexey-milovidov alexey-milovidov removed the st-need-info We need extra data to continue (waiting for response) label Feb 19, 2023
@CurtizJ CurtizJ assigned CurtizJ and unassigned CurtizJ Feb 20, 2023
@alexfernandez
Copy link

This issue seems fixed in all versions from 23.x onwards. At least the reproducer is not reproduced, e.g.. Are there any other cases where a MODIFY TABLE to Nullable may cause corruption? If so, do you have a reproducer?

@fm4v
Copy link
Member

fm4v commented Jun 10, 2024

Fixed

@fm4v fm4v closed this as completed Jun 10, 2024
fm4v added a commit that referenced this issue Jun 12, 2024
fm4v added a commit that referenced this issue Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed user-visible misbehaviour in official release comp-mutations ALTER UPDATE/DELETE v22.3-affected v22.8-affected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants