Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop detached partition with a S3 disk is not removing the blob files #55225

Closed
alifirat opened this issue Oct 4, 2023 · 4 comments
Closed

Comments

@alifirat
Copy link

alifirat commented Oct 4, 2023

Describe what's wrong

Dropping detached parts from a table with a storage on S3 is not removing the files from AWS S3.

Does it reproduce on recent release?

Yes, on the 23.9

How to reproduce

  • Use latest ClickHouse version
  • Run CREATE TABLE mytable (...) Engine = ReplicatedOne ... PARTITION BY something SETTINGS disk = 's3_disk'
  • Insert some data
  • Run ALTER TABLE FETCH PARTITION 'my_partition' FROM ''
  • Get the bucket size
  • Run `ALTER TABLE DROP DETACED PARTITION 'my_partition'
  • Wait at least 8 minutes and get the bucket size again, the files have not been removed from s3.
  • Re-check after 15 minutes, same behavior.

Expected behavior

On disk, when dropping the detached parts it's saving some disk space and for remote disks, I'll expect the same, i.e removing only the metadata is not enough to me.

Once you have drop the detached partition, you have no longer the information for the remote blobs file so it will generate orphans file on S3.

Error message and/or stacktrace

N/A

Additional context

@alifirat alifirat added the potential bug To be reviewed by developers and confirmed/rejected. label Oct 4, 2023
@CheSema CheSema added the st-need-info We need extra data to continue (waiting for response) label Oct 4, 2023
@CheSema
Copy link
Member

CheSema commented Oct 4, 2023

Hi.
I do not understand from your description what partition has been removed from detach directory? I do not follow when and how any partition appears in the detach directory.
Also I do not understand the command with fetch. You literally mean ... FROM ''?

Maybe, in order to avoid such questions, you could provide a test in PR? It would be much faster to understand.

@alifirat
Copy link
Author

alifirat commented Oct 4, 2023

Hi @CheSema

I did the following test (on the same server).

// local table
CREATE TABLE test_local(c1 Int8, c2 Date) ENGINE = ReplicatedMergeTree('/{cluster_name}-{env}/tables/shard{shard}/test_local', '{replica}') PARTITION BY c2 ORDER BY c2

// table with s3 disk
CREATE TABLE test_s3(c1 Int8, c2 Date) ENGINE = ReplicatedMergeTree('/{cluster_name}-{env}/tables/shard{shard}/test_s3', '{replica}') PARTITION BY c2 ORDER BY c2 SETTINGS disk = 'databucket'

// Insert on the local table
INSERT INTO test_local VALUES (1, '2023-10-04'), (2, '2023-10-04')

// Check the number of objects in S3 (returns 1 because the tables has been created)
$ aws s3 ls --human-readable s3://mybucket/myprefix/ --recursive | wc -l
1

// Fetch partitions 
ALTER TABLE test_s3 FETCH PARTITION '2023-10-04' FROM '/{cluster_name}-{env}/tables/shard{shard}/test_local'

// Check that the previous command worked.
SELECT
    table,
    partition_id,
    name,
    formatReadableSize(bytes_on_disk)
FROM system.detached_parts

Query id: eedf6237-1897-4444-b8ba-5ca9e2fdb010

┌─table───┬─partition_id─┬─name───────────┬─formatReadableSize(bytes_on_disk)─┐
│ test_s3 │ 20231004     │ 20231004_0_0_0 │ 684.00 B                          │
└─────────┴──────────────┴────────────────┴───────────────────────────────────┘

// Check the number of objects on S3 after the fetch 
$ aws s3 ls --human-readable s3://mybucket/myprefix/ --recursive | wc -l
12 

// Now don't attach the partition but drop it, 
ALTER TABLE test_s3 DROP DETACHED PARTITION '2023-10-04' SETTINGS allow_drop_detached = 1

// Check that there is no more detached parts
SELECT
    table,
    partition_id,
    name,
    formatReadableSize(bytes_on_disk)
FROM system.detached_parts

Query id: eedf6237-1897-4444-b8ba-5ca9e2fdb010

Ok.

0 row in set ..

// Check again the number of files in s3 
$ aws s3 ls --human-readable s3://mybucket/myprefix/ --recursive | wc -l
12 

@den-crane
Copy link
Contributor

I guess you can you do it with one MergeTree (not replicated) table, just use alter table detach partition.

@den-crane den-crane added unexpected behaviour comp-s3 and removed potential bug To be reviewed by developers and confirmed/rejected. st-need-info We need extra data to continue (waiting for response) labels Oct 4, 2023
@alifirat
Copy link
Author

Validated the fix on the latest patch release of the 23.9.
Thanks @alesapin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants