Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 zero copy replication #16240

Merged
merged 40 commits into from
Mar 14, 2021
Merged

Conversation

ianton-ru
Copy link
Contributor

@ianton-ru ianton-ru commented Oct 21, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Zero-copy replication for ReplicatedMergeTree over S3 storage

Detailed description / Documentation draft:

Zero-copy replication over S3 storage

@robot-clickhouse robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels Oct 21, 2020
@ianton-ru ianton-ru force-pushed the s3_zero_copy_replication branch 2 times, most recently from 123025c to d5bc7ad Compare October 22, 2020 11:26
Приемник в ответ отсылает куку send_s3_metadata=1 в случае, если идут метаданные. В остальных случаях отсылаются данные, как и прежде.

Применик перед запросом смотрит, будет ли хранить данные в S3. Проверка сейчас кривая - если в сторадже есть S3, то считаем, что будет S3.
Если да S3, то отсылает в запросе send_s3_metadata=1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Строго говоря не исключена ситуация когда и приёмник и передатчик на s3 но банкеты разные. Наверное надо адрес бакета отправлять или какой-то хэш от него.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Адрес может быть разным в случае например нахождения нод в разных ДЦ и использования разных прокси для доступа к S3. В качестве страховки приемник после получения метаданных проверяет доступность (фактически - наличие в S3 первого объекта от первого файла, не скачивая, в просто через list), и если данные не доступны, то сваливается в старый механизм с полноценной копией.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Бонусом дешево получилась возможность работы с несколькими разными S3, когда у разных нод разная очередность (перебирает все S3 в сторадже в поисках нужного). Но непонятно, кому это может пригодиться. :)

@ianton-ru ianton-ru changed the title [WIP] Prototype/MVP/Proof-of-concept etc. S3 zero copy replication S3 zero copy replication Jan 18, 2021
@Sallery-X
Copy link

How to solve the cache consistency problem of different replicas?

In Clickhouse data is "append only". Sometimes entry part can be deleted. So could you please describe where you see a problem?

If multiple replicas share the same storage, how to resolve the share file confliction when the diffrent replicas to excute insert and merge?

@Sallery-X
Copy link

I think need to consider the ddl worker, no need to do the ddl on the all replicas.

@ianton-ru
Copy link
Contributor Author

ianton-ru commented Mar 9, 2021

How to solve the cache consistency problem of different replicas?

In Clickhouse data is "append only". Sometimes entry part can be deleted. So could you please describe where you see a problem?

If multiple replicas share the same storage, how to resolve the share file confliction when the diffrent replicas to excute insert and merge?

On insert one replica creates all files of part, after that files never changes.
On merge replica makes a lock in zookeeper, other replicas wait for end and get merged part.

Copy link
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but need to remove suspicious if.

src/Storages/StorageReplicatedMergeTree.cpp Outdated Show resolved Hide resolved
@alesapin
Copy link
Member

Yarrr! Conflict on ErrorCode.

@boqu
Copy link

boqu commented Apr 13, 2021

Can I know whether this change will be merged to 21.3-lts? If yes, do we have any timeline? Thanks!

namespace DB
{

struct DiskType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like antipattern.
And there is zero comments in this file 😭

What does it mean "disk type" and why do we need to discriminate them?
Maybe remove this file and all its usages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature Pull request with new product feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants