-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tiered Storage documentation #1941
Conversation
The patch adds a section in "Concepts and Architecture" and a cookbook for setting up tiered storage with S3. Master Issue: apache#1511
@aahmed-se @mgodave @srkukarni can you guys help review documentation? |
s3ManagedLedgerOffloadBucket=pulsar-topic-offload | ||
``` | ||
|
||
It is also possible to specify the s3 endpoint directly, using ```s3ManagedLedgerOffloadServiceEndpoint```. This is useful if you are using a none AWS storage service which provides an S3 compatible API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a none -> another
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have been "a non-AWS storage service". Changed
|
||
Pulsar also provides some knobs to configure they size of requests sent to S3. | ||
|
||
- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document what the default values are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
If there is an error offloading, the error will be propagated to the offload-status command. | ||
|
||
```bash | ||
$ bin/pulsar-admin topics offload-status persistent://public/default/topic1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by triggerring the offload? Shouldnt this be once turned on always happen in the background?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently offload is triggered manually. Adding the offload on threshold should be pretty easy though.
Please be aware that there is currently an in-progress PR for this. Feel free to use that material as you wish. |
@lucperkins had missed that. Will pull some of it in. |
@srkukarni @merlimat please review the latest PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor comments
|
||
{% include admonition.html type="warning" content="The broker.conf of all brokers must have the same configuration for driver, region and bucket for offload to avoid data becoming unavailable as topics move from one broker to another." %} | ||
|
||
Pulsar also provides some knobs to configure they size of requests sent to S3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they size
--> the size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
When triggering offload, you must specify the maximum size, in bytes, of backlog which will be retained locally on the bookkeeper. The offload mechanism will offload segments from the start of the topic backlog until this condition is met. | ||
|
||
```bash | ||
$ bin/pulsar-admin topics offload ---size-threshold 10000000 persistent://public/default/topic1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
---size-threshold
--> --size-threshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: persistent://public/default/topic1
can be shortened into topic1
. Even if the namespace is needed, persistent://
is anyway optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the dash. I'll remove the persistent but will leave the tenant and namespace. There's no indication anywhere else that the topic is in public/default. In fact, will change to my-tenant/my-namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
The patch adds a section in "Concepts and Architecture" and a cookbook
for setting up tiered storage with S3.
Master Issue: #1511