Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiered Storage documentation #1941

Merged
merged 7 commits into from
Jun 18, 2018
Merged

Conversation

ivankelly
Copy link
Contributor

The patch adds a section in "Concepts and Architecture" and a cookbook
for setting up tiered storage with S3.

Master Issue: #1511

The patch adds a section in "Concepts and Architecture" and a cookbook
for setting up tiered storage with S3.

Master Issue: apache#1511
@sijie sijie requested a review from merlimat June 8, 2018 21:09
@sijie sijie added doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. area/tieredstorage labels Jun 8, 2018
@sijie sijie added this to the 2.1.0-incubating milestone Jun 8, 2018
@sijie
Copy link
Member

sijie commented Jun 8, 2018

@aahmed-se @mgodave @srkukarni can you guys help review documentation?

s3ManagedLedgerOffloadBucket=pulsar-topic-offload
```

It is also possible to specify the s3 endpoint directly, using ```s3ManagedLedgerOffloadServiceEndpoint```. This is useful if you are using a none AWS storage service which provides an S3 compatible API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a none -> another

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have been "a non-AWS storage service". Changed


Pulsar also provides some knobs to configure they size of requests sent to S3.

- ```s3ManagedLedgerOffloadMaxBlockSizeInBytes``` configures the maximum size of a "part" sent during a multipart upload.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document what the default values are?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

If there is an error offloading, the error will be propagated to the offload-status command.

```bash
$ bin/pulsar-admin topics offload-status persistent://public/default/topic1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by triggerring the offload? Shouldnt this be once turned on always happen in the background?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently offload is triggered manually. Adding the offload on threshold should be pretty easy though.

@lucperkins
Copy link
Contributor

Please be aware that there is currently an in-progress PR for this. Feel free to use that material as you wish.

@ivankelly
Copy link
Contributor Author

@lucperkins had missed that. Will pull some of it in.

@sijie
Copy link
Member

sijie commented Jun 12, 2018

@srkukarni @merlimat please review the latest PR

Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor comments


{% include admonition.html type="warning" content="The broker.conf of all brokers must have the same configuration for driver, region and bucket for offload to avoid data becoming unavailable as topics move from one broker to another." %}

Pulsar also provides some knobs to configure they size of requests sent to S3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they size --> the size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

When triggering offload, you must specify the maximum size, in bytes, of backlog which will be retained locally on the bookkeeper. The offload mechanism will offload segments from the start of the topic backlog until this condition is met.

```bash
$ bin/pulsar-admin topics offload ---size-threshold 10000000 persistent://public/default/topic1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

---size-threshold --> --size-threshold

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: persistent://public/default/topic1 can be shortened into topic1. Even if the namespace is needed, persistent:// is anyway optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the dash. I'll remove the persistent but will leave the tenant and namespace. There's no indication anywhere else that the topic is in public/default. In fact, will change to my-tenant/my-namespace

Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@sijie sijie merged commit 6e0afee into apache:master Jun 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tieredstorage doc Your PR contains doc changes, no matter whether the changes are in markdown or code files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants