Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored JCloud Tiered Storage #6335

Merged
merged 31 commits into from
Oct 19, 2020
Merged

Conversation

david-streamlio
Copy link
Contributor

Motivation

In order to facilitate the support of additional JClouds-supported providers, we first needed to clean up the existing code, as there were a lot of if/then/else constructs throughout the code that were based on the assumption that we either supported AWS S3 or Google Cloud Storage. I didn't want to keep adding else if's to these code blocks for every new provider we add, so I decided to refactor the code to make it a bit cleaner

Modifications

in addition to being home for most of the aforementioned if/then/else blocks, the BlobStoreManagedLedgerOffloader class had multiple responsibilities in addition to providing an implementation for the LedgerOffloader interface. My goal was to simplify this class such that its only responsibility was to implement the LedgerOffloader interface.

The other major change was the addition of the JCloudBlobStoreProvider enum, which implements 3 interfaces that allow for it to handle the provider specific logic for things such as acquiring the credentials, validating the configuration, and creating a provider-specific instance of BlobStore.

Result

After this change, we will be able to easily add support for additional JClouds-supported providers by simply adding new elements to the JCloudBlobStoreProvider Enums since the other logic has been isolated and is not vendor specific.

See #2865 for more details

@david-streamlio
Copy link
Contributor Author

Any update on this?

@sijie
Copy link
Member

sijie commented Apr 6, 2020

@david-streamlio sorry for the late response. @jiazhai @codelipenghui and @gaoran10 can you guys review this pull request? @david-streamlio is waiting for a review.

OrderedScheduler scheduler) throws IOException {

TieredStorageConfiguration config = TieredStorageConfiguration.create(userMetadata);
return BlobStoreManagedLedgerOffloader.create(config, userMetadata, scheduler);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the Pulsar support independent offload policy for each namespace, the userMetadata param contains the global offload configs and the offloadPolicies param contains the Namespace level offload configs. If the namespace has its own offload policy, we should use the independent offload policy, otherwise, we should use the global offload policy.

The OffloadPolicies has independent config for S3 and GCS, such as s3ManagedLedgerOffloadRegion and gcsManagedLedgerOffloadRegion, follow your way, we could combine them as the region.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The focus of this PR is to refactor the tiered storage to use JCloud. I feel that we need to keep this PR to just that. I suggest we create an issue for this and assign it to me. I will make the change after this PR has been successfully merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late response. If we don't care about the OffloadPolicies config, it will cause the namespace offload policies invalid.

Comment on lines 304 to 307
public OffloadPolicies getOffloadPolicies() {
return offloadPolicies;
// TODO Auto-generated method stub
return null;
}
Copy link
Contributor

@gaoran10 gaoran10 Apr 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, when the LedgerOffloader is generated by the PulsarService, the PulsarService will check the namespace's OffloadPolicies, if the new OffloadPolicies is same as the existing OffloadPolicies, the PulsarService will reuse the existing LedgerOffloader, so we should provide the OffloadPolicies of the LedgerOffloader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The focus of this PR is to refactor the tiered storage to use JCloud. I feel that we need to keep this PR to just that. I suggest we create an issue for this and assign it to me. I will make the change after this PR has been successfully merged.

@codelipenghui
Copy link
Contributor

move to 2.7.0 first.

@codelipenghui codelipenghui modified the milestones: 2.6.0, 2.7.0 Jun 4, 2020
@david-streamlio
Copy link
Contributor Author

Anything needed from me on this?

@sijie
Copy link
Member

sijie commented Jun 10, 2020

@gaoran10 @codelipenghui can you review this change again?

@jiazhai
Copy link
Member

jiazhai commented Sep 8, 2020

@david-streamlio Would you please help resolve the conflicts with master branch?

@david-streamlio
Copy link
Contributor Author

I will work on this conflicts shortly.

@jiazhai
Copy link
Member

jiazhai commented Sep 9, 2020

@congbobo184 @gaoran10 Would you please help review this PR?

@gaoran10
Copy link
Contributor

LGTM, that's so cool.

I added an issue to track the problem with the namespace level offload feature, we could complete this PR first and repair that problem. @david-streamlio Hi, could you make a rebase for this PR? Thanks.

@david-streamlio
Copy link
Contributor Author

Code has been rebased.

@gaoran10
Copy link
Contributor

/pulsarbot run-failure-checks

@apache apache deleted a comment from gaoran10 Sep 23, 2020
@gaoran10
Copy link
Contributor

gaoran10 commented Oct 12, 2020

@david-streamlio Hi, there are still some problems in this PR, I made some changes based on your work, could you have a look and add these changes to this PR, thanks. gaoran10#12

@wolfstudy
Copy link
Member

/pulsarbot run-failure-checks

@jiazhai
Copy link
Member

jiazhai commented Oct 19, 2020

Would like to merge this first.

@jiazhai jiazhai merged commit d3a5233 into apache:master Oct 19, 2020
@wolfstudy
Copy link
Member

Move this change to 2.6.2, because the #8310 depends on it.

wolfstudy pushed a commit that referenced this pull request Oct 30, 2020
Motivation

In order to facilitate the support of additional JClouds-supported providers, we first needed to clean up the existing code, as there were a lot of if/then/else constructs throughout the code that were based on the assumption that we either supported AWS S3 or Google Cloud Storage. I didn't want to keep adding else if's to these code blocks for every new provider we add, so I decided to refactor the code to make it a bit cleaner

Modifications

in addition to being home for most of the aforementioned if/then/else blocks, the BlobStoreManagedLedgerOffloader class had multiple responsibilities in addition to providing an implementation for the LedgerOffloader interface. My goal was to simplify this class such that its only responsibility was to implement the LedgerOffloader interface.

The other major change was the addition of the JCloudBlobStoreProvider enum, which implements 3 interfaces that allow for it to handle the provider specific logic for things such as acquiring the credentials, validating the configuration, and creating a provider-specific instance of BlobStore.

Result

After this change, we will be able to easily add support for additional JClouds-supported providers by simply adding new elements to the JCloudBlobStoreProvider Enums since the other logic has been isolated and is not vendor specific.

See #2865 for more details

* Refactored JCloud Tiered Storage

* Refactored JCloud Tiered Storage

* Added missing import statements

* Refactored JCloud Tiered Storage

* Refactored JCloud Tiered Storage

* Added missing import statements

* Refactored JCloud Tiered Storage

* Refactored JCloud Tiered Storage

* Added missing import statements

* fix test

* add test logs

* fix logs

* fix test

* fix

* fix

* fix broker log

* fix configuration

* fix

* fix

* fix

* fix get BlobStore

* repair test presto query tiered storage data

* fix test

* fix test

* fix test TestPrestoQueryTieredStorage

* fix

* fix

Co-authored-by: Sijie Guo <sijie@apache.org>
Co-authored-by: gaoran10 <gaoran_10@126.com>
Co-authored-by: xiaolong.ran <rxl@apache.org>
(cherry picked from commit d3a5233)
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Nov 13, 2020
Motivation

In order to facilitate the support of additional JClouds-supported providers, we first needed to clean up the existing code, as there were a lot of if/then/else constructs throughout the code that were based on the assumption that we either supported AWS S3 or Google Cloud Storage. I didn't want to keep adding else if's to these code blocks for every new provider we add, so I decided to refactor the code to make it a bit cleaner

Modifications

in addition to being home for most of the aforementioned if/then/else blocks, the BlobStoreManagedLedgerOffloader class had multiple responsibilities in addition to providing an implementation for the LedgerOffloader interface. My goal was to simplify this class such that its only responsibility was to implement the LedgerOffloader interface.

The other major change was the addition of the JCloudBlobStoreProvider enum, which implements 3 interfaces that allow for it to handle the provider specific logic for things such as acquiring the credentials, validating the configuration, and creating a provider-specific instance of BlobStore.

Result

After this change, we will be able to easily add support for additional JClouds-supported providers by simply adding new elements to the JCloudBlobStoreProvider Enums since the other logic has been isolated and is not vendor specific.

See apache#2865 for more details

* Refactored JCloud Tiered Storage

* Refactored JCloud Tiered Storage

* Added missing import statements

* Refactored JCloud Tiered Storage

* Refactored JCloud Tiered Storage

* Added missing import statements

* Refactored JCloud Tiered Storage

* Refactored JCloud Tiered Storage

* Added missing import statements

* fix test

* add test logs

* fix logs

* fix test

* fix

* fix

* fix broker log

* fix configuration

* fix

* fix

* fix

* fix get BlobStore

* repair test presto query tiered storage data

* fix test

* fix test

* fix test TestPrestoQueryTieredStorage

* fix

* fix

Co-authored-by: Sijie Guo <sijie@apache.org>
Co-authored-by: gaoran10 <gaoran_10@126.com>
Co-authored-by: xiaolong.ran <rxl@apache.org>
flowchartsman pushed a commit to flowchartsman/pulsar that referenced this pull request Nov 17, 2020
Motivation

In order to facilitate the support of additional JClouds-supported providers, we first needed to clean up the existing code, as there were a lot of if/then/else constructs throughout the code that were based on the assumption that we either supported AWS S3 or Google Cloud Storage. I didn't want to keep adding else if's to these code blocks for every new provider we add, so I decided to refactor the code to make it a bit cleaner

Modifications

in addition to being home for most of the aforementioned if/then/else blocks, the BlobStoreManagedLedgerOffloader class had multiple responsibilities in addition to providing an implementation for the LedgerOffloader interface. My goal was to simplify this class such that its only responsibility was to implement the LedgerOffloader interface.

The other major change was the addition of the JCloudBlobStoreProvider enum, which implements 3 interfaces that allow for it to handle the provider specific logic for things such as acquiring the credentials, validating the configuration, and creating a provider-specific instance of BlobStore.

Result

After this change, we will be able to easily add support for additional JClouds-supported providers by simply adding new elements to the JCloudBlobStoreProvider Enums since the other logic has been isolated and is not vendor specific.

See apache#2865 for more details

* Refactored JCloud Tiered Storage

* Refactored JCloud Tiered Storage

* Added missing import statements

* Refactored JCloud Tiered Storage

* Refactored JCloud Tiered Storage

* Added missing import statements

* Refactored JCloud Tiered Storage

* Refactored JCloud Tiered Storage

* Added missing import statements

* fix test

* add test logs

* fix logs

* fix test

* fix

* fix

* fix broker log

* fix configuration

* fix

* fix

* fix

* fix get BlobStore

* repair test presto query tiered storage data

* fix test

* fix test

* fix test TestPrestoQueryTieredStorage

* fix

* fix

Co-authored-by: Sijie Guo <sijie@apache.org>
Co-authored-by: gaoran10 <gaoran_10@126.com>
Co-authored-by: xiaolong.ran <rxl@apache.org>
codelipenghui pushed a commit that referenced this pull request Nov 21, 2020
# Motivation

The PR #6335 lost some PR changes, related PRs as below.

1. PR 4196 (2019/5/29 Merli)
Configure static PulsarByteBufAllocator to handle OOM errors (#4196)

2. PR 5356 (2019/10/30 Kelly)
[TIEREDSTORAGE] Only seek when reading unexpected entry (#5356)

3. PR 4433 (2019/6/4 Higham)
[tiered-storage] Add support for AWS instance and role creds (#4433)
codelipenghui pushed a commit that referenced this pull request Nov 21, 2020
# Motivation

The PR #6335 lost some PR changes, related PRs as below.

1. PR 4196 (2019/5/29 Merli)
Configure static PulsarByteBufAllocator to handle OOM errors (#4196)

2. PR 5356 (2019/10/30 Kelly)
[TIEREDSTORAGE] Only seek when reading unexpected entry (#5356)

3. PR 4433 (2019/6/4 Higham)
[tiered-storage] Add support for AWS instance and role creds (#4433)

(cherry picked from commit 68759ff)
codelipenghui pushed a commit to streamnative/pulsar-archived that referenced this pull request Nov 21, 2020
…ache#8630)

# Motivation

The PR apache#6335 lost some PR changes, related PRs as below.

1. PR 4196 (2019/5/29 Merli)
Configure static PulsarByteBufAllocator to handle OOM errors (#4196)

2. PR 5356 (2019/10/30 Kelly)
[TIEREDSTORAGE] Only seek when reading unexpected entry (#5356)

3. PR 4433 (2019/6/4 Higham)
[tiered-storage] Add support for AWS instance and role creds (#4433)

(cherry picked from commit 68759ff)
(cherry picked from commit 8793963)
codelipenghui pushed a commit to streamnative/pulsar-archived that referenced this pull request Nov 21, 2020
…ache#8630)

# Motivation

The PR apache#6335 lost some PR changes, related PRs as below.

1. PR 4196 (2019/5/29 Merli)
Configure static PulsarByteBufAllocator to handle OOM errors (#4196)

2. PR 5356 (2019/10/30 Kelly)
[TIEREDSTORAGE] Only seek when reading unexpected entry (#5356)

3. PR 4433 (2019/6/4 Higham)
[tiered-storage] Add support for AWS instance and role creds (#4433)

(cherry picked from commit 68759ff)
(cherry picked from commit 8793963)
(cherry picked from commit 0388a1a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants