Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS offload support(3): add configs to support GCS driver #2151

Merged
merged 13 commits into from
Jul 24, 2018

Conversation

zhaijack
Copy link
Contributor

@zhaijack zhaijack commented Jul 13, 2018

This is the third part to support Google Cloud Storage offload.
It aims to support GCS related config in ManagedLedgerOffloader. It is based on PR #2065. Please only review the commits after e5b1f7a.

Currently it passed the real test in GCS, by test case ManagedLedgerOffloaderTest#testGcsRealOffload, since lack of GCS mock docker image, will try to use read GCS to do the integration test in later PR.

Master Issue: #2067

@zhaijack
Copy link
Contributor Author

retest this please

@sijie
Copy link
Member

sijie commented Jul 23, 2018

@ivankelly can you review this?

@sijie sijie added type/feature The PR added a new feature or issue requested a new feature area/tieredstorage labels Jul 23, 2018
@sijie sijie added this to the 2.2.0-incubating milestone Jul 23, 2018
@sijie sijie requested a review from ivankelly July 23, 2018 21:29
conf/broker.conf Outdated
@@ -487,6 +487,8 @@ schemaRegistryStorageClassName=org.apache.pulsar.broker.service.schema.Bookkeepe
### --- Ledger Offloading --- ###

# Driver to use to offload old data to long term storage (Possible values: S3, aws-s3, google-cloud-storage)
# When using gcs, Make sure both Google Cloud Storage and Google Cloud Storage JSON API are enabled for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: gcs -> google-cloud-storage, though maybe it would be better to rename the driver gcs, as the options for it are prefixed with gcs.
nit: Make -> make

conf/broker.conf Outdated
@@ -507,6 +509,24 @@ s3ManagedLedgerOffloadMaxBlockSizeInBytes=67108864
# For Amazon S3 ledger offload, Read buffer size in bytes (1MB by default)
s3ManagedLedgerOffloadReadBufferSizeInBytes=1048576

# For Google Cloud Storage ledger offload region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Google Cloud Storage ledger offload, region where offload bucket is located.

# For Google Cloud Storage ledger offload, Bucket to place offloaded ledger into
gcsManagedLedgerOffloadBucket=

# For Google Cloud Storage ledger offload, Max block size in bytes. (64MB by default, 5MB minimum)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5MB minimum was an AWS imposed constraint. It's probably different for GCS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Yes, It is also 5 MB in GCS.
https://cloud.google.com/storage/docs/json_api/v1/how-tos/multipart-upload
Seems GCS make some alignment with aws s3

conf/broker.conf Outdated
# For Google Cloud Storage ledger offload, Read buffer size in bytes (1MB by default)
gcsManagedLedgerOffloadReadBufferSizeInBytes=1048576

# Note: you need first config this to get gcs avliable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line. It's redundant.

conf/broker.conf Outdated
gcsManagedLedgerOffloadReadBufferSizeInBytes=1048576

# Note: you need first config this to get gcs avliable.
# For Google Cloud Storage credentials of service account key path, usually it is a json file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Google Cloud Storage, path to json file containing service account credentials.
For more details, see the "Service Accounts" section of https://support.google.com/googleapi/answer/6158849

private int gcsManagedLedgerOffloadReadBufferSizeInBytes = 1024 * 1024; // 1MB

// For Google Cloud Storage credentials of service account key path
// reference this page for more details of service account key: https://support.google.com/googleapi/answer/6158849
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments from conf file also apply here, though I'm not sure if this comments are even useful. We should probably just group them all together and point user to broker.conf if they want details, as they will never update the values in this file.

try {
gcsKeyContent = Files.toString(new File(gcsKeyPath), Charset.defaultCharset());
} catch (IOException ioe) {
log.error("meet exception when read gcs service account key file ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> "Cannot read GCS service account credentials file " + gcsKeyPath

if (credentials != null) {
id = credentials.getAWSAccessKeyId();
key = credentials.getAWSSecretKey();
if (isS3Driver(driver)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have AWS/GCS specific logic in this class. Instead, it should be in the factory method, maybe broken out into a utility method called from the factory method. Then the id & key should be passed in as parameters to the constructor of this class.

public void testGcsNoBucketConfigured() throws Exception {
ServiceConfiguration conf = new ServiceConfiguration();
conf.setManagedLedgerOffloadDriver("google-cloud-storage");
//conf.setGcsManagedLedgerOffloadServiceAccountKeyFile("~/Downloads/project-804d5e6a6f33.json");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't commit commented out code.

Assert.fail("Should have thrown exception");
} catch (PulsarServerException pse) {
// correct
log.error("expect pse", pse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.info("Expected pse", pse);

@zhaijack
Copy link
Contributor Author

@ivankelly Thanks for the comments, updated it.

Copy link
Contributor

@ivankelly ivankelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm #shipit

@sijie sijie merged commit de05a9b into apache:master Jul 24, 2018
sijie added a commit to sijie/pulsar that referenced this pull request Jul 25, 2018
 ### Motivation

s3 mock service doesn't need credential. so we will fail to load credentials on s3 integration tests.
that is expected. However there is a regression when introducing GCS support in apache#2151:
https://github.com/apache/incubator-pulsar/pull/2151/files#diff-9160ffc9df7afcf42a8335f214393c6dR188

 ### Changes

Don't throw exception when failed to load aws credentials
@sijie sijie mentioned this pull request Jul 25, 2018
sijie added a commit that referenced this pull request Jul 25, 2018
* Fix TestS3Offload

 ### Motivation

s3 mock service doesn't need credential. so we will fail to load credentials on s3 integration tests.
that is expected. However there is a regression when introducing GCS support in #2151:
https://github.com/apache/incubator-pulsar/pull/2151/files#diff-9160ffc9df7afcf42a8335f214393c6dR188

 ### Changes

* Don't throw exception when failed to load aws credentials
* change `log.error` to `log.warn`
sijie pushed a commit that referenced this pull request Aug 8, 2018
This is the 4th part to support Google Cloud Storage offload.
It aims to add documentations for GCS. And it is based on PR #2151 

Master Issue: #2067
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tieredstorage type/feature The PR added a new feature or issue requested a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants