-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCS offload support(3): add configs to support GCS driver #2151
Conversation
…erOffloadServiceAccountKeyFile
retest this please |
@ivankelly can you review this? |
conf/broker.conf
Outdated
@@ -487,6 +487,8 @@ schemaRegistryStorageClassName=org.apache.pulsar.broker.service.schema.Bookkeepe | |||
### --- Ledger Offloading --- ### | |||
|
|||
# Driver to use to offload old data to long term storage (Possible values: S3, aws-s3, google-cloud-storage) | |||
# When using gcs, Make sure both Google Cloud Storage and Google Cloud Storage JSON API are enabled for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: gcs -> google-cloud-storage, though maybe it would be better to rename the driver gcs, as the options for it are prefixed with gcs.
nit: Make -> make
conf/broker.conf
Outdated
@@ -507,6 +509,24 @@ s3ManagedLedgerOffloadMaxBlockSizeInBytes=67108864 | |||
# For Amazon S3 ledger offload, Read buffer size in bytes (1MB by default) | |||
s3ManagedLedgerOffloadReadBufferSizeInBytes=1048576 | |||
|
|||
# For Google Cloud Storage ledger offload region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Google Cloud Storage ledger offload, region where offload bucket is located.
# For Google Cloud Storage ledger offload, Bucket to place offloaded ledger into | ||
gcsManagedLedgerOffloadBucket= | ||
|
||
# For Google Cloud Storage ledger offload, Max block size in bytes. (64MB by default, 5MB minimum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5MB minimum was an AWS imposed constraint. It's probably different for GCS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Yes, It is also 5 MB in GCS.
https://cloud.google.com/storage/docs/json_api/v1/how-tos/multipart-upload
Seems GCS make some alignment with aws s3
conf/broker.conf
Outdated
# For Google Cloud Storage ledger offload, Read buffer size in bytes (1MB by default) | ||
gcsManagedLedgerOffloadReadBufferSizeInBytes=1048576 | ||
|
||
# Note: you need first config this to get gcs avliable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line. It's redundant.
conf/broker.conf
Outdated
gcsManagedLedgerOffloadReadBufferSizeInBytes=1048576 | ||
|
||
# Note: you need first config this to get gcs avliable. | ||
# For Google Cloud Storage credentials of service account key path, usually it is a json file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Google Cloud Storage, path to json file containing service account credentials.
For more details, see the "Service Accounts" section of https://support.google.com/googleapi/answer/6158849
private int gcsManagedLedgerOffloadReadBufferSizeInBytes = 1024 * 1024; // 1MB | ||
|
||
// For Google Cloud Storage credentials of service account key path | ||
// reference this page for more details of service account key: https://support.google.com/googleapi/answer/6158849 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All comments from conf file also apply here, though I'm not sure if this comments are even useful. We should probably just group them all together and point user to broker.conf if they want details, as they will never update the values in this file.
try { | ||
gcsKeyContent = Files.toString(new File(gcsKeyPath), Charset.defaultCharset()); | ||
} catch (IOException ioe) { | ||
log.error("meet exception when read gcs service account key file "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> "Cannot read GCS service account credentials file " + gcsKeyPath
if (credentials != null) { | ||
id = credentials.getAWSAccessKeyId(); | ||
key = credentials.getAWSSecretKey(); | ||
if (isS3Driver(driver)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't have AWS/GCS specific logic in this class. Instead, it should be in the factory method, maybe broken out into a utility method called from the factory method. Then the id & key should be passed in as parameters to the constructor of this class.
public void testGcsNoBucketConfigured() throws Exception { | ||
ServiceConfiguration conf = new ServiceConfiguration(); | ||
conf.setManagedLedgerOffloadDriver("google-cloud-storage"); | ||
//conf.setGcsManagedLedgerOffloadServiceAccountKeyFile("~/Downloads/project-804d5e6a6f33.json"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't commit commented out code.
Assert.fail("Should have thrown exception"); | ||
} catch (PulsarServerException pse) { | ||
// correct | ||
log.error("expect pse", pse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.info("Expected pse", pse);
@ivankelly Thanks for the comments, updated it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm #shipit
### Motivation s3 mock service doesn't need credential. so we will fail to load credentials on s3 integration tests. that is expected. However there is a regression when introducing GCS support in apache#2151: https://github.com/apache/incubator-pulsar/pull/2151/files#diff-9160ffc9df7afcf42a8335f214393c6dR188 ### Changes Don't throw exception when failed to load aws credentials
* Fix TestS3Offload ### Motivation s3 mock service doesn't need credential. so we will fail to load credentials on s3 integration tests. that is expected. However there is a regression when introducing GCS support in #2151: https://github.com/apache/incubator-pulsar/pull/2151/files#diff-9160ffc9df7afcf42a8335f214393c6dR188 ### Changes * Don't throw exception when failed to load aws credentials * change `log.error` to `log.warn`
This is the third part to support Google Cloud Storage offload.
It aims to support GCS related config in
ManagedLedgerOffloader
. It is based on PR #2065. Please only review the commits after e5b1f7a.Currently it passed the real test in GCS, by test case ManagedLedgerOffloaderTest#testGcsRealOffload, since lack of GCS mock docker image, will try to use read GCS to do the integration test in later PR.
Master Issue: #2067