Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS offload support(2): replace s3client api with jclouds related api #2065

Merged
merged 11 commits into from Jul 20, 2018

Conversation

zhaijack
Copy link
Contributor

@zhaijack zhaijack commented Jul 2, 2018

This is the second part to support Google Cloud Storage offload.
It aims to replace "s3 client" api with "jclouds" api, and make sure unit test and integration test passed.
There will be a following change to add Google Cloud Storage support and related test.

change:
replace s3client api with jclouds related api in S3ManagedLedgerOffloader

Master Issue: #2067

@sijie
Copy link
Member

sijie commented Jul 2, 2018

@zhaijack I have merged the renames in #2064 . you can rebase to latest master now.

@sijie sijie added this to the 2.2.0-incubating milestone Jul 2, 2018
@zhaijack zhaijack force-pushed the jclouds_offloader branch 6 times, most recently from 84e0fbe to 61c1c09 Compare July 4, 2018 07:17
@zhaijack
Copy link
Contributor Author

zhaijack commented Jul 4, 2018

retest this please

error:
org.apache.pulsar.tests.integration.TestS3Offload.configureAndStartBrokers

07:41:58.384 [main] INFO  org.apache.pulsar.tests.PulsarClusterUtils - Connecting to zookeeper 172.18.0.4
~~~~~~~~~ SKIPPED -- [TestClass name=class org.apache.pulsar.tests.integration.TestS3Offload].null([])-------~~~~~~~~~ SKIPPED -- [TestClass name=class org.apache.pulsar.tests.integration.TestS3Offload].null([])-------07:43:06.995 [pool-2-thread-14] ERROR org.apache.pulsar.tests.DockerUtils - Error reading dir from container configuration-store_25b4cf62-be6d-4fed-95ef-1f05974eec0f
com.github.dockerjava.api.exception.NotFoundException: {"message":"lstat /var/lib/docker/aufs/mnt/aed05eca1add853b786134edec4d7a58eed32ca2aa47b1a6c1e8c0b3ed51baa9/pulsar/data/zookeeper: no such file or directory"}

	at com.github.dockerjava.jaxrs.filter.ResponseStatusExceptionFilter.filter(ResponseStatusExceptionFilter.java:47) ~[docker-java-3.0.14.jar:?]
	at org.glassfish.jersey.client.ClientFilteringStages$ResponseFilterStage.apply(ClientFilteringStages.java:140) ~[jersey-client-2.25.jar:?]
	at org.glassfish.jersey.client.ClientFilteringStages$ResponseFilterStage.apply(ClientFilteringStages.java:128) ~[jersey-client-2.25.jar:?]
	at org.glassfish.jersey.process.internal.Stages.process(Stages.java:171) ~[jersey-common-2.25.jar:?]

@zhaijack zhaijack force-pushed the jclouds_offloader branch 4 times, most recently from 69edf1d to 808603f Compare July 4, 2018 15:15
@zhaijack zhaijack changed the title WIP-GCS offload support(2): replace s3client api with jclouds related api GCS offload support(2): replace s3client api with jclouds related api Jul 5, 2018
@zhaijack zhaijack force-pushed the jclouds_offloader branch 2 times, most recently from f27b4c2 to 000675e Compare July 5, 2018 14:54
@zhaijack zhaijack changed the title GCS offload support(2): replace s3client api with jclouds related api WIP - GCS offload support(2): replace s3client api with jclouds related api Jul 5, 2018
@zhaijack
Copy link
Contributor Author

zhaijack commented Jul 5, 2018

Seems integration test still have some issue, change back to Work In Progress

@zhaijack
Copy link
Contributor Author

zhaijack commented Jul 11, 2018

retest this please
for ut:
org.apache.pulsar.broker.service.ReplicatorTest.setup
org.apache.pulsar.broker.service.v1.V1_ReplicatorTest.setup
org.apache.pulsar.broker.service.BrokerBkEnsemblesTests.testSkipCorruptDataLedger
org.apache.pulsar.client.impl.SequenceIdWithErrorTest.testCheckSequenceId

@zhaijack zhaijack changed the title WIP - GCS offload support(2): replace s3client api with jclouds related api GCS offload support(2): replace s3client api with jclouds related api Jul 11, 2018
<include>org.apache.jclouds.api:*</include>
<include>org.apache.jclouds.common:*</include>
<include>org.apache.jclouds.provider:*</include>
<include>com.google.inject.extensions:guice-assistedinject</include>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are new dependencies, which need to be accounted for in the LICENSE/NOTICE.

pom.xml Outdated
@@ -101,6 +101,8 @@ flexible messaging model and an intuitive client API.</description>
<module>tests</module>
<module>pulsar-log4j2-appender</module>
<module>protobuf-shaded</module>
<!-- jclouds shaded for gson conflict: https://issues.apache.org/jira/browse/JCLOUDS-1166 -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're shading so much crap. there must be a better way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ivankelly, Opened issue #2164 for this. Once it is done we could remove this shading.

@@ -268,6 +268,12 @@
<version>${project.version}</version>
<scope>test</scope>
</dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to remove the aws s3 dependency too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, Seems as you commented, the AWSCredentials related things could not get out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need the full s3 dependency now, just whatever AWSCredentials is in.

<dependency>
<groupId>org.apache.jclouds</groupId>
<artifactId>jclouds-allblobstore</artifactId>
<version>2.2.0-SNAPSHOT</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SNAPSHOT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, 2.2.1 has some issue related with gcs multi-part upload, so we use the latest. This is tracked in #2164

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it'll have to be updated before an actual release.

VersionCheck versionCheck,
long objectLen, int bufferSize) {
this.s3client = s3client;
public BackedInputStreamImpl(BlobStore blobStore, String bucket, String key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename BlobStoreBackedInputStreamImpl

@@ -123,6 +191,10 @@ static String indexBlockOffloadKey(long ledgerId, UUID uuid) {
return String.format("%s-ledger-%d-index", uuid.toString(), ledgerId);
}

public boolean createBucket() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will remove it. If needed, we could add it in the future.

etags.add(uploadRes.getPartETag());
Payload partPayload = Payloads.newInputStreamPayload(blockStream);
partPayload.getContentMetadata().setContentLength((long)blockSize);
partPayload.getContentMetadata().setContentType("text/plain");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not text/plain. It's application/octet-stream if anything

addVersionInfo(blobBuilder);
Payload indexPayload = Payloads.newInputStreamPayload(indexStream);
indexPayload.getContentMetadata().setContentLength((long)indexStream.getStreamSize());
indexPayload.getContentMetadata().setContentType("text/plain");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

application/octet-stream

blobStore.removeBlob(bucket, dataBlockOffloadKey(ledgerId, uid));
blobStore.removeBlob(bucket, indexBlockOffloadKey(ledgerId, uid));
// adobe/s3mock not support removeBlobs well.
/*blobStore.removeBlobs(bucket,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are shipping out own s3mock, we should fix it.
And I don't understand how it doesn't support removeBlobs if deleteObjects worked fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will changed it back to have a try.

import org.testng.annotations.AfterMethod;
import org.testng.annotations.BeforeMethod;

public class BlobStoreTestBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3TestBase and the mock can be deleted now, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3TestBase had a feature to allow the test to use real AWS if a system property was set. It would be good to have something similar here to validate that the code is ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, would like to do it later. opened issue #2165 to track this work.

@zhaijack
Copy link
Contributor Author

@ivankelly Thanks for the comments, updated this PR.

try {
Blob blob = blobStore.getBlob(bucket, key);
versionCheck.check(key, blob);
PayloadSlicer slicer = new BasePayloadSlicer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slicer isnt what you need here. getBlob can take a GetOptions parameter. You can set the range in that.

@zhaijack
Copy link
Contributor Author

retest this please

<dependency>
<groupId>org.apache.jclouds</groupId>
<artifactId>jclouds-allblobstore</artifactId>
<version>2.2.0-SNAPSHOT</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it'll have to be updated before an actual release.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class S3BackedReadHandleImpl implements ReadHandle {
private static final Logger log = LoggerFactory.getLogger(S3BackedReadHandleImpl.class);
public class BackedReadHandleImpl implements ReadHandle {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be BlobStoreBackedReadHandleImpl. Basically, anywhere you removed S3 we should have BlobStore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will change it, if you insist on this.

log.error("Exception when get credentials for s3 ", e);
}

String id = "accesskey";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default these to ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. will keep this default value. empty string will meet error in jclouds.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class S3ManagedLedgerOffloader implements LedgerOffloader {
private static final Logger log = LoggerFactory.getLogger(S3ManagedLedgerOffloader.class);
public class ManagedLedgerOffloader implements LedgerOffloader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to BlobStoreManagedLedgerOffloader

@@ -268,6 +268,12 @@
<version>${project.version}</version>
<scope>test</scope>
</dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need the full s3 dependency now, just whatever AWSCredentials is in.

@zhaijack
Copy link
Contributor Author

zhaijack commented Jul 17, 2018

retest this please

for C++/Python Tests build error

@sijie
Copy link
Member

sijie commented Jul 17, 2018

I know it is approved. let's not merge this until I cut 2.1 release.

@sijie sijie added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Jul 20, 2018
@sijie sijie merged commit 3c8d13c into apache:master Jul 20, 2018
sijie pushed a commit that referenced this pull request Jul 24, 2018
This is the third part to support Google Cloud Storage offload.
It aims to support GCS related config in `ManagedLedgerOffloader`.  It is based on PR #2065.  Please only review the commits after [e5b1f7a](e5b1f7a).

Currently it passed the real test in GCS, by test case [ManagedLedgerOffloaderTest#testGcsRealOffload](eda5097#diff-6387a2ab4cf9c9243135b5a34c8522efR584), since lack of GCS mock docker image, will try to use read GCS to do the integration test in later PR.

Master Issue: #2067
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tieredstorage type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants