Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3AttachmentStore #3779

Merged
merged 14 commits into from
Jul 27, 2018
Merged

S3AttachmentStore #3779

merged 14 commits into from
Jul 27, 2018

Conversation

chetanmeh
Copy link
Member

@chetanmeh chetanmeh commented Jun 19, 2018

This PR introduces a S3AttachmentStore which is an AttachmentStore implementation for storing attachments in S3 API compatible object storages

It should be possible to use this with IBM Cloud Object Storage as it supports S3 API.

Description

S3AttachmentStore uses Alpakka S3 Connector for making calls to S3. The object key is constructed in following form

<namespace>/<docId>/<name>

Where

  • namespace - Entity type in lowercase. For now only whiskentity is used with attachments
    • whiskentity
    • whiskauth
    • whiskactivation
  • docId - Document entity path
  • name - attachment name. It would be uuid

It supports following operations

  1. readAttachment - Read is done in a streaming way using GET Object
  2. attach - Stream is uploaded via multipart upload API where upload is done in chunks of 5 MB and then finally stitched together. This avoids buffering the whole stream on disk. So any upload would invoke 2+ remote calls to complete upload
  3. deleteAttachments - This is done via List Objects v2 API to list object under prefix <namespace>/<docId> and then remove them via delete object api
  4. deleteAttachment - This is done via single delete object api

Usage

To enable use of S3AttachmentStore following env variables need to be set

  1. CONFIG_whisk_spi_AttachmentStoreProvider = whisk.core.database.s3.S3AttachmentStoreProvider
  2. CONFIG_whisk_s3_bucket - Bucket name
  3. AWS_ACCESS_KEY_ID - AWS Access Key ID. See Access Keys for details on key id and access key. Also see AWS Java SDK for details on how access keys are looked up
  4. AWS_SECRET_ACCESS_KEY
  5. AWS_REGION - . AWS Region name. See AWS Region Selection for details

See Alpakka S3 for more configuration details. In OpenWhisk setup the config are under whisk.s3.alpakka namespace (see s3-reference.conf for details)

Testing

All attachment related test are consolidated in S3AttachmentStoreBehaviorBase and then there are suites to run it against various options

  1. S3AttachmentStoreMinioTests - Performs testing against a Minio S3 Proxy. These test only need docker to be present.
  2. S3AttachmentStoreAwsTests - Performs testing against S3. It requires following env variables to be configured. These are configured in travis settingsnfor master repo and in my fork. So are executed there.
    a. AWS_ACCESS_KEY_ID
    b. AWS_SECRET_ACCESS_KEY
    c. AWS_REGION

We can also use S3Mock which is built on top of akka stack but then it may later pose problem if our dependency version diverge. So using Minio for now

Dependencies

S3 support pulls in following dependencies

$ ./gradlew :core:controller:dependencies 
     \--- com.lightbend.akka:akka-stream-alpakka-s3_2.11:0.19
          +--- org.scala-lang:scala-library:2.11.12
          +--- com.typesafe.akka:akka-stream_2.11:2.5.12 (*)
          +--- com.typesafe.akka:akka-http_2.11:10.0.13 -> 10.1.1 (*)
          +--- com.typesafe.akka:akka-http-xml_2.11:10.0.13
          |    +--- org.scala-lang:scala-library:2.11.11 -> 2.11.12
          |    +--- com.typesafe.akka:akka-http_2.11:10.0.13 -> 10.1.1 (*)
          |    +--- com.typesafe.akka:akka-stream_2.11:2.4.20 -> 2.5.12 (*)
          |    \--- org.scala-lang.modules:scala-xml_2.11:1.0.6 (*)
          \--- com.amazonaws:aws-java-sdk-core:1.11.295
               +--- software.amazon.ion:ion-java:1.0.2
               \--- joda-time:joda-time:2.8.1

Add s3 support increases the size of controller.tar from 88 MB (92 jars) to 91 MB (97 jars). Following are version change and new jars being pulled in

14a15
> akka-http-xml_2.11-10.0.13.jar
22a24
> akka-stream-alpakka-s3_2.11-0.19.jar
30a33
> aws-java-sdk-core-1.11.295.jar
40a44
> ion-java-1.0.2.jar
52a57
> joda-time-2.8.1.jar
84c89
< scala-xml_2.11-1.0.5.jar
---
> scala-xml_2.11-1.0.6.jar

Below was the original dependency list which was trimmed down to previous one based on the assumption alpakka s3 client uses akk http stack. So commons http and jackson can be excluded. All tests pass with these dependencies removed

     \--- com.lightbend.akka:akka-stream-alpakka-s3_2.11:0.19
          +--- org.scala-lang:scala-library:2.11.12
          +--- com.typesafe.akka:akka-stream_2.11:2.5.12 (*)
          +--- com.typesafe.akka:akka-http_2.11:10.0.13 -> 10.1.1 (*)
          +--- com.typesafe.akka:akka-http-xml_2.11:10.0.13
          |    +--- org.scala-lang:scala-library:2.11.11 -> 2.11.12
          |    +--- com.typesafe.akka:akka-http_2.11:10.0.13 -> 10.1.1 (*)
          |    +--- com.typesafe.akka:akka-stream_2.11:2.4.20 -> 2.5.12 (*)
          |    \--- org.scala-lang.modules:scala-xml_2.11:1.0.6 (*)
          \--- com.amazonaws:aws-java-sdk-core:1.11.295
               +--- commons-logging:commons-logging:1.1.3 -> 1.2
               +--- org.apache.httpcomponents:httpclient:4.5.5 (*)
               +--- software.amazon.ion:ion-java:1.0.2
               +--- com.fasterxml.jackson.core:jackson-databind:2.6.7.1 -> 2.7.7 (*)
               +--- com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:2.6.7
               |    \--- com.fasterxml.jackson.core:jackson-core:2.6.7 -> 2.7.7
               \--- joda-time:joda-time:2.8.1

TODO

Related issue and scope

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@codecov-io
Copy link

codecov-io commented Jun 19, 2018

Codecov Report

Merging #3779 into master will decrease coverage by 4.68%.
The diff coverage is 84.48%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #3779      +/-   ##
=========================================
- Coverage   75.78%   71.1%   -4.69%     
=========================================
  Files         145     146       +1     
  Lines        6897    6953      +56     
  Branches      418     418              
=========================================
- Hits         5227    4944     -283     
- Misses       1670    2009     +339
Impacted Files Coverage Δ
...n/scala/whisk/core/database/CouchDbRestStore.scala 73.23% <0%> (ø) ⬆️
.../scala/whisk/core/database/AttachmentSupport.scala 100% <100%> (ø) ⬆️
.../scala/src/main/scala/whisk/core/WhiskConfig.scala 92.06% <100%> (+0.06%) ⬆️
...ala/whisk/core/database/s3/S3AttachmentStore.scala 85.18% <85.18%> (ø)
...core/database/cosmosdb/RxObservableImplicits.scala 0% <0%> (-100%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0% <0%> (-95.08%) ⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala 0% <0%> (-92.6%) ⬇️
...whisk/core/database/cosmosdb/CosmosDBSupport.scala 0% <0%> (-81.82%) ⬇️
...abase/cosmosdb/CosmosDBArtifactStoreProvider.scala 0% <0%> (-58.83%) ⬇️
...la/whisk/core/database/cosmosdb/CosmosDBUtil.scala 92% <0%> (-4%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e96c1bb...5966452. Read the comment docs.

@chetanmeh chetanmeh added the wip label Jun 19, 2018
@chetanmeh
Copy link
Member Author

Would need some suggestion on where to document steps for configuring S3AttachmentStore

@chetanmeh
Copy link
Member Author

This PR is now ready for review

@chetanmeh
Copy link
Member Author

Needs a PG

@cbickel
Copy link
Contributor

cbickel commented Jun 28, 2018

PG of commit aef88cf is 🔵

@markusthoemmes markusthoemmes added the review Review for this PR has been requested and yet needs to be done. label Jul 23, 2018
failure => s"[ATT_DELETE] '$prefix' internal error, doc: '$docId', failure: '${failure.getMessage}'")
}

override def shutdown(): Unit = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing to shutdown?

Copy link
Member Author

@chetanmeh chetanmeh Jul 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses Akka cached Http connection pool which is bound to ActorSystem lifecycle so need not be explicitly closed. It may be possible to close the pool but comments here indicates that it may pose problem so not attempting them

@rabbah rabbah merged commit 4fea23b into apache:master Jul 27, 2018
BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
This PR introduces a S3AttachmentStore which is an AttachmentStore implementation for storing attachments in S3 API compatible object storages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
artifactstore review Review for this PR has been requested and yet needs to be done.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants