Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat action code as attachments #3945

Merged
merged 18 commits into from
Aug 20, 2018
Merged

Conversation

chetanmeh
Copy link
Member

@chetanmeh chetanmeh commented Aug 6, 2018

Treats action code as attachment for all kinds in default runtime.

Supersedes #3286.

Description

This PR builds on #3286 and enables use of attachment mode for all kinds. Some key aspects are

  1. Binary code (like zip/jar) is stored in raw form (NOT Base64 encoded) in attachment
  2. Change is backward compatible in reading code
  3. Upon update or new action being created the code would be stored as attachment
  4. attachmentType property of runtime.json is now not used. Instead one of the below is used
    • text/plain(UTF-8) - For plain string which are not base64 encoded
    • application/octet-stream - For codes which are Base64 string. Such codes are stored in raw form in persistent store

Note that with support for attachment inlining (#3709) it may happen that small code is still stored as part of same document (but in attachment object format)

Forms of code

Code related to an action can be stored in following forms

String property

{
  "exec": {
    "kind": "java",
    "code": "ZHViZWU=",
    "binary": true,
    "main": "hello"
  }
}

For Blackbox action also the code is stored as string property

 {
 "exec": {
    "kind": "blackbox",
    "image": "docker-custom.com/openwhisk-runtime/magic/nodejs:0.0.1",
    "binary": false,
    "code": "\"use strict\";\n\nvar fs = require('fs');..."
  }
  }

Attachment

{
  "exec": {
    "kind": "java",
    "code": {
      "attachmentName": "couch:f41be90d-5656-47f5-9be9-0d5656f7f555",
      "attachmentType": "application/java-archive",
      "length": 32768,
      "digest": "sha256-5a373715b8cdcd608059debe9dae2ad95087eb5322321d1d0b862f8330a0e54d"
    },
    "binary": true,
    "main": "hello"
  }
}

Potential Performance Impact

Currently for non java kinds the code is stored as part of same document. With this change the code may be (if size > max-inline-size) stored as attachment. Due to this in case of action invocation if there is a cache miss then it would require 2 remote calls to load the whole action compared to 1 remote call currently.

Note that post #2855 the attachments are now also cached. So if more new containers need to be spinned then they would not incur any further performance impact till action is cached.

Benefits

With this change it would be possible to store larger codes for setups using store which place restriction on maximum document size.

Further it also improves the heap usage for normal cases where the code is not required to be loaded. See #2847

TODO

  • Review moveCodeToAttachment.py
  • Add more compatability tests
  • Review default inline size
  • Restore support for jar property name
  • Use application/octet-stream for binary code. Even for jars and text/plain for non binary code

Related issue and scope

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@codecov-io
Copy link

codecov-io commented Aug 7, 2018

Codecov Report

Merging #3945 into master will decrease coverage by 4.73%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3945      +/-   ##
==========================================
- Coverage   85.44%   80.71%   -4.74%     
==========================================
  Files         146      146              
  Lines        7057     7056       -1     
  Branches      413      422       +9     
==========================================
- Hits         6030     5695     -335     
- Misses       1027     1361     +334
Impacted Files Coverage Δ
...src/main/scala/whisk/core/entity/WhiskAction.scala 86.09% <100%> (ø) ⬆️
.../scala/src/main/scala/whisk/core/entity/Exec.scala 81.69% <100%> (-2.72%) ⬇️
...core/database/cosmosdb/RxObservableImplicits.scala 0% <0%> (-100%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0% <0%> (-95.1%) ⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala 0% <0%> (-92.6%) ⬇️
...whisk/core/database/cosmosdb/CosmosDBSupport.scala 0% <0%> (-81.82%) ⬇️
...abase/cosmosdb/CosmosDBArtifactStoreProvider.scala 0% <0%> (-58.83%) ⬇️
...la/whisk/core/database/cosmosdb/CosmosDBUtil.scala 92% <0%> (-4%) ⬇️
...src/main/scala/whisk/core/entity/Attachments.scala 88.88% <0%> (+5.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b5d805...0982e1e. Read the comment docs.

}
}
val attachment: Attachment[String] = {
code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this correctly, an undefined code results in a DeserializationException? If that's the case, why even bother checking for the "binary" field above if code is None? The binary field also feels redundant, is it ever used?

As I read it, this could be simplified to:

val (binary, attachment) = code match {
  case Some(JsString(c)) => 
    (isBinaryCode(c), attFmt[String].read(c))
  case _ => 
    throw new DeserializationException(s"'code' must be a string defined in 'exec' for '$kind' actions")
}

Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with @chetanmeh, I was indeed missing something:

code can be a JsObject, that breaks my simplification of course.

@chetanmeh chetanmeh added review Review for this PR has been requested and yet needs to be done. and removed wip labels Aug 7, 2018
@chetanmeh
Copy link
Member Author

This PR is ready for review. One aspect which is still to be addressed is should we raise the default max-inline-size (defaults 16 KB) to some higher value to offset any potential perf impact on small function.

val (bytes, attachmentType) = if (binary) {
(Base64.getDecoder().decode(code), ContentTypes.`application/octet-stream`)
} else {
(code.getBytes("UTF-8"), ContentTypes.`text/plain(UTF-8)`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace "UTF-8" with StandardCharsets.UTF_8

@@ -300,33 +300,28 @@ protected[core] object Exec extends ArgNormalizer[Exec] with DefaultJsonProtocol
case None => throw new DeserializationException(s"kind '$kind' not in $runtimes")
}

manifest.attached.map { _ =>
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad push, this is dead code (sorry I forgot to take that out of my patch)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah ... missed checking prior to push. Fixed it now

@chetanmeh chetanmeh requested a review from dubee August 9, 2018 07:58
Copy link
Member

@dubee dubee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chetanmeh, does this use the new database format for newly created actions? If so does this break any existing artifact stores?

@chetanmeh
Copy link
Member Author

@dubee new actions would be stored with code as object. That format was already being used by Java actions so should work fine.

{
"code": {
      "attachmentName": "couch:f41be90d-5656-47f5-9be9-0d5656f7f555",
      "attachmentType": "application/java-archive",
      "length": 32768,
      "digest": "sha256-5a373715b8cdcd608059debe9dae2ad95087eb5322321d1d0b862f8330a0e54d"
    }
}

Is that you were looking for or you meant something else?

Copy link
Member

@dubee dubee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chetanmeh, you may want to alert the dev-list about this change as it is rather significant.

Copy link
Member

@dubee dubee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PG2 3498 ⏳

Copy link
Member

@dubee dubee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chetanmeh, have you run performance tests against this to detect any cold start regressions that may arise from the double database fetch?

"attachmentName": "jarfile",
"attachmentType": "application/java-archive"
"attachmentName": "codefile",
"attachmentType": "text/plain"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won’t break pre-existing java actions I presume.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes attachmentName and attachmentType are not used in actual flow. ContentType is just stored but not interpreted so far for any purpose

} getOrElse {
.map { _ =>
// java actions once stored the attachment in "jar" instead of "code"
val code = obj.fields.get("code").orElse(obj.fields.get("jar")).getOrElse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove this code or jar (separately). It’s been a long time since I made this schema change.

s"finding attachment '[\\w-/:]+' of document 'id: ${action.namespace}/${action.name}").mkString("(?s).*")
val nodeAction = WhiskAction(namespace, aname(), jsDefault(nonInlinedCode(entityStore)), Parameters("x", "b"))
val swiftAction = WhiskAction(namespace, aname(), swift3(nonInlinedCode(entityStore)), Parameters("x", "b"))
val actions = Seq((javaAction, JAVA_DEFAULT), (nodeAction, NODEJS6), (swiftAction, SWIFT3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there defaults for the other runtimes to use as well? Node 6 and Swift 3 might have short shelf life. Not that imprtant can fix later if there isn’t a default.

@@ -1063,6 +1073,73 @@ class ActionsApiTests extends ControllerTestCommon with WhiskActionsApi {
stream.reset()
}

it should "ensure old and new action schemas are supported" in {
implicit val tid = transid()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks.

| "code": {
| "attachmentName": "foo:bar",
| "attachmentType": "application/java-archive",
| "length": 32768,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this test.

@rabbah
Copy link
Member

rabbah commented Aug 14, 2018

lgtm. As for the performance impact, sure two fetches will be slower than one. The size of the inline chunk could be controlled to tune the performance hit (ie based on expected size of code for example, or set it to 50MB to recover existing behavior). Also whatever comparison in performance will likely not match a production environment and hence skewed. Lastly this is the direction we need to go toward for streaming in any case.

If env CODE_SIZE is set then code would be padded with space * CODE_SIZE
@chetanmeh
Copy link
Member Author

you may want to alert the dev-list about this change as it is rather significant.

I have sent a mail on dev sometime back about this change ... would that be ok?. I can also highlight this on wednesday call

have you run performance tests against this to detect any cold start regressions that may arise from the double database fetch?

@dubee I modified the ColdBlockingInvokeSimulation to support tweaking the code size. So one can run this like below which would result in using a 100kb code size and thus use of attachmentMode. Note that default max-inline-size is 16kb so we need a value higher than that default size

OPENWHISK_HOST="server" USERS="1" REQUESTS_PER_SEC="1" SECONDS="30" CODE_SIZE="100000" ./gradlew gatlingRun-ColdBlockingInvokeSimulation

I tried to run the tests but the numbers are not stable

Pre Change Post Change
image image

Numbers for

  • 1MB code
  • 30 sec run

Below is aggregated stats

image

As these are run on a local setup with all OpenWhisk component running on same host (client on another) they would not convey any real value. To get true sense of impact we would need to run same test on a more prod like setup where CouchDB is remote and also bit populated otherwise it would be streaming stuff from memory!

Unfortunately I do not have access to such a setup so would not be able to get such numbers.

@dubee
Copy link
Member

dubee commented Aug 14, 2018

@chetanmeh, I'll run some performances tests on this.

@rabbah
Copy link
Member

rabbah commented Aug 14, 2018

What are you hoping to show? The real factor is the size of the Inlined code. If you have to do a fetch for the attachment it’s self evident the cost is additive.

Having the size of the average code size and perhaps one std dev would be more informative imo.

Copy link
Member

@dubee dubee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PG5 588 ⏳

Copy link
Member

@dubee dubee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chetanmeh, should we update runtimes.json for php to support attachments?

Copy link
Member

@dubee dubee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PG2 3498 🔵

@chetanmeh
Copy link
Member Author

chetanmeh commented Aug 15, 2018

@dubee Good point. Updated php runtime entry also to use attachment

@chetanmeh
Copy link
Member Author

@dubee Is there anything else to be addressed in this PR?

Would like to have this merged early next week as there are 3 more PR to be done after this to complete #3944

Copy link
Member

@dubee dubee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran some performances tests on these changes. For testing, I lowered the attachment size threshold from 16K to 0KB, so that all action code would be cached. The existing performance tests that we have all passed.

Going to merge this one.

Thanks a lot for taking this over for me @chetanmeh, much appreciated!!

@dubee dubee merged commit 74ffb4d into apache:master Aug 20, 2018
BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
* Treat action code as attachments for all newly created or update action runtimes

* Store base64 encoded streams in raw form

* Add test for exec deserialization compatibility

* Support code stored in jar property

* Use octet stream type for binary and text/plain otherwise

Ignore contentType info from manifest

* Reword exception message to state that code can be string on object

* Drop rewrite of jar files as base64 encoded file

* Remove unnecessary special handling of java code

* Fix expected attachmentType

* Add some more compat tests

* Simplify json deserialization based on Markus patch

* Use std charset

* Add support for tweaking code size via CODE_SIZE

If env CODE_SIZE is set then code would be padded with space * CODE_SIZE

* Use attachment for php runtime also
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
companion review Review for this PR has been requested and yet needs to be done.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants