New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Make action container log reading more robust #2137

Closed

sven-lange-last wants to merge 3 commits into apache:master from sven-lange-last:container-logs

Member

sven-lange-last commented Apr 13, 2017

This PR fixes a small defect in action container log file reading in the invoker and addresses a premature file end when reading the log file.

cbickel requested review from markusthoemmes and rabbah

April 13, 2017 12:31

markusthoemmes reviewed

View reviewed changes

core/invoker/src/main/scala/whisk/core/container/ContainerUtils.scala Outdated

@@ @@ -154,7 +154,7 @@ trait ContainerUtils { @@
                    * Reads the contents of the file at the given position.
                    * It is assumed that the contents does exist and that region is not changing concurrently.
                    */
-                  def getDockerLogContent(containerHash: ContainerHash, start: Long, end: Long, mounted: Boolean)(implicit transid: TransactionId): Array[Byte] = {
+                  def getDockerLogContent(containerHash: ContainerHash, start: Long, end: Long, mounted: Boolean)(implicit transid: TransactionId): java.nio.ByteBuffer = {

Contributor

markusthoemmes Apr 18, 2017

import java.nio.ByteBuffer and remove the qualifications please.

Member Author

sven-lange-last Apr 18, 2017

Changed as advised.

markusthoemmes reviewed

View reviewed changes

core/invoker/src/main/scala/whisk/core/container/ContainerUtils.scala

+                              } else {
+                                  // read < 0 means premature end of file => terminate loop
+                                  remain = 0
+                              }

Contributor

markusthoemmes Apr 18, 2017

Per personal discussion: We'll need a test that checks for this condition (and shows "wrong" behavior before the change)

Member

rabbah Apr 18, 2017

I posted and deleted two comments this morning as I try to remember what my concern was previously when I reviewed this PR. I can neither find that comment or remember exactly what it was but there are two high level bits. One, same as Markus' - test for the corner case? Second, looking back at the code, read() should not return -1 since we are reading a pre-determined number of bytes pool.getDockerLogContent(con.containerId, con.lastLogSize, size, runningInContainer).

Member Author

sven-lange-last Apr 18, 2017

We are only talking about edge cases here.

Current line 168 is wrong and would set remainto zero whenever channel.read() was able to read something - in other words: almost always. This bug actually delegates all problems with reading to the caller of getDockerLogContent() eating up the retry count.

Two different cases are possible:

It could happen that channel.read() is not able to fill the whole buffer in the first attempt, i.e. 0 < read < remain. The bug in line 168 causes the loop to terminate. The calling function cannot find the sentinel marker, will drop the read buffer and will retry - invoking getDockerLogContent() again. This condition may arise if the log file is large and a lot of I/O is going on.
It could happen that Docker truncates the log files after the its size has been determined by the caller of getDockerLogContent(). In case of full truncation, the loop is repeated until channel.read() returns a positive value - potentially blocking forever.

Both cases are expected to rarely happen in the wild - but if they happen, they can degrade performance considerably. And the fix is easy.

My recent commit 1e1494b847f4e0548d66db9d2108b3b514b5e082 adds unit tests.

Member

rabbah Apr 18, 2017

read - read 😳Thanks for pointing that out.

markusthoemmes reviewed

View reviewed changes

tests/src/test/scala/whisk/core/container/test/ContainerUtilsTests.scala Outdated

+                      val logText = ""
+                      val size = writeLogFile(fixture, logText)
+                      val buffer = fixture.cu.getDockerLogContent(containerHash = ContainerHash.fromString("0123"), start = 0, end = size, mounted = false)

Contributor

markusthoemmes Apr 18, 2017

externalize the default ContainerHash. We don't care about it anyways.

Member Author

sven-lange-last Apr 19, 2017

Makes sense.

markusthoemmes reviewed

View reviewed changes

tests/src/test/scala/whisk/core/container/test/ContainerUtilsTests.scala Outdated

+                      val logContent = new String(buffer.array, buffer.arrayOffset, buffer.position, StandardCharsets.UTF_8)
+                      logContent shouldBe logText
+                      stream.size() shouldBe 0

Contributor

markusthoemmes Apr 18, 2017

stream should have size 0

Member Author

sven-lange-last Apr 19, 2017

Good proposal - thanks. Changed.

markusthoemmes reviewed

View reviewed changes

tests/src/test/scala/whisk/core/container/test/ContainerUtilsTests.scala Outdated

+                      val qlen = logText.length / 4 // length of a quarter
+                      val from = qlen // start reading from second quarter
+                      val to = from + 2 * qlen // read second and third quarter
+                      val expectedText = logText.substring(from, to)

Contributor

markusthoemmes Apr 18, 2017

Why so complex? Why not just:

val logText = "hello scala, this is quite a nice thing we got running here, eh?"
val from  = 4
val to = 10

...

logContent shouldBe logText.substring(from, to)

Member Author

sven-lange-last Apr 19, 2017

Done

markusthoemmes reviewed

View reviewed changes

tests/src/test/scala/whisk/core/container/test/ContainerUtilsTests.scala

+                          writer.close()
+                          file.delete()
+                      }
+                  }

Contributor

markusthoemmes Apr 18, 2017

This looks a bit overcomplicated + magic happens (methodnames need to match). Why not take a simpler approach like:

def createFile(content: String): File = //create your file and write the string to it

and then for each test you just initialize the class under test with the respective file:

it should .... in {
    val file = createFile("text")
    val utils = new TestContainerUtils(file)
}

I'd say that's more readable and involves far less magic. thoughts?

Member Author

sven-lange-last Apr 19, 2017

I was using fixtures because they provide a central place for setting and - particularly - cleaning up. The fixture makes sure that temporary files are closed and removed after each test.

All this is also possible with the approach you suggest above - but requires a try ... finallybracket around each single test.

markusthoemmes reviewed

View reviewed changes

tests/src/test/scala/whisk/core/container/test/ContainerUtilsTests.scala

+                  }
+              }
+              class ContainerUtilsTester(var logFile: File)(implicit val logging: Logging) extends ContainerUtils {

Contributor

markusthoemmes Apr 18, 2017 •

edited

Loading

If you apply the changes suggested above, this can become a val or even easier: a private member (omit the keyword alltogether)

Member Author

sven-lange-last Apr 19, 2017

I understand... When keeping the fixture approach, test provide an empty result on failure will be invoked with a fixture that isn't really used...

At the same time, said test overrides the logFile to a file that doesn't exist to provoke a failure.

markusthoemmes reviewed

View reviewed changes

tests/src/test/scala/whisk/core/container/test/ContainerUtilsTests.scala Outdated

+                  }
+                  it should "tolerate premature end of log file" in { fixture =>
+                      val logText = (1 to 2).map(i => s"Line ${i}\n").mkString("")

Contributor

markusthoemmes Apr 18, 2017

If you want to write lines, why not do

(1 to 2).map(i => s"Line ${i}").mkString("\n")

Member Author

sven-lange-last Apr 19, 2017

Your proposal does only add \n between the strings included in the range. In other words: with your proposal, the last line won't be followed by a \n.

Member

rabbah Apr 19, 2017

There's a mkstring variant with separators at start, middle and end.

markusthoemmes reviewed

View reviewed changes

tests/src/test/scala/whisk/core/container/test/ContainerUtilsTests.scala Outdated

+                  with BeforeAndAfter
+                  with Matchers
+                  with StreamLogging
+                  with TimeLimitedTests {

Contributor

markusthoemmes Apr 18, 2017

Why is this needed?

Member Author

sven-lange-last Apr 19, 2017

Time limitation of tests is not strictly needed at the moment. The tests perform real I/O instead of mocked I/O so that a time limitation may prevent too long runtimes when running this test suite as part of a full build.

In addition, my initial assumption was to prepare unit tests for the original code. With the bug fixed by this PR, at least one unit test would have looped such that a time limitation had been necessary to prevent excessive test runtime.

Member Author

sven-lange-last Apr 19, 2017

Just discussed with @markusthoemmes - removing the time limit because it isn't required at the moment.

markusthoemmes reviewed

View reviewed changes

tests/src/test/scala/whisk/core/container/test/ContainerUtilsTests.scala Outdated

+                  with StreamLogging
+                  with TimeLimitedTests {
+                  val timeLimit = 100 millis

Contributor

markusthoemmes Apr 18, 2017

Unused value.

Member Author

sven-lange-last Apr 19, 2017

See http://doc.scalatest.org/1.8/org/scalatest/concurrent/TimeLimitedTests.html.

timeLimit overrides a def from trait org.scalatest.concurrent.TimeLimitedTests activating the specified time limit for each single test.

Member Author

sven-lange-last Apr 19, 2017

I added a comment to this line to point this out.

sven-lange-last force-pushed the container-logs branch from 1e1494b to 94d4538 Compare

April 19, 2017 11:47

rabbah approved these changes

View reviewed changes

Member Author

sven-lange-last commented Apr 25, 2017 •

edited

Loading

PG4 - 232 running

sven-lange-last force-pushed the container-logs branch from 94d4538 to 12ae65a Compare

May 9, 2017 11:17

rabbah assigned markusthoemmes

rabbah added bug invoker review labels

sven-lange-last added 3 commits

July 17, 2017 10:54


          Make action container log reading more robust

6fe63e9


          Add unit tests for getDockerLogContent

d050ef7


          Address review comments

74a1e81

rabbah force-pushed the container-logs branch from 12ae65a to 74a1e81 Compare

July 17, 2017 14:54

Member

rabbah commented Jul 17, 2017

@markusthoemmes rebased this to master; pg4/552 running since I couldn't verify the above noted pg.

Member

rabbah commented Jul 17, 2017

pg 4/552 🔵

andrewnet approved these changes

View reviewed changes

andrewnet commented Jul 31, 2017 •

edited

Loading

Hi, can anyone merge this fix into master? I am still facing the problem and would like to give it a try once this is pulled in. I would have done it if I have write access.

Member

rabbah commented Jul 31, 2017

We've actually switched to a new implementation of the invoker which doesn't have this bug. Are you seeing the issue still in bluemix?

Member

rabbah commented Jul 31, 2017

This is the code that's now active https://github.com/apache/incubator-openwhisk/blob/master/core/invoker/src/main/scala/whisk/core/containerpool/docker/DockerContainer.scala. The code affected by this pr is effectively deadcode we plan on removing.

andrewnet commented Jul 31, 2017 •

edited

Loading

Yes, I am still seeing logs got truncated in bluemix. The action was completed successfully (I can see the expected response in activation record), but the log was not completed (the final log entry didn't show up)

andrewnet commented Jul 31, 2017

I have opened a defect #2548 to report the issue separately since this defect is out dated as @rabbah.

Contributor

markusthoemmes commented Aug 17, 2017

@sven-lange-last want to port this to the new folderstructure?

Member

rabbah commented Sep 7, 2017

@sven-lange-last @markusthoemmes is this still relevant or now out of date and should be closed?

Contributor

markusthoemmes commented Sep 11, 2017

I think we can safely close this, the code is gone anyway.

markusthoemmes closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment