Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make action container log reading more robust #2137

Closed
wants to merge 3 commits into from

Conversation

sven-lange-last
Copy link
Member

This PR fixes a small defect in action container log file reading in the invoker and addresses a premature file end when reading the log file.

@@ -154,7 +154,7 @@ trait ContainerUtils {
* Reads the contents of the file at the given position.
* It is assumed that the contents does exist and that region is not changing concurrently.
*/
def getDockerLogContent(containerHash: ContainerHash, start: Long, end: Long, mounted: Boolean)(implicit transid: TransactionId): Array[Byte] = {
def getDockerLogContent(containerHash: ContainerHash, start: Long, end: Long, mounted: Boolean)(implicit transid: TransactionId): java.nio.ByteBuffer = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import java.nio.ByteBuffer and remove the qualifications please.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as advised.

} else {
// read < 0 means premature end of file => terminate loop
remain = 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per personal discussion: We'll need a test that checks for this condition (and shows "wrong" behavior before the change)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted and deleted two comments this morning as I try to remember what my concern was previously when I reviewed this PR. I can neither find that comment or remember exactly what it was but there are two high level bits. One, same as Markus' - test for the corner case? Second, looking back at the code, read() should not return -1 since we are reading a pre-determined number of bytes pool.getDockerLogContent(con.containerId, con.lastLogSize, size, runningInContainer).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are only talking about edge cases here.

Current line 168 is wrong and would set remainto zero whenever channel.read() was able to read something - in other words: almost always. This bug actually delegates all problems with reading to the caller of getDockerLogContent() eating up the retry count.

Two different cases are possible:

  1. It could happen that channel.read() is not able to fill the whole buffer in the first attempt, i.e. 0 < read < remain. The bug in line 168 causes the loop to terminate. The calling function cannot find the sentinel marker, will drop the read buffer and will retry - invoking getDockerLogContent() again. This condition may arise if the log file is large and a lot of I/O is going on.

  2. It could happen that Docker truncates the log files after the its size has been determined by the caller of getDockerLogContent(). In case of full truncation, the loop is repeated until channel.read() returns a positive value - potentially blocking forever.

Both cases are expected to rarely happen in the wild - but if they happen, they can degrade performance considerably. And the fix is easy.

My recent commit 1e1494b847f4e0548d66db9d2108b3b514b5e082 adds unit tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read - read 😳Thanks for pointing that out.

val logText = ""
val size = writeLogFile(fixture, logText)

val buffer = fixture.cu.getDockerLogContent(containerHash = ContainerHash.fromString("0123"), start = 0, end = size, mounted = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

externalize the default ContainerHash. We don't care about it anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

val logContent = new String(buffer.array, buffer.arrayOffset, buffer.position, StandardCharsets.UTF_8)

logContent shouldBe logText
stream.size() shouldBe 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream should have size 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good proposal - thanks. Changed.

val qlen = logText.length / 4 // length of a quarter
val from = qlen // start reading from second quarter
val to = from + 2 * qlen // read second and third quarter
val expectedText = logText.substring(from, to)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so complex? Why not just:

val logText = "hello scala, this is quite a nice thing we got running here, eh?"
val from  = 4
val to = 10

...

logContent shouldBe logText.substring(from, to)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

writer.close()
file.delete()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit overcomplicated + magic happens (methodnames need to match). Why not take a simpler approach like:

def createFile(content: String): File = //create your file and write the string to it

and then for each test you just initialize the class under test with the respective file:

it should .... in {
    val file = createFile("text")
    val utils = new TestContainerUtils(file)
}

I'd say that's more readable and involves far less magic. thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using fixtures because they provide a central place for setting and - particularly - cleaning up. The fixture makes sure that temporary files are closed and removed after each test.

All this is also possible with the approach you suggest above - but requires a try ... finallybracket around each single test.

}
}

class ContainerUtilsTester(var logFile: File)(implicit val logging: Logging) extends ContainerUtils {
Copy link
Contributor

@markusthoemmes markusthoemmes Apr 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you apply the changes suggested above, this can become a val or even easier: a private member (omit the keyword alltogether)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand... When keeping the fixture approach, test provide an empty result on failure will be invoked with a fixture that isn't really used...

At the same time, said test overrides the logFile to a file that doesn't exist to provoke a failure.

}

it should "tolerate premature end of log file" in { fixture =>
val logText = (1 to 2).map(i => s"Line ${i}\n").mkString("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to write lines, why not do

(1 to 2).map(i => s"Line ${i}").mkString("\n")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proposal does only add \n between the strings included in the range. In other words: with your proposal, the last line won't be followed by a \n.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a mkstring variant with separators at start, middle and end.

with BeforeAndAfter
with Matchers
with StreamLogging
with TimeLimitedTests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time limitation of tests is not strictly needed at the moment. The tests perform real I/O instead of mocked I/O so that a time limitation may prevent too long runtimes when running this test suite as part of a full build.

In addition, my initial assumption was to prepare unit tests for the original code. With the bug fixed by this PR, at least one unit test would have looped such that a time limitation had been necessary to prevent excessive test runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just discussed with @markusthoemmes - removing the time limit because it isn't required at the moment.

with StreamLogging
with TimeLimitedTests {

val timeLimit = 100 millis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See http://doc.scalatest.org/1.8/org/scalatest/concurrent/TimeLimitedTests.html.

timeLimit overrides a def from trait org.scalatest.concurrent.TimeLimitedTests activating the specified time limit for each single test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment to this line to point this out.

@sven-lange-last
Copy link
Member Author

sven-lange-last commented Apr 25, 2017

PG4 - 232 running

@rabbah rabbah added bug invoker review Review for this PR has been requested and yet needs to be done. labels Jul 17, 2017
@rabbah
Copy link
Member

rabbah commented Jul 17, 2017

@markusthoemmes rebased this to master; pg4/552 running since I couldn't verify the above noted pg.

@rabbah
Copy link
Member

rabbah commented Jul 17, 2017

pg 4/552 🔵

@andrewnet
Copy link

andrewnet commented Jul 31, 2017

Hi, can anyone merge this fix into master? I am still facing the problem and would like to give it a try once this is pulled in. I would have done it if I have write access.

@rabbah
Copy link
Member

rabbah commented Jul 31, 2017

We've actually switched to a new implementation of the invoker which doesn't have this bug. Are you seeing the issue still in bluemix?

@rabbah
Copy link
Member

rabbah commented Jul 31, 2017

This is the code that's now active https://github.com/apache/incubator-openwhisk/blob/master/core/invoker/src/main/scala/whisk/core/containerpool/docker/DockerContainer.scala. The code affected by this pr is effectively deadcode we plan on removing.

@andrewnet
Copy link

andrewnet commented Jul 31, 2017

Yes, I am still seeing logs got truncated in bluemix. The action was completed successfully (I can see the expected response in activation record), but the log was not completed (the final log entry didn't show up)

@andrewnet
Copy link

I have opened a defect #2548 to report the issue separately since this defect is out dated as @rabbah.

@markusthoemmes
Copy link
Contributor

@sven-lange-last want to port this to the new folderstructure?

@rabbah
Copy link
Member

rabbah commented Sep 7, 2017

@sven-lange-last @markusthoemmes is this still relevant or now out of date and should be closed?

@markusthoemmes
Copy link
Contributor

I think we can safely close this, the code is gone anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug invoker review Review for this PR has been requested and yet needs to be done.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants