-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make action container log reading more robust #2137
Conversation
@@ -154,7 +154,7 @@ trait ContainerUtils { | |||
* Reads the contents of the file at the given position. | |||
* It is assumed that the contents does exist and that region is not changing concurrently. | |||
*/ | |||
def getDockerLogContent(containerHash: ContainerHash, start: Long, end: Long, mounted: Boolean)(implicit transid: TransactionId): Array[Byte] = { | |||
def getDockerLogContent(containerHash: ContainerHash, start: Long, end: Long, mounted: Boolean)(implicit transid: TransactionId): java.nio.ByteBuffer = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import java.nio.ByteBuffer
and remove the qualifications please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as advised.
} else { | ||
// read < 0 means premature end of file => terminate loop | ||
remain = 0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per personal discussion: We'll need a test that checks for this condition (and shows "wrong" behavior before the change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted and deleted two comments this morning as I try to remember what my concern was previously when I reviewed this PR. I can neither find that comment or remember exactly what it was but there are two high level bits. One, same as Markus' - test for the corner case? Second, looking back at the code, read()
should not return -1 since we are reading a pre-determined number of bytes pool.getDockerLogContent(con.containerId, con.lastLogSize, size, runningInContainer)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are only talking about edge cases here.
Current line 168 is wrong and would set remain
to zero whenever channel.read()
was able to read something - in other words: almost always. This bug actually delegates all problems with reading to the caller of getDockerLogContent()
eating up the retry count.
Two different cases are possible:
-
It could happen that
channel.read()
is not able to fill the whole buffer in the first attempt, i.e.0 < read < remain
. The bug in line 168 causes the loop to terminate. The calling function cannot find the sentinel marker, will drop the read buffer and will retry - invokinggetDockerLogContent()
again. This condition may arise if the log file is large and a lot of I/O is going on. -
It could happen that Docker truncates the log files after the its size has been determined by the caller of
getDockerLogContent()
. In case of full truncation, the loop is repeated untilchannel.read()
returns a positive value - potentially blocking forever.
Both cases are expected to rarely happen in the wild - but if they happen, they can degrade performance considerably. And the fix is easy.
My recent commit 1e1494b847f4e0548d66db9d2108b3b514b5e082
adds unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read - read 😳Thanks for pointing that out.
val logText = "" | ||
val size = writeLogFile(fixture, logText) | ||
|
||
val buffer = fixture.cu.getDockerLogContent(containerHash = ContainerHash.fromString("0123"), start = 0, end = size, mounted = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
externalize the default ContainerHash. We don't care about it anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
val logContent = new String(buffer.array, buffer.arrayOffset, buffer.position, StandardCharsets.UTF_8) | ||
|
||
logContent shouldBe logText | ||
stream.size() shouldBe 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stream should have size 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good proposal - thanks. Changed.
val qlen = logText.length / 4 // length of a quarter | ||
val from = qlen // start reading from second quarter | ||
val to = from + 2 * qlen // read second and third quarter | ||
val expectedText = logText.substring(from, to) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why so complex? Why not just:
val logText = "hello scala, this is quite a nice thing we got running here, eh?"
val from = 4
val to = 10
...
logContent shouldBe logText.substring(from, to)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
writer.close() | ||
file.delete() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit overcomplicated + magic happens (methodnames need to match). Why not take a simpler approach like:
def createFile(content: String): File = //create your file and write the string to it
and then for each test you just initialize the class under test with the respective file:
it should .... in {
val file = createFile("text")
val utils = new TestContainerUtils(file)
}
I'd say that's more readable and involves far less magic. thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using fixtures because they provide a central place for setting and - particularly - cleaning up. The fixture makes sure that temporary files are closed and removed after each test.
All this is also possible with the approach you suggest above - but requires a try ... finally
bracket around each single test.
} | ||
} | ||
|
||
class ContainerUtilsTester(var logFile: File)(implicit val logging: Logging) extends ContainerUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you apply the changes suggested above, this can become a val
or even easier: a private member (omit the keyword alltogether)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand... When keeping the fixture approach, test provide an empty result on failure
will be invoked with a fixture that isn't really used...
At the same time, said test overrides the logFile
to a file that doesn't exist to provoke a failure.
} | ||
|
||
it should "tolerate premature end of log file" in { fixture => | ||
val logText = (1 to 2).map(i => s"Line ${i}\n").mkString("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to write lines, why not do
(1 to 2).map(i => s"Line ${i}").mkString("\n")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your proposal does only add \n
between the strings included in the range. In other words: with your proposal, the last line won't be followed by a \n
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a mkstring variant with separators at start, middle and end.
with BeforeAndAfter | ||
with Matchers | ||
with StreamLogging | ||
with TimeLimitedTests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time limitation of tests is not strictly needed at the moment. The tests perform real I/O instead of mocked I/O so that a time limitation may prevent too long runtimes when running this test suite as part of a full build.
In addition, my initial assumption was to prepare unit tests for the original code. With the bug fixed by this PR, at least one unit test would have looped such that a time limitation had been necessary to prevent excessive test runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just discussed with @markusthoemmes - removing the time limit because it isn't required at the moment.
with StreamLogging | ||
with TimeLimitedTests { | ||
|
||
val timeLimit = 100 millis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See http://doc.scalatest.org/1.8/org/scalatest/concurrent/TimeLimitedTests.html.
timeLimit
overrides a def
from trait org.scalatest.concurrent.TimeLimitedTests
activating the specified time limit for each single test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment to this line to point this out.
1e1494b
to
94d4538
Compare
PG4 - 232 running |
94d4538
to
12ae65a
Compare
@markusthoemmes rebased this to master; pg4/552 running since I couldn't verify the above noted pg. |
pg 4/552 🔵 |
Hi, can anyone merge this fix into master? I am still facing the problem and would like to give it a try once this is pulled in. I would have done it if I have write access. |
We've actually switched to a new implementation of the invoker which doesn't have this bug. Are you seeing the issue still in bluemix? |
This is the code that's now active https://github.com/apache/incubator-openwhisk/blob/master/core/invoker/src/main/scala/whisk/core/containerpool/docker/DockerContainer.scala. The code affected by this pr is effectively deadcode we plan on removing. |
Yes, I am still seeing logs got truncated in bluemix. The action was completed successfully (I can see the expected response in activation record), but the log was not completed (the final log entry didn't show up) |
@sven-lange-last want to port this to the new folderstructure? |
@sven-lange-last @markusthoemmes is this still relevant or now out of date and should be closed? |
I think we can safely close this, the code is gone anyway. |
This PR fixes a small defect in action container log file reading in the invoker and addresses a premature file end when reading the log file.