Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a pre-installed Minikube instance -- porting over logic from PR 521 #14

Merged
merged 14 commits into from Jan 12, 2018

Conversation

ifilonenko
Copy link
Member

Porting logic from: apache-spark-on-k8s/spark#521

Copy link
Member

@ssuchter ssuchter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good. I have some comments to take a look at and fix if you think they should be.

Utils.tryWithSafeFinally {
val buf = new Array[Byte](1024)
var len = in.read(buf)
while (len != -1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This input reading loop is typically done with a break command inside an infinite loop. (So there aren't two reads in the code)

Not that it's too important to fix this, but have you considered using an approach like this so it can use breaks?

https://alvinalexander.com/scala/break-continue-for-while-loops-in-scala-examples-how-to

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto to above ^^

scala.util.control.Exception.ignoring(classOf[IOException]) {
// FIXME: We copy the stream on the level of bytes to avoid encoding problems.
Utils.tryWithSafeFinally {
val buf = new Array[Byte](1024)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line 99-105 go into its own subroutine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative, do we have Guava available? Could we just use ByteStreams.copy() instead of the entire body here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Utils are taken from spark core.... should I modify them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that the cleanest modification (using Guava) is something we should do. There's no reason to replicate this code from the Spark core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this class being used anywhere anymore actually, so we can remove this.

def buildSparkDockerImages(): Unit = {
Eventually.eventually(TIMEOUT, INTERVAL) { dockerClient.ping() }
// Building Python distribution environment
val pythonExec = sys.env.get("PYSPARK_DRIVER_PYTHON")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should print out (log) which python gets selected? I can imagine debugging when it's hard to tell what is actually running because of this automatic switching behavior.

.orElse(sys.env.get("PYSPARK_PYTHON"))
.getOrElse("/usr/bin/python")
val builder = new ProcessBuilder(
Seq(pythonExec, "setup.py", "sdist").asJava)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the path to this setup.py be more carefully specified? It seems like the above code determines where the dockerfiles location more precisely, but maybe I'm missing something there. This just seems to be relevant to the current directory.

@ifilonenko
Copy link
Member Author

ifilonenko commented Dec 30, 2017

Problem I am having is that:

com.spotify.docker.client.exceptions.DockerException: 
ProgressMessage{id=null, status=null, stream=null, 
error=lstat dockerfiles/spark-base/entrypoint.sh: no such file or directory, 
progress=null, progressDetail=null}

I have no idea what this is stemming from as the same logic is used in the master branch atm.

@ifilonenko
Copy link
Member Author

I noticed you guys modified the target docker-file to have this line instead:
COPY kubernetes/dockerfiles/spark-base/entrypoint.sh /opt/
which distribution are you pointing to?

@ifilonenko
Copy link
Member Author

Above problems resolved when pulling from spark/master.

Copy link
Member

@kimoonkim kimoonkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First batch of comments. PTAL.

README.md Outdated
Running the integration tests requires a Spark distribution package tarball that
contains Spark jars, submission clients, etc. You can download a tarball from
http://spark.apache.org/downloads.html. Or, you can create a distribution from
source code using `make-distribution.sh`. For example:

```
$ git clone git@github.com:apache/spark.git
$ https://github.com/apache/spark.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this line mean only with a URL?

README.md Outdated
$ mvn clean integration-test \
-Dspark-distro-tgz=spark/spark-2.3.0-SNAPSHOT-bin.tgz \
-DextraScalaTestArgs=-Dspark.docker.test.persistMinikube=true
-DextraScalaTestArgs="-Dspark.kubernetes.test.master=k8s://https://<master>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the master parameter now required? If so, explain how people can get it from a minikube?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl cluster-info is already discussed above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Master parameter shouldn't be required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...But this section is under the "running against an arbitrary cluster" section, so in this context the parameter would be required.

@@ -143,6 +152,10 @@ private[spark] class KubernetesSuite extends FunSuite with BeforeAndAfterAll wit
}
}
}
private def doMinikubeCheck(): Unit = {
assume(testBackend == MinikubeTestBackend)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I remember @foxish just deleted this line recently so integration tests can run against GCE. Can you check with @foxish?

Copy link
Contributor

@mccheah mccheah Jan 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Comment was for the wrong section)

private val MINIKUBE_STARTUP_TIMEOUT_SECONDS = 60

// NOTE: This and the following methods are synchronized to prevent deleteMinikube from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are deleting this note. Maybe we don't need "synchronized" any more. Kill "synchronized" below?

@@ -37,5 +37,7 @@ private[spark] class GCETestBackend(val master: String) extends IntegrationTestB
defaultClient
}

override def name(): String = GCE_TEST_BACKEND
override def dockerImageTag(): String = {
return System.getProperty(KUBERNETES_TEST_DOCKER_TAG_SYSTEM_PROPERTY, "latest")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not generate a random ID like minikube backend code does? i.e. UUID.randomUUID().toString.replaceAll("-", "")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Minikube case we're building these images from scratch. In the GCE case, we don't create a Docker manager and hence are not building the images there. But this in itself seems to contradict this section of our readme:

If you're using a non-local cluster, you must provide an image repository which you have write access to, using the -i option, in order to store docker images generated during the test.

which indicates that GCE-backed tests should be building images as well. Is this correct @foxish?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That readme section is meant to highlight that we push the images to an image repository only in the cloud testing case, and don't have to in the minikube case since the images are built in the minikube VM's docker environment. That documentation pertains only to the use of the script, which avoids using maven for building images.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem then with using a random ID tag here is that it's impossible for this tag to actually match anything. Using "latest" at least guarantees that we pick up some image in the default case.

We can be more strict here and require the tag be explicitly specified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a little closer I think the miscommunication is because the docker image manager isn't serving the image tag but is instead being handed the tag by the test backend. The responsibilities thus aren't clear and the coupling of the provision of a custom tag vs. a generated tag, and how that influences whether or not images are built or deleted, is unclear.

I'm moving the generation of the tag vs. using the user-provided one into the docker manager. This should hopefully clarify the connection.

import org.apache.spark.deploy.k8s.integrationtest.Logging
import org.apache.spark.deploy.k8s.integrationtest.Utils.{RedirectThread, tryWithResource}

private[spark] class KubernetesSuiteDockerManager(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make git associate SparkDockerImageBuilder.scala with this file so we can see the change side by side and also keep the history? Maybe find how-to at https://stackoverflow.com/questions/433111/how-to-make-git-mark-a-deleted-and-a-new-file-as-a-file-move

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried most of the suggestions in that stack overflow post to no avail, unfortunately, as the diff against master will still treat the move as a remove and add operation. We can make the modification a separate commit from the rename, and then even if github UI doesn't show the diff in the main PR view, one can navigate to the specific commit and view the diff from there. Is that a reasonable compromise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimoonkim I rewrote some of the git history and 7470472 is the specific diff for this file. I suppose git is always treating it as removing the whole thing and adding it as a new file because of the large number of changes that were actually made here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Sounds good enough. We can take a look at the commit. Thanks!

@kimoonkim
Copy link
Member

rerun integration tests please

@kimoonkim
Copy link
Member

I have two high level design questions. Perhaps @ifilonenko @mccheah, @ssuchter or @foxish know the answers.

  1. Are we targetting to run multiple integration tests simultaneously using a single shared minikube instance? Then what is the mechanism to cap the max number of tests so the minikube doesn't blow up?

  2. When we share a single minikube instance, it's important for each test to clean up when it finishes. Like delete all images, pods, secrets, etc. What is our story against poorly written tests that failed to clean up? How do we ensure the leaked resources over time do not make all other tests fail later?

@foxish
Copy link
Member

foxish commented Jan 5, 2018 via email

@kimoonkim
Copy link
Member

I like the namespace story. I don't think dockerd does image GC (I could be wrong here). We'd need follow-up on that. I wonder if we want to move the namespace/docker clean up part to non-scala code so that there is less chance for buggy scala code in a PR to screw it up, but we can delay this until it becomes a real problem.

@mccheah
Copy link
Contributor

mccheah commented Jan 9, 2018

@ifilonenko heads up, I force pushed and rewrote some of the git history to address #14 (comment). (In retrospect I should have done git rm and git revert, etc. to only push onto the history stack though)

@kimoonkim
Copy link
Member

  1. Are we targetting to run multiple integration tests simultaneously using a single shared minikube instance? Then what is the mechanism to cap the max number of tests so the minikube doesn't blow up?

I wonder if we can use namespace resource quota for this. @foxish Does k8s reject a new namespace add request if the cluster would run out of resource by accepting the namespace's max resources? If yes, then the shared minikube can limit the max number of test namespaces, which in turn limit the max number of simultaneous tests?

@kimoonkim kimoonkim changed the title Porting over logic from PR 521 Use a pre-installed Minikube instance -- porting over logic from PR 521 Jan 9, 2018
Copy link
Member

@kimoonkim kimoonkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the change!

Let's delay merging until we resolve Jenkins setup issues somehow.

@ssuchter
Copy link
Member

ssuchter commented Jan 9, 2018 via email

@ifilonenko
Copy link
Member Author

ifilonenko commented Jan 10, 2018

This LGTM. Thanks for the the work @mccheah

@ssuchter
Copy link
Member

Kimoon brought up the question to me of (eventually) when we want to have a persistent minikube, how will be control the number of tests running against it simultaneously. E.g. maybe it can only hold 4 tests (or 8 or whatever). Turns out the Jenkins lockable resource plugin supports this - you can not just have a single resource that gets locked, you can have N resources that you can request M of for one test (e.g. N=4, M=1) and then after that, builds will block. It's like a counting semaphore, more than just a mutex.

@kimoonkim
Copy link
Member

Right. The question I asked earlier is:

  1. Are we targetting to run multiple integration tests simultaneously using a single shared minikube instance? Then what is the mechanism to cap the max number of tests so the minikube doesn't blow up?

So we have two potential answers:

A. Use Jenkins lockable resource plugin

E.g. maybe it can only hold 4 tests (or 8 or whatever). Turns out the Jenkins lockable resource plugin supports this

B. Use namespace resource quota

I wonder if we can use namespace resource quota for this. @foxish Does k8s reject a new namespace add request if the cluster would run out of resource by accepting the namespace's max resources? If yes, then the shared minikube can limit the max number of test namespaces, which in turn limit the max number of simultaneous tests?

Which one do we like more?

@mccheah
Copy link
Contributor

mccheah commented Jan 10, 2018

In the SIG meeting on 01/10/2018 we decided that we can use the locking plugin with a certain count of concurrently runnable builds. We should be able to ensure that whatever build runs, that build can get resources to run tests and not worry about timing out. For now we can leave the number of concurrent builds as 1 and experiment with what happens as we scale up moving forward.

Think we agreed that we want this to go in - can we merge this and then install Minikube on Pepperdata Jenkins that runs periodically with this?

@ssuchter
Copy link
Member

I'm fine with that plan.

@ssuchter
Copy link
Member

I should mention in response to @kimoonkim's different options, I prefer using the Jenkins plugin (or another job locking mechanism until the Jenkins plugin is available) over the namespace limit solution. I like the simplicity of the integration test code always expecting that it's submissions to Kubernetes will succeed, and if they fail, the entire integration test should fail. I think that if we use the namespace limit solution, we'd have to code detection of that reason that the Kubernetes submissions fail, in order to avoid spurious failures of the integration test system do to an unfortunate co-occurrence of builds.

Doing both limiting methods would avoid spurious failures, but I'm not sure of the value of having the namespace limit if we have locks beforehand.

@kimoonkim
Copy link
Member

I don't have a strong preference either way. But there is a longer-term downside in using the lockable resource Jenkins plugin. The resource need of integration tests would change over time when we put more helper components like RSS, HDFS, token refresh server, etc. We would have to reconfigure the lock plugin whenever we find it underestimated the resource requirement. A more ominous case is when a new PR or two add those extra helper components. Suddenly not every test run needs the same amount of resources. It's hard to express this using the lock plugin.

But I think we can set aside these concerns for now in pursuit of simplicity. And revisit other options if and when we actually face these issues.

I am personally more worried about the resource clean up issue. I understand we can rely on pods to be removed when the enclosing namespace gets deleted. And this PR cleans up docker images in the scala code. But a new poorly written PR can easily disable the clean-up scala code. So it won't be fool-safe. I wonder if we need more forceful clean-up mechanism, sitting outside the scala code.

  1. When we share a single minikube instance, it's important for each test to clean up when it finishes. Like delete all images, pods, secrets, etc. What is our story against poorly written tests that failed to clean up? How do we ensure the leaked resources over time do not make all other tests fail later?

@mccheah
Copy link
Contributor

mccheah commented Jan 11, 2018

We should probably have scripts that deal with the Docker cleanup outside of the Scala code. There's still some messiness related to the fact that we're building images in two places - see https://github.com/apache-spark-on-k8s/spark-integration/issues/30. Basically the Scala code should be best-effort but the scripts should make the final guarantees.

@mccheah
Copy link
Contributor

mccheah commented Jan 12, 2018

Are we good to merge this? I'm doing some refactors that will heavily depend on both this and #29.

@ssuchter
Copy link
Member

I'm good to go.

@foxish
Copy link
Member

foxish commented Jan 12, 2018

Saw that the docker image management moves into the scripts in #31. This LGTM. Thanks Matt & Ilan.

@foxish foxish merged commit 9051b55 into apache-spark-on-k8s:master Jan 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants