Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: send logs directly from builder pods back to builder pod #207

Open
arschles opened this issue Feb 29, 2016 · 6 comments
Open

Proposal: send logs directly from builder pods back to builder pod #207

arschles opened this issue Feb 29, 2016 · 6 comments

Comments

@arschles
Copy link
Member

arschles commented Feb 29, 2016

Note: I believe others have suggested a similar or identical solution to this problem in the past. Hopefully this issue solidifies those ideas.

Rel #185
Rel #199
Rel #298

Problem Statement

As of this writing, the builder does the following to do a build:

  1. Launch a builder pod (slugbuilder or dockerbuilder)
  2. Poll the k8s API for the pod's existence
  3. Begin streaming pod logs after the pod exists

We've found issues with this approach, all of which stem from the fact that the pod may not be reported as running during any polling event. This is a race condition, from which so far we've found the following symptoms:

  1. The pod has started & completed inside of one polling interval
    1. Attempted solution in fix(pkg/gitreceive): use a watch for pod additions (Prototype) #185. Note that this will not address the problem laid out in (2)
  2. The pod has started, completed and been garbage collected inside of one polling interval
    1. Temporary fix that relies on internal k8s GC implementation at: feat(race): change heritage label for every pod launch #206

Solution Details

Because of this race condition, we can't rely on polling, and even if we successfully use the event stream (#185), k8s GC doesn't guarantee that pod logs will still be available after the pod is done. This proposal calls for the builder pod to stream its logs back to the builder that launched it.

Here are the following changes (as of this writing) that would need to happen to make this work:

  1. Each git-receive hook process runs a websocket server (on a unique port, assigned by the builder SSH server) that accepts incoming logs from the builder pod. It uses these logs for the following purposes:
    1. Writes them to STDOUT (for the builder to write back to the SSH connection)
    2. Look for a FINISHED message that indicates the builder pod is done
  2. Each git-receive process launches builder pods with its "phone-home" IP and port, which is the websocket server that they should write their logs to
  3. The builder pods now include a program that launch the builder logic (a shell script for slugbuilder and a python program for dockerbuilder). This program's purpose is to:
    1. Stream STDOUT & STDERR via a websocket connection to the phone-home address
    2. Send a FINISHED message when the builder logic exits

After the builder's git-receive hook receives the FINISHED message, or after a generous timeout, it can shut down the websocket server and continue with the logic it already has. The builder no longer would need to rely on polling the k8s API if this proposal were implemented.'

@smothiki
Copy link
Contributor

We are anyways thinking about implementing JOBs . Which might change a lot of behavior. Also A POD getting garbage collected immediately without changing the event type is not an expected K8s behavior.
The intended behavior is
Event - pod status
Added - A pod is created
Modified -- status changes from pending to running
Deleted -- status Succeeded or something with error code 0 or greater.

because of some labels mess we are not observing the POD status change from pending to running rather GC starts collecting the POD the event will be Deleted directly even though the POD is running . which is not an intended behavior . No point in streaming the logs back if the the POD is garbage collected in the middle of an execution.

@smothiki
Copy link
Contributor

#185 this will solve a lot of things. I feel there is no need of special web socket connection to stream logs back.

@arschles
Copy link
Member Author

arschles commented Mar 3, 2016

@smothiki I'm not sure how #185 would solve this particular problem if we don't launch jobs. However, I am 👍 on using jobs for our builds when they come out of extensions. If I understand http://kubernetes.io/v1.1/docs/user-guide/jobs.html correctly, we'll be able to make an API call to get the logs of the job even if it's complete at the time of calling.

@arschles arschles modified the milestones: v2.0-beta3, v2.0-rc1 Apr 15, 2016
@arschles
Copy link
Member Author

promoting to beta3

@arschles
Copy link
Member Author

Punting to beta4

@Cryptophobia
Copy link

This issue was moved to teamhephy/builder#31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants