Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make builder read keys from database #122

Closed
smothiki opened this issue Jan 22, 2016 · 27 comments · Fixed by #148
Closed

make builder read keys from database #122

smothiki opened this issue Jan 22, 2016 · 27 comments · Fixed by #148
Assignees

Comments

@smothiki
Copy link
Contributor

Builder reads keys from etcd. We should avoid doing that and read keys from database

@smothiki smothiki self-assigned this Jan 22, 2016
@arschles
Copy link
Member

👍

Ref #81

@rimusz
Copy link

rimusz commented Jan 22, 2016

👍

@slack slack mentioned this issue Jan 22, 2016
63 tasks
@arschles
Copy link
Member

arschles commented Feb 2, 2016

Proposed API contract:

Call from Builder

POST /v2/hooks/key

{"app": "$APP_NAME", "public_key": "$PUBLIC_KEY"}

Response From Controller When Key is Valid for App

200 OK

Response from Controller When Key is Valid for App

404 UNAUTHORIZED

All Other Responses From Controller

500 INTERNAL SERVER ERROR

cc/ @helgi

@helgi
Copy link
Contributor

helgi commented Feb 2, 2016

Where are you going to get the public_key from?

https://github.com/deis/builder/blob/master/rootfs/etc/confd/templates/authorized_keys gets it from etcd and that's what we are solving for here.

I'm thinking you may want

GET /v2/hooks/keys/$APP_NAME

{"users": {"aaron": "keysandthings"}}

That'd get you keys for the given app, unless you have an alternative way to get the keys from?

@smothiki
Copy link
Contributor Author

smothiki commented Feb 2, 2016

wouldn't it be nice if controller updates builder whenever there is a new user key ?
Some mechanism where controller and builder talk to each other might be RPC or simple HTTP server inside builder that listens for keys and controller calls that API endpoint whenever some one does deis keys:add . Also for app deletion

@smothiki
Copy link
Contributor Author

smothiki commented Feb 2, 2016

This avoids builder dependency on database and k8s API but coupling builder with workflow

@helgi
Copy link
Contributor

helgi commented Feb 2, 2016

That seems too complicated - the builder should never talk to the DB, ever. I can see your point about creating the tight coupling between the two but running a HTTP server on the builder creates unnecessary state (IMO).

If we go down the route of tightly coupling things up between the two then we need to discuss the implications of that, same way we do have to consider / discuss why we wouldn't have each component depend on the k8s API (more and more rather than less and less).

@arschles
Copy link
Member

arschles commented Feb 2, 2016

@helgi I misspoke in the payload that I posted above (the result of doing too many things at once). Yes, the builder w/o etcd wouldn't have a public key. I agree that the endpoint should look very similar to the one you proposed. However, I advocate for putting the user in the path, since a GET can't have a request body. The request would then look like GET /v2/hooks/keys/$APP_NAME/$USER_NAME.

I'd like to address some of the general comments in sections below, and I'd like to hear thoughts.

TL;DR my suggestion is to make the builder query the controller for both app existence and key validity.

Regarding the Controller Updating Builder vs. the Builder Querying the Controller

If the controller updates the builder, the builder has to hold more state. However, if the builder queries the controller (as described in the previous section), then state is isolated to the controller, where it already resides (more precisely, in the DB that the controller queries).

Managing state in a distributed system is challenging, so let's try to keep it as isolated as possible, and minimize the amount of state in the builder, in this refactor.

Regarding Tight Coupling Between the Builder and Controller

The builder already makes 3 RPCs to the controller on every build, and we've accepted the fact that the two components are tightly coupled and will be through the beta.

Adding a 4th RPC is not ideal from a performance perspective, but adding the RPC now and reducing them later is a manageable task.

Regarding Builder Accessing the DB

I 100% agree with @helgi that the builder should never access the DB. I'll expand - nothing but the controller should ever access the DB (see my comments above on isolating state).

Additionally, we've built an abstraction layer - the controller API - on top of the DB. If anything circumvents that layer, the abstraction is leaky and we will have introduced a large maintenance problem (imagine if many components are directly accessing the DB and we try to refactor the schema).

Regarding Listening to the Kubernetes Event Stream

I do agree that the event stream could be a good source from which to listen for app deletion events. However, if we do so, then we rely more heavily on builder state. Instead, why not rely on the aforementioned GET /v2/hooks/keys/$APP_NAME/$USER_NAME RPC to determine whether the app exists as well? If it doesn't exist, then we can delete it's repository folder.

I also disagree that the event stream is a good data source for adding and removing SSH keys (again, for builder state reasons).

I'm not clear if anyone is proposing that key management strategy above, but I'd like to make my position clear.

@helgi
Copy link
Contributor

helgi commented Feb 2, 2016

In my example the GET used $APP_NAME in the URL and the json below it was the return body where you got all users for the given app. Yours simply takes the one more step in the granularity and I'm fine with that.

If we use the API for app existence and user existence then you'd get a 404 either way. That's probably enough unless you want to provide more useful error messages.

I'm not seeing how much more state the builder would have to maintain if it gets app adds / removals from k8s API. Not via an event stream but rather label selectors. The state is owned by k8s and the builder is simply using it to see when it should for example clean up older apps.

If you rely on the API on git push to figure out if an app exists then when will you figure out what doesn't exist anymore? You are only ever querying against the API for a git push scenario or are you planning on expanding this further?

@arschles
Copy link
Member

arschles commented Feb 2, 2016

Thanks for the clarification, and I agree - it'd be nice to determine whether the app or user doesn't exist without having to check error messages.

I see what you're saying RE the Kubernetes API now, but regardless, the builder would still hold state - the state of polling - and we can get race conditions if the builder runs with more than one replica.

For example, say we're running 3 builders pods (replicas: 3) and pod A polls the k8s API and finds that myapp is deleted, but pods B and C don't. If a git push ends up on B or C, then it could succeed even though the app is actually deleted.

On the other hand, if builder makes the controller API call on each git push, then if the controller returns a 404, it knows that the app doesn't exist (using your version of the API).

I don't see a need for the builder to need to know when a given app exists at any other time. Am I missing something crucial there?

@helgi
Copy link
Contributor

helgi commented Feb 2, 2016

Are old apps ever left behind if you are not cleaning things up? Are checkouts always blown away so there is no old code / apps around on the builder pods?

@arschles
Copy link
Member

arschles commented Feb 2, 2016

In that scheme, old apps would be left behind if someone deletes an app and never tries a git push. That could be solved by a "reaper" goroutine that runs outside the critical path (and nothing in the critical path would be dependent on its state).

However, old code would be deleted when the RPC happens on a git push.

@helgi
Copy link
Contributor

helgi commented Feb 2, 2016

What I'm confused about is how the reaper determines what to delete and what to keep, without checking with something somewhere. It isn't end of the world having an active app reaped I suppose.

@arschles
Copy link
Member

arschles commented Feb 2, 2016

So all app checkouts live in the same directory (see here), so the reaper's job would be to walk that directory and query the controller for each app (at a slow interval).

Note that the app checkout directories and the reaper's state are expendable.

@helgi
Copy link
Contributor

helgi commented Feb 2, 2016

So instead of querying the k8s API you want to do the same against the controller API. I'm not really seeing the difference or benefit beyond perhaps not having to deal with the k8s API.

@arschles
Copy link
Member

arschles commented Feb 2, 2016

I'm fine with querying the k8s API, I'm just advocating for querying something before running the receive hook on each git push, possibly in addition to running it in the background.

Maybe we're saying the same thing...

@helgi
Copy link
Contributor

helgi commented Feb 2, 2016

I think we are ... So here is how I see it:

  • git push queries Controller API for app / user and gets a 404 if either is missing (you could make 2 calls if you want to know which is missing, I guess?) and gets back a key if all is okay
  • reaper queries k8s API to get active apps in one API call.

reaper can work off the Controller API as well, just another endpoint or you use the same endpoint as git push (sans username) to check each app you know about.

@arschles
Copy link
Member

arschles commented Feb 2, 2016

👍 from me

💯

@arschles
Copy link
Member

arschles commented Feb 3, 2016

Ref deis/workflow#336 for the keys endpoint

@smothiki
Copy link
Contributor Author

smothiki commented Feb 3, 2016

here are bunch of use cases I'm not clear about.

  • Do we have to make an API call to get keys associated with the APP for every git push?
    wouldn't it be redundant most of the time. Getting the same USER and KEY and checking if we write it to authorized keys or not. This check at git push will definitely hurt user exp. Unless we optimize this check.
    Avoid writing the key to a file and directly check the key from API call.

@arschles
Copy link
Member

arschles commented Feb 3, 2016

@smothiki I've posted answers to some of your questions below. I'm unclear on some others so can you please point out if I've left something unanswered?

Do we have to make an API call to get keys associated with the APP for every git push

Not necessarily. Since we know the app and the user, we could make the call to just get the key for the (app, user) pair.

wouldn't it be redundant most of the time

No, because multiple users may git push to the same app.

This check at git push will definitely hurt user exp

On what metric?

@smothiki
Copy link
Contributor Author

smothiki commented Feb 3, 2016

An App is not associated with the key and user can have many keys .

@arschles
Copy link
Member

arschles commented Feb 3, 2016

Right, so the builder would make a call to the controller's /v2/hooks/keys/$APP/$USER endpoint, which would return back the keys for that user.

@smothiki
Copy link
Contributor Author

smothiki commented Feb 3, 2016

let's say user has 100 apps running and uses 200 keys . The call returns 200 keys for an APP.
Either we have to match the private key to these 200 keys or write the 200 keys to authorized key file and let ssh server figure out . If we are going with writing option most of the keys have already been written. we have to keep an indexing record of what keys are present and how to write efficiently

@smothiki
Copy link
Contributor Author

smothiki commented Feb 3, 2016

My concern is about optimizing the git push operation if we are making a call to controller for each git push.

@smothiki
Copy link
Contributor Author

smothiki commented Feb 3, 2016

Also practically the number of time git pushes are made are more than the number of times doing deis keys:add
One way of Optimization is to check authorized keys file first and then making an API call for the rest of the keys.

@arschles
Copy link
Member

arschles commented Feb 4, 2016

The call returns 200 keys for an APP. Either we have to match the private key to these 200 keys or write the 200 keys to authorized key file and let ssh server figure out

Correct. There is no way around this, regardless of the method of acquiring keys from the controller.

My concern is about optimizing the git push operation if we are making a call to controller for each git push.

We already make 3 API calls to the controller for each git push operation. While that's not a good reason to add a fourth, it does show that builder <--> workflow RPCs are proven to work. As a sidenote, we should strive to reduce the latency, complexity and number of RPCs in a separate patch (I've created #144 for that).

Also practically the number of time git pushes are made are more than the number of times doing deis keys:add

I agree. Again, though, there's no way around this regardless of key delivery mechanism. The RPC method as proposed returns the full list every time, however, and that's non-optimal. Perhaps a future optimization would be to have the controller only deliver deltas.

One way of Optimization is to check authorized keys file first and then making an API call for the rest of the keys.

I agree. See my above comment on a possible future optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants