make builder read keys from database #122

smothiki · 2016-01-22T00:49:53Z

Builder reads keys from etcd. We should avoid doing that and read keys from database

arschles · 2016-01-22T00:59:59Z

👍

Ref #81

rimusz · 2016-01-22T01:17:28Z

👍

arschles · 2016-02-02T17:24:07Z

Proposed API contract:

Call from Builder

POST /v2/hooks/key

{"app": "$APP_NAME", "public_key": "$PUBLIC_KEY"}

Response From Controller When Key is Valid for App

200 OK

Response from Controller When Key is Valid for App

404 UNAUTHORIZED

All Other Responses From Controller

500 INTERNAL SERVER ERROR

cc/ @helgi

helgi · 2016-02-02T17:36:23Z

Where are you going to get the public_key from?

https://github.com/deis/builder/blob/master/rootfs/etc/confd/templates/authorized_keys gets it from etcd and that's what we are solving for here.

I'm thinking you may want

GET /v2/hooks/keys/$APP_NAME

{"users": {"aaron": "keysandthings"}}

That'd get you keys for the given app, unless you have an alternative way to get the keys from?

smothiki · 2016-02-02T17:50:45Z

wouldn't it be nice if controller updates builder whenever there is a new user key ?
Some mechanism where controller and builder talk to each other might be RPC or simple HTTP server inside builder that listens for keys and controller calls that API endpoint whenever some one does deis keys:add . Also for app deletion

smothiki · 2016-02-02T17:52:22Z

This avoids builder dependency on database and k8s API but coupling builder with workflow

helgi · 2016-02-02T17:55:17Z

That seems too complicated - the builder should never talk to the DB, ever. I can see your point about creating the tight coupling between the two but running a HTTP server on the builder creates unnecessary state (IMO).

If we go down the route of tightly coupling things up between the two then we need to discuss the implications of that, same way we do have to consider / discuss why we wouldn't have each component depend on the k8s API (more and more rather than less and less).

arschles · 2016-02-02T18:22:41Z

@helgi I misspoke in the payload that I posted above (the result of doing too many things at once). Yes, the builder w/o etcd wouldn't have a public key. I agree that the endpoint should look very similar to the one you proposed. However, I advocate for putting the user in the path, since a GET can't have a request body. The request would then look like GET /v2/hooks/keys/$APP_NAME/$USER_NAME.

I'd like to address some of the general comments in sections below, and I'd like to hear thoughts.

TL;DR my suggestion is to make the builder query the controller for both app existence and key validity.

Regarding the Controller Updating Builder vs. the Builder Querying the Controller

If the controller updates the builder, the builder has to hold more state. However, if the builder queries the controller (as described in the previous section), then state is isolated to the controller, where it already resides (more precisely, in the DB that the controller queries).

Managing state in a distributed system is challenging, so let's try to keep it as isolated as possible, and minimize the amount of state in the builder, in this refactor.

Regarding Tight Coupling Between the Builder and Controller

The builder already makes 3 RPCs to the controller on every build, and we've accepted the fact that the two components are tightly coupled and will be through the beta.

Adding a 4th RPC is not ideal from a performance perspective, but adding the RPC now and reducing them later is a manageable task.

Regarding Builder Accessing the DB

I 100% agree with @helgi that the builder should never access the DB. I'll expand - nothing but the controller should ever access the DB (see my comments above on isolating state).

Additionally, we've built an abstraction layer - the controller API - on top of the DB. If anything circumvents that layer, the abstraction is leaky and we will have introduced a large maintenance problem (imagine if many components are directly accessing the DB and we try to refactor the schema).

Regarding Listening to the Kubernetes Event Stream

I do agree that the event stream could be a good source from which to listen for app deletion events. However, if we do so, then we rely more heavily on builder state. Instead, why not rely on the aforementioned GET /v2/hooks/keys/$APP_NAME/$USER_NAME RPC to determine whether the app exists as well? If it doesn't exist, then we can delete it's repository folder.

I also disagree that the event stream is a good data source for adding and removing SSH keys (again, for builder state reasons).

I'm not clear if anyone is proposing that key management strategy above, but I'd like to make my position clear.

helgi · 2016-02-02T18:30:30Z

In my example the GET used $APP_NAME in the URL and the json below it was the return body where you got all users for the given app. Yours simply takes the one more step in the granularity and I'm fine with that.

If we use the API for app existence and user existence then you'd get a 404 either way. That's probably enough unless you want to provide more useful error messages.

I'm not seeing how much more state the builder would have to maintain if it gets app adds / removals from k8s API. Not via an event stream but rather label selectors. The state is owned by k8s and the builder is simply using it to see when it should for example clean up older apps.

If you rely on the API on git push to figure out if an app exists then when will you figure out what doesn't exist anymore? You are only ever querying against the API for a git push scenario or are you planning on expanding this further?

arschles · 2016-02-02T18:42:13Z

Thanks for the clarification, and I agree - it'd be nice to determine whether the app or user doesn't exist without having to check error messages.

I see what you're saying RE the Kubernetes API now, but regardless, the builder would still hold state - the state of polling - and we can get race conditions if the builder runs with more than one replica.

For example, say we're running 3 builders pods (replicas: 3) and pod A polls the k8s API and finds that myapp is deleted, but pods B and C don't. If a git push ends up on B or C, then it could succeed even though the app is actually deleted.

On the other hand, if builder makes the controller API call on each git push, then if the controller returns a 404, it knows that the app doesn't exist (using your version of the API).

I don't see a need for the builder to need to know when a given app exists at any other time. Am I missing something crucial there?

helgi · 2016-02-02T18:44:45Z

Are old apps ever left behind if you are not cleaning things up? Are checkouts always blown away so there is no old code / apps around on the builder pods?

arschles · 2016-02-02T18:47:19Z

In that scheme, old apps would be left behind if someone deletes an app and never tries a git push. That could be solved by a "reaper" goroutine that runs outside the critical path (and nothing in the critical path would be dependent on its state).

However, old code would be deleted when the RPC happens on a git push.

helgi · 2016-02-02T18:48:46Z

What I'm confused about is how the reaper determines what to delete and what to keep, without checking with something somewhere. It isn't end of the world having an active app reaped I suppose.

arschles · 2016-02-02T18:56:49Z

So all app checkouts live in the same directory (see here), so the reaper's job would be to walk that directory and query the controller for each app (at a slow interval).

Note that the app checkout directories and the reaper's state are expendable.

helgi · 2016-02-02T18:59:39Z

So instead of querying the k8s API you want to do the same against the controller API. I'm not really seeing the difference or benefit beyond perhaps not having to deal with the k8s API.

arschles · 2016-02-02T19:01:15Z

I'm fine with querying the k8s API, I'm just advocating for querying something before running the receive hook on each git push, possibly in addition to running it in the background.

Maybe we're saying the same thing...

helgi · 2016-02-02T19:05:14Z

I think we are ... So here is how I see it:

git push queries Controller API for app / user and gets a 404 if either is missing (you could make 2 calls if you want to know which is missing, I guess?) and gets back a key if all is okay
reaper queries k8s API to get active apps in one API call.

reaper can work off the Controller API as well, just another endpoint or you use the same endpoint as git push (sans username) to check each app you know about.

arschles · 2016-02-02T19:07:09Z

👍 from me

💯

arschles · 2016-02-03T22:26:54Z

Ref deis/workflow#336 for the keys endpoint

smothiki · 2016-02-03T23:37:23Z

here are bunch of use cases I'm not clear about.

Do we have to make an API call to get keys associated with the APP for every git push?
wouldn't it be redundant most of the time. Getting the same USER and KEY and checking if we write it to authorized keys or not. This check at git push will definitely hurt user exp. Unless we optimize this check.
Avoid writing the key to a file and directly check the key from API call.

arschles · 2016-02-03T23:43:22Z

@smothiki I've posted answers to some of your questions below. I'm unclear on some others so can you please point out if I've left something unanswered?

Do we have to make an API call to get keys associated with the APP for every git push

Not necessarily. Since we know the app and the user, we could make the call to just get the key for the (app, user) pair.

wouldn't it be redundant most of the time

No, because multiple users may git push to the same app.

This check at git push will definitely hurt user exp

On what metric?

smothiki · 2016-02-03T23:45:21Z

An App is not associated with the key and user can have many keys .

arschles · 2016-02-03T23:46:38Z

Right, so the builder would make a call to the controller's /v2/hooks/keys/$APP/$USER endpoint, which would return back the keys for that user.

smothiki · 2016-02-03T23:52:35Z

let's say user has 100 apps running and uses 200 keys . The call returns 200 keys for an APP.
Either we have to match the private key to these 200 keys or write the 200 keys to authorized key file and let ssh server figure out . If we are going with writing option most of the keys have already been written. we have to keep an indexing record of what keys are present and how to write efficiently

smothiki · 2016-02-03T23:55:25Z

My concern is about optimizing the git push operation if we are making a call to controller for each git push.

smothiki · 2016-02-03T23:57:27Z

Also practically the number of time git pushes are made are more than the number of times doing deis keys:add
One way of Optimization is to check authorized keys file first and then making an API call for the rest of the keys.

arschles · 2016-02-04T00:30:19Z

The call returns 200 keys for an APP. Either we have to match the private key to these 200 keys or write the 200 keys to authorized key file and let ssh server figure out

Correct. There is no way around this, regardless of the method of acquiring keys from the controller.

My concern is about optimizing the git push operation if we are making a call to controller for each git push.

We already make 3 API calls to the controller for each git push operation. While that's not a good reason to add a fourth, it does show that builder <--> workflow RPCs are proven to work. As a sidenote, we should strive to reduce the latency, complexity and number of RPCs in a separate patch (I've created #144 for that).

Also practically the number of time git pushes are made are more than the number of times doing deis keys:add

I agree. Again, though, there's no way around this regardless of key delivery mechanism. The RPC method as proposed returns the full list every time, however, and that's non-optimal. Perhaps a future optimization would be to have the controller only deliver deltas.

One way of Optimization is to check authorized keys file first and then making an API call for the rest of the keys.

I agree. See my above comment on a possible future optimization.

smothiki self-assigned this Jan 22, 2016

arschles mentioned this issue Jan 22, 2016

[Meta] Replace etcd usage with something else #81

Closed

slack mentioned this issue Jan 22, 2016

[meta] Deis v2 Beta deis/deis#4809

Closed

63 tasks

helgi mentioned this issue Feb 3, 2016

Expose user/app checking for Builder in a builder specific hook deis/controller#335

Closed

arschles mentioned this issue Feb 4, 2016

try to reduce the complexity & number of API calls to controller on each git push #144

Open

aledbf mentioned this issue Feb 5, 2016

ref(builder): remove etcd and confd #148

Merged

arschles closed this as completed in #148 Feb 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make builder read keys from database #122

make builder read keys from database #122

smothiki commented Jan 22, 2016

arschles commented Jan 22, 2016

rimusz commented Jan 22, 2016

arschles commented Feb 2, 2016

helgi commented Feb 2, 2016

smothiki commented Feb 2, 2016

smothiki commented Feb 2, 2016

helgi commented Feb 2, 2016

arschles commented Feb 2, 2016

helgi commented Feb 2, 2016

arschles commented Feb 2, 2016

helgi commented Feb 2, 2016

arschles commented Feb 2, 2016

helgi commented Feb 2, 2016

arschles commented Feb 2, 2016

helgi commented Feb 2, 2016

arschles commented Feb 2, 2016

helgi commented Feb 2, 2016

arschles commented Feb 2, 2016

arschles commented Feb 3, 2016

smothiki commented Feb 3, 2016

arschles commented Feb 3, 2016

smothiki commented Feb 3, 2016

arschles commented Feb 3, 2016

smothiki commented Feb 3, 2016

smothiki commented Feb 3, 2016

smothiki commented Feb 3, 2016

arschles commented Feb 4, 2016

make builder read keys from database #122

make builder read keys from database #122

Comments

smothiki commented Jan 22, 2016

arschles commented Jan 22, 2016

rimusz commented Jan 22, 2016

arschles commented Feb 2, 2016

Call from Builder

Response From Controller When Key is Valid for App

Response from Controller When Key is Valid for App

All Other Responses From Controller

helgi commented Feb 2, 2016

smothiki commented Feb 2, 2016

smothiki commented Feb 2, 2016

helgi commented Feb 2, 2016

arschles commented Feb 2, 2016

Regarding the Controller Updating Builder vs. the Builder Querying the Controller

Regarding Tight Coupling Between the Builder and Controller

Regarding Builder Accessing the DB

Regarding Listening to the Kubernetes Event Stream

helgi commented Feb 2, 2016

arschles commented Feb 2, 2016

helgi commented Feb 2, 2016

arschles commented Feb 2, 2016

helgi commented Feb 2, 2016

arschles commented Feb 2, 2016

helgi commented Feb 2, 2016

arschles commented Feb 2, 2016

helgi commented Feb 2, 2016

arschles commented Feb 2, 2016

arschles commented Feb 3, 2016

smothiki commented Feb 3, 2016

arschles commented Feb 3, 2016

smothiki commented Feb 3, 2016

arschles commented Feb 3, 2016

smothiki commented Feb 3, 2016

smothiki commented Feb 3, 2016

smothiki commented Feb 3, 2016

arschles commented Feb 4, 2016