-
Notifications
You must be signed in to change notification settings - Fork 799
[meta] Deis v2 Beta #4809
Comments
two additional items to consider:
|
I agree that the user-experience for folks first setting up the platform would be greatly aided by something which isn't raw k8s wrangling. I feel there is certainly a space for a Builder can certainly benefit from a non-bash life; however, one of my motivating priorities is working code now, versus the thing we'd like to have. A builder go-rewrite could easily be delivered in a 2.0 point release, which is what I'm leaning toward at the moment. A huge caveat being that through beta, if builder is consistently melting our faces we can reconsider. |
@slack ok, fair enough. RE the RE Go in the builder - it's actually not as big a task as a rewrite as Builder is mostly written in Go already. There are a few face melting shell scripts, however, that do a lot of important things and are untested. Those need to be ported and tests written for them. deis/builder#29 explains a bit further, and happy to explain further. Not sure if this all needs to be in the list - IMO it easily fits under a stability bullet point. |
also, @slack we should add persistent storage for minio, under 'object storage' |
Storage added! |
👍 |
minio persistent storage example:
minio's boot should run |
Do we need to expose slugbuilder metrics?? After all its an intermittent job that runs and produces slug and dies. |
@slack question about the following that was stated WRT router:
Are any more solid requirements available on what sort of metrics are desired? |
I haven't had the time to pull out specific sets of stats for each of the components. Since we are using prometheus to scrape metrics, a survey of existing prometheus exporters is a great place to start. In the absence (or even presence) of existing exporters, what we want to enable is operators of the platform to be able to understand the health of the components through these metrics, which then become potential sources for alerts. To your question @smothiki, emitting metrics for all the platform components lets us answer questions and debug problems with numbers rather than guts. So the more the 📈 📉 the better. |
Just an update on the metrics side of things. I have a working telegraf/influx setup. So I'm trying to determine what I should do next. If we have the VTS plugin back in the router component then I can take a stab at building a plugin that will scrape those metrics and send them over to our sink. For other components we should figure out what is most important and how we want to get that data to influx. For example, should we have a /stats endpoint in the controller that prints out metrics in json that we can scrape like we do for router? Or should it just send the metrics to influx as it calculates them? |
I'm worried about some of the project level requirements such as:
I know there has been some chatter re: dogfooding, but I have yet to see long running tests of applications and performance/load testing. That historically has fallen into my purview but I'm currently working on improving the release process (mutable -> immutable Docker tags). |
@slack if this is the official document for releasing beta should we remove the metrics requirements from it? |
Yeah, everything but router. Will update. Done! |
The final push for v2 beta is visible on #4962 |
Meta issue for Deis v2 Beta; as specific issues are filed in other repos, we will link them back to this issue.
Beta Issues Search
Beta Goals
The Beta period is focused on stabilizing backing services, reaching availability and recoverability targets. Technology choices for each of backing services should be well understood for each of our blessed configurations.
e2e testing infrastructure for v2 platform should be in place. The long-running/soak tests are a requirement to leave beta (see test target below).
Beta should also exit with a strong opinion about production and staging configurations and how best to go from zero to kubernetes.
Project-level
Services that support TLS should be deployed with TLS by defaultMeta
deis/prototype-repo
all components output version information on startupRelease Checklist
Components
Client
Controller deis/workflow
Metrics: expose operational metrics to platform monitor componentDatabase deis/postgres
Metrics: expose operational metrics to platform monitor componentEtcd deis/etcdAvailability: HA configuration deployed by defaultMetrics: expose operational metrics to platform monitor componentRecoverability: documented backup + restoreTesting: single, multi-node failure, recoveryRegistry deis/registry
Metrics: expose operational metrics to monitor componentObject storage
helm generate
works for all of the above scenarios (Make 'helm generate' generate the correct credentials for object storage charts#95)SSL/TLS by defaultRiakCSTested persistent volumesRouter deis/router
Builder deis/builder
Dockerfile
functionalityMetrics: expose operational metrics to monitor componentSlugbuilder deis/slugbuilder
Metrics: expose operational metrics to monitor componentSlugrunner deis/slugrunner
Metrics: expose operational metrics to monitor componentLogger deis/logger
deis logs
for all appsMonitoring deis/monitor
The text was updated successfully, but these errors were encountered: