New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Production Quality Deployment #340

Closed
colhom opened this Issue Mar 22, 2016 · 36 comments

Comments

Projects
None yet
@colhom
Contributor

colhom commented Mar 22, 2016

The goal is to offer a "production ready solution" for provisioning a coreos kubernetes cluster. These are the major functionality blockers that I can think of.

  • Cluster upgrade path
    • Enable decommissioning of kubelets when instances are rotated out of the ASG
    • Automatically remove nodes when instances are rotated out of ASG
  • Put controllers into ASG, behind an ELB
  • Spread workers across AZs (federation-lite) -- thanks @mumoshu! ref #439
  • Dedicated etcd cluster in an ASG, behind an ELB
    • Set up etcd tls
  • Set up controller and worker AutoscalingGroups to recover from ec2 instance failures
  • Secure etcd peer/client connections with TLS
  • Route53 integration for APIServerEndpoint. Automatically create hosted-zone and/or A record for controller EIP on kube-aws up -- DONE #389 (requires that the hosted zone already exist)
  • Provision AWS ElasticSearch cluster
    • Kibana/elasticsearch/fluentd addons. (ELK logging)
    • Enable heapster elasticsearch sink functionality (kubernetes/heapster#733)
  • Support deploying to existing VPC (and maybe existing subnet as well?) -- DONE #346
  • Cluster PKI infrastructure (ref #420)
    • Kubelet TLS bootstrapping upstream proposal
    • Figure out what we're going to do about automated CSR signing in kube-aws (necessary for self-healing and autoscaling)
    • Provide option to use pre-existing CA certificate and key to sign component certs (integrate with existing PKI systems)
@pieterlange

This comment has been minimized.

Show comment
Hide comment
@pieterlange

pieterlange Mar 23, 2016

All good points. Very happy to see you working on making it easier to deploy to existing VPCs!

I'm currently using https://github.com/MonsantoCo/etcd-aws-cluster/ to do bootstrapping to a dedicated etcd cluster (discovery happens by specifying the ASG for the etcd cluster and assigning appropriate IAM describe roles)

I'm not too sure about automatically provisioning an AWS elasticsearch cluster. The AWS native cluster is stuck on a very old ES version. Maybe this'll become a whole lot easier once kubernetes EBS support matures a bit, and we could just host it in the provisioned kube cluster.

pieterlange commented Mar 23, 2016

All good points. Very happy to see you working on making it easier to deploy to existing VPCs!

I'm currently using https://github.com/MonsantoCo/etcd-aws-cluster/ to do bootstrapping to a dedicated etcd cluster (discovery happens by specifying the ASG for the etcd cluster and assigning appropriate IAM describe roles)

I'm not too sure about automatically provisioning an AWS elasticsearch cluster. The AWS native cluster is stuck on a very old ES version. Maybe this'll become a whole lot easier once kubernetes EBS support matures a bit, and we could just host it in the provisioned kube cluster.

@bfallik

This comment has been minimized.

Show comment
Hide comment
@bfallik

bfallik Mar 23, 2016

Contributor

Very eager for this work and happy to help if I can. I can't advocate for deploying a k8s+coreos cluster in AWS at work until I have a good answer for many of the items on this list, especially the upgrade path and high availability.

Contributor

bfallik commented Mar 23, 2016

Very eager for this work and happy to help if I can. I can't advocate for deploying a k8s+coreos cluster in AWS at work until I have a good answer for many of the items on this list, especially the upgrade path and high availability.

@colhom

This comment has been minimized.

Show comment
Hide comment
@colhom

colhom Mar 24, 2016

Contributor

@bfallik do you want to work on any of the bullet points in particular?

Contributor

colhom commented Mar 24, 2016

@bfallik do you want to work on any of the bullet points in particular?

@bfallik

This comment has been minimized.

Show comment
Hide comment
@bfallik

bfallik Mar 24, 2016

Contributor

@colhom nothing in particular though I suppose I'm most interested in the cluster upgrades and ELB+ASG work

Contributor

bfallik commented Mar 24, 2016

@colhom nothing in particular though I suppose I'm most interested in the cluster upgrades and ELB+ASG work

@pieterlange

This comment has been minimized.

Show comment
Hide comment
@pieterlange

pieterlange Mar 24, 2016

@colhom if you like the discovery method used for etcd i think i can help with that.

pieterlange commented Mar 24, 2016

@colhom if you like the discovery method used for etcd i think i can help with that.

@colhom

This comment has been minimized.

Show comment
Hide comment
@colhom

colhom Mar 25, 2016

Contributor

@pieterlange putting etcd in an autoscaling group worries me as of now. The monsantoCo script seems kind of rickety: for example, does not support scaling down the cluster as far as I can tell.

Contributor

colhom commented Mar 25, 2016

@pieterlange putting etcd in an autoscaling group worries me as of now. The monsantoCo script seems kind of rickety: for example, does not support scaling down the cluster as far as I can tell.

@drewblas

This comment has been minimized.

Show comment
Hide comment
@drewblas

drewblas Apr 5, 2016

This list is fantastic. It represents exactly what we need in order to consider Kubernetes+CoreOS production ready for our use. I can't wait to see these executed!

drewblas commented Apr 5, 2016

This list is fantastic. It represents exactly what we need in order to consider Kubernetes+CoreOS production ready for our use. I can't wait to see these executed!

@krancour

This comment has been minimized.

Show comment
Hide comment
@krancour

krancour Apr 8, 2016

This is just what I have always wanted!

krancour commented Apr 8, 2016

This is just what I have always wanted!

cgag added a commit to cgag/coreos-kubernetes that referenced this issue Apr 12, 2016

kube-aws: add option to create a DNS record automatically
Currently it's on the user to create a record, via Route53 or otherwise,
in order to make the controller IP accessible via externalDNSName.  This
commit adds an option to automatically create a Route53 record in a given
hosted zone.

Related to: #340, #257
@pieterlange

This comment has been minimized.

Show comment
Hide comment
@pieterlange

pieterlange Apr 18, 2016

@colhom i suggest adding #420 to the list as well as even the deployment guidelines point it out as a production deficiency

You are right about having etcd in an autoscaling group of course. I'm running a dedicated etcd cluster across all available zones, which feels a little bit safer but is still a hazard as i'm depending on a majority of the etcd cluster to stay up & reachable. Not sure what the answer is here.

I'm spending some time on HA controllers myself, i'll try to make whatever adjustments i make mergeable.

pieterlange commented Apr 18, 2016

@colhom i suggest adding #420 to the list as well as even the deployment guidelines point it out as a production deficiency

You are right about having etcd in an autoscaling group of course. I'm running a dedicated etcd cluster across all available zones, which feels a little bit safer but is still a hazard as i'm depending on a majority of the etcd cluster to stay up & reachable. Not sure what the answer is here.

I'm spending some time on HA controllers myself, i'll try to make whatever adjustments i make mergeable.

@mumoshu

This comment has been minimized.

Show comment
Hide comment
@mumoshu

mumoshu Apr 26, 2016

Contributor

@colhom Hi, thanks for maintaining this project :)

Enable decommissioning of kubelets when instances are rotated out of the ASG
Automatically remove nodes when instances are rotated out of ASG

Would you mind sharing me what you think about requirements for/how to do this?
Running something like kubectl drain against a node when it is scheduled to detach/terminate from an ASG makes sense for you?

If so, I guess I can contribute on that (auto scaling lifecycle hooks + sqs + tiny golang app container which runs kubectl drain or else on new sqs message).

Contributor

mumoshu commented Apr 26, 2016

@colhom Hi, thanks for maintaining this project :)

Enable decommissioning of kubelets when instances are rotated out of the ASG
Automatically remove nodes when instances are rotated out of ASG

Would you mind sharing me what you think about requirements for/how to do this?
Running something like kubectl drain against a node when it is scheduled to detach/terminate from an ASG makes sense for you?

If so, I guess I can contribute on that (auto scaling lifecycle hooks + sqs + tiny golang app container which runs kubectl drain or else on new sqs message).

@colhom

This comment has been minimized.

Show comment
Hide comment
@colhom

colhom Apr 26, 2016

Contributor

I was thinking that nodes would trigger kubectl drain via systemd service
on shutdown.
On Apr 25, 2016 7:46 PM, "KUOKA Yusuke" notifications@github.com wrote:

@colhom https://github.com/colhom Hi, thanks for maintaining this
project :)

Enable decommissioning of kubelets when instances are rotated out of the
ASG
Automatically remove nodes when instances are rotated out of ASG

Would you mind sharing me what you think about requirements for/how to do
this?
Running something like kubectl drain against a node when it is scheduled
to detach/terminate from an ASG makes sense for you?

If so, I guess I can contribute on that (auto scaling lifecycle hooks +
sqs + tiny golang app container which runs kubectl drain or else on new
sqs message).


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#340 (comment)

Contributor

colhom commented Apr 26, 2016

I was thinking that nodes would trigger kubectl drain via systemd service
on shutdown.
On Apr 25, 2016 7:46 PM, "KUOKA Yusuke" notifications@github.com wrote:

@colhom https://github.com/colhom Hi, thanks for maintaining this
project :)

Enable decommissioning of kubelets when instances are rotated out of the
ASG
Automatically remove nodes when instances are rotated out of ASG

Would you mind sharing me what you think about requirements for/how to do
this?
Running something like kubectl drain against a node when it is scheduled
to detach/terminate from an ASG makes sense for you?

If so, I guess I can contribute on that (auto scaling lifecycle hooks +
sqs + tiny golang app container which runs kubectl drain or else on new
sqs message).


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#340 (comment)

@mumoshu

This comment has been minimized.

Show comment
Hide comment
@mumoshu

mumoshu Apr 29, 2016

Contributor

@colhom Sounds much better than my idea in regard to simplicity!

I'd like to contribute to that but I'm not sure what to include in the kubeconfig used by worker's `kubectl'.
Would you mind sharing me your ideas?

(Also, we may want to create a separate issue for this)

Contributor

mumoshu commented Apr 29, 2016

@colhom Sounds much better than my idea in regard to simplicity!

I'd like to contribute to that but I'm not sure what to include in the kubeconfig used by worker's `kubectl'.
Would you mind sharing me your ideas?

(Also, we may want to create a separate issue for this)

@mumoshu

This comment has been minimized.

Show comment
Hide comment
@mumoshu

mumoshu May 3, 2016

Contributor

@colhom

Set up controller and worker AutoscalingGroups to recover from ec2 instance failures

I believe this is solved for workers by the PR #439
I appreciate it if you could update your original comment on this issue to reference that? (Just to track how things are going. I'm a huge fan of this project and wishing this issue to be solved 😄 )

Contributor

mumoshu commented May 3, 2016

@colhom

Set up controller and worker AutoscalingGroups to recover from ec2 instance failures

I believe this is solved for workers by the PR #439
I appreciate it if you could update your original comment on this issue to reference that? (Just to track how things are going. I'm a huge fan of this project and wishing this issue to be solved 😄 )

@colhom

This comment has been minimized.

Show comment
Hide comment
@colhom

colhom May 3, 2016

Contributor

@mumoshu that excerpt is referring to the fact that our controllers are not in an autoscaling group, and if the instance is killed the control plane will be down pending human intervention. I do believe the worker pool ASG should recover from an instance failure on it's own, though. Will edit that line to just reference the controller.

Contributor

colhom commented May 3, 2016

@mumoshu that excerpt is referring to the fact that our controllers are not in an autoscaling group, and if the instance is killed the control plane will be down pending human intervention. I do believe the worker pool ASG should recover from an instance failure on it's own, though. Will edit that line to just reference the controller.

@mumoshu

This comment has been minimized.

Show comment
Hide comment
@mumoshu

mumoshu May 5, 2016

Contributor

@colhom I have just subimitted #465 for #340 (comment)
I appreciate it if you could look into it 🙇

Contributor

mumoshu commented May 5, 2016

@colhom I have just subimitted #465 for #340 (comment)
I appreciate it if you could look into it 🙇

@mumoshu

This comment has been minimized.

Show comment
Hide comment
@mumoshu

mumoshu May 5, 2016

Contributor

FYI, regarding this:

Dedicated etcd cluster in an ASG, behind an ELB

in addition to MonsantoCo/etcd-aws-cluster @pieterlange mentioned, I have recently looked into crewjam/etcd-aws with the blog post. It seems to be a great work.

Contributor

mumoshu commented May 5, 2016

FYI, regarding this:

Dedicated etcd cluster in an ASG, behind an ELB

in addition to MonsantoCo/etcd-aws-cluster @pieterlange mentioned, I have recently looked into crewjam/etcd-aws with the blog post. It seems to be a great work.

mumoshu added a commit to mumoshu/coreos-kubernetes that referenced this issue May 8, 2016

kube-aws: Drain nodes before shutting them down to give running pods …
…time to gracefully stop.

This change basically achieves it by running `docker run IMAGE kubectl drain THE_NODE --force` on the to-be-shut-down node before the kubelet gets SIGTERM in a CoreOS' shutdown process.

Without this change, kubelets getting SIGTERM without a prior `drain` results in unfunctional(actually not schedulable) pods to have statuses `Ready`.

With this change, when an ASG's desired cap decreased, a node's status changes over time as follows:

On desired cap change:
  STATUS=Ready
On shut-down started:
  STATUS=Ready,SchedulingDisabled (<- Pods are stopped and status is changed by `kubectl drain`)
On shut-down finished:
  Status=NotReady,SchedulingDisabled (<- It's `NotReady` but it won't result in a down-time because we already stopped both further scheduling and pods)
After a minute:
  The node disappears from the output of `kubectl get nodes`

Note that:

* This applies to manual shutdowns(via running `sudo systemctl shutdown` for example) and automated shutdowns(triggered by AWS AutoScaling when nodes get rotated out of a group)
* We currently depend on the community docker image `mumoshu/kubectl` because `kubectl` included in the official `coreos/hyperkube` image doesn't work due to the issue kubernetes/kubernetes#24088 in Kubernetes. Once the issue is fixed and the CoreOS team published the new hyperkube image with the updated Kubernetes, we can remove that dependency.
* The author considers this an experimental feature. So you shouldn't expect configuration API regarding this stable. It may change in the future.

ref #340

Re-format code introduced in the previous with gofmt to conform the rules and make the build pases
@mgoodness

This comment has been minimized.

Show comment
Hide comment
@mgoodness

mgoodness May 12, 2016

Re: etcd clustering via ASG, there's also sttts/elastic-etcd. I believe the author presented at CoreOS Fest this week.

mgoodness commented May 12, 2016

Re: etcd clustering via ASG, there's also sttts/elastic-etcd. I believe the author presented at CoreOS Fest this week.

@pdressel

This comment has been minimized.

Show comment
Hide comment
@pdressel

pdressel Jun 7, 2016

I submitted #525 implementing a dedicated etcd cluster. It does not use autoscaling groups but intead relies on EBS and instance recovery, since IMHO this is the most simple and robust way of deploying. Putting three nodes each in their own AZ should allow for very robust etcd and also paves the way to easily put the controllers in an ASG.

With this solution, we don't need to worry about potentially catastrophic failures in etcd runtime reconfiguration, and we also don't need an ELB, which incurs extra costs.

pdressel commented Jun 7, 2016

I submitted #525 implementing a dedicated etcd cluster. It does not use autoscaling groups but intead relies on EBS and instance recovery, since IMHO this is the most simple and robust way of deploying. Putting three nodes each in their own AZ should allow for very robust etcd and also paves the way to easily put the controllers in an ASG.

With this solution, we don't need to worry about potentially catastrophic failures in etcd runtime reconfiguration, and we also don't need an ELB, which incurs extra costs.

@enxebre

This comment has been minimized.

Show comment
Hide comment
@enxebre

enxebre Jun 7, 2016

Member

We've been working on https://github.com/Capgemini/kubeform which is based on terraform, ansible + CoreOS and it's inline with some of the thinking here. Happy to help contribute to something here.

Member

enxebre commented Jun 7, 2016

We've been working on https://github.com/Capgemini/kubeform which is based on terraform, ansible + CoreOS and it's inline with some of the thinking here. Happy to help contribute to something here.

@harsha-y

This comment has been minimized.

Show comment
Hide comment
@harsha-y

harsha-y Jun 10, 2016

Contributor

When multi-az support was announced combined with the checked off #346 in the list mentioned above we got excited and tried to deploy a kube-aws cluster without actually verifying that existing subnets are supported. Obviously we ran into issues. What we ended up doing was to take the CF template output after running kube-aws init/kube-aws render, edit the template to include our existing subnets and launch the cluster aws cloudformation. After a little bit of hacking we did end up with a working cluster in our existing subnets. But this solution seems brittle.

Here are a few things IMHO that would make the cluster launch more "productionized":

  • Support for existing subnets
  • Private subnets/instances with no public ip address
    • Leave AWS managed nat gateway attachment/routing to the end-users?
    • This was partially accomplished in the cluster we launched but the controller still assigned itself an EIP in a private subnet
  • Clear upgrade paths
  • Docker pull through cache registry on the master
  • More granularity around the different addons
    • Would most definitely include SkyDNS and Kubernetes Dashboard by default but...
    • Folks might have alternate solutions around Calico, Heapster(Prometheus/Sysdig) and Fluentd-ELK(Fluentd-Graylog2) - these should be optional

May be this list should be split into must-haves vs nice-to-haves? Or better, layers of cloudformation templates? (I might be over-simplifying things here, but you get the idea)
Something along the lines of -

  • kube-aws up cluster
  • kube-aws up calico
  • kube-aws up logging

When we initially launched our k8s clusters last year, there were very few solutions that solved some of the requirements we had. So we went ahead and wrote a lengthy but working cloudformation template and that solved most of our reqs. But we ended up with a template that was hard to maintain and a cluster that needed to be replaced whenever we wanted to upgrade/patch - which doesn't really work well when you're running production workloads unless you have some serious orchestration around the cluster. The current toolset(kargo/kube-aws) around CoreOS/Kubernetes still leave much to be desired.

Contributor

harsha-y commented Jun 10, 2016

When multi-az support was announced combined with the checked off #346 in the list mentioned above we got excited and tried to deploy a kube-aws cluster without actually verifying that existing subnets are supported. Obviously we ran into issues. What we ended up doing was to take the CF template output after running kube-aws init/kube-aws render, edit the template to include our existing subnets and launch the cluster aws cloudformation. After a little bit of hacking we did end up with a working cluster in our existing subnets. But this solution seems brittle.

Here are a few things IMHO that would make the cluster launch more "productionized":

  • Support for existing subnets
  • Private subnets/instances with no public ip address
    • Leave AWS managed nat gateway attachment/routing to the end-users?
    • This was partially accomplished in the cluster we launched but the controller still assigned itself an EIP in a private subnet
  • Clear upgrade paths
  • Docker pull through cache registry on the master
  • More granularity around the different addons
    • Would most definitely include SkyDNS and Kubernetes Dashboard by default but...
    • Folks might have alternate solutions around Calico, Heapster(Prometheus/Sysdig) and Fluentd-ELK(Fluentd-Graylog2) - these should be optional

May be this list should be split into must-haves vs nice-to-haves? Or better, layers of cloudformation templates? (I might be over-simplifying things here, but you get the idea)
Something along the lines of -

  • kube-aws up cluster
  • kube-aws up calico
  • kube-aws up logging

When we initially launched our k8s clusters last year, there were very few solutions that solved some of the requirements we had. So we went ahead and wrote a lengthy but working cloudformation template and that solved most of our reqs. But we ended up with a template that was hard to maintain and a cluster that needed to be replaced whenever we wanted to upgrade/patch - which doesn't really work well when you're running production workloads unless you have some serious orchestration around the cluster. The current toolset(kargo/kube-aws) around CoreOS/Kubernetes still leave much to be desired.

@igalbk

This comment has been minimized.

Show comment
Hide comment
@igalbk

igalbk Jul 12, 2016

@harsha-y Thank you for this info
We are trying to do the same and modify the template output to make the cloudformation to use an existing subnet.
Can you please share how exactly to do it, what exactly to modify?

igalbk commented Jul 12, 2016

@harsha-y Thank you for this info
We are trying to do the same and modify the template output to make the cloudformation to use an existing subnet.
Can you please share how exactly to do it, what exactly to modify?

@sdouche

This comment has been minimized.

Show comment
Hide comment
@sdouche

sdouche Jul 12, 2016

Hi @igalbk
Today I tried successfully to install CoreOS-Kubernetes on existing subnet (only one, it's a POC) and existing IAM Roles. What I do:

  • remove in the CF template the creation of subnet0
  • remove in the CF template the creation of IAM*
  • remove in the CF template the creation of EIPController
  • remove in the CF template the creation of RouteTableAssociation
  • add subnet, IAMInstanceProfile* as parameters
  • Use the privateIP of the controller for the DNS record

And in cluster.go (quick & dirty, I'm sorry):

-       if err := c.ValidateExistingVPC(*existingVPC.CidrBlock, subnetCIDRS); err != nil {
-               return fmt.Errorf("error validating existing VPC: %v", err)
-       }
+       //if err := c.ValidateExistingVPC(*existingVPC.CidrBlock, subnetCIDRS); err != nil {
+       //      return fmt.Errorf("error validating existing VPC: %v", err)
+       //}

        return nil
 }
@@ -266,7 +266,7 @@ func (c *Cluster) Info() (*Info, error) {
        cfSvc := cloudformation.New(c.session)
        resp, err := cfSvc.DescribeStackResource(
                &cloudformation.DescribeStackResourceInput{
-                       LogicalResourceId: aws.String("EIPController"),
+                       LogicalResourceId: aws.String("InstanceController"),
                        StackName:         aws.String(c.ClusterName),
                },
        )

It's good for the stack creation but I've an error with the kubernetes-wrapper (need to investigate).

sdouche commented Jul 12, 2016

Hi @igalbk
Today I tried successfully to install CoreOS-Kubernetes on existing subnet (only one, it's a POC) and existing IAM Roles. What I do:

  • remove in the CF template the creation of subnet0
  • remove in the CF template the creation of IAM*
  • remove in the CF template the creation of EIPController
  • remove in the CF template the creation of RouteTableAssociation
  • add subnet, IAMInstanceProfile* as parameters
  • Use the privateIP of the controller for the DNS record

And in cluster.go (quick & dirty, I'm sorry):

-       if err := c.ValidateExistingVPC(*existingVPC.CidrBlock, subnetCIDRS); err != nil {
-               return fmt.Errorf("error validating existing VPC: %v", err)
-       }
+       //if err := c.ValidateExistingVPC(*existingVPC.CidrBlock, subnetCIDRS); err != nil {
+       //      return fmt.Errorf("error validating existing VPC: %v", err)
+       //}

        return nil
 }
@@ -266,7 +266,7 @@ func (c *Cluster) Info() (*Info, error) {
        cfSvc := cloudformation.New(c.session)
        resp, err := cfSvc.DescribeStackResource(
                &cloudformation.DescribeStackResourceInput{
-                       LogicalResourceId: aws.String("EIPController"),
+                       LogicalResourceId: aws.String("InstanceController"),
                        StackName:         aws.String(c.ClusterName),
                },
        )

It's good for the stack creation but I've an error with the kubernetes-wrapper (need to investigate).

@sdouche

This comment has been minimized.

Show comment
Hide comment
@sdouche

sdouche Jul 12, 2016

@igalbk I can send you patches if you want. Thanks for your support.

sdouche commented Jul 12, 2016

@igalbk I can send you patches if you want. Thanks for your support.

@igalbk

This comment has been minimized.

Show comment
Hide comment
@igalbk

igalbk Jul 13, 2016

Thank you @sdouche
It will be great if you can send me or share more details.
I'm successfully can create a cloudformation stack after modifying the "kuve-aws up --exort" stack in the same VPC+subnet, but the cluster itsefv is not functional because flannel.service is not starting, maybe I did something wring... but I don"t know what. BTW, We don"t need to use an existing IAM,
Is it a must? and why did you had to remove the EIPController?

igalbk commented Jul 13, 2016

Thank you @sdouche
It will be great if you can send me or share more details.
I'm successfully can create a cloudformation stack after modifying the "kuve-aws up --exort" stack in the same VPC+subnet, but the cluster itsefv is not functional because flannel.service is not starting, maybe I did something wring... but I don"t know what. BTW, We don"t need to use an existing IAM,
Is it a must? and why did you had to remove the EIPController?

@sdouche

This comment has been minimized.

Show comment
Hide comment
@sdouche

sdouche Jul 13, 2016

  1. I can't creating roles (created and validated by ops). Thanks to AWS to allow to create roles with more power than the creator of the role.
  2. ElasticIP works only on public subnets. You can't use it for private clusters.
  3. Have you set a /16 for the podsCIDR ? Flannel uses a /24 for each node. Hard to say what's wrong w/o logs. I think it's not a CF issue, more a k8s one.

sdouche commented Jul 13, 2016

  1. I can't creating roles (created and validated by ops). Thanks to AWS to allow to create roles with more power than the creator of the role.
  2. ElasticIP works only on public subnets. You can't use it for private clusters.
  3. Have you set a /16 for the podsCIDR ? Flannel uses a /24 for each node. Hard to say what's wrong w/o logs. I think it's not a CF issue, more a k8s one.
@sdouche

This comment has been minimized.

Show comment
Hide comment
@sdouche

sdouche Jul 20, 2016

update: I can't create ELB (see kubernetes/kubernetes#29298) with an existing subnet.

EDIT: Must add service.beta.kubernetes.io/aws-load-balancer-internal annotation to create an internal ELB.

sdouche commented Jul 20, 2016

update: I can't create ELB (see kubernetes/kubernetes#29298) with an existing subnet.

EDIT: Must add service.beta.kubernetes.io/aws-load-balancer-internal annotation to create an internal ELB.

@detiber

This comment has been minimized.

Show comment
Hide comment
@detiber

detiber Jul 21, 2016

@dgoodwin not sure if you've seen this.

detiber commented Jul 21, 2016

@dgoodwin not sure if you've seen this.

@colhom

This comment has been minimized.

Show comment
Hide comment
@colhom

colhom Aug 2, 2016

Contributor

Update here on work that is closing in on being ready for review:

  • discrete etcd cluster w/ TLS PR - #544
  • HA/cross zone control plane - #596
  • Cluster upgrades... - coming next!
Contributor

colhom commented Aug 2, 2016

Update here on work that is closing in on being ready for review:

  • discrete etcd cluster w/ TLS PR - #544
  • HA/cross zone control plane - #596
  • Cluster upgrades... - coming next!

@colhom colhom referenced this issue Aug 9, 2016

Closed

Cluster upgrades #608

@colhom

This comment has been minimized.

Show comment
Hide comment
@colhom

colhom Aug 9, 2016

Contributor

Cluster upgrade PR is in #608

Contributor

colhom commented Aug 9, 2016

Cluster upgrade PR is in #608

@AlmogBaku

This comment has been minimized.

Show comment
Hide comment
@AlmogBaku

AlmogBaku commented Sep 2, 2016

Heapster now fully support ElasticSearch Sink(also hosted ES clusters on AWS):
kubernetes/heapster#733 MERGED
kubernetes/heapster#1260 MERGED
kubernetes/heapster#1276 MERGED

Documentation
https://github.com/kubernetes/heapster/blob/master/docs/sink-configuration.md#aws-integration

@AlmogBaku

This comment has been minimized.

Show comment
Hide comment
@AlmogBaku

AlmogBaku Sep 2, 2016

Fluentd integration depends on #650 + making an image that preconfigured for the cluster(we can contribute that; cc @Thermi )

AlmogBaku commented Sep 2, 2016

Fluentd integration depends on #650 + making an image that preconfigured for the cluster(we can contribute that; cc @Thermi )

@AlmogBaku

This comment has been minimized.

Show comment
Hide comment
@AlmogBaku

AlmogBaku Sep 26, 2016

kubernetes/heapster#1313 this PR will fix the ES Sink compatability.. however since AWS doesn't allow "scripted fields" it's still impossible to calculate usage rate of resources as percentage out of the capacity.

AlmogBaku commented Sep 26, 2016

kubernetes/heapster#1313 this PR will fix the ES Sink compatability.. however since AWS doesn't allow "scripted fields" it's still impossible to calculate usage rate of resources as percentage out of the capacity.

@aaronlevy

This comment has been minimized.

Show comment
Hide comment
@aaronlevy

aaronlevy Nov 17, 2016

Member

The kube-aws tool has been moved to its own top-level directory @ https://github.com/coreos/kube-aws

If this issue still needs to be addressed, please re-open the issue under the new repository.

Member

aaronlevy commented Nov 17, 2016

The kube-aws tool has been moved to its own top-level directory @ https://github.com/coreos/kube-aws

If this issue still needs to be addressed, please re-open the issue under the new repository.

@aaronlevy aaronlevy closed this Nov 17, 2016

@drewblas

This comment has been minimized.

Show comment
Hide comment
@drewblas

drewblas Nov 17, 2016

No worries, nobody cares about production quality deploys. That'd be ridiculous...

drewblas commented Nov 17, 2016

No worries, nobody cares about production quality deploys. That'd be ridiculous...

@colhom

This comment has been minimized.

Show comment
Hide comment
@colhom

colhom Nov 17, 2016

Contributor

@drewblas the project has simply moved to a new repo- where significant progress has been made in merging functionality towards these goals in the last few weeks.

Contributor

colhom commented Nov 17, 2016

@drewblas the project has simply moved to a new repo- where significant progress has been made in merging functionality towards these goals in the last few weeks.

@aaronlevy

This comment has been minimized.

Show comment
Hide comment
@aaronlevy

aaronlevy Nov 17, 2016

Member

@drewblas Sorry I was copy / pasting a generic notice. As @colhom said -- a lot of the work towards these goals is being merged there.

Member

aaronlevy commented Nov 17, 2016

@drewblas Sorry I was copy / pasting a generic notice. As @colhom said -- a lot of the work towards these goals is being merged there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment