vpc asg / elb issues #486

Open
tjbaker opened this Issue Mar 4, 2014 · 17 comments

Projects

None yet

6 participants

@tjbaker
tjbaker commented Mar 4, 2014

Running 1.4.2 compile from commit 5623fad as standalone jar or war on tomcat, I get errors when trying to interact with ELBs and ASGs in VPC. When creating a new ASG the UI shows all my subnets grouped by the immutable_metadata "purpose" correctly, though upon choosing the desired subnet group and attempting to create the ASG I get the following exception:

Launch Config 'stg-Web-external-20140304182434' has been created. Could not create
AutoScaling Group: com.amazonaws.AmazonServiceException: Status Code: 400, AWS
Service: AmazonAutoScaling, AWS Request ID: 3cdf5f7e-a3ca-11e3-a648-c7cb9ecd624d, AWS Error Code: ValidationError, AWS Error Message: At least one
Availability Zone or VPC Subnet is required.. Launch Config 'stg-Web-external-20140304182434' has been deleted.

Even though I selected the VPC subnet grouping (all same "purpose") the Availability Zone chooser is not populated, so it seems the VPC subnet choice from which the AZs can be inferred is not being inspected. I presume no AZ or VPC subnets are being passed to the API call resulting in the error.

I've attached a screenshot showing the populated VPC section but blank AZ.

After the error, the page refreshes with "Launch non-VPC instances" selected and the AZ chooser shows the default a/c/d zones.

screen shot 2014-03-04 at 1 24 12 pm

@claymccoy
Contributor

It is possible that the space or dash in the purpose could be causing a problem.
There is a hidden screen that will show your subnets, I go here to start debugging this sort of thing.
http:////subnet/list
Something is wrong where the zones are looked up based on the purpose.

@tjbaker
tjbaker commented Mar 4, 2014

All purpose related subnets appear correct in /subnet/list
ie. they have have correct subnet, vpc id, matching purpose and target values.

I'll remove the spaces and dash and try again.

@tjbaker
tjbaker commented Mar 4, 2014

I removed the spaces and converted the dash to underscore but still get the same exception. Can I change a log level somewhere to get more info?

The elb is in a public vpc subnet and directs traffic to servers in a private subnet via vpc security group ingress/egress rules. I think this is pretty standard though.

@claymccoy claymccoy was assigned by danveloper Mar 7, 2014
@jperry
jperry commented Mar 12, 2014

any status on this issue?

@danveloper
Contributor

Sorry, Jay, we've been a little backed up internally. This issue is definitely on our radar and as soon as we get some bandwidth we'll get it sorted. Sorry for the issue.

@jperry
jperry commented Mar 12, 2014

@danveloper thanks! Any idea though if the bug was introduced in a specific version or is still being determined?

@danveloper
Contributor

We're still investigating the issue, but I'll update as we learn more. Thanks for your patience!

@jperry
jperry commented Mar 12, 2014

okay, thanks again!

@jperry
jperry commented Apr 4, 2014

@danveloper - bump?

@danveloper
Contributor

@jperry sorry for the delay in getting back to you :-( ... The short answer to this is that I'm keeping this issue open to remind me to double check this issue. We're doing a MAJOR overhaul on Asgard -- details to come soon, but this problem is on the radar for the refactor.

@jperry
jperry commented Apr 7, 2014

@danveloper - thanks for the follow up. Any ETA on a timeline for all of this? Sorry if I'm asking too much I just am really excited about what Asgard can offer to my company and want to start using asap. Thanks again!

@danveloper
Contributor

i'm hoping that we can get some of the details and timetables out this week, but will know more soon.

@jperry
jperry commented Apr 9, 2014

@danveloper thanks!

@jperry
jperry commented May 13, 2014

@danveloper - checking in. Any update on when details and timetables will be out? Thanks!

@danveloper
Contributor

Hi Jay,

Yes, I can provide some details now about what we're doing with respect to Asgard 2. First, though, it's probably worthwhile to outline the problem space that we're facing, and why we're going through a major architectural refactoring of the existing Asgard project.

In terms of a continuous delivery and continuous deployment pipeline, Asgard is really just the back side to that goal. As in, the project needs to be built, tested, and packaged, and then that package needs staged for a deployment. With Asgard and AWS, that "staging" comes in the form of an AMI that is pre-cooked and ready to be sent off for deployment into an AutoScaling group. But there are really big parts that lead up to the point where we have an AMI that is ready to be shipped, and it's something that we can't ignore if we want to get the "full picture".

Internal to Netflix, we have a tool that automates the aspects of continuous delivery for our deployment pipelines. That tool is called Mimir, and it allows teams to build workflows that automate the stages of delivery and deployment, from SCM to an ASG accepting traffic, and everything in between. Mimir integrates with Asgard to provide the deployment functions, but, in general, the two systems are autonomous and self-contained. Although they share common concepts -- like "applications" and "clusters" --, they have no common data model, and, in fact, they derive some of those details in very different ways. Mimir was written as a prototype and to demonstrate that continuous-delivery-at-scale is viable; it has been very successful and widely adopted, to the point that, in today's world, a large portion of Netflix teams use it to automate their builds, bakes, and deployments.

As we move forward with supporting the functions of continuous delivery and deployment, we want to solidify the concepts that make Mimir and Asgard respectively valuable tools, and to bridge the gap between the two components. To that end, Asgard 2 will focus on the full breadth of what it means to "deploy to the cloud". At a very high level, Asgard 2 will fold the functionality of Mimir and Asgard into a single platform that operates within a common context and understanding of how we view the cloud.

At a lower level, we're looking to improve and modularize the code bases of both projects to facilitate greater extensibility, which will allow us to deliver features and fixes with much greater velocity. From a practical standpoint, we're also trying to support a much higher level of engagement within the code base. In the past year, the Delivery Engineering team at Netflix has grown from just two people working on the Asgard codebase to now eight people, and that number will surely grow as we start to need more people working to improve the platform. Given that, we need the ability to operate in a much-more Agile manner, and the current code bases of Asgard and Mimir aren't amenable to that goal.

From a lower level, and as a strategic maneuver for project architecture and management, we are building the Asgard 2 platform as a set of independent micro-services, which join together to service the common goal of cloud delivery and deployment. Each function in the architecture is able to exist within its own repository and code base, and is also able to be deployed independently of the greater platform and with a level of scalability that neither Asgard 1 nor Mimir have today.

These micro-services are being built on top of Spring Boot. Boot gives us all of the rapid development aspects that we love about Grails, it enables us to integrate with other parts of the Spring ecosystem (another factor we love about Grails), and its deployable footprint can be as low as a few dozen megabytes. In addition, Boot's Gradle integration (our preferred method) allows us to choose our packaging strategy -- war or runnable jar. To that end, we can deploy slices of the platform on the same instance, which will better utilize our usage footprint with Amazon. Moreover, Boot applications are inherently libraries, which means that we can compose components of the platform as necessary. This gives much more agility (little 'a') in refactoring as we go forward.

It's important to note, however, that this is NOT a departure from Grails. About three months ago, as we were evaluating how the next phase of Asgard and Mimir would look, we knew two things: 1) we need the components to be micro-services (for all the reasons mentioned); and 2) we LOVE Grails. Grails, in its present-day form, however, is not a great framework for building micro-services that are lightweight deployables. On top of that, there is no migration story for Grails 2.x (current version) to Grails 3, which leaves us a little trepidatious about adopting it for a near-term major refactoring. A much more suitable migration path to Grails 3 is through Spring Boot. The Grails core team is working hard to extract into their own libraries the parts of Grails that make it so great, where those libraries are capable of being used independently of the core framework. You may have seen their recent releases of standalone GORM and standalone GSP libraries, which are both able to work inside of Boot applications. We very much want to circle back to Grails when the time is right, and structuring our architecture in a micro-service way gives us a much better path to do that in the future.

From an open source perspective, we are working diligently to ensure that the respective pieces of the Asgard 2 platform have a really good open source story -- as in, "how do I put all this stuff together?". We want be sure that the open source community can leverage all or a slice of the platform, as is appropriate for their needs. To that end, we're looking at lxc technologies, like Docker, to ensure that we can bring an end-to-end experience that is not-so-dissimilar from the experience that you get today. For outstanding issues on Asgard 1, we're addressing those internally to Asgard 2, and we're working to find the right (easy) way to bring those issues into context with respect to Github. We'll have many more details around our open source story in the coming weeks and months, but rest assured that it is at the forefront of our development workflows.

@kalosoid
kalosoid commented Sep 9, 2014

@danveloper - Any update on Asgard 2.0? Thanks!

@marco-hoyer

would be nice to hear something about asgard 2. we're also waiting on any news about it. Did you maybe think about integration with cloudformation as infrastructure management tool in combination as asgard as a tool to manage parts of the infrastructure dynamically?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment