Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example to create multiple instances #155

Open
vsoch opened this issue May 11, 2023 · 24 comments
Open

Example to create multiple instances #155

vsoch opened this issue May 11, 2023 · 24 comments

Comments

@vsoch
Copy link

vsoch commented May 11, 2023

Describe the Feature

I'd like to know how to modify this example https://github.com/cloudposse/terraform-aws-ec2-instance/blob/master/examples/complete/main.tf for multiple EC2 instances

Expected Behavior

NA

Use Case

Bringing up a small networked cluster

Describe Ideal Solution

A similar example in the examples folder for multiple networked EC2 instances from custom AMI.

Alternatives Considered

No response

Additional Context

No response

@jamengual
Copy link
Contributor

If you need instances not on an autoscaling group, you can use a for_each or count on the module.

if you can put them on an ASG then use the asg module.

@vsoch
Copy link
Author

vsoch commented May 11, 2023

okay let me put this together - so I'd take this part of that spec:

module "ec2_instance" {
  source = "../../"

  ssh_key_pair                = module.aws_key_pair.key_name
  vpc_id                      = module.vpc.vpc_id
  subnet                      = module.subnets.private_subnet_ids[0]
  security_groups             = [module.vpc.vpc_default_security_group_id]
  assign_eip_address          = var.assign_eip_address
  associate_public_ip_address = var.associate_public_ip_address
  instance_type               = var.instance_type
  security_group_rules        = var.security_group_rules
  instance_profile            = aws_iam_instance_profile.test.name
  tenancy                     = var.tenancy

  context = module.this.context
}

and reading about for_each I would do something like this?

module "ec2_instance" {
  source = "../../"
  # This would be a 4 node cluster
  for_each = toset( ["0", "1", "2", "3"] )
 
 # Where would I specify a name or a hostname?
  name = "instance-${each.key}"
  ssh_key_pair                = module.aws_key_pair.key_name
  vpc_id                      = module.vpc.vpc_id
  subnet                      = module.subnets.private_subnet_ids[0]
  security_groups             = [module.vpc.vpc_default_security_group_id]
  assign_eip_address          = var.assign_eip_address
  associate_public_ip_address = var.associate_public_ip_address
  instance_type               = var.instance_type
  security_group_rules        = var.security_group_rules
  instance_profile            = aws_iam_instance_profile.test.name
  tenancy                     = var.tenancy

  context = module.this.context
}

Or should I just set instance count to the number that I want? In which case, how would I be able to know there hostnames in advance? Thanks for the help! Sorry I'm new to this.

@jamengual
Copy link
Contributor

the hostname is something you will have to setup since the instances will pick the instance id as hostname i-dsad3424dsfds

if you are building a cluster that may or may not autoscale I will just use the asg module not this one. https://github.com/cloudposse/terraform-aws-ec2-autoscale-group

if you have the requirement to have separated and unique instances that are not that mutable then you could use this module for that with the for_each or count.

@vsoch
Copy link
Author

vsoch commented May 11, 2023

The cluster will have our job manager, Flux framework, and the different node hostnames need to be known at creation time (and we don't currently support any concept of scaling) so that's why I was looking to this config! The instances can be separated and not unique, but I do need to figure out how to, for example, get all the hostnames in the cluster for the main broker instance. As an example, when I do this in Kubernetes I use an Indexed Job, and then I know the hostname is something like:

flux-sample-0.flux-service.flux-operator.svc.cluster.local

And then the broker config just needs to know the shared DNS network (flux-service.flux-operator.svc.cluster.local) and then the range of hosts (e.g., flux-sample[0-4]). When we set this up in terraform with GCP (and I didn't create these recipes so I understand them superficially) I think we set the hostnames via a variable that goes into metadata for the instance https://github.com/GoogleCloudPlatform/scientific-computing-examples/blob/b6995a84ba084bd55e08d3e09d9a1b8e6715db65/fluxfw-gcp/tf/modules/compute/main.tf#L72-L80 and then during startup we can ping that API to get the exact value. Does that make sense? I'm looking for something similar / simple here - I basically just need a set number of instances (no auto-scaling) that I can somehow retrieve the hostnames for to write into the cluster configs.

@vsoch
Copy link
Author

vsoch commented May 11, 2023

What about this recipe / setup under this same org? I see there is a DNS -> hostname https://github.com/clouddrove/terraform-aws-ec2/blob/edcac308fa17b8135b7813e643c2f7448306c01e/main.tf#L193

@jamengual
Copy link
Contributor

jamengual commented May 11, 2023

it all depends on what you are doing, I have no idea about your requirements so I can only assume.

if the software you are using requires a valid domain name record to point to the instance then you can use that.
if it uses the hostname then you can add a userdata script that sets the hostname in a predictable format.

#cloud-config
instanceid=$(curl -s http://169.254.169.254/latest/meta-data/instance-id | sed 's/i-//g')
hostnamectl set-hostname "cassandra-$${instanceid}.${namespace}-${environment}"
echo "cassandra-$${instanceid}.${namespace}-${environment}" > /etc/hostname
hostname -F /etc/hostname

@vsoch
Copy link
Author

vsoch commented May 11, 2023

okay I've been able to use the recipe to deploy two nodes, and I'm looking at the metadata above!

instanceid=$(curl -s http://169.254.169.254/latest/meta-data/instance-id | sed 's/i-//g')
[rocky@ip-172-16-215-227 ~]$ echo $instanceid 
0b5058dd4d3be6142

Stupid question - where did you get http://169.254.169.254 from (because it works for me too). I'm assuming the above would be able to set a hostname based on the instance id (which I can't control?) from within a single node, but I would not be able to request getting a set of instance ids? And if I were to setup the DNS section of the config (that I linked to above) that would be done manually on S3 first, and then do the instances get a predictable name?

I think the requirements are fairly loose - I just need an ip address / hostname that one instance can see for the other. I am fairly indifferent about how that is accomplished. If we could walk through the logic of one simple way, I'd be really appreciative!

For the above - if we know the instance id (or some other metadata variable) that is unique to the instance across all instances, this would actually be perfect. I would set up a user data section that can run on each to generate an updated hostname and add the known hostnames to the other nodes generated. They would be added to /etc/hosts. If we created the equivalent of a headless service, then I would still need to know the hostnames for the broker, but wouldn't need to add to /etc/hosts because they would be discovered via DNS.

@vsoch
Copy link
Author

vsoch commented May 11, 2023

Maybe I could try setting user-data and seeing if it shows up in this list:

image

And then part of the user data could (somehow) be the count (the for_each part?) although I don't fully understand how that relates to the explicit instance_count (or maybe I could pipe that index in?)

@jamengual
Copy link
Contributor

okay I've been able to use the recipe to deploy two nodes, and I'm looking at the metadata above!

instanceid=$(curl -s http://169.254.169.254/latest/meta-data/instance-id | sed 's/i-//g')
[rocky@ip-172-16-215-227 ~]$ echo $instanceid 
0b5058dd4d3be6142

Stupid question - where did you get http://169.254.169.254 from (because it works for me too). I'm assuming the above would be able to set a hostname based on the instance id (which I can't control?) from within a single node, but I would not be able to request getting a set of instance ids? And if I were to setup the DNS section of the config (that I linked to above) that would be done manually on S3 first, and then do the instances get a predictable name?

I think the requirements are fairly loose - I just need an ip address / hostname that one instance can see for the other. I am fairly indifferent about how that is accomplished. If we could walk through the logic of one simple way, I'd be really appreciative!

For the above - if we know the instance id (or some other metadata variable) that is unique to the instance across all instances, this would actually be perfect. I would set up a user data section that can run on each to generate an updated hostname and add the known hostnames to the other nodes generated. They would be added to /etc/hosts. If we created the equivalent of a headless service, then I would still need to know the hostnames for the broker, but wouldn't need to add to /etc/hosts because they would be discovered via DNS.

you can template the userdata to pass whatever string to it instead of the instanceid, I use that in my case.

do you need the ips as immutable?

@vsoch
Copy link
Author

vsoch commented May 12, 2023

do you need the ips as immutable?

They only need to be consistent for the lifecycle of a single cluster deployment - the general design is we bring it up, use it (and knowing the ips for a single broker in the cluster allows them to see one another) and then we throw away. We can bring up another one later with completely different ones.

@jamengual
Copy link
Contributor

I will say use the asg module instead if that is the case

@vsoch
Copy link
Author

vsoch commented May 12, 2023

Can you tell me how to get the index of the instance for the launch script? I tried setting for each, but it doesn't seem to use it - it tries to create the same one twice (and then tells me it already exists). I found this example https://www.middlewareinventory.com/blog/terraform-aws-ec2-user_data-example/ that has a count.index that it starts at 1 but I don't understand how that's working.

@vsoch
Copy link
Author

vsoch commented May 12, 2023

E.g., I can see there are indices here: module.ec2["2"].aws_iam_instance_profile.default[0]

@vsoch
Copy link
Author

vsoch commented May 12, 2023

Even if I could get an index though, I don't know how to get the instances to see one another. E.g., I manually set one of two to flux-1, but flux-1 cannot see flux-0

[rocky@flux-0 ~]$ ping flux-1
PING flux-1(flux-1 (fe80::10f9:24ff:feb0:836b%eth0)) 56 data bytes
64 bytes from flux-1 (fe80::10f9:24ff:feb0:836b%eth0): icmp_seq=1 ttl=64 time=0.040 ms
64 bytes from flux-1 (fe80::10f9:24ff:feb0:836b%eth0): icmp_seq=2 ttl=64 time=0.037 ms
64 bytes from flux-1 (fe80::10f9:24ff:feb0:836b%eth0): icmp_seq=3 ttl=64 time=0.034 ms
^C
--- flux-1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2057ms
rtt min/avg/max/mdev = 0.034/0.037/0.040/0.002 ms
[rocky@flux-0 ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

[rocky@flux-0 ~]$ ping flux-0
ping: flux-0: Name or service not known

@vsoch
Copy link
Author

vsoch commented May 12, 2023

I will say use the asg module instead if that is the case

The autoscaling module is using kubernetes? We already have a flux operator (for Kubernetes) and we are looking to deploy on bare VMs, hence this approach!

@jamengual
Copy link
Contributor

jamengual commented May 12, 2023 via email

@vsoch
Copy link
Author

vsoch commented May 12, 2023

Do you know what this tag is for?
image

@vsoch
Copy link
Author

vsoch commented May 12, 2023

okay I figured out how to get for_each working, and with custom variables. If I do:

locals {
  multiple_instances = {
    one = {
      instance_type = "m4.large"
    }
    two = {
      instance_type = "m4.large"
    }
    three = {
      instance_type = "m4.large"
    }
  }
}

then in the block we were talking about earlier:

  for_each         = local.multiple_instances

and then I have ${each.key} available as a variable (e.g, to set a hostname) but I run into an error that the instance profile was already created, so I turned this off? Not sure what the implications of that are:

  instance_profile_enabled = false

But that seemed to bring up the machines, and now they have at least unique hostnames! So the last step is figuring out how to get them to see one another... should I try enabling the DNS?

@vsoch
Copy link
Author

vsoch commented May 12, 2023

I tried enabling DNS, making a route 63 to get a zone id, and changing the hostnames there to match my instances. They were created correctly, but I can't seem to ping any of them from any instance. What am I missing? Also note I keep seeing this deprecation notice:

╷
│ Warning: Argument is deprecated
│ 
│   with module.vpc.aws_vpc.default[0],
│   on .terraform/modules/vpc/main.tf line 29, in resource "aws_vpc" "default":
│   29:   enable_classiclink               = var.enable_classiclink
│ 
│ With the retirement of EC2-Classic the enable_classiclink attribute has been deprecated and will be removed in a future
│ version.
│ 
│ (and one more similar warning elsewhere)

Going to try the autoscale recipe now - I might be epically failing but at least I'm learning little bits along the way! 😆

@vsoch
Copy link
Author

vsoch commented May 12, 2023

Okay one issue for the other repos (let me know if you want me to report them there or if they belong with a submodule)

│ Error: Error in function call
│ 
│   on .terraform/modules/subnets/outputs.tf line 53, in output "nat_ips":
│   53:   value       = coalescelist(aws_eip.default.*.public_ip, aws_eip.nat_instance.*.public_ip, data.aws_eip.nat_ips.*.public_ip, list(""))
│     ├────────────────
│     │ while calling list(vals...)
│ 
│ Call to function "list" failed: the "list" function was deprecated in Terraform v0.12 and is no longer available; use tolist([ ... ])
│ syntax to write a literal list.

This looks promising, but I'm not able to ssh in. It looks like the recipe there takes "security_groups" but that is actually linked to security group ids (not a spec for groups). Is it possible to add the creation / association of a security group with port 22 open to the autoscale spec? So I don't have to do it manually?

@vsoch
Copy link
Author

vsoch commented May 12, 2023

oh wait I think I can figure this out!! Be back - will try after dinner. Sorry having fun :)

@vsoch
Copy link
Author

vsoch commented May 12, 2023

okay! (lol still working on this!) I was able to put pieces together, and I think I have an instance group plus security groups that actually work to allow me to ssh in: https://github.com/converged-computing/flux-terraform-ami/pull/1/files#diff-6e45d26e502f88302f69c4c196babd8939186d9cd298f94caca283c128a2d186.

You can read the description of that PR - the next step (and really the final one I need help with) is to understand how to refer to the different nodes on the network, and how to predict the names so I can put into the user data start script to ensure the broker is ready. I can see that I have a hostname (I didn't do anything yet in userdata to change it):

$ hostname
ip-172-16-100-10.ec2.internal

/etc/hosts isn't populated with anything of interest:

[rocky@ip-172-16-100-10 ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

But I think the DNS lookup is setup in /etc/resolv.conf


[rocky@ip-172-16-100-10 ~]$ cat /etc/resolv.conf 
# Generated by NetworkManager
search ec2.internal
nameserver 172.16.0.2

So TLDR: I am hopeful that if we can figure out setting a unique hostname (some count or index from your autoscale module) I can set that, figure out how to refer to it for DNS to resolve, and then be able to write that predictably into the flux broker config file (and everything should work if the networking is good!) And hopefully with this config there is something more concrete for you to see and work with! And I found https://github.com/meltwater/terraform-aws-asg-dns-handler in case that gives us hints about the hostnames.

@vsoch
Copy link
Author

vsoch commented May 12, 2023

I think this could work if we are able to define a lifecycle and use that module I pointed out... still trying.

@vsoch
Copy link
Author

vsoch commented May 12, 2023

Is there any way I can reference an index in the user data? I'm really struggling to get anything working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants