-
Notifications
You must be signed in to change notification settings - Fork 314
Description
Hi,
I want to start a cfncluster with the MASTER node in a public Subnet and the COMPUTE nodes in a private Subnet. I don't want my compute nodes to be exposed, and even if you can close all possible ways to connect to the nodes through the Security Group, I'm convinced that having a Public IP on the compute nodes is an unnecessary weakness in terms of security.
My question is:
HOW CAN I START A CLUSTER HAVING THE COMPUTE NODES IN A STRICTLY PRIVATE NETWORK (with no Internet access at all)?
Complementary info:
When trying to start such cluster, the ComputeFleet fails to initialize, and the cluster fails to start with the following error:
Status: cfncluster-Test - CREATE_FAILED
Cluster creation failed. Failed events:
- AWS::CloudFormation::Stack cfncluster-Test The following resource(s) failed to create: [ComputeFleet].
- AWS::AutoScaling::AutoScalingGroup ComputeFleet Received 2 FAILURE signal(s) out of 2. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement
In cloud-init.log, I could see that the script named /var/lib/cloud/instance/scripts/part-002 fails to execute properly on the COMPUTE nodes (no error on the MASTER, and no error if the COMPUTE nodes are configured to start in a public Subnet) :
2018-08-28 15:36:38,160 - util.py[DEBUG]: Running command ['/var/lib/cloud/instance/scripts/part-002'] with allowed return codes [0] (shell=False, capture=False)
2018-08-28 15:47:11,727 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-002 [1]
2018-08-28 15:47:11,728 - util.py[DEBUG]: Failed running /var/lib/cloud/instance/scripts/part-002 [1]
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 802, in runparts
subp(prefix + [exe_path], capture=False)
File "/usr/lib/python2.7/site-packages/cloudinit/util.py", line 1858, in subp
cmd=args)
ProcessExecutionError: Unexpected error while running command.
Command: ['/var/lib/cloud/instance/scripts/part-002']
Exit code: 1
Reason: -
Stdout: -
Stderr: -
Please, find below the configuration that I use:
[aws]
aws_region_name = eu-west-1
aws_access_key_id = xxx
aws_secret_access_key = xxx
[global]
update_check = true
sanity_check = true
cluster_template = TorqueCluster
[cluster TorqueCluster]
vpc_settings = VPC-test
key_name = mykey
scheduler = torque
base_os = centos7
initial_queue_size = 2
max_queue_size = 2
maintain_initial_size = true
master_instance_type = m4.xlarge
compute_instance_type = t2.micro
[vpc VPC-test]
vpc_id = vpc-xxx
master_subnet_id = subnet-<public>
compute_subnet_id = subnet-<private>
vpc_security_group_id = sg-xxx
Thanks in advance for your help.