Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sanity check error when providing custom EC2 instance role #1439

Closed
reisingerf opened this issue Nov 11, 2019 · 19 comments
Closed

sanity check error when providing custom EC2 instance role #1439

reisingerf opened this issue Nov 11, 2019 · 19 comments

Comments

@reisingerf
Copy link

Environment:

  • AWS ParallelCluster version 2.4.1
  • OS: alinux
  • Scheduler: Slurm
  • Master instance type: t2.micro
  • Compute instance type: t2.micro

Bug description and how to reproduce:
When I try to provide a custom ec2_iam_role and sanity_check is enabled I receive an error:
IAM role error on user provided role parallelcluster-ec2-instance-role: action ec2:DescribeVolumes is implicitDeny

When turning sanity_check off, the cluster creation seems to work fine and the permissions seem to be added as expected. A quick test showed the cluster working with no apparent issues.

Additional context:
My IAM policy is pretty much a copy of the one use by default (or documented). I just add permissions for SSM, so I can login without SSH. I even used a role name that is compatible with the existing PassRole permission.
I also double checked that the ec2:DescribeVolumes permission is Allowed on all resources.

Any pointers would be much appreciated.

@lukeseawalker
Copy link
Contributor

Hi @reisingerf,
can you please attach the IAM policy so we can double check?

Thanks

@reisingerf
Copy link
Author

Attached.
I also attach the AWS managed policies AmazonSSMManagedInstanceCore and AmazonS3ReadOnlyAccess to that role and there are no permission boundaries set.

policy.txt

@sean-smith
Copy link
Contributor

Can you provide your config file?

@reisingerf
Copy link
Author

[aws]
aws_region_name = ap-southeast-2

[global]
cluster_template = default
update_check     = true
sanity_check     = true


[cluster default]
key_name              = cluster_test
vpc_settings          = public
efs_settings          = customfs
s3_read_resource      = *
master_instance_type  = t2.micro
compute_instance_type = t2.micro
post_install          = s3://my-bucket/parallel_cluster/testcluster/bootstrap.sh
post_install_args     = "R wget"
ec2_iam_role          = parallelcluster-ec2-instance-role
scheduler             = slurm
initial_queue_size    = 0

[vpc public]
vpc_id           = vpc-efg2e648
master_subnet_id = subnet-dcf23a84

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

[efs customfs]
shared_dir       = efs
encrypted        = false
performance_mode = generalPurpose

@sean-smith
Copy link
Contributor

Did you run into #1241 ?

@reisingerf
Copy link
Author

Does not sound familiar, no.
Btw: I just pushed my cluster setup (excluding the EC2 role) to GH, in case you want to try and see if the error is reproducible. Again, the create worked fine (as far as I can tell) if I disable the sanity_check.

https://github.com/umccr/infrastructure/tree/parallel_cluster/parallel_cluster/testcluster

@sean-smith
Copy link
Contributor

nice use of ssm btw!

@reisingerf
Copy link
Author

Thanks!
We replaced SSH with SSM login not too long ago and so far we are quite happy with that move.
Also means we don't really need the key in the cluster setup, although it's good to have that as a backup option, I guess.

@sean-smith
Copy link
Contributor

@reisingerf yep, we're going in that direction. Until recently ssm didn't support all the os's that pcluster supports (Ubuntu 1404 being the culprit), now that it does that's definitely in the roadmap.

I was not able to reproduce the issue you're seeing. I've created the exact same role and config and tried with pcluster 2.4.1 and it's working for me. Can you run pcluster version please?

@reisingerf
Copy link
Author

(parallel_cluster) ~ $ pcluster version
2.4.1
(parallel_cluster) ~ $

@reisingerf
Copy link
Author

OK, some progress...
We have an organisation wide SCP that denies operations outside our region:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DenyAllOutsideAU",
            "Effect": "Deny",
            "NotAction": [
                "a4b:*",
                "acm:*",
                "budgets:*",
                "ce:*",
                "chime:*",
                "cloudfront:*",
                "cur:*",
                "globalaccelerator:*",
                "health:*",
                "iam:*",
                "sts:*",
                "importexport:*",
                "mobileanalytics:*",
                "organizations:*",
                "route53:*",
                "route53domains:*",
                "shield:*",
                "support:*",
                "trustedadvisor:*",
                "waf:*",
                "wellarchitected:*",
                "chatbot:*",
                "cloudwatch:*",
                "cloudtrail:*"
            ],
            "Resource": "*",
            "Condition": {
                "StringNotEquals": {
                    "aws:RequestedRegion": [
                        "ap-southeast-2"
                    ]
                }
            }
        }
    ]
}

When I detach that policy I run into the error you mentioned earlier:

Unexpected error of type ValueError: too many values to unpack (expected 2)

I still can't see the connection with the ec2:DescribeVolumes though...
Would I have to use a region specific Resource instead of * in the policy? I didn't have issues with that before...

@sean-smith
Copy link
Contributor

This is just the simulate_principal_policy() giving an error saying it's not allowed. It's the first action in the policy. See https://github.com/aws/aws-parallelcluster/blob/develop/cli/pcluster/config/validators.py#L347

Can you run echo $AWS_DEFAULT_REGION and see what region is given? It's possible that boto3 in the simulate_principal_policy() call is using a different region than in your config (which is what we use for cluster creation). We get the region here for simulate_principal_policy()

@reisingerf
Copy link
Author

The variable was not set, but even setting it did not change the behaviour.

(parallel_cluster) ~ $ echo $AWS_DEFAULT_REGION
ap-southeast-2
(parallel_cluster) ~ $
(parallel_cluster) ~ $ pcluster create my-test-cluster
Beginning cluster creation for cluster: my-test-cluster
IAM role error on user provided role parallelcluster-ec2-instance-role: action ec2:DescribeVolumes is implicitDeny
See https://aws-parallelcluster.readthedocs.io/en/latest/iam.html
(parallel_cluster) ~ $

@reisingerf
Copy link
Author

reisingerf commented Nov 12, 2019

I think the implicitDeny is due to our SCP. So the remaining issue seems to be the same as #1241, and by the looks of it a fix it's going to be released in version 2.5.0.
So I just disable the sanity_check and wait for the new version.
Any idea when that might happen?

@reisingerf
Copy link
Author

BTW: could the pcluster user policy here:

(
[
"ec2:DescribeVolumes",
"ec2:AttachVolume",
"ec2:DescribeInstanceAttribute",
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances",
],
"*",
),

be changed to separate out the ec2:AttachVolume action with region and account bound Resources?

Something like:

(
    [
        "ec2:DescribeVolumes",
        "ec2:AttachVolume",
        "ec2:DescribeInstanceAttribute",
        "ec2:DescribeInstanceStatus",
        "ec2:DescribeInstances",
    ],
    "*",
),
(
    [
        "ec2:AttachVolume"
    ],
    [
        "arn:%s:ec2:%s:%s:instance/*" % (partition, region, account_id),
        "arn:%s:ec2:%s:%s:volume/*" % (partition, region, account_id),
    }
)

Perhaps other actions would need to be treated similarly...

Not sure if that would help, but it might just allow us to keep our region restriction SCP?

Or do you see any particular downside?

@sean-smith
Copy link
Contributor

@reisingerf Unfortunately for simulate_principal_policy() those resource arns are too restrictive. Even with those arns it'll still return implicit deny. I'd be open to making that a warning instead of an error message. Seems too draconian as an error.

@reisingerf
Copy link
Author

That would work.

@reisingerf
Copy link
Author

Thanks for all the help!

@demartinofra
Copy link
Contributor

Issue with sanity_check got released as part of version 2.5.0.
Feel free to reopen in case you are still facing any issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants