Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regenerate NASA SMCE AWS account credentials every 60 days #2434

Open
Tracked by #2538
yuvipanda opened this issue Mar 29, 2023 · 23 comments · Fixed by #2605
Open
Tracked by #2538

Regenerate NASA SMCE AWS account credentials every 60 days #2434

yuvipanda opened this issue Mar 29, 2023 · 23 comments · Fixed by #2605
Assignees

Comments

@yuvipanda
Copy link
Member

yuvipanda commented Mar 29, 2023

Cluster status

Every 60 days / two months, we need to re-generate the deployer credentials for each nasa AWS account we manage, this issue describes the status of the accounts and how to do it. Update the status below if you have re-generated credentials.

Cluster Re-generation history
nasa-esdis April 23 via #3979, Feb 28 via #3747, Jan 5 via #3575, Dec 6 2023 via #3503
nasa-ghg April 23 via #3979, Feb 28 via #3747, Jan 5 via #3575, Dec 12 2023 via #3528, Nov 19 2023 via #3442, Oct 4 2023 via #3220
nasa-veda April 23 via #3979, Feb 28 via #3747, Jan 5 via #3575, Dec 6 2023 via #3506, Nov 19 2023 via #3442, Oct 4 2023 via #3219

Upcoming regeneration: 23 April

How to re-generate credentials for deployer

#2339 is related, but this is specifically for the continuous deployer key for individual nasa AWS accounts.

Here's the how to on doing this:

  1. Authenticate yourself - see https://repost.aws/knowledge-center/authenticate-mfa-cli on how to
  2. cd terraform/aws
  3. CLUSTER_NAME=...
  4. # put yourself in the right workspace
    export TF_WORKSPACE=$CLUSTER_NAME
  5. # replace previous credentials with new
    terraform apply -replace=aws_iam_access_key.continuous_deployer -var-file=projects/$CLUSTER_NAME.tfvars
  6. # write new credentials to file, and then encrypt them in place
    terraform output -raw continuous_deployer_creds > ../../config/clusters/$CLUSTER_NAME/enc-deployer-credentials.secret.json
    sops -i -e ../../config/clusters/$CLUSTER_NAME/enc-deployer-credentials.secret.json
  7. # verify function of new credentials
    deployer use-cluster-credentials $CLUSTER_NAME
  8. # add relevant files updated, and then...
    git commit -m "nasa smce clusters: re-generate deployer credentials"
@yuvipanda

This comment was marked as outdated.

@jmunroe

This comment was marked as outdated.

@pnasrat

This comment was marked as outdated.

@pnasrat

This comment was marked as outdated.

@pnasrat

This comment was marked as outdated.

@yuvipanda

This comment was marked as resolved.

yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this issue Aug 4, 2023
@consideRatio consideRatio changed the title Regenerate nasa-veda credentials every 60 days Regenerate nasa-veda and nasa-ghg credentials every 60 days Oct 4, 2023
@consideRatio
Copy link
Member

I'm authenticating myself like this:

  1. I've setup a .aws/credentials file to reference an access key created
    # about config: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html
    # about aws cli env vars: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html
    # about regenerating creds: https://github.com/2i2c-org/infrastructure/issues/2434
    #
    
    [nasa-veda]
    # https://smce-veda.signin.aws.amazon.com/console
    #
    # TMP=$(aws sts get-session-token --profile nasa-ghg --serial-number <...> --token-code <...>)
    # export AWS_ACCESS_KEY_ID=$(echo $TMP | jq -r .Credentials.AccessKeyId) 
    # export AWS_SECRET_ACCESS_KEY=$(echo $TMP | jq -r .Credentials.SecretAccessKey)      
    # export AWS_SESSION_TOKEN=$(echo $TMP | jq -r .Credentials.SessionToken)
    #
    aws_access_key_id=<...>
    aws_secret_access_key=<...>
    
  2. I use the following commands to acquire useful CLI creds, relying on aws and jq
    TMP=$(aws sts get-session-token --profile nasa-ghg --serial-number arn-for-mfa-device --token-code code-from-token)
    export AWS_ACCESS_KEY_ID=$(echo $TMP | jq -r .Credentials.AccessKeyId) 
    export AWS_SECRET_ACCESS_KEY=$(echo $TMP | jq -r .Credentials.SecretAccessKey)      
    export AWS_SESSION_TOKEN=$(echo $TMP | jq -r .Credentials.SessionToken)
    

@sgibson91
Copy link
Member

sgibson91 commented Oct 4, 2023

@consideRatio there is a command in the deployer that will handle MFA of the CLI for these kinds of accounts. Yuvi added it up after I struggled with the bucket setup.

Edited to add link (edited again to update link by erik):

@exec_app.command()
def aws(
profile: str = typer.Argument(..., help="Name of AWS profile to operate on"),
mfa_device_id: str = typer.Argument(
..., help="Full ARN of MFA Device the code is from"
),
auth_token: str = typer.Argument(
..., help="6 digit 2 factor authentication code from the MFA device"
),
):
"""
Exec into a shall with appropriate AWS credentials (including MFA)
"""
creds = json.loads(
subprocess.check_output(
[
"aws",
"sts",
"get-session-token",
"--serial-number",
mfa_device_id,
"--token-code",
str(auth_token),
"--profile",
profile,
]
).decode()
)
env = os.environ | {
"AWS_ACCESS_KEY_ID": creds["Credentials"]["AccessKeyId"],
"AWS_SECRET_ACCESS_KEY": creds["Credentials"]["SecretAccessKey"],
"AWS_SESSION_TOKEN": creds["Credentials"]["SessionToken"],
"AWS_PROFILE": profile,
}
subprocess.check_call([os.environ["SHELL"], "-l"], env=env)

@consideRatio

This comment was marked as resolved.

@consideRatio
Copy link
Member

Re-generated, see #3442!

@yuvipanda

This comment was marked as resolved.

@sgibson91
Copy link
Member

sgibson91 commented Dec 6, 2023

I'm now having to do this for nasa-esdis. I ran into the same issue described by Erik here and applied since they were to be non-disruptive. But I got an error message:

│ Error: creating EFS Mount Target (fs-0013506a2d5ee70fc): MountTargetConflict: mount target already exists in this AZ
│ {
│   RespMetadata: {
│     StatusCode: 409,
│     RequestID: "71f43368-7e14-4949-9856-60dbd0d56a78"
│   },
│   ErrorCode: "MountTargetConflict",
│   Message_: "mount target already exists in this AZ"
│ }
│ 
│   with aws_efs_mount_target.homedirs["subnet-0614449fa3a84a06e"],
│   on efs.tf line 70, in resource "aws_efs_mount_target" "homedirs":
│   70: resource "aws_efs_mount_target" "homedirs" {

EDIT: I circumvented this by running a -refresh-only apply, and then another apply. I am now at step 6.

@sgibson91 sgibson91 changed the title Regenerate nasa-veda and nasa-ghg credentials every 60 days Regenerate NASA SMCE AWS account credentials every 60 days Dec 6, 2023
@sgibson91
Copy link
Member

sgibson91 commented Dec 6, 2023

I also generalised the title of this issue a bit

@damianavila
Copy link
Contributor

We might need a new iteration for nasa-ghg before the EOY break, right?

@sgibson91
Copy link
Member

@damianavila It may need to happen now. There is no Access Key associated with the hub-continuous-deployer user in the AWS console, and running deployer deploy nasa-ghg staging failed with:

An error occurred (UnrecognizedClientException) when calling the DescribeCluster operation: The security token included in the request is invalid.

@damianavila
Copy link
Contributor

@damianavila It may need to happen now.

@sgibson91, can you take care of this one before EOW?

sgibson91 added a commit to sgibson91/infrastructure that referenced this issue Dec 12, 2023
@sgibson91
Copy link
Member

Done

@consideRatio
Copy link
Member

Re-generated all creds again to keep them aligned as part of doing it for esdis where the deployer key had been deleted for some reason.

@consideRatio consideRatio self-assigned this Jan 31, 2024
@consideRatio
Copy link
Member

consideRatio commented Jan 31, 2024

@2i2c-org/engineering I'll assign myself to do regular checks of this every ~50 days or so, aiming for between 1-2 weeks notice.

  1. re-generate deployer credentials before they expire

    # code that works on eriks computer via bash functions
    cd terraform/aws
    
    nasa_esdis ...
    terraform apply -replace=aws_iam_access_key.continuous_deployer -var-file=projects/$CLUSTER_NAME.tfvars
    terraform output -raw continuous_deployer_creds > ../../config/clusters/$CLUSTER_NAME/enc-deployer-credentials.secret.json
    sops -i -e ../../config/clusters/$CLUSTER_NAME/enc-deployer-credentials.secret.json
    deployer use-cluster-credentials $CLUSTER_NAME
    
    nasa_ghg ...
    terraform apply -replace=aws_iam_access_key.continuous_deployer -var-file=projects/$CLUSTER_NAME.tfvars
    terraform output -raw continuous_deployer_creds > ../../config/clusters/$CLUSTER_NAME/enc-deployer-credentials.secret.json
    sops -i -e ../../config/clusters/$CLUSTER_NAME/enc-deployer-credentials.secret.json
    deployer use-cluster-credentials $CLUSTER_NAME
    
    nasa_veda ...
    terraform apply -replace=aws_iam_access_key.continuous_deployer -var-file=projects/$CLUSTER_NAME.tfvars
    terraform output -raw continuous_deployer_creds > ../../config/clusters/$CLUSTER_NAME/enc-deployer-credentials.secret.json
    sops -i -e ../../config/clusters/$CLUSTER_NAME/enc-deployer-credentials.secret.json
    deployer use-cluster-credentials $CLUSTER_NAME
    
    # add relevant files updated, and then...
    git commit -am "nasa smce clusters: re-generate deployer credentials"
  2. check if a member of 2i2c has been blocked after getting expired passwords by being in the AccountDisabled user group, then remove them from the group and re-generate new credentials and send them via slack.

  3. write a reminder in #engineering to refresh both cloud console passwords and the any CLI credentials they may have setup

    @Yuvi @Sarah @Georgiana you have credentials for three SMCE managed nasa clusters that forces all credentials to be updated every 60 days. This is a reminder to take action. I had ~6 days left before mine expired this time.
    
    For each cluster, you may want to do two things to login quickly later:
    - update your web console password (IAM -> Users -> Your username -> Security credentials -> Manage console access)
    - delete and re-create any access key you may be using in order to perform the 2FA check to get proper access
    
    If you had your account disabled, I've re-enabled it and sent you a generated password to be able to sign in again. You may want to take the same steps as outlined above also for this cluster.
    
    The SMCE managed nasa clusters' web consoles are available at:
    - nasa-esdis: https://smce-esdis-hub.signin.aws.amazon.com/console
    - nasa-ghg: https://smce-ghg-center.signin.aws.amazon.com/console
    - nasa-veda: https://smce-veda.signin.aws.amazon.com/console
    
  4. schedule next re-generation after seven weeks (49 days)

@yuvipanda
Copy link
Member Author

This is amazing, thank you so much @consideRatio

@consideRatio
Copy link
Member

5th Jan I regenerated the deployer credentials for all nasa hubs, but they are now no longer valid and needs to be re-generated even though 60 days hasn't passed.

Looking at the AWS console, nasa-ghg's key for the hub-continuous-deployer isn't around any more - so it got deleted I presume. But what deleted it? Looking at https://github.com/2i2c-org/infrastructure/actions/runs/7853052002 I see that it worked then but not at https://github.com/2i2c-org/infrastructure/actions/runs/7877219028.

With that, I can conclude that the credentials apparently lived Jan 5th ~12 AM when I re-created them in #3575, and that they kept working Feb 10 ~7 AM, but stopped working sometime after that and Feb 12 ~10 PM.

I think this concludes that the security credentials for our deployer script seems to be getting invalidated more often than every 60 days, based on evidence between after 36-38 days. I have security credentials that have been around and remained active for 40 days, so I think this may be specifically for the hub-deployer user that stands out by not being required to use 2FA thanks to an exception.

I think the required action plan is to re-generated the deployer script once a month instead then.

@consideRatio
Copy link
Member

Ah, I think the issue is that the instructions didn't replace existing credentials, they only ensured new ones if the previous were gone.

I've updated the instructions to use a flag for terraform apply, making us get what we want done more directly: -replace=aws_iam_access_key.continuous_deployer.

I think due to this, my conclusions on needing to re-generate more often than every 60 days was incorrect - we just had to make sure we successfully re-generated credentials even if there were existing non-expired credentials around already.

@consideRatio
Copy link
Member

I've scheduled myself to re-generate 23 April, in two weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

6 participants