# Cleaning up and tagging AMIs, snapshots and volumes associated with AWS Batch

# WARNING: DO **NOT** RUN ALL CELLS AUTOMATICALLY. READ THROUGH THE TEXT OR YOU MAY DELETE SOMETHING IMPORTANT!

Set up clients

In [49]:
import boto3
from botocore.exceptions import ClientError

batch = boto3.client("batch")
ec2 = boto3.client("ec2")


 Find out which AMIs Batch actually uses. Note that more than one Batch compute environment uses the same AMI, so we remove duplicates by converting `batch_amis` to a `set` and then back to a `list`.

In [15]:
envs = batch.describe_compute_environments()['computeEnvironments']
batch_amis = [x['computeResources']['imageId'] for x in envs if 'imageId' in x['computeResources']]
print(batch_amis)
print(len(batch_amis))
print(len(set(batch_amis)))
batch_amis = list(set(batch_amis))


['ami-fd6dae85', 'ami-07b385342c44f150a', 'ami-40e03038', 'ami-2327865b', 'ami-63a0ea1b', 'ami-07b385342c44f150a', 'ami-0d8f32ba95357dc9f', 'ami-0334b3e09dd001f8e', 'ami-4ba2f333', 'ami-c9ce1eb1', 'ami-fd6dae85', 'ami-cab4feb2', 'ami-0334b3e09dd001f8e', 'ami-083c40aa53398f984']
14
11


11

Which AMIs in EC2 are associated with Batch?

There is no automated way to find out. We rely on the fact that the AMIs have names and descriptions that include the word "batch". However, **a human must review this list** before proceeding further. **Otherwise, you could end up 
deleting resources that should not be deleted**.

First, look at all image names, along with their IDs:

In [28]:
ec2_amis = ec2.describe_images(Owners=['self'])['Images']
print(len(ec2_amis))
for x in ec2_amis:
    if 'Name' in x:
        print("{}\t{}".format(x['ImageId'], x['Name']))

47
ami-0334b3e09dd001f8e	GPU-enabled AMI for Batch 201808301031
ami-0546947d	TrainingAnalysis3-clone
ami-07b385342c44f150a	aws-batch-image-with-sminot-refdbs-and-scratch-201809061525
ami-083c40aa53398f984	GPU-enabled AMI for Batch 201809071234
ami-08c4d1a439ca1082f	GPU-enabled AMI for Batch 201808310947
ami-0bb2a97b2098506ad	GPU-enabled AMI for Batch 201808301331
ami-0d615f55b7ffd97c2	GPU-enabled AMI for Batch 201808161415
ami-0d8f32ba95357dc9f	GPU-enabled AMI for Batch 201808171141
ami-0e7505f639c668593	GPU-enabled AMI for Batch 201808301357
ami-10fa6270	dr-avere-2
ami-13f96173	dr-avere-1
ami-2327865b	GPU-enabled AMI for Batch 201712141141
ami-2c107d4c	ubuntu1604-ENA
ami-2cc65e4c	dr-avere-3
ami-3149d151	cloudburst-avere-2
ami-33f4384b	aws-hsesbx-testfortinet-1-fw_fullconfig-image-5.4.5
ami-3d07e545	Brat-Super-Secure
ami-449f543c	aws-hsepro-1-fw_fullconfig-image-5.4.5
ami-4ba2f333	GPU-enabled AMI for Batch 201807091006
ami-4ec03936	ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server



Also, some of the AMIs used by batch (`batch_amis`) are in a different AWS account (the SciComp account). Let's remove those. (Note that this notebook or similar code should also be run in that account in order to remove unused resources there.)

In [16]:
# out of batch_amis, which ones are not in ec2_amis?
ec2_ami_id_list = set([x['ImageId'] for x in ec2_amis])
foreign = set(batch_amis) - ec2_ami_id_list
print(len(foreign))
foreign
batch_amis = list(set(batch_amis) - foreign) 
len(batch_amis)

3


8

Probably it's safe to include all images whose names contain "batch" (lowercase), plus the one called "ECS image with 4TB non-encrypted scratch at /scratch. Created at 201808071644". (The giveaway is that scratch space is a concept in AWS Batch). 
So let's make a smaller candidate list of all of these.

In [17]:
candidates = [x for x in ec2_amis if 'Description' in x and 'batch' in x['Description'].lower()]

candidates.extend([x for x in ec2_amis if x['ImageId'] == 'ami-bd5177c5'])
print(len(candidates))

19


Now find all the members of `candidates` whose IDs are **NOT** in `batch_amis`. These **should** be safe to remove.


In [26]:
unused = [x for x in candidates if x['ImageId'] not in batch_amis]
print(len(unused))

for x in unused:
    print(x['Name'])


11
GPU-enabled AMI for Batch 201808310947
GPU-enabled AMI for Batch 201808301331
GPU-enabled AMI for Batch 201808161415
GPU-enabled AMI for Batch 201808301357
aws-batch-image-with-sminot-refdbs-and-scratch-201801042032
aws-batch-image-with-sminot-refdbs-and-scratch-201802131338
aws-batch-image-with-sminot-refdbs
aws-batch-image-with-sminot-refdbs-and-scratch-20180271822
aws-batch-image-with-sminot-refdbs-and-scratch-201801011818
aws-batch-image-with-sminot-refdbs-and-scratch-201802231222
ECS image with 4TB non-encrypted scratch at /scratch. Created at 201808071644
11


OK, there were 19 candidates, and 11 which are not in `batch_amis`, a difference of 8 which is the length of `batch_amis`. So it does seem safe to delete the AMIs in `unused`.

## Snapshots

Before we delete anything, let's look at what snapshots are associated with the AMIs we are going to delete.

Just in case, we'll also collect information on the volumes associated with those snapshots, which may or may not exist.


In [33]:
all_snaps = ec2.describe_snapshots(OwnerIds=['self'])['Snapshots']

{'Description': 'Created by CreateImage(i-012e27bb0fcebc688) for ami-10fa6270 from vol-077ddf5ff6906b85c',
 'Encrypted': False,
 'OwnerId': '064561331775',
 'Progress': '100%',
 'SnapshotId': 'snap-170fce52',
 'StartTime': datetime.datetime(2017, 5, 1, 20, 32, 55, tzinfo=tzutc()),
 'State': 'completed',
 'VolumeId': 'vol-077ddf5ff6906b85c',
 'VolumeSize': 200}

Let's collect the ones that are associated with the AMIs in `unused`.

In [41]:
unused_amis = [x['ImageId'] for x in unused] # easier to work with a list of IDs only
unused_snaps = []
for snap in all_snaps:
    for ami in unused_amis:
        if ami in snap['Description']:
            unused_snaps.append(snap)
            break
print(len(unused_snaps))
unused_snaps[0]


37


{'Description': 'Created by CreateImage(i-0c5dd1d75d7b00467) for ami-99b63ee1 from vol-0ce025ee4254b1e8f',
 'Encrypted': False,
 'OwnerId': '064561331775',
 'Progress': '100%',
 'SnapshotId': 'snap-014b86838e991ca0a',
 'StartTime': datetime.datetime(2018, 2, 28, 2, 22, 37, tzinfo=tzutc()),
 'State': 'completed',
 'VolumeId': 'vol-0ce025ee4254b1e8f',
 'VolumeSize': 22}

### Finally, deleting stuff.


In [43]:
for ami in unused_amis:
    ec2.deregister_image(ImageId=ami)

In [45]:
for snap in unused_snaps:
    ec2.delete_snapshot(SnapshotId=snap['SnapshotId'])

In [54]:
unused_volumes = []
for snap in unused_snaps:
    unused_volumes.append(snap['VolumeId'])

# find out which volumes in unused_volumes actually exist:
existing_unused_volumes = []

for vol in unused_volumes:
    try:
        res = ec2.describe_volumes(VolumeIds=[vol])
        existing_unused_volumes.append(vol)
    except ClientError:
        pass
    
# and delete them:

for vol in existing_unused_volumes:
    ec2.delete_volume(VolumeId=vol)
print("Done")

Done
