# Cleaning up and tagging AMIs, snapshots and volumes associated with AWS Batch

# WARNING: DO **NOT** RUN ALL CELLS AUTOMATICALLY. READ THROUGH THE TEXT OR YOU MAY DELETE SOMETHING IMPORTANT!

Set up clients

In [1]:
import boto3

batch = boto3.client("batch")
ec2 = boto3.client("ec2")


 Find out which AMIs Batch actually uses.

In [2]:
envs = batch.describe_compute_environments()['computeEnvironments']
batch_amis = [x['computeResources']['imageId'] for x in envs if 'imageId' in x['computeResources']]
print(batch_amis)
print(len(batch_amis))

['ami-fd6dae85', 'ami-07b385342c44f150a', 'ami-40e03038', 'ami-2327865b', 'ami-63a0ea1b', 'ami-07b385342c44f150a', 'ami-0d8f32ba95357dc9f', 'ami-0334b3e09dd001f8e', 'ami-4ba2f333', 'ami-c9ce1eb1', 'ami-fd6dae85', 'ami-cab4feb2', 'ami-0334b3e09dd001f8e', 'ami-083c40aa53398f984']
14


Which AMIs in EC2 are associated with Batch?

There is no automated way to find out. We rely on the fact that the AMIs have names and descriptions that include the word "batch". However, **a human must review this list** before proceeding further. **Otherwise, you could end up 
deleting resources that should not be deleted**.

First, look at all image names, along with their IDs:

In [3]:
ec2_amis = ec2.describe_images(Owners=['self'])['Images']
print(len(ec2_amis))
for x in ec2_amis:
    if 'Name' in x:
        print("{}\t{}".format(x['ImageId'], x['Name']))

47
ami-0334b3e09dd001f8e	GPU-enabled AMI for Batch 201808301031
ami-0546947d	TrainingAnalysis3-clone
ami-07b385342c44f150a	aws-batch-image-with-sminot-refdbs-and-scratch-201809061525
ami-083c40aa53398f984	GPU-enabled AMI for Batch 201809071234
ami-08c4d1a439ca1082f	GPU-enabled AMI for Batch 201808310947
ami-0bb2a97b2098506ad	GPU-enabled AMI for Batch 201808301331
ami-0d615f55b7ffd97c2	GPU-enabled AMI for Batch 201808161415
ami-0d8f32ba95357dc9f	GPU-enabled AMI for Batch 201808171141
ami-0e7505f639c668593	GPU-enabled AMI for Batch 201808301357
ami-10fa6270	dr-avere-2
ami-13f96173	dr-avere-1
ami-2327865b	GPU-enabled AMI for Batch 201712141141
ami-2c107d4c	ubuntu1604-ENA
ami-2cc65e4c	dr-avere-3
ami-3149d151	cloudburst-avere-2
ami-33f4384b	aws-hsesbx-testfortinet-1-fw_fullconfig-image-5.4.5
ami-3d07e545	Brat-Super-Secure
ami-449f543c	aws-hsepro-1-fw_fullconfig-image-5.4.5
ami-4ba2f333	GPU-enabled AMI for Batch 201807091006
ami-4ec03936	ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server

Probably it's safe to include all images whose names contain "batch" (lowercase), plus the one called "ECS image with 4TB non-encrypted scratch at /scratch. Created at 201808071644". (The giveaway is that scratch space is a concept in AWS Batch). 
So let's make a smaller candidate list of all of these.

In [4]:
candidates = [x for x in ec2_amis if 'Description' in x and 'batch' in x['Description'].lower()]

candidates.extend([x for x in ec2_amis if x['ImageId'] == 'ami-bd5177c5'])
print(len(candidates))

19


Now find all the members of `candidates` whose IDs are **NOT** in `batch_amis`. These **should** be safe to remove.


In [5]:
unused = [x for x in candidates if x['ImageId'] not in batch_amis]
print(len(unused))
for x in unused:
    print(x['Name'])

11
GPU-enabled AMI for Batch 201808310947
GPU-enabled AMI for Batch 201808301331
GPU-enabled AMI for Batch 201808161415
GPU-enabled AMI for Batch 201808301357
aws-batch-image-with-sminot-refdbs-and-scratch-201801042032
aws-batch-image-with-sminot-refdbs-and-scratch-201802131338
aws-batch-image-with-sminot-refdbs
aws-batch-image-with-sminot-refdbs-and-scratch-20180271822
aws-batch-image-with-sminot-refdbs-and-scratch-201801011818
aws-batch-image-with-sminot-refdbs-and-scratch-201802231222
ECS image with 4TB non-encrypted scratch at /scratch. Created at 201808071644


In [6]:
len(set([x['Name'] for x in ec2_amis]))

47

OK, I am a little concerned/confused. There were 47 total AMIs owned by the HSE account. 
Batch is actively using 14 of these. We identified 19 out of the 47 that were probably Batch AMIs.
When we filtered out the ones that are actively in use, we ended up with 11. 
I would have expected 5 (19 - 14 = 5). 
I checked to see if there were any duplicate names that could be throwing us off.
We can use sets which remove duplicates and also allow interesting operations like union, intersection, delta, etc.

In [7]:
assert len(ec2_amis) == len(set([x['Name'] for x in ec2_amis])) 

It doesn't appear so. Let's look at the names of the AMIs that Batch is actively using:

In [8]:
batch_used_names = [x['Name'] for x in ec2_amis if x['ImageId'] in batch_amis]
print(len(batch_used_names))
for x in batch_used_names:
    print(x)

8
GPU-enabled AMI for Batch 201808301031
aws-batch-image-with-sminot-refdbs-and-scratch-201809061525
GPU-enabled AMI for Batch 201809071234
GPU-enabled AMI for Batch 201808171141
GPU-enabled AMI for Batch 201712141141
GPU-enabled AMI for Batch 201807091006
ubuntu-deep-learning-ami-10.0-for-batch-201806291507
ubuntu-deep-learning-ami-10.0-for-batch-201806291829


Now I am really confused. I would have expected this list to have 14 elements, but it only has 8.
AHA. I remember now that in the early days of using Batch at the Hutch, some AMIs were created in the SciComp account and made public. So those would not show up in our calculations. Let's check if some of the Batch AMI IDs 
do not exist in our full list of AMIs.

In [9]:
ec2_ami_ids = set([x['ImageId'] for x in ec2_amis])
foreign_amis = set(set(batch_amis) - ec2_ami_ids)
print(len(foreign_amis))

3


OK, there are 3 AMIs that are not in our account. 8 + 3 is 11, which is what we ended up with. Let's filter those 3 out of our `unused` list:

In [10]:
unused = [x for x in unused if x['ImageId'] not in foreign_amis]
print(len(unused))

11


It looks like there were no "foreign" AMIs among the `unused` list. Hmm. Still not sure why we have an unexpected number of items in `unused`. Will come back to this later.