Skip to content
A script to get a summary spreadsheet of your S3 buckets: how many objects/bytes are in each bucket?
Python
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
LICENSE
README.md
create_s3_summary_spreadsheet.py

README.md

s3_summary_spreadsheet_script

This repo has a Python script that creates a spreadsheet with a summary of your S3 buckets:

  • How many buckets do I have?
  • How many objects are in each bucket?
  • How many bytes are in each bucket?

The last one is directly tied to your S3 costs, so this script gives you a way to find quick ways to reduce your S3 bill. Run the script, and then ask yourself given what I use all my buckets for, are any of them surprisingly large or populous?

Usage

You need Python 3, which you can install through most package managers or from the Python website.

  1. Install boto3 (the AWS SDK for Python):

    $ pip3 install --user boto3
  2. Set up your AWS config and credentials for boto3. (See the instructions in the boto3 docs.)

  3. Clone this repository:

    $ git clone https://github.com/alexwlchan/s3_summary_spreadsheet_script.git
    $ cd s3_summary_spreadsheet_script
  4. Run the script:

    $ python3 create_s3_summary_spreadsheet.py
    ✨ Written a summary of your S3 stats to buckets.csv ✨

Motivation

I work at Wellcome Collection. Whenever I look at our AWS bill, one of the biggest costs is always S3 storage. That's not a surprise -- our account holds, among other things, two copies of our entire digital archive, which is nearly 120TB and growing every day. If we ever got a bill and there wasn't a big number next to S3, it's time to panic.

We spend about $25,000 on S3 storage every year. That's not nothing, but it's also not exorbitant in the context of a large organisation. It'd be nice to find some easy wins, but developer time costs money too -- it's worth an hour to save a few thousand dollars a year, but a complete audit to squeeze out a few extra dollars is out of the question.

I wrote this script to give me a quick overview of our buckets, so see if there were any quick wins. Among other things, the first time I ran it I discovered:

  • Some leftover files from old experiments that could be deleted
  • A bucket with versioning enabled, where we'd "deleted" all the objects, but the versions were all hanging around
  • A bucket where objects were being saved in the wrong storage class

It took about half an hour to write the initial version, and a few hours more to tidy it up. I'm sharing it here so other people can use it to find quick wins in their own AWS accounts.

How it works

The script uses the S3 CloudWatch metrics to determine the size of the bucket. They're only updated every few days, and may be a bit out-of-date or inaccurate, but that's okay – I'm only using this to get a rough idea of which buckets have an unexpected number of objects.

On an account with a dozen buckets, it only takes about a second to run.

License

MIT.

Say thanks

If you found this useful, you can say thanks on Twitter, or by donating to one of the charities I support.

You can’t perform that action at this time.