Skip to content
This repository has been archived by the owner on Jan 13, 2023. It is now read-only.

Latest commit

 

History

History
278 lines (198 loc) · 12 KB

s3.md

File metadata and controls

278 lines (198 loc) · 12 KB

Getting Mastodon working with Amazon S3 file-hosting

This document explains what these options in the mastodon config file are for and how you configure them:

S3_ENABLED
S3_BUCKET
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
S3_REGION
S3_PROTOCOL
S3_HOSTNAME

And optionally:

S3_ALIAS_HOST

What / Why

The most disk-space-intensive part of hosting a mastodon instance is hosting all of your users' content, that is their avatars, header images, and attachments. Not only that, but all remote users which are brought into your instance via people following them or viewing their profile also create local copies of these kinds of content.

Having all this content sit on the same server as the rest of the mastodon instance can make it difficult to do backups (there will be more contention for disk while the backup is occurring) and create problems down the line if you start running out of space. Also, serving the file data over the same pipe as the web data could bottleneck your network connection.

To solve this, Mastodon supports offloading these kinds of assets onto an Amazon S3 bucket. If you're not familiar, Amazon S3 is kind of like a hard drive in the cloud, which is optimized for serving static (infrequently changing) content very quickly. A large percent of websites today use it to serve images, stylesheets, big data files, and so on. (Note that this also means if it ever goes down, like one S3 region did a few months ago, your instance will be affected along with a large chunk of the rest of the web. That's a fairly unlikely occurrence that doesn't weight very heavily against the other concerns, though).

Cost

Amazon charges both for how much data you're storing on the cloud ($0.020 - $0.030 per GB) and how much is transferred out due to e.g. serving web requests ($0.020 - $0.250 per GB past the first GB). Prices are given in ranges because they differ depending on what region (datacenter) you tell it to use -- US, Canada and EU datacenters tend to run cheapest.

For a small instance, this will most likely amount to less than $1 / month.

How

Setting up AWS and an S3 Bucket

(Feel free to skim this if you know how to make an S3 bucket already, but check the !!IMPORTANT!! note below.)

First, you'll need to create an AWS account. Go to aws.amazon.com and sign up if you haven't already. You'll need a credit card so that Amazon can charge you for the costs noted above. Note: amazon may ask you which region you'd like to be in at this point, but this will not affect which region your S3 bucket is in -- it will ask about that later.

Now go to console.aws.amazon.com, click on Services, and navigate to S3 (under Storage). You'll be looking at a list, probably empty, which will show your bucket when it's created. Click Create bucket. Here's where you'll fill in the bucket information, including the bucket name (which we'll call $S3_BUCKET) and the region (which we'll call $S3_REGION) for the bucket to be in.

!!IMPORTANT!!: The bucket name must NOT contain any dots (.), or you will not be able to serve content over HTTPS! The Amazon HTTPS certificates are only wildcarded for one level of domain name, so they will cover domains like:

https://mastodon.s3.amazonaws.com

but NOT domain names like:

https://masto.mycoolsite.s3.amazonaws.com

The only other bucket settings that need to be changed are the four public access settings – "block new public ACLs," "remove public access granted through public ACLs," "block new public bucket policies," and "block public and cross-account access if bucket has public policies." You should disable all four of these.

In the end, the bucket will have the access status "Objects can be public," which means that objects aren't enumerable (using List Objects), but objects are still public if you know the URL. This is what we want for Mastodon media.

The other bucket settings don't need to be changed, so you can now click "Create" to create the bucket.

Setting up an AWS API Key

You'll also need to create an API keypair that Mastodon can use to upload content to the S3 bucket. Go to console.aws.amazon.com, click on Services, and navigate to IAM (under Security, Identity, & Compliance). On the side column, click on Users, then Add user. Put in a descriptive name (like "Mastodon-API"), and enable "Programmatic access". You can add this user to the "Administrators" group if you like, or if you're paranoid you can create a new group for it (make sure it has AmazonS3FullAccess, or a custom S3 access restricting it to a single bucket). Finish creating the user, and copy its Access key ID and Secret access key (which we'll call $AWS_ACCESS_KEY and $AWS_SECRET_ACCESS_KEY) down somewhere -- you'll need them later.

Configuring Mastodon to use the S3 bucket

Transitioning from an existing server

!!IMPORTANT!! If your Mastodon server has already been live for a while, you need to upload all of the content to the s3 bucket before continuing! (If you're setting up a new server, you can skip this section).

Also, these instructions are specific to manual deployments, you may need to modify paths slightly for docker or other automatic deployments.

Install the command line tool s3cmd:

sudo apt-get install s3cmd

(As the Mastodon user, if you aren't already) Run the configuration wizard to set up API access:

s3cmd --configure

(Note that this command will fail if the account only has access to a single bucket, but you can temporarily grant AmazonS3FullAccess just to run this command.)

You can test that it's working by doing:

echo "test" > tmp.txt
s3cmd --acl-public put tmp.txt s3://$S3_BUCKET/tmp.txt

(Substitute the name of your bucket for $S3_BUCKET.) Now, navigate to your mastodon deployment folder, e.g.:

cd ~/mastodon/live

Before copying over the content, you'll want to pare it down to the minimal amount possible to save on bandwidth. Run this command to remove old remote media:

RAILS_ENV=production ./bin/tootctl media remove --days=7

Now run the following command. Substitute the name of your bucket for $S3_BUCKET:

s3cmd --acl-public sync --add-header="Cache-Control:public, max-age=315576000, immutable" public/system/ s3://$S3_BUCKET

This may take a while depending on how long your instance has been running.

!!IMPORTANT!!: You may also need to perform another copy after updating the Mastodon config, depending on how active your server is -- attachments, avatars, and headers can be created between when this tool runs and when you switch out the config!

If you're a fan of fancy shell commands, the following series of pipes will do several transfers in parallel:

find -type f | cut -d"/" -f 2- | xargs -P 6 -I {} s3cmd --acl-public sync --add-header="Cache-Control:public, max-age=315576000, immutable" {} s3://$S3_BUCKET/$(echo {} | sed "s:public/system/\(.*\):\1:")

Note that you should run the above command inside of the live/public/system directory, and you can use -P 6 to use more or less than 6 parallel tasks.

Filling out the config

Open up the mastodon environment config in your favorite text editor:

nano .env.production

Here's where we'll update those lines you saw earlier. Uncomment all the s3 lines and fill them out with the values we've been keeping track of. That is, don't just copy and paste this, make sure to fill in the actual values I told you to keep track of in all the places where there are $VARIABLES:

S3_ENABLED=true
S3_BUCKET=$S3_BUCKET
AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY
AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS
S3_REGION=$S3_REGION
S3_PROTOCOL=https
S3_HOSTNAME=s3-$S3_REGION.amazonaws.com                        

Your S3 region name needs to be a lowercase, dash-delimited version of whatever region you created the bucket in -- Amazon provides these here. For example, for region "US West (Oregon)", you would fill them in as follows:

S3_REGION=us-west-2
S3_HOSTNAME=s3-us-west-2.amazonaws.com                        

Mastodon will use the protocol and hostname parameters when generating the URLs for the content it displays on the page. I haven't yet determined why it needs to know the region but I assume it has something to do with the API calls it makes.

Optionally, if you want to use your own URL for assets (e.g. assets.myinstance.com, you can set up CloudFront and a custom CNAME). If you do, you'll want to tell Mastodon to use this custom URL when serving assets:

S3_ALIAS_HOST=assets.myinstance.com

Restarting Mastodon

Now restart your mastodon server (or start it up if it's a fresh one). e.g.:

sudo systemctl restart mastodon-*.service

Now is the time to perform another upload of all the local files if your server was running between the time you did the last upload and now. You will be missing content, otherwise.

Go to your web browser to double check that it's working. If you can see people's avatars and attachments, and your attachment URLs look like the following, then you're good to go!

https://cybre.s3-us-west-2.amazonaws.com/media_attachments/files/000/005/330/original/halebopp_dimai_960.jpg?1491719606

Follow-up

Since there may still be URLs out there in the wild pointing to the old assets URLs, you'll also want to set up a redirect for them. In your nginx config, make /packs point to local resources (e.g. Webpack assets), whereas /system points to the new S3 bucket:

  location ~ ^/(emoji|packs) {
    add_header Cache-Control "public, max-age=31536000, immutable";
    try_files $uri @proxy;
  }

  location /system {
    rewrite ^/system(.*) https://assets.myinstance.com$1 permanent;
  }

Troubleshooting

Here are some problems you might encounter while getting this up and running. If you encounter new ones and figure out how to solve them, let me know and I'll update this document.

Avatars / attachments are blank

Either the URLs mastodon is generating for the bucket path are incorrect, or something about the way they are being served is incorrect. Right click on a broken attachment and open it in a new tab. You might see one of a few things:

SSL certificate error: Either you didn't listen to me above about not having . in your domain name, or something is wrong with the way the URL is generating. Double check your bucket name in the config file.

S3 XML showing not found message: Files from pre-S3 may not have been copied into the bucket correctly. Double check that the basic paths of the copy were okay by going to the bucket root in the AWS console: you should see at least "accounts" "media_attachments" and maybe "preview_cards" folders.

If just a few images are blank, you may also have missed a few files in between copying the files onto the bucket and restarting your mastodon server. (You can check if this is the case by taking the path after the domain name -- e.g. accounts/avatars/... -- and checking if the file exists in your mastodon deployment under public/system/accounts/avatars/...)

S3 XML showing AccessDenied message: The permissions on either the files or the bucket are not set correctly. The files should have a permission for "Everyone" that allows them to Read the file. Mastodon does this by default for all files it creates; you may have omitted "--acl-public" or an equivalent parameter when copying the files into the bucket.

Mastodon won't start / displays an error page

It may not be able to access the AWS API or its API calls are failing. Double check the bucket name, region name, and your access and secret keys.

Contact

If you have anything you want to add to this guide or questions about the setup, ping me at @chr@cybre.space or email me at the address listed on my github profile and I'll try to help or forward on your question to those who might know more.

Thanks!