Skip to content

Commit

Permalink
Merge pull request #116 from alphagov/file-performance-documentation
Browse files Browse the repository at this point in the history
Add some documentation
  • Loading branch information
chrislo committed Aug 9, 2017
2 parents 4a282d0 + 8bb4c8a commit fa1cbba
Show file tree
Hide file tree
Showing 2 changed files with 70 additions and 0 deletions.
9 changes: 9 additions & 0 deletions docs/costing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Estimated cost of AWS storage

To store all of the [current assets](existing_assets.md) in Asset Manager and Whitehall would require ~670 GB. S3 storage is currently priced at [$0.023/GB/month on S3](https://aws.amazon.com/s3/pricing/) which equates to ~$15/month.

Amazon also offers an [Elastic File System (EFS)](https://aws.amazon.com/efs/) in the [Ireland and Frankfurt](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/) availability zones. It appears to have the advantage over EBS in that the volumes scale automatically with the data that is added. As it can be mounted as a file-system to an EC2 instance it potentially offers an alternative for Asset Manager that would require smaller changes to the existing AM codebase (in that the mounted EFS system would appear to the asset manager application as a file system like the current NFS model).

EFS is more expensive than S3, [currently priced](https://aws.amazon.com/efs/pricing/) at $0.33/GB/month or ~$221/month for ~670GB.

The cost of serving the assets has not currently been calculated.
61 changes: 61 additions & 0 deletions docs/existing_assets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Existing assets

This document is a short overview of the assets currently stored in the NFS mount. On integration the sizes of the directories under `/mnt/uploads` are:

``` bash
@integration-asset-master-1:~$ cd /mnt/uploads/
@integration-asset-master-1:/mnt/uploads$ du -h --max-depth=1
35G ./asset-manager
629G ./whitehall
7.6M ./publisher
14G ./support-api
16K ./lost+found
677G .
```

Comparing this to the [production Grafana dashboard](https://grafana.publishing.service.gov.uk/dashboard/db/assets) (674.99G today) leads me to believe that integration has all of the assets that are on production.

The asset manager application stores a record in MongoDB for each asset. On integration the number of records is

``` bash
@integration-backend-1:/var/apps/asset-manager$ sudo su - deploy
deploy@integration-backend-1:~$ cd /var/apps/asset-manager
deploy@integration-backend-1:/var/apps/asset-manager$ govuk_setenv asset-manager bundle exec rails c
Loading production environment (Rails 4.2.7.1)
irb(main):002:0> Asset.count
=> 57232
```

We generate a list of all the files stored in NFS in the asset manager directory

``` bash
@integration-asset-master-1:/mnt/uploads/asset-manager$ find . -type f | xargs ls -s > ~/file_sizes.txt
```

This indicates that there are 58,613 files in the NFS mount (which is slightly more than the number of records in MongoDB).

``` bash
12:18 $ wc -l file_sizes.txt
58613 file_sizes.txt
```

I haven't investigated yet why this difference exists. However we can take a look at the file sizes of the files on the mount

``` bash
cat file_sizes.txt | tr -d ' ' | awk -F"[.]/" '{print $1","$2}
```
Loading this file into R allows us to calculate the distribution of file sizes
``` r
library(readr)
d <- read_csv('file_sizes.csv', col_names=c('size', 'filename'))
quantile(d$size, c(.5, .8, .95, .99, 1))
```
```
50% 80% 95% 99% 100%
204.00 732.00 2376.00 6031.52 174844.00
```
The median file size is 204k, 95% of all assets are under 2.3Mb and the largest asset is just over 174Mb.

0 comments on commit fa1cbba

Please sign in to comment.