-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #116 from alphagov/file-performance-documentation
Add some documentation
- Loading branch information
Showing
2 changed files
with
70 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Estimated cost of AWS storage | ||
|
||
To store all of the [current assets](existing_assets.md) in Asset Manager and Whitehall would require ~670 GB. S3 storage is currently priced at [$0.023/GB/month on S3](https://aws.amazon.com/s3/pricing/) which equates to ~$15/month. | ||
|
||
Amazon also offers an [Elastic File System (EFS)](https://aws.amazon.com/efs/) in the [Ireland and Frankfurt](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/) availability zones. It appears to have the advantage over EBS in that the volumes scale automatically with the data that is added. As it can be mounted as a file-system to an EC2 instance it potentially offers an alternative for Asset Manager that would require smaller changes to the existing AM codebase (in that the mounted EFS system would appear to the asset manager application as a file system like the current NFS model). | ||
|
||
EFS is more expensive than S3, [currently priced](https://aws.amazon.com/efs/pricing/) at $0.33/GB/month or ~$221/month for ~670GB. | ||
|
||
The cost of serving the assets has not currently been calculated. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Existing assets | ||
|
||
This document is a short overview of the assets currently stored in the NFS mount. On integration the sizes of the directories under `/mnt/uploads` are: | ||
|
||
``` bash | ||
@integration-asset-master-1:~$ cd /mnt/uploads/ | ||
@integration-asset-master-1:/mnt/uploads$ du -h --max-depth=1 | ||
35G ./asset-manager | ||
629G ./whitehall | ||
7.6M ./publisher | ||
14G ./support-api | ||
16K ./lost+found | ||
677G . | ||
``` | ||
|
||
Comparing this to the [production Grafana dashboard](https://grafana.publishing.service.gov.uk/dashboard/db/assets) (674.99G today) leads me to believe that integration has all of the assets that are on production. | ||
|
||
The asset manager application stores a record in MongoDB for each asset. On integration the number of records is | ||
|
||
``` bash | ||
@integration-backend-1:/var/apps/asset-manager$ sudo su - deploy | ||
deploy@integration-backend-1:~$ cd /var/apps/asset-manager | ||
deploy@integration-backend-1:/var/apps/asset-manager$ govuk_setenv asset-manager bundle exec rails c | ||
Loading production environment (Rails 4.2.7.1) | ||
irb(main):002:0> Asset.count | ||
=> 57232 | ||
``` | ||
|
||
We generate a list of all the files stored in NFS in the asset manager directory | ||
|
||
``` bash | ||
@integration-asset-master-1:/mnt/uploads/asset-manager$ find . -type f | xargs ls -s > ~/file_sizes.txt | ||
``` | ||
|
||
This indicates that there are 58,613 files in the NFS mount (which is slightly more than the number of records in MongoDB). | ||
|
||
``` bash | ||
12:18 $ wc -l file_sizes.txt | ||
58613 file_sizes.txt | ||
``` | ||
|
||
I haven't investigated yet why this difference exists. However we can take a look at the file sizes of the files on the mount | ||
|
||
``` bash | ||
cat file_sizes.txt | tr -d ' ' | awk -F"[.]/" '{print $1","$2} | ||
``` | ||
Loading this file into R allows us to calculate the distribution of file sizes | ||
``` r | ||
library(readr) | ||
d <- read_csv('file_sizes.csv', col_names=c('size', 'filename')) | ||
quantile(d$size, c(.5, .8, .95, .99, 1)) | ||
``` | ||
``` | ||
50% 80% 95% 99% 100% | ||
204.00 732.00 2376.00 6031.52 174844.00 | ||
``` | ||
The median file size is 204k, 95% of all assets are under 2.3Mb and the largest asset is just over 174Mb. |