From 971c186b8681407467d85d9201fb98aa200aad1c Mon Sep 17 00:00:00 2001 From: Chris Lowis Date: Wed, 9 Aug 2017 11:48:05 +0100 Subject: [PATCH 1/2] Add a document about existing assets A short overview of the assets currently stored on the NFS mount and the distribution of their file sizes. Fixes: #103 --- docs/existing_assets.md | 61 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 docs/existing_assets.md diff --git a/docs/existing_assets.md b/docs/existing_assets.md new file mode 100644 index 00000000..20df8e13 --- /dev/null +++ b/docs/existing_assets.md @@ -0,0 +1,61 @@ +# Existing assets + +This document is a short overview of the assets currently stored in the NFS mount. On integration the sizes of the directories under `/mnt/uploads` are: + +``` bash +@integration-asset-master-1:~$ cd /mnt/uploads/ +@integration-asset-master-1:/mnt/uploads$ du -h --max-depth=1 +35G ./asset-manager +629G ./whitehall +7.6M ./publisher +14G ./support-api +16K ./lost+found +677G . +``` + +Comparing this to the [production Grafana dashboard](https://grafana.publishing.service.gov.uk/dashboard/db/assets) (674.99G today) leads me to believe that integration has all of the assets that are on production. + +The asset manager application stores a record in MongoDB for each asset. On integration the number of records is + +``` bash +@integration-backend-1:/var/apps/asset-manager$ sudo su - deploy +deploy@integration-backend-1:~$ cd /var/apps/asset-manager +deploy@integration-backend-1:/var/apps/asset-manager$ govuk_setenv asset-manager bundle exec rails c +Loading production environment (Rails 4.2.7.1) +irb(main):002:0> Asset.count +=> 57232 +``` + +We generate a list of all the files stored in NFS in the asset manager directory + +``` bash +@integration-asset-master-1:/mnt/uploads/asset-manager$ find . -type f | xargs ls -s > ~/file_sizes.txt +``` + +This indicates that there are 58,613 files in the NFS mount (which is slightly more than the number of records in MongoDB). + +``` bash +12:18 $ wc -l file_sizes.txt + 58613 file_sizes.txt +``` + +I haven't investigated yet why this difference exists. However we can take a look at the file sizes of the files on the mount + +``` bash +cat file_sizes.txt | tr -d ' ' | awk -F"[.]/" '{print $1","$2} +``` + +Loading this file into R allows us to calculate the distribution of file sizes + +``` r +library(readr) +d <- read_csv('file_sizes.csv', col_names=c('size', 'filename')) +quantile(d$size, c(.5, .8, .95, .99, 1)) +``` + +``` + 50% 80% 95% 99% 100% +204.00 732.00 2376.00 6031.52 174844.00 +``` + +The median file size is 204k, 95% of all assets are under 2.3Mb and the largest asset is just over 174Mb. From 8bb4c8a05835a9608dd231405dbd7b672740db55 Mon Sep 17 00:00:00 2001 From: Chris Lowis Date: Wed, 9 Aug 2017 11:54:44 +0100 Subject: [PATCH 2/2] Comparing the costs of S3 and EFS A short comparison of the costs of storage for existing assets on S3 and EFS. Fixes: #104 --- docs/costing.md | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 docs/costing.md diff --git a/docs/costing.md b/docs/costing.md new file mode 100644 index 00000000..8ca6045c --- /dev/null +++ b/docs/costing.md @@ -0,0 +1,9 @@ +# Estimated cost of AWS storage + +To store all of the [current assets](existing_assets.md) in Asset Manager and Whitehall would require ~670 GB. S3 storage is currently priced at [$0.023/GB/month on S3](https://aws.amazon.com/s3/pricing/) which equates to ~$15/month. + +Amazon also offers an [Elastic File System (EFS)](https://aws.amazon.com/efs/) in the [Ireland and Frankfurt](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/) availability zones. It appears to have the advantage over EBS in that the volumes scale automatically with the data that is added. As it can be mounted as a file-system to an EC2 instance it potentially offers an alternative for Asset Manager that would require smaller changes to the existing AM codebase (in that the mounted EFS system would appear to the asset manager application as a file system like the current NFS model). + +EFS is more expensive than S3, [currently priced](https://aws.amazon.com/efs/pricing/) at $0.33/GB/month or ~$221/month for ~670GB. + +The cost of serving the assets has not currently been calculated.