proposal footprint benchmarks

Joseph Heck edited this page Mar 9, 2016 · 2 revisions

RackHD Footprint benchmarking

(from ORFS-140)

Background

We want to consider future work for reducing the resource footprint for running RackHD as a system to make it more amenable to running in a top of rack switch or other "constrained" environments. To enable that, we need clear measurements of where to make improvements and a benchmark to measure improvement against - a means of getting those and clear marker for success.

Goals

  • create repeatable benchmarks to measure memory and CPU footprint within a consistent system for a basic set of tasks (for example, maybe a 5 node discovery, installation of an OS, monitoring IPMI)
  • pick 1 benchmark set (configuration, run-through, etc) as an initial baseline for footprint comparison work
  • measure each process independently - CPU, memory consumed, disk IO, network IO - plot over lifetime of tasks
    • analysis should also measure mongodb and rabbitmq
  • measure disk consumed in mongodb
  • come up with metrics to compare each so we know where we’re consuming the most resources
  • process needs to be easily repeatable with distinct metrics from each run to compare as we change over time
  • leverage integration test scripts to run while profiling system

API

Notes

  • Using node 4 debuggers to profile real time CPU performance/utilization stats.

Monitor MongoDB disk consumption

  1. MongoDB disk structure Every MongoDB instance consists of a namespace file, journal files and data files. A “namespace” is the concatenation of the database name and the collection names with a period character in between. Data files is preallocated, with 64M as db-name.0 by default and 128M as db-name.1 if needed, which can be extended to 2G at maximum. User can config this preallocated file size to be smaller. Each data file is made up of multiple extents. Some concept that will be used in footprint benchmark:
  • Extends Extents are logical containers within data files used to store documents (data) and indexes.
  • dataSize The dataSize metric is the sum of the sizes (in bytes) of all the documents and padding stored in the database. (Padding: a small amount of extra space for a document. It reduces the likelihood that a slight increase in document size will cause the document to exceed its allocated record size.)
  • storageSize The storageSize metric is equal to the size (in bytes) of all the data extents in the database.
  • fileSize The fileSize metric is equal to the size (in bytes) of all the data extents, index extents and yet-unused space (in data files) in the database.

Details can be found here: how big is your mongodb Though disk occupied by MongoDB is fixed as it is preallocated, it is still valuable to measure the actual disk size that is used, as we can change the config if possible. As shown above, dataSize is the one that is directly manipulated by RackHD code, and we will monitor this change.

  1. Find preallocated disk consumption in RackHD
  • Get MongoDB configuration Configuration can be found from
    vi /etc/mongodb.conf
    
    From which data file path is
    
    # mongodb.conf
    # Where to store the data.
    dbpath=/var/lib/mongodb
    
  • Look into the directory
$ll -h /var/lib/mongodb/
total 497M
drwxr-xr-x  3 mongodb mongodb 4.0K Feb 29 05:03 ./
drwxr-xr-x 44 root    root    4.0K Mar  4 09:53 ../
drwxr-xr-x  2 mongodb nogroup 4.0K Feb 29 05:16 journal/
-rw-------  1 mongodb nogroup  64M Feb 29 05:15 local.0
-rw-------  1 mongodb nogroup  16M Feb 29 05:15 local.ns
-rwxr-xr-x  1 mongodb nogroup    5 Feb 29 05:15 mongod.lock*
-rw-------  1 mongodb nogroup  64M Mar  1 06:22 onserve.0
-rw-------  1 mongodb nogroup 128M Jan 13 12:14 onserve.1
-rw-------  1 mongodb nogroup  16M Mar  1 06:22 onserve.ns
-rw-------  1 mongodb nogroup  64M Mar  8 05:59 pxe.0
-rw-------  1 mongodb nogroup 128M Dec 23 16:48 pxe.1
-rw-------  1 mongodb nogroup  16M Mar  7 09:51 pxe.ns
  ```
  $du -h --max-depth=0 /var/lib/mongodb/journal/
  3.1G    /var/lib/mongodb/journal/
  ```
  It can verify that the MongoDB preallocated data file is 64M -> 128M by default, the namesapce file (*.ns) is a fixed 16M, and journals take a large room (3.1G).
  1. Change preallocated disk size There is "smallfiles" configuration in MongoDB. It reduces the initial size for data files and limits the maximum size to 512 megabytes, and also reduces the size of each journal file from 1 gigabyte to 128 megabytes. Use this setting if you have a large number of databases that each holds a small quantity of data. Use onserve db as an example:
  • Delete onserve db
    mongo
    use onserve
    db.dropDatabase()
    exit
    
  • Stop MongoDB daemon
    sudo killall mongod
    
  • Add "smallfile" setting in config file (/etc/mongodb.conf)
    smallfiles=true
    
  • Remove journals
    sudo rm -rf /var/lib/mongodb/journals
    
  • Restart MongoDB (the user is not correct, just to verify the file size can be changed)
    sudo /usr/bin/mongod -f /etc/mongodb.conf
    
  • Restart onserve to recreate database
    sudo service onrack-conductor restart
    
  • Check the size of data file in onserve database
$ ll -h /var/lib/mongodb/
total 353M
drwxr-xr-x  4 mongodb mongodb 4.0K Mar  8 06:39 ./
drwxr-xr-x 42 root    root    4.0K Feb 28 06:47 ../
drwxr-xr-x  2 root    root    4.0K Mar  8 06:38 journal/
-rw-------  1 mongodb nogroup  64M Mar  8 06:38 local.0
-rw-------  1 mongodb nogroup  16M Mar  8 06:38 local.ns
-rwxr-xr-x  1 mongodb nogroup    5 Mar  8 06:38 mongod.lock*
-rw-------  1 root    root     16M Mar  8 06:39 onserve.0
-rw-------  1 root    root     32M Mar  8 06:39 onserve.1
-rw-------  1 root    root     16M Mar  8 06:39 onserve.ns
-rw-------  1 mongodb nogroup  64M Feb 22 08:33 pxe.0
-rw-------  1 mongodb nogroup 128M Feb 21 13:51 pxe.1
-rw-------  1 mongodb nogroup  16M Feb 22 08:33 pxe.ns
drwxr-xr-x  2 root    root    4.0K Mar  8 06:39 _tmp/

  ```
Onserve database has decreased to 16M+32M. Hence we can measure the actual disk size that MongoDB in RackHD used. If it didn't take much space, "smallfiles" can be applied to shrink the disk occupation
4. Tools to measure actual disk consumption: [dbStats](https://docs.mongodb.org/manual/reference/command/dbStats/)

$ mongo MongoDB shell version: 2.4.9 connecting to: test

use pxe switched to db pxe db.stats() { "db" : "pxe", "collections" : 14, "objects" : 877, "avgObjSize" : 7107.115165336374, "dataSize" : 6232940, "storageSize" : 12083200, "numExtents" : 26, "indexes" : 17, "indexSize" : 155344, "fileSize" : 201326592, "nsSizeMB" : 16, "dataFileVersion" : { "major" : 4, "minor" : 5 }, "ok" : 1 }

dataSize + indexSize + nsSize + journalSize(got from "du" command) is the size that RackHD is currently using, where **dataSize** is the one that is directly manipulated by RackHD code.
5. Corresponding Pyhton API
[Pymongo](https://api.mongodb.org/python/current/index.html) is the recommended library to work with MongoDB from Python. db.stats() command can issue in Python as:

from pymongo import MongoClient client = MongoClient() db = client['pxe'] stats_ret = db.command('dbstats', 1, scale = 1024)

The stats_ret would be in the unit of 1024 bytes:

	stats_ret = {
	    u'storageSize': 11800,
	    u'ok': 1.0,
	    u'avgObjSize': 7595.496332518337,
	    u'dataFileVersion': {
	        u'major': 4,
	        u'minor': 5
	    },
	    u'db': u'pxe',
	    u'indexes': 17,
	    u'objects': 818,
	    u'collections': 14,
	    u'fileSize': 196608,
	    u'numExtents': 26,
	    u'dataSize': 6067,
	    u'indexSize': 151,
	    u'nsSizeMB': 16
	}

Thus disk consumption data can be extracted from the above structure.


    
	
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.