Skip to content

HDD tests selection

Kostiantyn Danylov aka koder edited this page Apr 30, 2015 · 8 revisions

The next test parameters are selected:

Common parameters:

  • vm_count = 1 (no VM, only local host)
  • number retries = 7 (under disscussion)
  • size = 10G
  • ramp_time = 5
  • runtime = 30

Tests:

  • 4k, randwrite, sync, th_count={1, 5, 10, 15, 20, 30, 40, 80}
  • 4k, randread, direct, th_count={1, 5, 10, 15, 20, 30, 40, 80}
  • 4k, randwrite, direct, th_count=1
  • 1m, read, direct, th_count=1
  • 1m, write, direct, th_count=1

Selection reasoning

Absence of async mode

In async mode all requests are goes to RAM buffer first, and then, eventually, hit the disk. Write performance consists of two main stages - buffers filling, when write performance depends only on CPU and RAM performance and second stage, when buffers are filled and no new data can be written into buffer until some old data gets written to disk from buffer. There two processes are happened with write requests before they goes from buffer to driver write queue:

  • Reordering
  • Merging

First of requests are reordered to avoid large HDD heads movements, also if there two requests, for consequence parts of disk - they would be merged into one single request. Possibility to merge two blocks are BUFFER_SIZE / TEST_FILE_SIZE in case of even distribution and larger in case of any real random generator. There also OS buffer manager involved to decide which block should be written first.

As results:

  • We need to skip first buffer-fill stage, better using large enough ramp_time value (~1min)
  • When buffer was filled - performance would be equals to direct write, as direct allows to reorder requests as optimal, as async due to large driver reorder queue. The only real difference is in requests merging, which isn't relay on disk performance, but only on you disk cache size, file size and random distribution in load tool.

That's mean that we can estimate async performance from direct performance quite reliable.

ASYNC_IOPS_4k = DIRECT_IOPS_4k * (1 + BUFFER_SIZE / TEST_FILE_SIZE), if (BUFFER_SIZE / TEST_FILE_SIZE) << 1 (<< means much smaller)

ASYNC_BW_1M = DIRECT_BW_1M, as this is maximum write performance and merging has no effects on linear operations.

NEED TO ATTACH DIRECT VS ASYNC PERFORMANCE COMPARISON PLOT


Absence of block other sizes, than 4k and 1M

In short: request process time can be calculated with next formula with high level of confidence

TIME = 1 / IOPS = SEEK_TIME + BLOCK_SIZE / INTERNAL_BANDWIDTH

This formula has only two parameters - SEEK_TIME and INTERNAL_BANDWIDTH, which can be determined using IOPS for two block sizes. Better to use small and large ones to decrease influence of measurement errors.

https://mirantis.jira.com/wiki/display/MOL/Procedure+and+tests+selection

Only one thread for direct write

Direct mode means, that data goes into drives IO queue directly, bypassing FS cache. Write process don't waits for requests to be written on disk and continue execution after request was placed into driver queue. In case if queue is full - process blocked until there would be a space for new request. Typical size of driver queue and SATA NCQ HDD queue is smallest from ~100 requests or 8-64MiB size. Single thread fills driver and disk write queue completely in direct mode in couple of milliseconds. Additional threads don't increase IOPS, latency increases linearly with increasing thread count.

NEED TO ATTACH PLOT HERE

Thread limitation for read/sync write modes

Read and sync write performance (IOPS/BW) increased when more threads operate on same device in parallel. Latency increased as well. This happened due to requests reordering, described above in 'Absence of async mode' section. In sync mode each thread/process waits till HDD acknowledge that request data successfully written on disk and don't issue new request until acknowledge received. That means that in case of single thread driver and HDD has no chance to optimize head movement. With X thread (X-1) requests are waiting in queue and can be reordered. Yet with increasing X reordering optimization became smaller and smaller. Usually after ~100 thread there no improvement anymore. Along with increasing IOPS latency also increases. Somewhere around 100 threads latency start increasing linearly - that mean that new requests justs gets queued, but don't improve performance anymore.

NEED TO ATTACH PLOT HERE

Only one thread for sequential operations

With two or more threads linear operation would became random, as write requests from different threads would interleaves.

File size

File size should be large enough to make possibility of merging write requests in direct mode negligible.

100(avg IO queue) * (FILE_SZ / BLOCK_SZ) << 1. 10Gb is more then enough.

Test run and rump-up times
Number of retries
Results distribution

Distribution is tested on couple of HDD and found to be very close to normal. NEED TO ATTACH PLOT AND DISTRIBUTION PROPERTIES HERE