how to benchmark dmdedup #30

venktesh-bolla · 2016-12-17T06:34:58Z

Hi Team,

I would like to benchmark dmdedup as described in documentation/paper published.
In that, somewhere it is stated that "test exercise is done with 40 linux kernels",to see the level of deduplication with dmdedup.
In the process of learning, i want to reproduce the claimed numbers.
Will share the tabulated values as soon as I accomplish it.

Could you please share me some info about it, and shed some light.

Thanks in advance,
Venkatesh.

venktesh-bolla · 2016-12-21T12:56:54Z

Any info or update??
-Venktesh.

sectorsize512 · 2016-12-28T23:14:59Z

What specific info is needed?

venktesh-bolla · 2016-12-29T04:15:43Z

Hi Vasily,
I want to benchmark dmdedup. So to do this, I have got a hard-drive with 2 partitions. As stated in the dmdedup documentation, i want to reproduce those numbers or performance or rate of deduplication.

For instance, If I copy 100GB of data(includes several files like linux kernels) to dmdedup device, what is amount of meta data and data partition writes.

In short, I would like to reproduce the numbers which was tabulated in the dmdedup paper. Could you please tell, How exactly you guys did benchmark it?

Thanks in advance,
Venkatesh

sectorsize512 · 2016-12-29T16:00:23Z

Hi, as the paper describes on page 10:

"Linux kernels (Figure 6). This dataset contains the source code of 40
Linux kernels from version 2.6.0 to 2.6.39, archived in a single tarball.
We first used an unmodified tar , which aligns files on 512B bound-
aries ( tar-512). In this case, the tarball size was 11GB and the
deduplication ratio was 1.18. We then modi- fied tar to align files on
4KB boundaries ( tar-4096). In this case, the tarball size was 16GB
and the dedu- plication ratio was 1.88. Dmdedup uses 4KB chunking, which is
why aligning files on 4KB boundaries increases the deduplication ratio. One
can see that although tar- 4096 produces a larger logical tarball, its physical
size (16GB / 1.88 = 8.5GB) is actually smaller than the tar- ball produced
by tar-512 (11GB / 1.18 = 9 = 9.3GB"

We just used dd command to write corresponding data to dmdedup.

A word of warning - we did not use two paritions of the same HDD. Instead, we used a separte SSD for metadata. The paper has details on it:

https://www.fsl.cs.sunysb.edu/docs/ols-dmdedup/dmdedup-ols14.pdf

HTH,
Vasily

venktesh-bolla · 2016-12-29T16:12:31Z

Thanks alot, that really helped.
How can I create a tar ball with alignment??

Oliver-Luo · 2017-03-23T12:39:43Z

Another question: How do you test the random write? For seq write, dd is enough, though you still need another device to store the data and read from that device into dmdedup device. But I don't know how do you test the random write. Do you write a program to do that?

By the way, I'm still wandering how to create a tar ball with alignment as well. Would be helpful if you got any clue.

Thanks.

sectorsize512 · 2017-03-31T17:02:27Z

We used Filebench and modified it to generate data with required deduplication ratio. I'm attaching old FB patch to give you a sense.

I'm attaching tar patch to this post as well.

sectorsize512 · 2017-03-31T17:04:34Z

filebench-data-gen.diff.txt
tar-4096-blocksize.diff.txt

sectorsize512 added the question label Dec 17, 2016

sectorsize512 changed the title ~~Need info!! regarding to benchmark dmdedup abilities..~~ how to benchmark dmdedup Dec 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to benchmark dmdedup #30

how to benchmark dmdedup #30

venktesh-bolla commented Dec 17, 2016

venktesh-bolla commented Dec 21, 2016

sectorsize512 commented Dec 28, 2016

venktesh-bolla commented Dec 29, 2016

sectorsize512 commented Dec 29, 2016

venktesh-bolla commented Dec 29, 2016

Oliver-Luo commented Mar 23, 2017

sectorsize512 commented Mar 31, 2017

sectorsize512 commented Mar 31, 2017

how to benchmark dmdedup #30

how to benchmark dmdedup #30

Comments

venktesh-bolla commented Dec 17, 2016

venktesh-bolla commented Dec 21, 2016

sectorsize512 commented Dec 28, 2016

venktesh-bolla commented Dec 29, 2016

sectorsize512 commented Dec 29, 2016

venktesh-bolla commented Dec 29, 2016

Oliver-Luo commented Mar 23, 2017

sectorsize512 commented Mar 31, 2017

sectorsize512 commented Mar 31, 2017