Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to benchmark dmdedup #30

Open
venktesh-bolla opened this issue Dec 17, 2016 · 8 comments
Open

how to benchmark dmdedup #30

venktesh-bolla opened this issue Dec 17, 2016 · 8 comments
Labels

Comments

@venktesh-bolla
Copy link

Hi Team,

I would like to benchmark dmdedup as described in documentation/paper published.
In that, somewhere it is stated that "test exercise is done with 40 linux kernels",to see the level of deduplication with dmdedup.
In the process of learning, i want to reproduce the claimed numbers.
Will share the tabulated values as soon as I accomplish it.

Could you please share me some info about it, and shed some light.

Thanks in advance,
Venkatesh.

@sectorsize512 sectorsize512 changed the title Need info!! regarding to benchmark dmdedup abilities.. how to benchmark dmdedup Dec 17, 2016
@venktesh-bolla
Copy link
Author

Any info or update??
-Venktesh.

@sectorsize512
Copy link
Member

What specific info is needed?

@venktesh-bolla
Copy link
Author

Hi Vasily,
I want to benchmark dmdedup. So to do this, I have got a hard-drive with 2 partitions. As stated in the dmdedup documentation, i want to reproduce those numbers or performance or rate of deduplication.

For instance, If I copy 100GB of data(includes several files like linux kernels) to dmdedup device, what is amount of meta data and data partition writes.

In short, I would like to reproduce the numbers which was tabulated in the dmdedup paper. Could you please tell, How exactly you guys did benchmark it?

Thanks in advance,
Venkatesh

@sectorsize512
Copy link
Member

Hi, as the paper describes on page 10:

"Linux kernels (Figure 6). This dataset contains the source code of 40
Linux kernels from version 2.6.0 to 2.6.39, archived in a single tarball.
We first used an unmodified tar , which aligns files on 512B bound-
aries ( tar-512). In this case, the tarball size was 11GB and the
deduplication ratio was 1.18. We then modi- fied tar to align files on
4KB boundaries ( tar-4096). In this case, the tarball size was 16GB
and the dedu- plication ratio was 1.88. Dmdedup uses 4KB chunking, which is
why aligning files on 4KB boundaries increases the deduplication ratio. One
can see that although tar- 4096 produces a larger logical tarball, its physical
size (16GB / 1.88 = 8.5GB) is actually smaller than the tar- ball produced
by tar-512 (11GB / 1.18 = 9 = 9.3GB"

We just used dd command to write corresponding data to dmdedup.

A word of warning - we did not use two paritions of the same HDD. Instead, we used a separte SSD for metadata. The paper has details on it:

https://www.fsl.cs.sunysb.edu/docs/ols-dmdedup/dmdedup-ols14.pdf

HTH,
Vasily

@venktesh-bolla
Copy link
Author

Thanks alot, that really helped.
How can I create a tar ball with alignment??

@Oliver-Luo
Copy link

Another question: How do you test the random write? For seq write, dd is enough, though you still need another device to store the data and read from that device into dmdedup device. But I don't know how do you test the random write. Do you write a program to do that?

By the way, I'm still wandering how to create a tar ball with alignment as well. Would be helpful if you got any clue.

Thanks.

@sectorsize512
Copy link
Member

We used Filebench and modified it to generate data with required deduplication ratio. I'm attaching old FB patch to give you a sense.

I'm attaching tar patch to this post as well.

@sectorsize512
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants