-
Notifications
You must be signed in to change notification settings - Fork 7
YCSB on 9.1G Movie Data
rockeet edited this page Aug 23, 2018
·
15 revisions
- 1.Introduction
- 2.Test Method
- 2.1.Hardware
- 3.Write Performance
- 4.Read Performance
- 4.1.Data Much Smaller than Memory(Memory 64GB)
- 4.2.Data Smaller than Memory(Memory 8GB)
- 4.3.Data Lager than Memory(Memory 4GB)
- 4.4.Data Much Larger than Memory(Memory 2GB)
We've embedded TerarkDB into MongoDB's community distribution as storage engine,and will keep publishing benchmark reports in the future.
-
TerarkDB is a storage engine that uses RocksDB with our own
SST (Static Sorted Table)
- Mongo-Rocks is a MongoDB adapter that can uses RocksDB as a storage engine under MongoDB.
-
Mongo-Terocks is a modified
Mongo-Rocks
that usesTerarkDB
. - MongoDB's version is r3.5.4
- Test Tool
- Test Dataset
- Since YCSB's data is generated from random strings, too far away from real world scenarios, so we use Amazon movie data (~8 million reviews) as our test dataset.
- Test Dataset Detail
- About 9.1GB
- About 8 millions records
- About 1KB each record
- Storage Engines
- MongoDB's default storage engine WiredTiger
- Facebook's Mongo-Rocks
- Terark's Mongo-Terocks
- Read testing is under
Uniform Distribution
andZipf Distribution
- We've tested 95/99 percentile latency of reading
- Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- Kingston 16G @ 2133 MHz x4 (64G Total)
- SanDisk SD8SBAT256G1122 (256G SSD)
- Ubuntu 16.04.2 LTS
- The graphs below are Write Speed and 95/99 Percentile Latency:
Before we start read performance testing, we write all data into database, then restart test server and begin our test.
Note that, all of the tests are under Uniform Distribution
and Zipf Distribution
except the one with 2GB
memory.
- The memory limitation is archived by virtual machine.
- RocksDB enabled
allow_mmap_reads
option,BlockSize
is 4k. - WiredTiger and TerarkDB uses default configurations.
- These graphs are Compressed Data Size and Memory Usage:
- Since the compression ratio is not related to the memory of the reading test, we won't report it in the rest of this article.
- All the tests behind will use the same dataset.
- YCSB client uses 240% CPU in the whole process.
- Under this scenario, data is a little smaller than memory, we set database cache 4G (to cache de-compressed data).
- (Wiredtiger and RocksDB both recommend use half of the physical memory as cache).
- TerarkDB only needs 2.84G memory,much smaller than 8G, read performance is not affected.
- 95/99 percentile latency of reading is under
Uniform Distribution
- Data is a little larger than memory, we set the database cache to 2G
- TerarkDB only needs 2.84G memory,much smaller than 4G, so TerarkDB is not affected.
- 95/99 percentile latency of reading is under
Uniform Distribution
- All engines don't have enough memory and will be affected
- Bottleneck is file system IO, all engines' performance drop down rapidly
- 95/99 percentile latency of reading is under
Uniform Distribution
- 1.前言
- 2.测试方式
- 2.1.硬件配置
- 3.写性能
- 4.读性能
- 4.1.数据远小于内存(内存64GB)
- 4.2.数据略小于内存(内存8GB)
- 4.3.数据略大于内存(内存4GB)
- 4.4.数据远大于内存(内存2GB)
我们将 TerarkDB 集成到了 MongoDB 社区版中,后续我们会逐步发布性能测试报告。
- TerarkDB 是我们替换了 RocksDB 的 SST (Static Sorted Table) 后的产品
- Mongo-Rocks 是 Facebook 官方适配 RocksDB 作为 MongoDB 存储引擎的产品
- Mongo-Terocks 指使用 TerarkDB 的 Mongo-Rocks
- MongoDB 版本为 r3.5.4
- 测试工具
- 测试数据
- 由于YCSB的数据都是纯随机字符串生成的,离用户的真实场景相差较大,我们采用了Amazon movie data (~8 million reviews)数据进行测试
- 测试数据集尺寸
- 约为9.1GB
- 约800万条数据
- 平均每条数据大约1KB
- 测试使用的引擎
- MongoDB 默认存储引擎 WiredTiger
- Facebook 的 Mongo-Rocks
- Terark 的 Mongo-Terocks
- 读性能测试均是均匀分布与齐普夫(Zipf)分布测试
- 这里测量了读95/99分位延迟数据
- Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- Kingston 16G @ 2133 MHz x4 (64G Total)
- SanDisk SD8SBAT256G1122 (256G SSD)
- Ubuntu 16.04.2 LTS
- 以下为写入速度与95/99分位延迟数据:
我们在开始读性能测试之前,首先批量的将所有数据写入数据库,然后重启服务器后开始测试。需要注意的是,除了数据远小于内存,其它的的读测试均是均匀分布与齐普夫分布测试。
- 内存受限情况,我们使用虚拟机达成
- 其中 RocksDB 开启 allow_mmap_reads 选项,BlockSize 为 4k
- WiredTiger 与 TerarkDB 使用默认配置选项
-
以下为数据压缩后大小与内存占用:
-
由于压缩后数据库的尺寸(Storage Size)与读测试的内存限制无关,后面不再重复 Storage Size 图表
-
后续所有测试都使用同一份数据
-
YCSB客户端全程占用 240% 以上CPU
- 此种情况下内存比数据略大,设置数据库专用缓存(缓存解压后的数据) 4G
- (Wiredtiger 和 RocksDB 官方均推荐配置该缓存占物理内存一半)
- TerarkDB 需要的内存只有 2.84G,远小于8G,不影响性能
- 读95/99分位延迟数据为均匀分布测试结果
- 此种情况下内存比数据略大,设置缓存2G
- TerarkDB 需要的内存只有 2.84G,远小于4G,不影响性能
- 读95/99分位延迟数据为均匀分布测试结果
- 此种情况下所有存储引擎都达不到需要的内存
- 瓶颈在于文件IO,所有引擎的速度严重下降
- 读95/99分位延迟数据为均匀分布测试结果