Skip to content

MongoShake Performance Document

Vinllen Chen edited this page Jul 4, 2018 · 3 revisions

This article is a partial performance test document for MongoShake, which gives test data in different cases.

Environment

Both the source MongoDB, destination MongoDB and MongoShake are located in the same hosts. The following shows the configuration:

  • CPU:64 Core,Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
  • Memory:32*16G
  • Network Card:20Gbps
  • OS:Linux x86_64
  • MongoDB version: 3.4.15
  • Hard disk: SSD
  • Go version: 1.10.3

topology

Above figure shows the topology of our experiment deployment where tunnel type is “direct” so that MongoShake write data into target MongoDB.
The source MongoDB type is replica-set or sharding, while the target MongoDB type is ReplicaSet. There is only one Mongod instance in every replica-set in our experiment. For convenience, there is only insertion operation in the oplog.
In our pre-test, the performance limit of target MongoDB writing QPS with 8 threads is about 70,000 when single oplog inserted, 540,000 when multiple insertions batched together. When adjusting the threads to 64, the QPS is about 80,000 when single oplog inserted while 610,000 when multiple.

Test Condition

The test data covers the following dimensions: latency, QPS, CPU usage, and memory usage. All the values are given by the average of 60 seconds. Latency calculates the time difference between data is inserted in the source database and the same data copied in the destination database, the source timestamp is from the key “ts” in source oplog while target timestamp is also from the key “ts” in the oplog generated in the target side, however, it’s not an easy job to calculate time difference of oplog that generated in two hosts which have different time clock, so we choose topology that source MongoDB and destination MongoDB locate in the same host; QPS is got from the RESTful API of MongoShake which counts oplog writing number every second; Workload marks data distribution; We also give the CPU and memory usage.
The test variable including source MongoDB type, tunnel type, the workload of data in the source database, worker concurrency, and executor concurrency.

Test

Case 1

In this case, tunnel type is “mock” which will throw away all the data so we can measure the QPS of reading and handle in the pipeline. There is 1 db with 5 collections in the source database, the data distribution in the database is relatively average, about 24 million items per collection. For convenience, all the oplog operation is the insertion. Each document includes 5 columns and the total size of each oplog document is about 220 bytes.

Variable Value
Source MongoDB type replica set
Unique index/Shard key No; Hash by id.
Workload A total of 5 collections in one db, each document includes 5 columns and the total size of each oplog document is about 220 bytes.
Tunnel type mock
Worker concurrency 8

Here comes the result of measurement:

Measurement Value
QPS 424,251
CPU usage 1175%
Memory usage About 0%, 130MB

The QPS is about 420 thousands while the CPU usage is 1175%. The memory usage is about 0.0% because data is thrown away in the tunnel so that the queue usage is very low.

Case 2

In this case, we change the workload. There is still 1 db with 5 collections but only 1 column in each document and the size of the document is about 180 bytes.

Variable Value
Source MongoDB type replica set
Unique index/Shard key No; Hash by id.
Workload A total of 5 collections in one db, each document includes 5 columns and the total size of each oplog document is about 180 bytes.
Tunnel type mock
Worker concurrency 8

Here comes the result of measurement:

Measurement Value
QPS 545,117
CPU usage 694%
Memory usage About 0%, 43MB

Compared to case 1, case 2’s QPS is higher, and CPU and memory usage are lower. It is because the size of oplog is smaller so that the cost of de-serialization is lower.

Case 3

Compare to case 1, we change tunnel type to “direct” which will write data into target MongoDB directly. What needs to be emphasized is that MongoShake will merge the continuously oplogs that has same namespace and same operation into one before writing. But in this case, all operation in each collection is very average which means not conducive to merge before writing.

Variable Value
Source MongoDB type replica set
Unique index/Shard key No; Hash by id.
Workload A total of 5 collections in one db, each document includes 5 columns and the total size of each oplog document is about 220 bytes. This data is not conducive to merge before writing.
Tunnel type direct
Worker concurrency 8

Here comes the result of measurement:

Measurement Value
QPS 55,705
CPU usage 497%
Memory usage About 0.4%, 1.9 GB

In this case, we can see that QPS is about one eighth of case 1. The bottleneck is the writing speed in the target database.

Case 4

Compare to case 3, we increase the worker concurrency to 64 so that there are 64 worker threads in the pipeline.

Variable Value
Source MongoDB type replica set
Unique index/Shard key No; Hash by id.
Workload A total of 5 collections in one db, each document includes 5 columns and the total size of each oplog document is about 220 bytes. This data is not conducive to merge before writing.
Tunnel type direct
Worker concurrency 64

Here comes the result of measurement:

Measurement Value
QPS 186,755
CPU usage 2256%
Memory usage About 0.4%, 1.9 GB

In this case, we increase the degree of worker concurrency which means the replayer number of writing is also increased. As a result, both the QPS, CPU usage and memory usage is increased.

Case 5

Compare to case 2, we adjust the workload to only 1 db and 1 collection with only 1 field. As an expected, this oplogs are conducive to merge before writing so that the QPS will higher than the value in the case 2.

Variable Value
Source MongoDB type replica set
Unique index/Shard key No; Hash by id.
Workload 1 db, 1 collection with only 1 field.
Tunnel type direct
Worker concurrency 8

Here comes the result of measurement:

Measurement Value
QPS 455,567
CPU usage 1049%
Memory usage About 0.28%, 1.5GB

The QPS is about 471 thousands which is higher than case 3 because the data is more likely to be merged together and then inserted into mongoDB to achive higher result.

Case 6

Compare to case 5, we also increase worker concurrency just like case 4 do.

Variable Value
Source MongoDB type replica set
Unique index/Shard key No; Hash by id.
Workload 1 db, 1 collection with only 1 field.
Tunnel type direct
Worker concurrency 64

Here comes the result of measurement:

Measurement Value
QPS 456,990
CPU usage 1811%
Memory usage About 0.18%, 0.74 GB

The QPS is almost equal to the result shown in Case 4. The result is limited by the writing speed and conflict on the target MongoDB.

Case 7

In this case, we adjust the collection number from 5 to 128 compared to case 3 to reduce the conflicts when inserting in the target MongoDB. In each collection, there are 5 fields.

Variable Value
Source MongoDB type replica set
Unique index/Shard key No; Hash by id.
Workload 1 db, 128 collection with 5 fields each.
Tunnel type direct
Worker concurrency 8

Here comes the result of measurement:

Measurement Value
QPS 63,206
CPU usage 508%
Memory usage About 0.4%, 1.8GB

The QPS value is bigger than case 3 because we ease the writing conflict. In MongoDB, there is a optimistic lock in collection when writing in the same collection concurrently.

Case 8

Compared to case 7, we also increase worker concurrency from 8 to 64. The result will bigger than case 3.

Variable Value
Source MongoDB type replica set
Unique index/Shard key No; Hash by id.
Workload 1 db, 128 collection with 5 fields each.
Tunnel type direct
Worker concurrency 64

Here comes the result of measurement:

Measurement Value
QPS 189,140
CPU usage 1348%
Memory usage About 0.3%, 1.3GB

The value is a little bigger more case 3 but meets our expectation.

Case 9

We adjust the shard_key from “auto” to collection and set worker concurrency to 128.

Variable Value
Source MongoDB type replica set
Unique index/Shard key No; Hash by document.
Workload 1 db, 128 collection with 5 fields each.
Tunnel type direct
Worker concurrency 128

Here comes the result of measurement:

Measurement Value
QPS 279,633
CPU usage 2096%
Memory usage About 0.3%, 1.6GB

Set worker concurrency to collection number will achieve good performance when collection number is bigger and data distribution evenly.

Case 10

In this case, we adjust the source MongoDB from ReplicaSet to Sharding with 8 shards. There are also a total of 5 collections in one db and 5 fields in each table which is same as case 3. This data is also not conducive to merge before writing.

Variable Value
Source MongoDB type sharding
Unique index/Shard key No; Hash by id.
Workload A total of 5 collections in one db, each document includes 5 columns and the total size of each oplog document is about 220 bytes. This data is not conducive to merge before writing.
Tunnel type direct
Worker concurrency 8

Here comes the result of measurement:

Measurement Value
QPS 61,184
CPU usage 518%
Memory usage About 2.7%, 13.8GB

Compared to case 3, the QPS is very close which means no matter ReplicaSet or Sharding type the source database is, we can get similar results.

Case 11

At first, we want to test the performance when unique key exist. However, the QPS will depends on how many unique keys have, how evenly collections distributed, and how many conflicts happens between these unique keys. And in current v1.0.0 open source, “uk” field is not supported. So we don’t give this result.
In general, the QPS value will be slightly smaller than above value because the hash method is switching from “id” to “collection” and in most cases the data is distributed unevenly.

Case 12

So far, we didn’t give the latency value in the above experiments because these experiments are all fetching data at the beginning of oplogs generated at the source database. So that if we want to calculate the latency of a new generated data, we have to wait for all the previous data to be transmitted first. Under these circumstances, the calculated value is inaccurate.
So, we came up an idea to inserting data after all previous data finish transmitting and then calculate the difference between timestamp in the oplog of source database and target database as the latency. However, the time clock in the source database host and target database host maybe different, so we only calculate the latency that source database and target database located in the same host to eliminate this difference.
Latency: less than 1 second.

Clone this wiki locally