Permalink
Browse files

Better latency page. Use of huge tables moved into appendix as it is …

…confusing for most non advanced users. Fork time table added.
  • Loading branch information...
1 parent 66d5276 commit 18caa09945d079be04de8fec5b15d8b5b553c25e @antirez committed Apr 12, 2012
Showing with 91 additions and 67 deletions.
  1. +91 −67 topics/latency.md
View
@@ -112,9 +112,9 @@ a sign that slow commands are used.
Latency generated by fork
-------------------------
-Depending on the chosen persistency mechanism, Redis has to fork background
-processes. The fork operation (running in the main thread) can induce latency
-by itself.
+In order to generate the RDB file in background, or to rewrite the Append Only File
+if AOF persistence is enabled, Redis has to fork background processes.
+The fork operation (running in the main thread) can induce latency by itself.
Forking is an expensive operation on most Unix-like systems, since it involves
copying a good number of objects linked to the process. This is especially
@@ -131,73 +131,22 @@ which will involve allocating and copying 48 MB of memory. It takes time
and CPU, especially on virtual machines where allocation and initialization
of a large memory chunk can be expensive.
-Some CPUs can use different page size though. AMD and Intel CPUs can support
-2 MB page size if needed. These pages are nicknamed *huge pages*. Some
-operating systems can optimize page size in real time, transparently
-aggregating small pages into huge pages on the fly.
-
-On Linux, explicit huge pages management has been introduced in 2.6.16, and
-implicit transparent huge pages are available starting in 2.6.38. If you
-run recent Linux distributions (for example RH 6 or derivatives), transparent
-huge pages can be activated, and you can use a vanilla Redis version with them.
-
-This is the preferred way to experiment/use with huge pages on Linux.
-
-Now, if you run older distributions (RH 5, SLES 10-11, or derivatives), and
-not afraid of a few hacks, Redis requires to be patched in order to support
-huge pages.
-
-The first step would be to read [Mel Gorman's primer on huge pages](http://lwn.net/Articles/374424/)
-
-There are currently two ways to patch Redis to support huge pages.
-
-+ For Redis 2.4, the embedded jemalloc allocator must be patched.
-[patch](https://gist.github.com/1171054) by Pieter Noordhuis.
-Note this patch relies on the anonymous mmap huge page support,
-only available starting 2.6.32, so this method cannot be used for older
-distributions (RH 5, SLES 10, and derivatives).
-
-+ For Redis 2.2, or 2.4 with the libc allocator, Redis makefile
-must be altered to link Redis with
-[the libhugetlbfs library](http://libhugetlbfs.sourceforge.net/).
-It is a straightforward [change](https://gist.github.com/1240452)
-
-Then, the system must be configured to support huge pages.
-
-The following command allocates and makes N huge pages available:
+Fork time in different systems
+------------------------------
- $ sudo sysctl -w vm.nr_hugepages=<N>
-
-The following command mounts the huge page filesystem:
-
- $ sudo mount -t hugetlbfs none /mnt/hugetlbfs
-
-In all cases, once Redis is running with huge pages (transparent or
-not), the following benefits are expected:
-
-+ The latency due to the fork operations is dramatically reduced.
- This is mostly useful for very large instances, and especially
- on a VM.
-+ Redis is faster due to the fact the translation look-aside buffer
- (TLB) of the CPU is more efficient to cache page table entries
- (i.e. the hit ratio is better). Do not expect miracle, it is only
- a few percent gain at most.
-+ Redis memory cannot be swapped out anymore, which is interesting
- to avoid outstanding latencies due to virtual memory.
+Modern hardware is pretty fast to copy the page table, but Xen is not.
+The problem with Xen is not virtualization-specific, but Xen-specific. For instance
+The using VMware or Virutal Box does not result into slow fork time.
+The following is a table that comprares fork time for difference Redis instance
+size. Data is obtained performing a BGSAVE and looking at the `latest_fork_usec` filed in the `INFO` command output.
-Unfortunately, and on top of the extra operational complexity,
-there is also a significant drawback of running Redis with
-huge pages. The COW mechanism granularity is the page. With
-2 MB pages, the probability a page is modified during a background
-save operation is 512 times higher than with 4 KB pages. The actual
-memory required for a background save therefore increases a lot,
-especially if the write traffic is truly random, with poor locality.
-With huge pages, using twice the memory while saving is not anymore
-a theoretical incident. It really happens.
-
-The result of a complete benchmark can be found
-[here](https://gist.github.com/1272254).
+* **Linux beefy VM on VMware** 6.0GB RSS forked in 77 milliseconds (12.8 milliseconds per GB).
+* **Linux running on physical machine (Unknown HW)** 6.1GB RSS forked in 80 milliseconds (13.1 milliseconds per GB)
+* **Linux running on physical machine (Xeon @ 2.27Ghz)** .9GB RSS forked into 62 millisecodns (9 milliseconds per GB).
+* **Linux VM on EC2 (Xen)** 6.1GB RSS forked in 1460 milliseconds (239.3 milliseconds per GB).
+* **Linux VM on Linode (Xen)** 0.9GBRSS forked into 382 millisecodns (424 milliseconds per GB).
+As you can see a VM running on Xen has a performance hit that is between one order to two orders of magnitude. We believe this is a severe problem with Xen and we hope it will be addressed ASAP.
Latency induced by swapping (operating system paging)
-----------------------------------------------------
@@ -528,3 +477,78 @@ The following is an example of what you'll see printed in the log file once the
Note: in the example the **DEBUG SLEEP** command was used in order to block the server. The stack trace is different if the server blocks in a different context.
If you happen to collect multiple watchdog stack traces you are encouraged to send everything to the Redis Google Group: the more traces we obtain, the simpler it will be to understand what the problem with your instance is.
+
+APPENDIX A: Experimenting with huge pages
+-----------------------------------------
+
+Latency introduced by fork can be mitigated using huge pages at the cost of a bigger memory usage during persistence. The following appeindex describe in details this feature as implemented in the Linux kernel.
+
+Some CPUs can use different page size though. AMD and Intel CPUs can support
+2 MB page size if needed. These pages are nicknamed *huge pages*. Some
+operating systems can optimize page size in real time, transparently
+aggregating small pages into huge pages on the fly.
+
+On Linux, explicit huge pages management has been introduced in 2.6.16, and
+implicit transparent huge pages are available starting in 2.6.38. If you
+run recent Linux distributions (for example RH 6 or derivatives), transparent
+huge pages can be activated, and you can use a vanilla Redis version with them.
+
+This is the preferred way to experiment/use with huge pages on Linux.
+
+Now, if you run older distributions (RH 5, SLES 10-11, or derivatives), and
+not afraid of a few hacks, Redis requires to be patched in order to support
+huge pages.
+
+The first step would be to read [Mel Gorman's primer on huge pages](http://lwn.net/Articles/374424/)
+
+There are currently two ways to patch Redis to support huge pages.
+
++ For Redis 2.4, the embedded jemalloc allocator must be patched.
+[patch](https://gist.github.com/1171054) by Pieter Noordhuis.
+Note this patch relies on the anonymous mmap huge page support,
+only available starting 2.6.32, so this method cannot be used for older
+distributions (RH 5, SLES 10, and derivatives).
+
++ For Redis 2.2, or 2.4 with the libc allocator, Redis makefile
+must be altered to link Redis with
+[the libhugetlbfs library](http://libhugetlbfs.sourceforge.net/).
+It is a straightforward [change](https://gist.github.com/1240452)
+
+Then, the system must be configured to support huge pages.
+
+The following command allocates and makes N huge pages available:
+
+ $ sudo sysctl -w vm.nr_hugepages=<N>
+
+The following command mounts the huge page filesystem:
+
+ $ sudo mount -t hugetlbfs none /mnt/hugetlbfs
+
+In all cases, once Redis is running with huge pages (transparent or
+not), the following benefits are expected:
+
++ The latency due to the fork operations is dramatically reduced.
+ This is mostly useful for very large instances, and especially
+ on a VM.
++ Redis is faster due to the fact the translation look-aside buffer
+ (TLB) of the CPU is more efficient to cache page table entries
+ (i.e. the hit ratio is better). Do not expect miracle, it is only
+ a few percent gain at most.
++ Redis memory cannot be swapped out anymore, which is interesting
+ to avoid outstanding latencies due to virtual memory.
+
+Unfortunately, and on top of the extra operational complexity,
+there is also a significant drawback of running Redis with
+huge pages. The COW mechanism granularity is the page. With
+2 MB pages, the probability a page is modified during a background
+save operation is 512 times higher than with 4 KB pages. The actual
+memory required for a background save therefore increases a lot,
+especially if the write traffic is truly random, with poor locality.
+With huge pages, using twice the memory while saving is not anymore
+a theoretical incident. It really happens.
+
+The result of a complete benchmark can be found
+[here](https://gist.github.com/1272254).
+
+
+

0 comments on commit 18caa09

Please sign in to comment.