Redis at Disqus
We've been running Redis in production for a bit over three months now on projects ranging from critical (paying premium users) to experimental (in-house). Just as Will at Bump was inspired by someone else, I was inspired by Will to do a write up on what we're up to.
Our entire premium analytics service runs on Redis. These stats are, as you'd expect, time-based. In order to squeeze this problem into a key/value store, we just append date-string suffixes to the keys: "comments_by_day:disqus:disqus:2011-01-01". If we split on ":" we have 4 pieces: the stat name, the account, the site and the date. (I'll go into the purpose of the account piece shortly.) Having the stats split by day means it's easy to calculate totals (or averages, unions, etc.) over arbitrary user-defined time periods. For example, a user could select the time range "Jan 1, 2011 - Feb 4, 2011", and we can give them accurate stats simply by summing the results of a mget of each day in the range.
Now, that's all nice and easy, but it's not much of a selling point for Redis over any other key/value store (or even a good RDBMS). The real advantage Redis offers is powerful support for data structures and the operations you'd expect to use on them. The big selling point for us was the availability of sets (sorted and not). We need to track sets of unique users that do certain activities in certain time periods. An example of this would be the set of (unique) users who commented in a single day. Can you store sets pretty easily in other key/value stores? Sure you can. The difference for us is that our sets can easily contain tens of thousands of elements each. The benefit of a store like Redis is that sets are "first class", and you can ask the server to do operations like unions for you. Imagine if a user were to request stats for an arbitrary 17-day period: We would just do a zunionstore of all 17-day sets and get the cardinality of that new set, which is the total number of unique users who posted in that time period. If Redis didn't support sets or set operations, we'd have to fetch all 17-day sets ourselves then calculate it locally. The network I/O alone would kill performance.
So what was the "account" bit in the key all about? Sharding is one of the pains of planning ahead when using Redis. Since we use (and depend on) things like set operations, we can't just arbitrarily split our data across different nodes. What if Jan 1 was on Node1 and Jan 2 on Node2? Doing a set union when the data isn't in memory takes us right back to doing network I/O of entire sets, killing performance. The first obvious step is to shard on sites (so my blog and Disqus' blog might be on different Redis nodes). No harm there, an easy win. But what if we're talking about Big Newspaper Inc., who have one Disqus account but 20 distinct Disqus sites? We could easily provide them stats per site with the current sharding scheme, but that's no fun. We wanted to offer "network-wide" stats. That is, the ability to see things like total comments, upvotes, replies, and even unique users across all of their sites at once, and for any arbitrary time period. That means all of a user's sites need to be hosted on the same Redis node. This is actually pretty trivial once you decide to do it (we just take the modulo of the owning user's ID against the number of nodes we have to decide which node to read/write from/to), and the bit in the key isn't strictly necessary, but we like to have it in there just incase we need to reason about them later.
In short, we attempted to migrate our sessions from PostgreSQL to Redis but failed. Redis started as an in-memory database but added a virtual memory backend in (if I recall correctly) the 2.0 branch. While the VM backend helped, we found that it still wouldn't stay within the bounds we set, and would continually grow no matter what we set. We did report the issue but never came to a good solution in time. For example, we could give Redis an entire 12GB server and set the VM to 4GB, and given enough time (under high load, mind you) it would climb well above 12GB and start to swap, more or less killing our site. Since we didn't have a need for the powerful Redis datastructures here, we eventually decided to go with Membase because its main purpose is a simple disk-backed key/value store.
We look forward for Redis 2.4's "diskstore" which aims to solve the issues the VM backend has had.
We have many other counters and sets for all sites and users in Disqus network, outside of the complex and rich premium analytics data. There's nothing too fancy here, but again Redis has been our choice (often migrating away from PostgreSQL) when we need a high write capacity. As most (all?) large sites eventually find, our master database can be a bottleneck, and using SQL to increment rows that are already essentially key/value stores can be a waste. When (and only when!) it's more natural, we move denormalized counters and other data to Redis.
Lately we've been pushing the limits further with some in-house (maybe soon to be user-facing) projects that do all of the above at much larger and faster rates. I've been using Redis as storage for output from Hadoop jobs and we've been pushing data to other nodes to collect statistics on reads. If you imagine the full network size of Disqus being rather large in writes, you'd have to see a few services crumble when you try to do a write-per-request. ;)
All in all, I'd say we're big fans. Aside from the VM issues (which should only affect a very small percentage of users, and should be resolved in 2.4) it's been a pleasure to administrate even for our very small team. I wouldn't call us technically conservative by any means, but we have a decent amount of PostgreSQL expertise in house and we don't throw in new moving parts for no reason. When it's the right tool for the job, it's great. Huge props to Salvatore Sanfilippo, Pieter Noordhuis and others.