Skip to content
niho edited this page Dec 29, 2011 · 10 revisions

Welcome to the Related Wiki

Philosophy

The design of Related is highly opinionated and the intention is not for Related to be a hammer for all nails. The goal is specifically to be an easy to use, yet powerful, tool for creating social applications. Quality over quantity! If you're doing DNA sequencing or are trying to figure out who is a terrorist you should probably look into another tool. If you're building the next Facebook or a cool new semantic web product, you've probably come to the right place.

Design goals

  • Simple and lightweight. If you already have Redis installed the only step to get started with Related is "gem install related". If you are familiar with Active Record then Related will probably feel very natural to you.
  • Extremely fast JSON output. The main use case for Related is to build web APIs and to output large amounts of JSON data very efficiently. Timestamps are for example stored as ISO 8601 strings, which is the same format used when serializing timestamps in JSON, making a costly post processing step on the data unnecessary. Related can very easily return several thousand nodes in JSON format in less than 100ms and is easy to integrate with a cache layer like memcache. For most use cases bandwidth to the API consumer will be the limiting factor rather than the performance of Related.
  • Distributed architecture and easy to shard. All graph walking is currently done on the client side and IDs are Base62 encoded MD5 strings rather than auto incrementing integers, which makes it fairly simple to use Related in a distributed setup. Since Redis is so damn fast, doing the graph walking is not really a performance problem and if you have a sharded setup where nodes are stored in different Redis instances Related can potentially visit many nodes in parallel. The only limiting factor will once again be bandwidth.
  • Powerful and easy to use querying. Querying the graph should not require that you learn a new programming language or even understand much about graph algorithms at all. Related provides ready-made features like the Related::Follower module and query methods like shortest_path_to.
  • Parallelism Related will use concurrency mechanisms like EventMachine and Fibers to perform operations in parallel whenever possible. Many graph algorithms can be easily made to execute in parallel and with a graph distributed over a large cluster of servers running such algorithms with Related can be very fast. Related handles parallel execution as transparently as possible and will require hardly any special code.
  • Real-time stream processing. It should be easy to process in real-time the data that gets put into Related in an efficient, easy to use/understand way. Most non-trivial applications will need to aggregate the graph data in one way or another and the closer to "real-time" that aggregation can happen the better. Related should provide the primitives needed to do that. Implementing a real-time version of the Google Page Rank algorithm or the Twitter Trending Topics algorithm on a large scale graph should be trivial using Related (or at least, that's the goal).
  • Non-ACID Related makes the same guarantees about data integrity and safety as the underlying Redis database. Related does not support (distributed) transactions for example, but will never corrupt your graph in a way that makes it unreadable even if an update is only partially successful. Especially in social graphs each individual update to the graph is usually expendable and of overall low importance, but the graph as a whole can often grow very large and have requirements of consistent low latency access. So making ACID a lower priority makes sense. Related therefore prioritizes multi-server scalability and low latency over atomicity, consistency, isolation and durability (if you need a great graph database with ACID guarantees and excellent single-server performance you should try Neo4j instead).

Graphs and graph processing has been the defining technical feature behind many of the most successful internet companies in the last 10 years (Google, Facebook, Twitter, etc.). In short, the vision behind Related is to bring large scale graphs and graph processing to the masses and allow anyone to play and experiment with graph technology.

Indexing

Related does not implement any kind of indexing of the data you store in a node or relationship. Which means the only way to access that data is by knowing the ID of a node or relationship and then to query the graph to get to the data. There are no plans to implement any index functionality in Related.

The reason for that is that most useful indexing is either too application specific or too heavy/complex/inefficient to be added to Related. Some applications will need full text search, some will need geo spatial indices, some will not need any indices at all. Some will need to index the data in real time, some will rather index the data in offline background jobs. To support all those use cases in an optimal way is not realistic. Related is a graph database, not a full text search solution. Some people might not like it and I know it might be controversial, but my opinion is that indexing does not belong in the database. Another strong reason is that it does not play very well with caching. If you want to cache an object in memcached for example, each time you want to retrieve that object you will need its ID to look it up in memcached. If the only way to get that ID is to query an index and that index is a part of your database, then some of the benefit of having a caching layer in front of your database gets diminished. So the recommendation is: "Use the right tool for the right job and everything will work much better!".

If you need a simple and fast distributed index solution you should try Related's sister project Resolver. It uses Redis as a storage backend just like Related and integrates easily with any ActiveModel-like object.

Clone this wiki locally