Readings in distributed systems
2013-07-12 12:00:35 -0400
This post is a work in progress.
Inspired by a recent purchase of the Red Book, which provides a curated list of important papers around database systems, I've decided to begin assembling a list of important papers in distributed systems. Similar to the Red Book, I've broken each group of papers out into a series of categories, each highlighting a progression of related ideas over time focused in a specific area of research within the field.
Keeping the tradition of the Red Book, I've included both papers which resulted in very successful systems and/or techniques, as well as papers which introduced a concept which was either immediately dismissed or proven incorrect. This emphasizes the progression of ideas which lead to the development of these systems.
The problems of establishing consensus in a distributed system.
- In Search of an Understandable Consensus Algorithm Diego Ongaro, John Ousterhout 2013
- A Simple Totally Ordered Broadcast Protocol Benjamin Reed, Flavio P. Junqueira 2008
- Paxos Made Live - An Engineering Perspective Tushar Deepak Chandra, Robert Griesemer, Joshua Redstone 2007
- The Chubby Lock Service for Loosely-Coupled Distributed Systems Mike Burrows 2006
- Paxos Made Simple Leslie Lamport 2001
- Impossibility of Distributed Consensus with One Faulty Process Michael Fischer, Nancy Lynch, Michael Patterson 1985
- The Byzantine Generals Problem Leslie Lamport 1982
Types of consistency, and practical solutions to solving ensuring atomic operations across a set of replicas.
- Highly Available Transactions: Virtues and Limitations Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica 2013
- Consistency Tradeoffs in Modern Distributed Database System Design Daniel J. Abadi 2012
- CAP Twelve Years Later: How the "Rules" Have Changed Eric Brewer 2012
- Calvin: Fast Distributed Transactions for Partitioned Database Systems Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, Daniel J. Abadi 2012
- Optimistic Replication Yasushi Saito and Marc Shapiro 2005
- Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Seth Gilbert, Nancy Lynch 2002
- Harvest, Yield, and Scalable Tolerant Systems Armando Fox, Eric A. Brewer 1999
- Linearizability: A Correctness Condition for Concurrent Objects Maurice P. Herlihy, Jeannette M. Wing 1990
- Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport 1978
Conflict-free data structures
Studies on data structures which do not require coordination to ensure convergence to the correct value.
- A Comprehensive Study of Convergent and Commutative Replicated Data Types Mark Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski 2011
- A Commutative Replicated Data Type For Cooperative Editing Nuno Preguica, Joan Manuel Marques, Marc Shapiro, Mihai Letia 2009
- CRDTs: Consistency Without Concurrency Control Mihai Letia, Nuno Preguiça, Marc Shapiro 2009
Languages aimed towards disorderly distributed programming as well as case studies on problems in distributed programming.
- Logic and Lattices for Distributed Programming Neil Conway, William Marczak, Peter Alvaro, Joseph M. Hellerstein, David Maier 2012
- Dedalus: Datalog in Time and Space Peter Alvaro, William R. Marczak, Neil Conway, Joseph M. Hellerstein, David Maier, Russell Sears 2011
- MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean, Sanjay Ghemawat 2004
- A Note On Distributed Computing Samuel C. Kendall, Jim Waldo, Ann Wollrath, Geoff Wyant 1994
Implemented and theoretical distributed systems.
- Spanner: Google’s Globally-Distributed Database James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman,Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh,Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura,David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak,Christopher Taylor, Ruth Wang, Dale Woodford 2012
- ZooKeeper: Wait-free coordination for Internet-scale systems Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, Benjamin Reed 2010
- A History Of The Virtual Synchrony Replication Model Ken Birman 2010
- Cassandra — A Decentralized Structured Storage System Avinash Lakshman, Prashant Malik 2009
- Dynamo: Amazon’s Highly Available Key-Value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels 2007
- Stasis: Flexible Transactional Storage Russell Sears, Eric Brewer 2006
- Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber 2006
- The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 2003
- Lessons from Giant-Scale Services Eric A. Brewer 2001
- Towards Robust Distributed Systems Eric A. Brewer 2000
- Cluster-Based Scalable Network Services Armando Fox, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, Paul Gauthier 1997
- The Process Group Approach to Reliable Distributed Computing Ken Birman 1993
Overviews and details covering many of the above papers and concepts compiled into single resources.
- Distributed Systems: for fun and profit Mikito Takada 2013
- Programming Distributed Computing Systems: A Foundational Approach Carlos A.Varela, Gul Agha 2013
- Guide to Reliable Distributed Systems: Building High-Assurance Applications and Cloud-Hosted Services Ken Birman 2012
- Introduction to Reliable and Secure Distributed Programming Christian Cachin, Rachid Guerraoui, Luís Rodrigues 2011
I'm hoping to make this into a living document, so please submit pull requests or leave comments!