Merge pull request #2 from duarten/master

Talk outline
R42 · Oct 23, 2012 · 4f3aade · 4f3aade
2 parents f860a15 + d3a269c
commit 4f3aade
Show file tree

Hide file tree

Showing 7 changed files with 178 additions and 2 deletions.
diff --git a/00 - why.md b/00 - why.md
@@ -0,0 +1,22 @@
+# Why
+
+## Definition
+
+ * None
+
+## Cracks in the hegemony
+
+ * Impedance mismatch
+ * Application databases
+ * Clusters (vs relational databases)
+
+## Characteristics 
+
+ * Not using the relational model
+ * Schemaless
+ * (Mostly) Adequate for clusters
+ * (Mostly) Open-source
+
+## Results
+
+ * Polyglot persistence
diff --git a/01 - data models.md b/01 - data models.md
@@ -0,0 +1,55 @@
+# Data Models
+
+ * How we perceive the data
+ * Not the storage model
+ * Typically, the relational data model
+
+## Aggregates
+
+ * Unit for data manipulation
+ * Unit of distribution
+ * Transactional boundary
+ * Relationships are not enforced
+ * Denormalization
+ * Typically, no atomic operations spaning multiple aggregates (exception: RavenDB)
+ * Schemaless
+
+## Key-value and document stores
+
+ * Lookup by ID
+ * The value can be opaque (key-value) or used by queries (document)
+ * For key-value stores, you can integrate search tools for query support
+ * Model for data access
+
+## Column-family stores
+
+ * Confusing model
+ * Sparse table: columns can be added to any row, and rows can have different columns 
+ * Two-level map: first key identifies the row, second one identifies the column
+ * Column families (super-columns in Cassandra)
+ * Storage model more suited for reading
+ * Columns are ordered (name, timestamp, etc) 
+ * Model for data access
+
+## Materialized views
+
+ * Cached queries
+ * Can be stale
+ * Can be included in a document
+ * Map-reduce
+
+## Graphs
+
+ * Small records with complex interconnections
+ * Nodes connected by edges
+ * More performant than relational databases
+ * Not suitable for clusters
+ * Transactions need to span multiple nodes
+
+## Facts
+
+ * Immutability
+ * Time
+ * Storage is cheap
+ * Event sourcing
+ * Datomic (Memory Image)
diff --git a/02 - map-reduce.md b/02 - map-reduce.md
@@ -0,0 +1,21 @@
+# Mad-Reduce
+
+* Computation closer to the data
+* Minimize cluster traffic
+* Parallelization
+* Based on the same functional ideas
+* Map produces key-value pairs
+* Reduce combines map outputs with the same key
+* By example
+
+    from post in posts
+    from tag in post.Tags
+    select new { Name = tag.ToString().ToLower(), Count = 1 };
+
+	from tagCount in results
+	group tagCount by tagCount.Name into g
+	select new { Name = g.Key, Count = g.Sum(x => x.Count) }
+* Reduce should be combinable
+* Map-reduce pipeline
+* Incremental
+* Feeds materialized views
diff --git a/03 - distribution.md b/03 - distribution.md
@@ -0,0 +1,25 @@
+# Distribution
+
+ * Most NoSQL are specially desgined for running on clusters
+ * But not all (e.g., graph databases)
+ * Single server vs sharding and/or replication
+
+## Sharding
+
+ * Scales writes
+ * How do users get all the data from a single server?
+ * Aggregates are the unit of distribution
+ * Can be handled in application logic
+ * Many NoSQL databases offer auto-sharding
+
+## Master-Slave Replication
+
+ * Scales reads
+ * Read resilience
+ * Affects read consistency
+
+## Peer-to-Peer Replication
+
+ * Scales reads and writes
+ * No single point of failure
+ * Affects read and write consistency
diff --git a/04 - consistency.md b/04 - consistency.md
@@ -0,0 +1,46 @@
+# Consistency
+
+ * Relational databases exhibit strong consistency
+ * NoSQL stores allow you to relax consistency
+ * Consistency requirements per operation or per request
+
+## Write consistency
+
+ * Lost updates
+ * Concurrency control
+ * Pessimistic
+ * Optimistic - conditional updates
+ * Vector clocks and version vectors
+
+## Read consistency
+
+ * Inconsistent reads
+ * Logical consistency 
+ * Inconsistency window
+ * Replication consistency
+ * Eventually consistent
+ * Read-your-writes consistency
+
+## CAP Theorem
+
+ * Consistency
+ * Partition tolerance
+ * Availability
+
+   > Every request received by a nonfailing node in the system
+   > must result in a response
+   
+ * Tradeoff between consistency and availability/latency
+
+## Durability
+
+ * Can also be sacrificed
+ * Periodically flush writes to disk
+ * What if the master node fails before replication?
+
+## Quorums
+
+ * The tradeoff is flexible
+ * Replication factor - N
+ * Write quorum - W > N/2
+ * Read quorum - R + W > N
diff --git a/05 - future.md b/05 - future.md
@@ -0,0 +1,4 @@
+# Future
+
+ * More systems, more models, more adoption
+ * Polyglot persistence
diff --git a/README.md b/README.md
@@ -1,4 +1,7 @@
-nosql-intro
+Introduction to NoSQL
 ===========
 
-Introduction to NoSQL
+* Data models
+* Map-Reduce
+* Distribution
+* Consistency