Skip to content

Commit

Permalink
Merge pull request #2 from duarten/master
Browse files Browse the repository at this point in the history
Talk outline
  • Loading branch information
duarten committed Oct 23, 2012
2 parents f860a15 + d3a269c commit 4f3aade
Show file tree
Hide file tree
Showing 7 changed files with 178 additions and 2 deletions.
22 changes: 22 additions & 0 deletions 00 - why.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Why

## Definition

* None

## Cracks in the hegemony

* Impedance mismatch
* Application databases
* Clusters (vs relational databases)

## Characteristics

* Not using the relational model
* Schemaless
* (Mostly) Adequate for clusters
* (Mostly) Open-source

## Results

* Polyglot persistence
55 changes: 55 additions & 0 deletions 01 - data models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Data Models

* How we perceive the data
* Not the storage model
* Typically, the relational data model

## Aggregates

* Unit for data manipulation
* Unit of distribution
* Transactional boundary
* Relationships are not enforced
* Denormalization
* Typically, no atomic operations spaning multiple aggregates (exception: RavenDB)
* Schemaless

## Key-value and document stores

* Lookup by ID
* The value can be opaque (key-value) or used by queries (document)
* For key-value stores, you can integrate search tools for query support
* Model for data access

## Column-family stores

* Confusing model
* Sparse table: columns can be added to any row, and rows can have different columns
* Two-level map: first key identifies the row, second one identifies the column
* Column families (super-columns in Cassandra)
* Storage model more suited for reading
* Columns are ordered (name, timestamp, etc)
* Model for data access

## Materialized views

* Cached queries
* Can be stale
* Can be included in a document
* Map-reduce

## Graphs

* Small records with complex interconnections
* Nodes connected by edges
* More performant than relational databases
* Not suitable for clusters
* Transactions need to span multiple nodes

## Facts

* Immutability
* Time
* Storage is cheap
* Event sourcing
* Datomic (Memory Image)
21 changes: 21 additions & 0 deletions 02 - map-reduce.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Mad-Reduce

* Computation closer to the data
* Minimize cluster traffic
* Parallelization
* Based on the same functional ideas
* Map produces key-value pairs
* Reduce combines map outputs with the same key
* By example

from post in posts
from tag in post.Tags
select new { Name = tag.ToString().ToLower(), Count = 1 };

from tagCount in results
group tagCount by tagCount.Name into g
select new { Name = g.Key, Count = g.Sum(x => x.Count) }
* Reduce should be combinable
* Map-reduce pipeline
* Incremental
* Feeds materialized views
25 changes: 25 additions & 0 deletions 03 - distribution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Distribution

* Most NoSQL are specially desgined for running on clusters
* But not all (e.g., graph databases)
* Single server vs sharding and/or replication

## Sharding

* Scales writes
* How do users get all the data from a single server?
* Aggregates are the unit of distribution
* Can be handled in application logic
* Many NoSQL databases offer auto-sharding

## Master-Slave Replication

* Scales reads
* Read resilience
* Affects read consistency

## Peer-to-Peer Replication

* Scales reads and writes
* No single point of failure
* Affects read and write consistency
46 changes: 46 additions & 0 deletions 04 - consistency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Consistency

* Relational databases exhibit strong consistency
* NoSQL stores allow you to relax consistency
* Consistency requirements per operation or per request

## Write consistency

* Lost updates
* Concurrency control
* Pessimistic
* Optimistic - conditional updates
* Vector clocks and version vectors

## Read consistency

* Inconsistent reads
* Logical consistency
* Inconsistency window
* Replication consistency
* Eventually consistent
* Read-your-writes consistency

## CAP Theorem

* Consistency
* Partition tolerance
* Availability

> Every request received by a nonfailing node in the system
> must result in a response
* Tradeoff between consistency and availability/latency

## Durability

* Can also be sacrificed
* Periodically flush writes to disk
* What if the master node fails before replication?

## Quorums

* The tradeoff is flexible
* Replication factor - N
* Write quorum - W > N/2
* Read quorum - R + W > N
4 changes: 4 additions & 0 deletions 05 - future.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Future

* More systems, more models, more adoption
* Polyglot persistence
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
nosql-intro
Introduction to NoSQL
===========

Introduction to NoSQL
* Data models
* Map-Reduce
* Distribution
* Consistency

0 comments on commit 4f3aade

Please sign in to comment.