### Introduction to MongoDB

### Mateusz 

Date: Tuesday, 7 June 2015, day 2nd  
Start: 14:30  
End: 17:30  

### Agenda:
- 13:30 - 14:00 - Introduction to NoSQL and MongoDB
- 14:00 - 14:30 - Queries and commands mastering
- 14:30 - 14:40 - Short break, please be back on time
- 14:40 - 15:10 - More on MongoDB
- 15:10 - 17:30 - Building sample application with Python and Pymongo

### Objectives:
- learn fundamentals of MongoDB
- learn how to use most common commands and queries
- know when to use and when to avoid MongoDB
- know how to avoid most common MongoDB gotchas

###URLs:
- http://docs.mongodb.org/manual/ - MongoDB offical documentation
- http://api.mongodb.org/python/current/ - PyMongo driver documentation
- http://www.mongodbspain.com/wp-content/uploads/2014/03/MongoDBSpain-CheetSheet.pdf - mongo cheatsheet
- http://openmymind.net/mongodb.pdf
- http://www.slideshare.net/friedo/data-modeling-examples - schema design examples
- http://blog.mongodb.org/post/33700094220/how-mongodbs-journaling-works - how journaling in MongoDB works
- https://www.youtube.com/watch?v=9qmIa_m5Y8w - mongostat and mongotop explained

# NoSQL motivation

- Why do we need NoSQL?
- Why RDBS are sometimes not enough?


### MongoDB performance vs functionality
- Scalability
- Performance
- HA
- Fast development

<img src="http://www.mongothinking.com/wp-content/uploads/2014/06/Knee-Curve_2.png">

# NoSQL vs SQL
- Document vs Relation
- Available vs Consistent
- No transactions vs transactions
- Embedding vs joins

# Why MongoDB?
- Out of the box sharding, that enables horizontal scaling
- Out of the box replication and failover
- Performance (usually a lot faster than RDBS)
- Powerfull (most of the features from RDBS except for transactions and relations)
- Fast development
- Stability
- Official Python driver

# Terminology
- Database -> Database
- Table -> Collection
- Row -> Document
- Column -> Field
- Index -> Index
- Join -> Embedded
- FK -> Reference
- Partition -> Shard

# NoSQL vs. SQL schema comparison
### SQL
<br/>
<img src="http://img.ctrlv.in/img/15/07/07/559bd2a63e8e4.png">  
<br/>
### NoSQL
<img src="http://img.ctrlv.in/img/15/07/07/559bd2387b87f.png">
- Note Mongo document max size is 16MB

# Queries
<table>
<tr><td>SQL</td><td>MongoDB</td></tr>
<tr><td>CREATE TABLE users</td><td>Implicitly created on first insert command</td></tr>
<tr><td>ADD / DROP COLUMN x</td><td>Structure not enforced</td></tr>
<tr><td>INSERT INTO users (name) VALUES (‘x’)</td><td>db.users.insert({‘name’ : ‘x’})</td></tr>
<tr><td>SELECT \* FROM users</td><td>db.users.find()</td></tr>
<tr><td>SELECT \* FROM users WHERE name = ‘x’</td><td>db.users.find({‘name’ : ‘x’})</td></tr>
<tr><td>(ALTER first!) UPDATE users SET age = 1 WHERE name = ‘x’;</td><td>db.users.update({‘name’ : ‘x’}, {$set: {age: 1}}, {multi : true})</td></tr>
<tr><td>DELETE FROM users WHERE name = ‘x’</td><td>db.users.remove({‘name’ : ‘x’})</td></tr>
<tr><td>UPDATE ...; SELECT ...;</td><td>db.users.findAndModify()</td></tr>
</table>


















In [None]:
# Excercise 1 - mongoimport zips database
# Download zips database - http://media.mongodb.org/zips.json
$ mongoimport --db summercamp --collection zips --file zips.json

In [None]:
# Exercise 2 - findOne
# connects by default to test database
$ mongo
# switches to summercamp database
> use summercamp
> db.zips.findOne()

In [None]:
# Exercise 3 - find, filtering fields, sorting, limiting, pretty
> db.zips.find()
> db.zips.find().pretty()
> db.zips.find({city: 'CHICAGO'})
> db.zips.find({city: 'CHICAGO'}, {city: 1})
> db.zips.find({city: 'CHICAGO'}, {city: 1, _id: 0})
> db.zips.find().limit(1).sort({pop: -1})
> db.zips.find().limit(2).skip(1).sort({pop: -1})


In [None]:
# Exercise 4 - find $ne, $gte, $lte, $regex, $in, $exists, $or
> db.zips.find({city: {$ne: 'CHICAGO'}})
> db.zips.find({city: {$in: ['NEW YORK', 'CHICAGO']}})
> db.zips.find({pop: {$gte: 40}})
> db.zips.find({pop: {$lte: 40}})
> db.zips.find({city: {$regex: /CHI/}})
> db.zips.find({city: 'CHICAGO', pop: {$gt: 100000}})
> db.zips.find({$or: [{city: 'CHICAGO'}, {state : "WA"}]})

In [None]:
# Exercise 5 - updates, multi, $set, $unset, remove, insert
> db.zips.update({city: 'CHICAGO'}, {city: 'WARSAW'})
> db.zips.update({city: 'WARSAW'}, {$set: {city: 'CHICAGO'}})
> db.zips.update({city: 'CHICAGO'}, {$set: {city: 'WARSAW'}}, {multi: true})
> db.zips.update({city: 'WARSAW'}, {$unset: {pop: ''}})
> db.zips.insert({city: 'CRACOW'})
> db.zips.find({city: 'CRACOW'})
> db.zips.remove({city: 'CRACOW'})

## Aggregations
<img src="http://docs.mongodb.org/manual/_images/aggregation-pipeline.png">

In [None]:
# Exercise 6 - aggregations

# Find states with population over 10 mln
db.zips.aggregate( [
   { $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
   { $match: { totalPop: { $gte: 10000000 } } }
] )

# Find average city population by state
db.zips.aggregate( [
   { $group: { _id: { state: "$state", city: "$city" }, pop: { $sum: "$pop" } } },
   { $group: { _id: "$_id.state", avgCityPop: { $avg: "$pop" } } }
] )

# Find smallest and biggest cities by state. 
# TIP: $first and $last will get the first and last element of the array
# {$sort: {pop: -1 }}

## Indexing thoery
### Single field index
<img src="http://docs.mongodb.org/manual/_images/index-ascending.png">
### Compound index
<img src="http://docs.mongodb.org/manual/_images/index-compound-key.png">

## Explain
<code>db.zips.find({city: 'CHICAGO'}).explain()</code>
> 
<code>
{
  "cursor" : "<Cursor Type and Index>",
  "isMultiKey" : <boolean>,
  "n" : <num>,
  "nscannedObjects" : <num>,
  "nscanned" : <num>,
  "nscannedObjectsAllPlans" : <num>,
  "nscannedAllPlans" : <num>,
  "scanAndOrder" : <boolean>,
  "indexOnly" : <boolean>,
  "nYields" : <num>,
  "nChunkSkips" : <num>,
  "millis" : <num>,
  "indexBounds" : { <index bounds> },
  "allPlans" : [
                 { "cursor" : "<Cursor Type and Index>",
                   "n" : <num>,
                   "nscannedObjects" : <num>,
                   "nscanned" : <num>,
                   "indexBounds" : { <index bounds> }
                 },
                  ...
               ],
  "oldPlan" : {
                "cursor" : "<Cursor Type and Index>",
                "indexBounds" : { <index bounds> }
              }
  "server" : "<host:port>",
  "filterSet" : <boolean>
}
</code>

In [None]:
# Excercise 7 - write JS code to insert 500k documents into summercamp collection
# Have 3 fields - name, gender and hairs
# Name should be a random string
# Gender should be randomly chosen from ['male', 'female']
# Hairs should be randomly chosen from ['black', 'brown', 'blond', 'red', 'auburn', 'chestnut', 'white'] 
# Use Math.floor(Math.random() * some_array.length) to generate array random index
# Use Math.random().toString(36).substring(13); to create a random string

In [None]:
# Excerise 7 - indexing (creating, dropping, query explaining), comparing queries w/out indexes vs /w indexes
> var name = 'l4i';
> db.summercamp.getIndexes()
> db.summercamp.find({name: name})
> db.summercamp.createIndex({name: 1})
> db.summercamp.find({name: name})
> db.summercamp.find({name: name}).explain()
> db.summercamp.find({hairs: 'blond'}).explain()
> db.summercamp.dropIndex({"name_1"})
> db.summercamp.createIndex({name: 1, hair: 1})
> db.summercamp.find({hairs: 'blond'}).explain()
> db.summercamp.find({name: name, hairs: 'blond'}).explain()

## Profiling

In [None]:
# Exercise 8 - profiling
> db.setProfilingLevel(1)
> db.setProfilingLevel(2)
> db.system.profile.find()

# Find last 10 commands for a summercamp collection

## Replication #1
- A cluster of N servers
- Any node can be primary
- All writes go to primary
- All reads go to primary (by default, can be configured)
- Automatic failover
- Automatic recovery

## Replication #2

- Backup
- Disaster recovery
- Reporting
- Increased read capacity  

http://docs.mongodb.org/master/MongoDB-replication-guide.pdf

In [None]:
# Exercise todo - setup replica set locally

## Write concerns
http://docs.mongodb.org/manual/core/write-concern/
- Unacknowledged  - {w: 0} - fire and forget
- Acknowledged - {w: 1} - default, network, duplicate key and other errors
- Journaled - {w:1, j: true} - data can be recovered after a crash
- Replica acknowledged - {w: 2}  


- Usefull to control Availability vs. Consistancy
- Weakest - w: 0 - when to use?
- Strongest - w: "majority"

## Read preferences
http://docs.mongodb.org/manual/core/read-preference/#read-preference-modes
- Helps to take load of from primary
- Secondary is used as backup and for failover in company
- In theory both primary and secondary should be of the same size but in practise secondary is less powerfull (problems when failover occurs)

## Sharding
<img src="http://docs.mongodb.org/manual/_images/sharded-collection.png">
- CPU/RAM/HDD are limited
- Enable horizontal scaling
- Out of the box feature
- Choosing a shard key is a challange
- Once you choose a shard key you cannot modify it



## Shard key properties
- Cardinality (distribution of data)
- Write scaling
- Query isolation
- Reliability

## Sharding excercise
Imagine you are designing the email inbox system.  
Would the "_id" would be a good shard key?  
What is the most frequent query?  
What would your shard key looked like?

## Monitoring tools
- mongotop - tracks and reports the current reading and writing activity of a MongoDB instance, providing per-collection visibility into use.
- mongostat - view live MongoDB performance statistics.
- MMS - https://cloud.mongodb.com - company statistics

## MongoDB and Python

- <a href="http://api.mongodb.org/python/current/">PyMongo</a> - official driver  
- <a href="http://mongoengine.org/">Mongoengine</a> - ODM pattern
- <a href="http://merciless.sourceforge.net/index.html">Ming</a> - Unit of work pattern

## MongoDB - when to use and when avoid
- Would you use NoSQL database for financial activities?

## MongoDB common gotchas
- Field names cannot contain dots (i.e. `.`) or null characters, and they must not start with a dollar sign (i.e. `$`)  
- Not using index for `$or` but using for `$in`
- Type sensitive queries (quering for integer when strings are stored)
- Order of fields matter when quering for embedded document or list field
- The total size of an index entry must be less than 1024 bytes


## Further education
- <a href="https://university.mongodb.com/courses/M101P/about">MongoDB for Python developers</a>
- <a href="https://university.mongodb.com/courses/M102/about">MongoDB for DBAs</a>
- <a href="https://university.mongodb.com/courses/M202/about">MongoDB advanced deployment and operations</a>