### Introduction to MongoDB

### Mateusz 

Date: Tuesday, 7 June 2015, day 2nd  
Start: 13:30  
End: 17:30  

### Agenda:
- 13:30 - 14:00 - Introduction to NoSQL and MongoDB
- 14:00 - 14:30 - Queries and commands mastering
- 14:30 - 14:40 - Short break, please be back on time
- 14:40 - 15:10 - More on MongoDB
- 15:10 - 17:30 - Building sample application with Python and Pymongo

### Objectives:
- learn fundamentals of MongoDB
- learn how to use most common commands and queries
- know when to use and when to avoid MongoDB
- know how to avoid most common MongoDB gotchas
- learn how to design a schema

###URLs:
- http://docs.mongodb.org/manual/ - MongoDB offical documentation
- http://api.mongodb.org/python/current/ - PyMongo driver documentation
- http://www.mongodbspain.com/wp-content/uploads/2014/03/MongoDBSpain-CheetSheet.pdf - mongo cheatsheet
- http://openmymind.net/mongodb.pdf
- http://www.slideshare.net/friedo/data-modeling-examples - schema design examples
- http://blog.mongodb.org/post/33700094220/how-mongodbs-journaling-works - how journaling in MongoDB works
- https://www.youtube.com/watch?v=9qmIa_m5Y8w - mongostat and mongotop explained

# Intro
Slides are here:
https://docs.google.com/presentation/d/1Ii_B4FM1WV6QZewgQS-q_HvQyNgDB5rx7rXxolLUi8Y/edit#slide=id.g3610562c7_12


# NoSQL motivation

- Why do we need NoSQL?
- Why RDBS are sometimes not enough?


### MongoDB performance vs functionality

<img src="http://www.mongothinking.com/wp-content/uploads/2014/06/Knee-Curve_2.png">

# NoSQL vs SQL
- Document vs Relation
- Available vs ACID
- No transactions vs transactions
- Embedding vs joins

# Why MongoDB?
- Out of the box sharding, that enables horizontal scaling
- Out of the box replication and failover
- Performance (usually a lot faster than RDBS)
- Powerfull (most of the features from RDBS except for transactions and relations)

# Terminology
- Table -> Collection
- Row -> Document
- Index -> Index
- Join -> Embedded
- FK -> Reference
- Partition -> Shard

# Schema comparing

# Queries

In [None]:
# Excercise 1 - mongoimport zips database
# Download zips database - http://media.mongodb.org/zips.json
$ mongoimport --db summercamp --collection bios --file zips.json

In [None]:
# Exercise 2 - findOne
# connects by default to test database
$ mongo
# switches to summercamp database
> use summercamp
> db.zips.findOne()

In [None]:
# Exercise 3 - find, filtering fields, sorting, limiting, pretty
> db.zips.find()
> db.zips.find().pretty()
> db.zips.find({city: 'CHICAGO'})
> db.zips.find({city: 'CHICAGO'}, {city: 1})
> db.zips.find({city: 'CHICAGO'}, {city: 1, _id: 0})
> db.zips.find().limit(1).sort({pop: -1})
> db.zips.find().limit(2).skip(1).sort({pop: -1})


In [None]:
# Exercise 4 - find $ne, $gte, $lte, $regex, $in, $exists, $or
> db.zips.find({city: {$ne: 'CHICAGO'}})
> db.zips.find({city: {$in: ['NEW YORK', 'CHICAGO']}})
> db.zips.find({loc: {$gte: 40}})
> db.zips.find({loc: {$lte: 40}})
> db.zips.find({city: {$regex: /CHI/}})
> db.

In [None]:
# Exercise 5 - updates, multi, $set, $unset, remove, insert

In [None]:
# Exercise 6 - aggreations

## Indexing thoery

In [None]:
# Excercise 7 - write JS code to insert 500k documents into some collection
# Have 3 fields - name, gender and hairs
# Name should be a random string
# Gender should be randomly chosen from ['male', 'female']
# Hairs should be randomly chosen from ['black', 'brown', 'blond', 'red', 'auburn', 'chestnut', 'white'] 
# Use Math.floor(Math.random() * some_array.length) to generate array random index
# Use Math.random().toString(36).substring(7); to create a random string

In [None]:
# Excerise 7 - indexing (creating, dropping, query explaining), comparing queries w/out indexes vs /w indexes

## Profiling

In [None]:
# Exercise 8 - profiling

## Replication
- Backup
- Disaster recovery
- Reporting
- Increased read capacity

In [None]:
# Exercise todo - setup replica set locally

## Write concerns

- Usefull to control Availability vs. Consistancy
- Weakest - w: 0 - when to use?
- Strongest - w: "majority"

## Read preferences
- Helps to take load of from primary
- Secondary is used as backup and for failover in company
- In theory both primary and secondary should be of the same size but in practise secondary is less powerfull (problems when failover occurs)

## Sharding
- CPU/RAM/HDD are limited
- Enable horizontal scaling
- Out of the box feature
- Choosing a shard key is a challange
- Once you choose a shard key you cannot modify it

## Monitoring tools
- mongotop - tracks and reports the current reading and writing activity of a MongoDB instance, providing per-collection visibility into use.
- mongostat - view live MongoDB performance statistics.
- MMS - https://cloud.mongodb.com - company statistics

## MongoDB and Python

- <a href="http://api.mongodb.org/python/current/">PyMongo</a> - official driver  
- <a href="http://mongoengine.org/">Mongoengine</a> - ODM pattern
- <a href="http://merciless.sourceforge.net/index.html">Ming</a> - Unit of work pattern

## MongoDB - when to use and when avoid
- Would you use NoSQL database for financial activities?

## MongoDB common gotchas
- Field names cannot contain dots (i.e. `.`) or null characters, and they must not start with a dollar sign (i.e. `$`)  
- Not using index for `$or` but using for `$in`
- Type sensitive queries
- Order of fields matter when querign for embedded document
