In [1]:
# code for loading the format for the notebook
import os

# path : store the current path to convert back to it later
path = os.getcwd()
os.chdir('../notebook_format')
from formats import load_style
load_style()

In [2]:
os.chdir(path)
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = 8, 6 # change default figure size

# 1. magic to print version
# 2. magic so that the notebook will reload external python modules
%load_ext watermark
%load_ext autoreload 
%autoreload 2

# python -m pip install pymongo
import json
import pymongo
%watermark -a 'Ethen' -d -t -v -p numpy,pandas,pymongo,matplotlib

Ethen 2016-08-01 19:55:13 

CPython 3.5.2
IPython 4.2.0

numpy 1.11.1
pandas 0.18.1
pymongo 3.3.0
matplotlib 1.5.1


## QuickStart on PyMongo

For installing MongoDB on mac see refer to this [MongoDB installation gist](https://gist.github.com/adamgibbons/cc7b263ab3d52924d83b). After installing it, we can type `mongod` from the command line to run a MongoDB instance. If we wish to kill the instance we can do `killall mongod`.

https://api.mongodb.com/python/current/tutorial.html

http://altons.github.io/python/2013/01/21/gentle-introduction-to-mongodb-using-pymongo/

http://blog.mycodesite.com/mongodb-basics-and-tips/

http://blog.pythonisito.com/2012/01/moving-along-with-pymongo.html

In [10]:
# connect to MongoDB
conn = pymongo.MongoClient()
print(conn)

# create a new database called Library
# and a collection in the database Books
db = conn['Library']
collection = db['Books']

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)


Unlike relational database where you need to create the schema for the database upfront, In Mongodb, databases and collections are dynamically created! MongoDB stores structured data as JSON-like documents, using dynamic schemas (called BSON), rather than predefined schemas. **Observations (data) are called documents and these documents are stored in collections**.

Compared to relational databases, we could say collections are like tables, and documents are like records. But there is one big difference: every record in a table has the same fields (with, usually, differing values) in the same order, while each document in a collection can have completely different fields from the other documents.

When using `pymongo`, documents are basically Python dictionaries that can have strings as their keys and  various primitive types (int, float, unicode, datetime, lists, dicts) as values.

To insert a document into MongoDB, all we need to do is create a dictionary and call `.insert_one()` on the collection object:

In [11]:
collection

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'Library'), 'Books')

In [12]:
# book_dict['_id'] = '978-3-16-148410-0'  #The unique key  
book_dict = { 
    'title': 'MongoDB book',
    'year': 2014,
    'authors': ['Author 1', 'Author 2'],
    'sells': 0
}

book = collection.insert_one(book_dict)
book.inserted_id

ObjectId('579f39b240187910aa1d0853')

When a document is inserted, a special key, `_id`, is automatically added if the document doesn’t already contain an `_id` key. The value of "_id" must be unique across the collection. 

For inserting documents we can simply call `collection.insert_one(book_dict)` without saving it to a variable, it was simply used to show the id.

In [13]:
# we can check that our database and collection does in fact exists
print( conn.database_names() )
print( db.collection_names() )

# or we can remove the collection that we've created
# db['Books'].drop()

['Library', 'Yelp', 'local']
['Books']


There are multiple different ways to query a document out of the collection. Simplying calling `find()` without any arguments will show all the documents from the collection, this is called the *cursor*.

In [14]:
# note that the _id matches the id printed above
cursor = collection.find()
for c in cursor:
    print(c)

{'_id': ObjectId('579f39b240187910aa1d0853'), 'year': 2014, 'authors': ['Author 1', 'Author 2'], 'title': 'MongoDB book', 'sells': 0}


In [15]:
# in order to loop through the cursor again
# we need to call .rewind()
cursor.rewind()
for c in cursor:
    print(c)

{'_id': ObjectId('579f39b240187910aa1d0853'), 'year': 2014, 'authors': ['Author 1', 'Author 2'], 'title': 'MongoDB book', 'sells': 0}
