<DIV ALIGN=CENTER>

# Introduction to MongoDB
## Professor Robert J. Brunner
  
</DIV>  
-----
-----

## Introduction

Previously in this course, we have discussed .

In this IPython Notebook, we explore using .

https://en.wikipedia.org/wiki/MongoDB


-----

## Python with MongoDB

To use Python to interact with MongoDB, we need to use a suitable Python
library. The recommended Python library is [_pymongo_][pymdb], which
provides support for establishing a connection between a Python program
and a MongoDB server as well as support tools for working with MongoDB. 

We have already installed _pymongo_ in the course Docker container;
however, you can easily install is by using `pip`, for example to
install _pymongo_ for use with Python3 for the current user, we can
execute:

```console
pip3 install pymongo --user
```

Once this library is installed, we can import the MongoDB client to
establish a connection and retrieve data and MongoDB information.

```python
from pymongo import MongoClient
```

-----

[pymdb]: http://api.mongodb.org/python/current/

In [1]:
from pymongo import MongoClient

## Local MongoDB Server

To use a local MongoDB server, for instance, a MongoDB server running
inside our course Docker container, we need to first start the server.
To do this, open a terminal window inside the Docker container, most
easily done using the _New_ menu on the JupyterHub Server homepage,
followed by _Terminal_.

![New Terminal](images/new-term.png)

Inside this new terminal window, start up the MongoDB server by issuing
the following command:

```console mongod --nojournal ``` This will start the mongo database
daemon with no journaling (since we are not worried about crash safety).
This will produce a list of messages such as the following in your
terminal window.

![New MongoDB local server](images/new-mongod.png)

At this point the local server is ready to start accepting connections.
To open a connection to the localhost using pymongo, we establish a new
MongoDB client:


```python
client = MongoClient()
```

which assumes a local server with default port. Alternatively, we can
explicitly list the hostname and port, which is preferred since it is
easier to recognize the server and port number, which can be easily
changed when we move to a remote MongoDB server.

```python
client = MongoClient("mongodb://localhost:27017")
```

which connects to the local MongoDB daemon using the default local host
name and port.

-----

## Remote MongoDB Server

To connect to a remote MongoDB server, for instance by using the course
cluster system, we simply need the IP address for the server and the
port number on which the MongoDB daemon is listening. For this course,
Notebooks running on the course JupyterHub Server can access a MongoDB
server on `10.0.3.126` and the default port number of `27017`:


```python
client = MongoClient("mongodb://10.0.3.126:27017")
```

-----

In [2]:
# Establish a connection to MongoDB (uncomment only one of these lines)

# For remote course server use
#client = MongoClient("mongodb://10.0.3.126:27017")

# For local Docker server use
client = MongoClient("mongodb://localhost:27017")

-----
## MongoDB Database

MongoDB provides storage for collections of documents. To manage a set
of related collections, MongoDB uses the concept of a database. Thus a
MongoDB database is similar to a standard relational database, which
contains a collection of tables.

In the next few sections, we explore the _pymongo_ library in a similar
manner as the official [_pymongo_ tutorial][pymt]. In addition, in this
Notebook we use dictionary style access to acquire a database,
collection, or document. There is also an attribute style method to
access these items, but dictionary style is preferred since it reinforces
that concept that MongoDB is a document style database and that Python
dictionaries are used to create document schema. In addition, the
dictionary style enables names to be used that might not be legal Python
names, such as `test-database`. 

-----
[pymt]: http://api.mongodb.org/python/current/tutorial.html

In [32]:
# We will delete our working directory if it exists before recreating.

dbname = 'test-database'
if  dbname in client.database_names():
    client.drop_database(dbname)
    
print('Existing databases:', client.database_names())

Existing databases: ['local', 'admin']


In [33]:
db = client['test-database']
print('Existing databases:', client.database_names())

Existing databases: ['local', 'admin']


----

databases with no collections or with empty collections will not show up with database_names(). Same goes when we try to list empty collections in a database.

An important note about collections (and databases) in MongoDB is that they are created lazily - none of the above commands have actually performed any operations on the MongoDB server. Collections and databases are created when the first document is inserted into them.


Create collection.

-----

In [34]:
collection = db['test_collection']

print('Existing databases:', client.database_names())
print('Existing collections:', db.collection_names())

Existing databases: ['local', 'admin']
Existing collections: []


-----

## Adding data

insert data, first jsut basic stuff, then add in more complex.

-----

In [35]:
student = {'fname': 'Jane',
           'lname': 'Doe',
           'company': 'bdg surf shop'}

students = db['students']

jane_id = students.insert_one(student).inserted_id
print("New Student ID: ", jane_id)

New Student ID:  560ac053107efe0053ec7e56


In [36]:
print('Existing databases:', client.database_names())
print('Existing collections:', db.collection_names())

Existing databases: ['test-database', 'local', 'admin']
Existing collections: ['students', 'system.indexes']


-----

Insert two new documents with different schema

-----

In [37]:
student = {'fname': 'John',
           'lname': 'Doe',
           'company': 'bdg surf shop',
           'lucky_numbers': [2, 5, 9, 13, 27]}

john_id = students.insert_one(student).inserted_id
print("New Student ID: ", john_id)

New Student ID:  560ac058107efe0053ec7e57


In [38]:
import datetime

student = {'fname': 'Pat',
           'lname': 'Doe',
           'company': 'bdg surf shop',
           'hire_date': datetime.datetime.utcnow()}

pat_id = students.insert_one(student).inserted_id
print("New Student ID: ", pat_id)

New Student ID:  560ac05c107efe0053ec7e58


In [50]:
print("Number of students = ", students.count())

Number of students =  5


-----

Find one, specific one, or iterate through all.

-----

In [39]:
students.find_one()

{'_id': ObjectId('560ac053107efe0053ec7e56'),
 'company': 'bdg surf shop',
 'fname': 'Jane',
 'lname': 'Doe'}

In [41]:
students.find_one({"_id": pat_id})

{'_id': ObjectId('560ac05c107efe0053ec7e58'),
 'company': 'bdg surf shop',
 'fname': 'Pat',
 'hire_date': datetime.datetime(2015, 9, 29, 16, 46, 20, 639000),
 'lname': 'Doe'}

In [51]:
for student in students.find():
    print(student)

{'_id': ObjectId('560ac053107efe0053ec7e56'), 'fname': 'Jane', 'lname': 'Doe', 'company': 'bdg surf shop'}
{'_id': ObjectId('560ac058107efe0053ec7e57'), 'fname': 'John', 'lucky_numbers': [2, 5, 9, 13, 27], 'lname': 'Doe', 'company': 'bdg surf shop'}
{'_id': ObjectId('560ac05c107efe0053ec7e58'), 'fname': 'Pat', 'hire_date': datetime.datetime(2015, 9, 29, 16, 46, 20, 639000), 'lname': 'Doe', 'company': 'bdg surf shop'}
{'_id': ObjectId('560ac256107efe0053ec7e59'), 'fname': 'Mike', 'lname': 'Simone', 'products': [{'name': 'eyeware', 'id': 1}, {'name': 'hat', 'id': 2}], 'company': 'Del Ray Enterprises'}
{'_id': ObjectId('560ac256107efe0053ec7e5a'), 'fname': 'Clair', 'lname': 'Hwu', 'comment': 'Great supplier, fast, fair, and courteous.', 'company': 'Hoboken Surfware Incorporated'}


-----

Insert new documents and get updated count

-----

In [45]:
new_students = [
    {'fname': 'Mike',
     'lname': 'Simone',
     'company': 'Del Ray Enterprises',
    'products': [{'id': 1, 'name': 'eyeware'}, {'id': 2, 'name': 'hat'},]},
    {'fname': 'Clair',
     'lname': 'Hwu',
     'company': 'Hoboken Surfware Incorporated',
     'comment': 'Great supplier, fast, fair, and courteous.'}]

result = students.insert_many(new_students)

print(result.inserted_ids)

[ObjectId('560ac256107efe0053ec7e59'), ObjectId('560ac256107efe0053ec7e5a')]


In [52]:
print("Number of students = ", students.count())

Number of students =  5


-----

find specific documents and get count

-----

In [47]:
for student in students.find({"lname": "Hwu"}):
    print(student)

{'_id': ObjectId('560ac256107efe0053ec7e5a'), 'fname': 'Clair', 'lname': 'Hwu', 'comment': 'Great supplier, fast, fair, and courteous.', 'company': 'Hoboken Surfware Incorporated'}


In [55]:
print("Number of students = ", students.find({"lname": "Doe"}).count())

Number of students =  3


-----

Extract specific elements

-----

In [49]:
for student in students.find():
    print(student['fname'], student['lname'])

Jane Doe
John Doe
Pat Doe
Mike Simone
Clair Hwu


----


## Querying

Full range of operators.

http://www.mongodb.org/display/DOCS/Advanced+Queries


In [56]:
for student in students.find({"lname": {'$eq': 'Doe'}}).sort('fname'):
    print(student)

{'_id': ObjectId('560ac053107efe0053ec7e56'), 'fname': 'Jane', 'lname': 'Doe', 'company': 'bdg surf shop'}
{'_id': ObjectId('560ac058107efe0053ec7e57'), 'fname': 'John', 'lucky_numbers': [2, 5, 9, 13, 27], 'lname': 'Doe', 'company': 'bdg surf shop'}
{'_id': ObjectId('560ac05c107efe0053ec7e58'), 'fname': 'Pat', 'hire_date': datetime.datetime(2015, 9, 29, 16, 46, 20, 639000), 'lname': 'Doe', 'company': 'bdg surf shop'}


-----
## Breakout Session

During this breakout, you should work with the previous MongoDB examples
in order to better learn how MongoDB works, and how it is different than
pure relational databases. Specific problems you can attempt
include the following:

1. 

2. 

3.

Additional, more advanced problems:

1. Read in Airline data (100k rows) and store in collection?

-----

-----
### Additional References


2. 

-----


### Return to the [Week Three](index.ipynb) index.

-----