# MongoDB from Python
This notebook introduces how we communicate with a MongoDB database server from Python. A package called PyMongo provides the necessary functionality.

First, we need to install two packages into our virtual environment:  `pymongo` and `dnspython`.  The `dnspython` package is required by `pymongo` but is not automatically installed during the `pymongo` installation.  With our virtual environment activated, enter the following at the command line:

```
pip install pymongo
pip install dnspython
```

Next, we must connect to a running instance of MongoDB. A MongoDB database server can be running in many different places.  It could be on your machine (`localhost` or `127.0.0.1`), or perhaps a machine in the cloud (`vcm-0000.vm.duke.edu`), or a MongoDB Cloud server. This notebook demonstrates connecting to the MongoDB Atlas cloud database server.  The following instructions assume that you have already set up and configured your own instance of a MongoDB Atlas cluster.

## Creating a Connection to MongoDB
In our code, we are going to create a "client" variable that we will use to access the MongoDB server.  First, we need to import the `MongoClient` class from the `pymongo` package.

In [None]:
from pymongo import MongoClient

Next, we create a `MongoClient` instance as follows:

In [None]:
uri = "<connection_string>"
client = MongoClient(uri)

In the above code, replace `<connection_string>` with the connection string you obtained from the MongoDB Atlas on-line interface.  Remember to replace the <db_username> and <db_password> with the appropriate values you created during setup of the database access account.

You can test the connection as follows:

In [None]:
client.admin.command({'ping': 1})

If the connection is successful, the above command will be successful and the program will continue.  If the connection is not successful, an `OperationFailure` error will halt the program and provide some information about the connection error.

While not necessary for this class, if in the future you would like your communications with the MongoDB server to be secure, you can add the TLS protocol to your connection.  To do so, add `tls=True` to the creation of the `MongoClient` as shown here:  `client = MongoClient(uri, tls=True)`.

## MongoDB Organization:  Databases, Collections, and Documents
Within the MongoDB cluster you created in Atlas, you can have multiple databases.  Within each database, you can have multiple collections.  Collections are a set of documents.

Let's start at the bottom.  A __document__ is the basic entry in the database and generally is meant to refer to a specific item:  a single patient, a blog entry, a piece of equipment, etc.  You will generally have many of these specific items that you want to keep track of.  And, each specific item will have its own document.

These documents of the same type are grouped in __collections__.  So, all patients would be in the Patient collection, all blog posts in the Blog collection, or all equipment in the Equipment collection, etc.  This way, when you are looking for a certain item, you can go to a particular collection and look for it there.

Finally, groups of collections can be put together into a __database__.  These collections in a database are generally related to each other in some way.

The exact organization is up to the database/software designer.

## Access a Database
Generally, you will have a single database for each project you are working on.  We define a variable to point to the database of interest by using the `client` variable as follows:

In [None]:
database = client["class_demo"]

If the given database name does not exist in MongoDB, it will be created when the first document is added to it.

## Access a Collection
To access a collection within a database, we define a variable to point to that collection using the `database` variable as follows:

In [None]:
collection = database["user"]

Again, if the collection does not already exist in MongoDB, it will be created when the first document is added.

## Database Usage:  CRUD

### Create
To create an entry in our database, we first make a connection to the collection in which we want to add a new document following the steps above.  Then, we need to define the document.  The contents of a document are generally defined as key:value pairs, or a dictionary in Python terminology.  For example:

In [None]:
user_document = {"email": "suyash@suyashkumar.com", "first_name": "Suyash", "last_name": "Kumar", "age": 1000}

Then, we can add this document to our collection as follows:

In [None]:
collection.insert_one(user_document)

Note that the command above returned a result that included an object id.  And, if you go to MongoDB, you will see that the document has been added and has been assigned an `_id` key with that object id.  MongoDB, along with any database, needs to somehow identify each document uniquely.  In MongoDB, this unique id is stored in the `_id` key.  We will talk about that a little more below.

Let's add a couple of more documents to our database.  Remember that in a non-relational database like MongoDB, each document can have its own set of key:value pairs.  So, it is up to us as programmer to be consistent and use the same set of key:value pairs when we are compiling similar documents in a collection.  Here is where a function to create a database entry might be helpful.


In [None]:
def create_user_entry(collection, email, first_name, last_name, age):
    user_document = {"email": email, "first_name": first_name, "last_name": last_name, "age": age}
    result = collection.insert_one(user_document)
    return result

create_user_entry(collection, "mark@test.com", "Mark", "Palmeri", 2000)
create_user_entry(collection, "bob@test.com", "Bob", "Smith", 2000)


### Retrieve
We can retrieve data from the database by using the `.find` method of the collection variable.  The `find` method does not return the actual documents, but a "cursor" to the set of found documents.  You then need to navigate through the documents using this cursor.  The easiest way of doing that is with a `for` loop.

In [None]:
all_users_cursor = collection.find()
for user in all_users_cursor:
    print(user)

Or, you can move the cursor manually through the found documents using the `.next()` method of the find results:

In [None]:
all_users_cursor = collection.find()
print(all_users_cursor.next())
print(all_users_cursor.next())

With this approach, you need to be careful because you will generate an error if you run out of entries.

In [None]:
print(all_users_cursor.next())
print(all_users_cursor.next())

The reason `pymongo` returns a cursor to the documents instead of the documents themselves it that it is not uncommon for database searches to returns thousands or more of documents.  That number of documents may overwhelm the memory available to your program.  Remember, one advantage of using an external database is to save on memory usage with large databases.  So, the cursor concept allows you to search for and find many documents, but only have access to a few of them at any one time to save on memory.

However, if you do want all of the documents at once, you can convert the cursor into an actual list of results.  Just be careful that your list isn't too long.

In [None]:
all_users_cursor = collection.find()
all_users = list(all_users_cursor)
print(len(all_users))
print(all_users)

#### Query Filter
When making a find request, you can specify a query to return only certain documents that match a certain set of criteria.  For example, let's find those users whose age is 2000.

In [None]:
users_cursor = collection.find({"age": 2000})
for user in users_cursor:
    print(user)

If we want to look at a range of possible results, we use comparisons.  Details on Comparison Query Operators in MongoDB can be found at <https://www.mongodb.com/docs/manual/reference/operator/query-comparison/>.  Below is example syntax of a greater than or equal query.  Also, note in this example that the query is done directly in the `for` loop definition.

In [None]:
for user in collection.find({"age": {"$gte": 1000}}):
    print(user)

#### Returning Single Document
If you know that you will only find a single document (or you only want the first document that matches a query), you can use the `.find_one()` method.

In [None]:
mark_user = collection.find_one({"first_name": "Mark"})
print(mark_user)

### Update
To update a document in MongoDB, the easiest way is to 1) retrieve the document to update, 2) modify its contents as desired, and 3) replace the document usings its unique identifier.

The steps for doing this are shown in the example below where Mark's age is changed from 2000 to 1750.

In [None]:
# 1. Retrieve document
mark_user = collection.find_one({"first_name": "Mark"})
print("Age before change: {}".format(mark_user["age"]))
# 2. Update the contents of the document
mark_user["age"] = 1750
# 3. Replace with the updated document using its unique identifier
collection.replace_one({"_id": mark_user["_id"]}, mark_user)

# Verify it worked
new_mark_user = collection.find_one({"first_name": "Mark"})
print("Age after change: {}".format(new_mark_user["age"]))

### Delete
There is not often a reason to documents entries from a database.  But, it is sometimes necessary.  And, as we will discuss in database testing, we will often want to add test entries and then delete them during testing.  You can delete a single document by using the `.delete_one` method of the collection and specify the search criteria for the document that should be deleted.



In [None]:
collection.delete_one({"first_name": "Mark"})

The `.delete_one` method will only delete the first item found.  If multiple items that match the query need to be deleted, use the `.delete_many` method.  To delete all documents in the collection, use `.delete_many` and provide an empty query.  Example:

In [None]:
collection.delete_many({})

all_users = list(collection.find())
print(len(all_users))

Another way of deleting all documents in a collection is to simply delete the collection itself, as shown below.

In [None]:
collection.drop()

## Using Unique ID as a Primary Key
As mentioned above, each document in a collection needs a unique ID (`_id`) to separate it from every other document.  This is often known as a primary key.  It is the "key" that primarily defines the document.  If we do not specify a unique ID when creating a document in MongoDB, MongoDB creates its own.  But, it is possible for us to define our own unique ids to use as a primary key.

For example, lets think about a user database.  What piece of information about a person uniquely identifies them.  It isn't name because we know many people have the same name.  It isn't age or address.  But, it could be an e-mail:  only one person should have a particular e-mail (although that isn't always true for families).  Or, a medical record number.

MongoDB enforces that no two documents can have the same unique id, or `_id`.  So, if there is a similar field in our document that we want to be unique to each user, we could use that as the `_id` instead of having MongoDB define it for us.  Let's think of the example of a medical equipment database.  What can uniquely identify each item is its serial number.  Here is an example:

In [None]:
eq_database = client["equipment"]
eq_collection = eq_database["equipment"]

def add_equipment(eq_collection, serial_number, name, room_location):
    new_equipment = {"_id": serial_number, "name": name, "room_location": room_location}
    eq_collection.insert_one(new_equipment)

add_equipment(eq_collection, 4564567, "Electrocardiograph", "B203")
add_equipment(eq_collection, 334576, "BP Monitor", "B203")
add_equipment(eq_collection, 7445644, "Electrocardiograph", "A123")

# Find by serial number
device = eq_collection.find_one({"_id": 7445644})
print("Device Serial Number: {}".format(device["_id"]))
print("Device Name: {}".format(device["name"]))
print("Device Room Location: {}".format(device["room_location"]))


## Python Data Types in MongoDB Documents
The examples above used simply data types, integers and strings, as values in the documents.  But, MongoDB can accept more complex data types, such as dictionaries, lists, booleans, and nested types.  Here is an example:

In [None]:
from datetime import datetime

database = client["university_db"]
collection = database["students"]

# Create
document = {"student_id": 123,
            "grades": ["A", "A", "C", "B"],
            "address": {"street": "123 Main", "city": "Durham"},
            "enrolled": True,
            "balance": 14000.13,
            "timestamp": datetime.now()}
collection.insert_one(document)

# Retrieve
student = collection.find_one({"student_id": 123})
print(student)

# Update
student["grades"].append("B+")
student["enrolled"] = False
student["balance"] -= 3000.54
collection.replace_one({"_id": student["_id"]}, student)

# Verify changes
updated_student = collection.find_one({"student_id": 123})
print(student)

More information on the types of data that can be stored can be found at <https://www.mongodb.com/docs/languages/python/pymongo-driver/current/data-formats/>.

## References
PyMongo Documentation:  <https://www.mongodb.com/docs/languages/python/pymongo-driver/current/>