# MongoDB

MongoDB (is a cross-platform document-oriented database. Classified as a NoSQL database, MongoDB eschews (avoid) the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster. Released under a combination of the GNU Affero General Public License and the Apache License, MongoDB is free and open-source software.

##Install MongoDB on OS X

http://docs.mongodb.org/manual/tutorial/install-mongodb-on-os-x/

I did it manually on Mac.

I also set the PATH to mongodb in my bash_profile. (~/.bash_profile)

export PATH="&lt;mongodb-install-directory&gt;/bin:$PATH"

I put the command to start mongodb with database path in a shell script.

Make sure using chmod ugo+x &lt;shell_script_file_name&gt;.sh, to grant execution right to the shell scrip.

Then run it: ./&lt;shell_script_file_name&gt;.sh <br />
Try this URL on your browser and you get a nice message from MongoDB: http://localhost:27017/

control-c terminates the mongodb. <br />
control-z just takes you to terminal but mongodb still running. Close the terminal or kill the process to terminate
mongodb.<br />

How to kill the process:</br>

ps -ef|grep mongo  (find the processid) <br />
kill -9 processId


##Install MongoDB on Windows

http://docs.mongodb.org/manual/tutorial/install-mongodb-on-windows/
    

## Getting Started with MongoDB (Python Edition)

### Introduction to MongoDB

https://docs.mongodb.org/getting-started/python/introduction/

### Import Example Dataset

https://docs.mongodb.org/getting-started/python/import-data/

The command connects to mongodb server and populates the data to database path set in running the server script.

data/db

### Python Driver (PyMongo)

https://docs.mongodb.org/getting-started/python/client/

I used pip install pymongo. We already done pip installation before for some Python thirdparty libraries.

### Import pymongo

In [None]:
from pymongo import MongoClient

### Create a Connection

In [None]:
client = MongoClient()

### Access Database Objects

In [None]:
db = client.primer

### Access Collection Objects

In [None]:
coll = db.dataset

### Getting Started with the mongo Shell

http://docs.mongodb.org/manual/tutorial/getting-started-with-the-mongo-shell/

### Insert Data with PyMongo

https://docs.mongodb.org/getting-started/python/insert/

In [None]:
# we can use strptime function to format datetime for MongoDB
from datetime import datetime

datetime.strptime("2014-01-16", "%Y-%m-%d")

In [None]:
# we can use strptime function to format datetime for MongoDB
# datetime.strptime("2014-01-16", "%Y-%m-%d")
# gives you: 

from pymongo import MongoClient
from datetime import datetime

client = MongoClient()
db = client.test

result = db.restaurants.insert_one(
    {
        "address": {
            "street": "2 Avenue",
            "zipcode": "10075",
            "building": "1480",
            "coord": [-73.9557413, 40.7720266]
        },
        "borough": "Manhattan",
        "cuisine": "Italian",
        "grades": [
            {
                "date": datetime.strptime("2014-10-01", "%Y-%m-%d"),
                "grade": "A",
                "score": 11
            },
            {
                "date": datetime.strptime("2014-01-16", "%Y-%m-%d"),
                "grade": "B",
                "score": 17
            }
        ],
        "name": "Vella",
        "restaurant_id": "41704620"
    }
)


In [None]:
result.inserted_id

### Find or Query Data with PyMongo

https://docs.mongodb.org/getting-started/python/query/

In [None]:
from pymongo import MongoClient

client = MongoClient()
db = client.test

In [None]:
# Query for All Documents in a Collection, I put a .limit to get the top 5 document otherwise it takes forever!
cursor = db.restaurants.find().limit(5)

for document in cursor:
    print(document)

In [None]:
# Specify Equality Conditions, again I put a limit on documents being returned
cursor = db.restaurants.find({"borough": "Bronx"}).limit(5)

for document in cursor:
    print(document)

In [None]:
# Query by a Field in an Embedded Document
cursor = db.restaurants.find({"address.zipcode": "10075"})

for document in cursor:
    print(document)

In [None]:
# Query by a Field in an Array, again I put a limit on documents being returned
cursor = db.restaurants.find({"grades.grade": "B"}).limit(5)

for document in cursor:
    print(document)

In [None]:
# Specify Conditions with Operators
# Greater Than Operator ($gt)
cursor = db.restaurants.find({"grades.score": {"$gt": 30}}).limit(5)

for document in cursor:
    print(document)

In [None]:
# Less Than Operator ($lt)
cursor = db.restaurants.find({"grades.score": {"$lt": 10}}).limit(5)

for document in cursor:
    print(document)

In [None]:
# Combine Conditions
# Logical AND
cursor = db.restaurants.find({"cuisine": "Italian", "address.zipcode": "10075"})

for document in cursor:
    print(document)

In [None]:
# Logical OR
cursor = db.restaurants.find(
    {"$or": [{"cuisine": "Italian"}, {"address.zipcode": "10075"}]})

for document in cursor:
    print(document)

In [None]:
# Sort Query Results
import pymongo

cursor = db.restaurants.find().sort([
    ("borough", pymongo.ASCENDING),
    ("address.zipcode", pymongo.DESCENDING)
]).limit(20)

for document in cursor:
    print(document)

### Update Data with PyMongo

https://docs.mongodb.org/getting-started/python/update/

In [None]:
from pymongo import MongoClient

client = MongoClient()
db = client.test

# updates only one document, the first one
# it adds the lastModified to those documents
result = db.restaurants.update_one(
    {"name": "Juni"},
    {
        "$set": {
            "cuisine": "American (New)"
        },
        "$currentDate": {"lastModified": True}
    }
)


In [None]:
result.matched_count

In [None]:
result.modified_count

In [None]:
cursor = db.restaurants.find({"name": "Juni"})

for document in cursor:
    print(document)

In [None]:
# updates one or more documents
result = db.restaurants.update_many(
    {"address.zipcode": "10016", "cuisine": "Other"},
    {
        "$set": {"cuisine": "Category To Be Determined"},
        "$currentDate": {"lastModified": True}
    }
)

In [None]:
result.matched_count

In [None]:
result.modified_count

In [None]:
cursor = db.restaurants.find({"address.zipcode": "10016", "cuisine": "Category To Be Determined"})

for document in cursor:
    print(document)

In [None]:
# replaces a document. The first argument is search criteria and the second what we want to repalce the whole document
# with. So the whole document gets replaced but the _id
result = db.restaurants.replace_one(
    {"restaurant_id": "41704620"},
    {
        "name": "Vella 2",
        "address": {
            "coord": [-73.9557413, 40.7720266],
            "building": "1480",
            "street": "2 Avenue",
            "zipcode": "10075"
        }
    }
)

In [None]:
result.matched_count

In [None]:
result.modified_count

In [None]:
cursor = db.restaurants.find({"address.zipcode": "10075", "address.building": "1480"})

for document in cursor:
    print(document)

### Remove Data with PyMongo

https://docs.mongodb.org/getting-started/python/remove/

In [None]:
from pymongo import MongoClient

client = MongoClient()
db = client.test

# removing alll matching documents
result = db.restaurants.delete_many({"borough": "Manhattan"})

In [None]:
result.deleted_count

In [None]:
# delete all the documents
result = db.restaurants.delete_many({})

In [None]:
result.deleted_count

In [None]:
# drop a collection
db.restaurants.drop()

### Data Aggregation with PyMongo

In [None]:
from pymongo import MongoClient

client = MongoClient()
db = client.test

# group all documents by borough and with _id as key and key count for summing all the counts 
# 1 for $sum value, multiplys the result by 1
cursor = db.restaurants.aggregate(
    [
        {"$group": {"_id": "$borough", "count": {"$sum": 1}}}
    ]
)

for document in cursor:
    print(document)

In [None]:
cursor = db.restaurants.aggregate(
    [
        {"$match": {"borough": "Queens", "cuisine": "Brazilian"}},
        {"$group": {"_id": "$address.zipcode", "count": {"$sum": 1}}}
    ]
)

for document in cursor:
    print(document)

### Showing the date in DataFrame

In [None]:
import pandas as pd
from pymongo import MongoClient

client = MongoClient()
db = client.test
input_data = db.restaurants
data  = pd.DataFrame(list(input_data.find()))
data