In [18]:
!pip install pymongo





In [19]:
from pymongo import MongoClient
import datetime

In [20]:
client = MongoClient()

The above code will connect on the default host and port. We can also specify the host and port explicitly, as follows:

In [21]:
client = MongoClient(host="localhost", port=27017)
# OR
client = MongoClient("mongodb://localhost:27017")

client

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

**Getting a Database**
Once you have a connected instance of MongoClient, 
you can access any database managed by the specified MongoDB server. 
To define which database you want to use, you can use the dot notation:

In [22]:
db = client.test_database
# OR 
db = client["test_database"]
#This statement is handy when the name of your database isn’t a valid Python identifier.

db

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'test_database')

In this case, **newCollection** is an instance of **Collection** and represents a physical collection of documents in your database. You can insert documents into tutorial by calling `.insert_one()` on it with a document as an argument:

In [23]:
collection = db.test_collection
#OR
collection = db['test_collection']
#This statement is handy when the name of your collection isn’t a valid Python identifier.

collection

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'test_database'), 'test_collection')

**Sample Document**
Following example shows the document structure of a blog site, which is simply a comma separated key value pair.

In [24]:
post = {"author": "Mike",
        "text": "My first blog post!",
        "tags": ["mongodb", "python", "pymongo"],
        "date": datetime.datetime.utcnow()}

In [25]:
post_collection = db.posts
post_id = post_collection.insert_one(post).inserted_id
post_id

ObjectId('62a956a000e3583e1758d80c')

In [26]:
import pprint
pprint.pprint(post_collection.find_one())

{'_id': ObjectId('628de76d9c628e0a3f69ff33'),
 'author': 'Mike',
 'date': datetime.datetime(2022, 5, 25, 8, 16, 48, 627000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


In [27]:
pprint.pprint(post_collection.find_one({"author": "Mike"}))

{'_id': ObjectId('628de76d9c628e0a3f69ff33'),
 'author': 'Mike',
 'date': datetime.datetime(2022, 5, 25, 8, 16, 48, 627000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


In [28]:
post_collection.find_one({"author": "Eliot"})

In [29]:
pprint.pprint(post_collection.find_one({"_id": post_id}))

{'_id': ObjectId('62a956a000e3583e1758d80c'),
 'author': 'Mike',
 'date': datetime.datetime(2022, 6, 15, 3, 48, 48, 279000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


## Inserting many Documents
`insert_many()` This method is used to insert multiple entries in a collection or the database in MongoDB. The parameter of this method is a list that contains dictionaries of the data that we want to insert in the collection.


In [30]:
post_2 = {"author": "Leo",
        "text": "Fasting 14-10",
        "tags": ["python", "pymongo", "django"],
        "date": datetime.datetime.utcnow()}

post_3 = {"author": "Jack",
        "text": "Fastest Car",
        "tags": ["mongodb", "python", "pyspark"],
        "date": datetime.datetime.utcnow()}

In [31]:
new_result = post_collection.insert_many([post_2, post_3])

for i in new_result.inserted_ids:
    pprint.pprint(post_collection.find_one({"_id": i}))

{'_id': ObjectId('62a956a000e3583e1758d80d'),
 'author': 'Leo',
 'date': datetime.datetime(2022, 6, 15, 3, 48, 48, 320000),
 'tags': ['python', 'pymongo', 'django'],
 'text': 'Fasting 14-10'}
{'_id': ObjectId('62a956a000e3583e1758d80e'),
 'author': 'Jack',
 'date': datetime.datetime(2022, 6, 15, 3, 48, 48, 320000),
 'tags': ['mongodb', 'python', 'pyspark'],
 'text': 'Fastest Car'}


This is faster and more straightforward than calling `.insert_one()` multiple times. The call to `.insert_many()` takes an iterable of documents and inserts them into the tutorial collection in your rptutorials database.

## Querying for More Than One Document


To retrieve documents from a collection, you can use `.find()`. Without arguments, `.find()` returns a Cursor object that yields the documents in the collection on demand:

In [32]:
for doc in post_collection.find():
    print(doc)

{'_id': ObjectId('628de76d9c628e0a3f69ff33'), 'author': 'Mike', 'text': 'My first blog post!', 'tags': ['mongodb', 'python', 'pymongo'], 'date': datetime.datetime(2022, 5, 25, 8, 16, 48, 627000)}
{'_id': ObjectId('628e77c09598a2b042476679'), 'author': 'Mike', 'text': 'My first blog post!', 'tags': ['mongodb', 'python', 'pymongo'], 'date': datetime.datetime(2022, 5, 25, 18, 38, 56, 601000)}
{'_id': ObjectId('628ef229eebc6749ebf9c50d'), 'author': 'Mike', 'text': 'My first blog post!', 'tags': ['mongodb', 'python', 'pymongo'], 'date': datetime.datetime(2022, 5, 26, 3, 21, 13, 397000)}
{'_id': ObjectId('628ef46feebc6749ebf9c50e'), 'author': 'Mike', 'text': 'My first blog post !', 'tags': ['mongodb', 'python', 'pymongo'], 'date': datetime.datetime(2022, 5, 26, 3, 30, 54, 831000)}
{'_id': ObjectId('628ef4b8eebc6749ebf9c50f'), 'author': 'Tomas', 'text': 'My first blog post !', 'tags': ['mongodb', 'python', 'pymongo'], 'date': datetime.datetime(2022, 5, 26, 3, 32, 7, 766000)}
{'_id': ObjectId(

## Counting

If we just want to know how many documents match a query we can perform a `count_documents()` operation instead of a full query. We can get a count of all of the documents in a collection:

In [33]:
post_collection.count_documents({})

20

## Aggregation

There are several methods of performing aggregations in MongoDB. These examples cover the new aggregation framework, using map reduce and using the group method.

create a sample collection named inventory with the following document:

In [35]:
db.inventory.insert_one({"_id" : 2, "item" : "ABC1", "sizes": [ "S", "M", "L"]})

<pymongo.results.InsertOneResult at 0x27ff2d6f7c0>

The following aggregation uses the $unwind stage to output a document for each element in the sizes array:

In [49]:
result = db.inventory.aggregate( [ { "$unwind": "$sizes" } ] )
print(list(result))

[{'_id': 2, 'item': 'ABC1', 'sizes': 'S'}, {'_id': 2, 'item': 'ABC1', 'sizes': 'M'}, {'_id': 2, 'item': 'ABC1', 'sizes': 'L'}]


In [55]:
db.inventory.insert_many([{"x": 1, "tags": ["dog", "cat"]},
                                {"x": 2, "tags": ["cat"]},
                                {"x": 2, "tags": ["mouse", "cat", "dog"]},
                                {"x": 3, "tags": []}])


<pymongo.results.InsertManyResult at 0x27ff2d6f730>

In [56]:
result = db.inventory.aggregate( [ {"$unwind": "$tags"}, {"$group": {"_id": "$tags", "count": {"$sum": 1}}} ] )
print(list(result))

[{'_id': 'dog', 'count': 6}, {'_id': 'cat', 'count': 9}, {'_id': 'mouse', 'count': 3}]


As python dictionaries don’t maintain order you should use `SON` or `collections.OrderedDict` where explicit ordering is required eg `“$sort”`:

In [52]:
from bson.son import SON
pipeline = [
    {"$unwind": "$tags"},
    {"$group": {"_id": "$tags", "count": {"$sum": 1}}},
    {"$sort": {"count": -1, "_id": -1}}
]
result = collection.aggregate( pipeline )

In [53]:
print(list(result))

[{'_id': 'cat', 'count': 9}, {'_id': 'dog', 'count': 6}, {'_id': 'mouse', 'count': 3}]
