In [3]:
!pip install pymongo



In [4]:
from pymongo import MongoClient
import datetime

In [5]:
client = MongoClient()

The above code will connect on the default host and port. We can also specify the host and port explicitly, as follows:

In [6]:
client = MongoClient(host="localhost", port=27017)
# OR
client = MongoClient("mongodb://localhost:27017")

client

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

**Getting a Database**
Once you have a connected instance of MongoClient, 
you can access any database managed by the specified MongoDB server. 
To define which database you want to use, you can use the dot notation:

In [7]:
db = client.test_database
# OR 
db = client["test_database"]
#This statement is handy when the name of your database isn’t a valid Python identifier.

db

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'test_database')

In this case, **newCollection** is an instance of **Collection** and represents a physical collection of documents in your database. You can insert documents into tutorial by calling `.insert_one()` on it with a document as an argument:

In [8]:
collection = db.test_collection
#OR
collection = db['test_collection']
#This statement is handy when the name of your collection isn’t a valid Python identifier.

collection

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'test_database'), 'test_collection')

**Sample Document**
Following example shows the document structure of a blog site, which is simply a comma separated key value pair.

In [9]:
post = {"author": "Mike",
        "text": "My first blog post!",
        "tags": ["mongodb", "python", "pymongo"],
        "date": datetime.datetime.utcnow()}

In [10]:
posts = db.posts
post_id = posts.insert_one(post).inserted_id
post_id

ObjectId('628e77c09598a2b042476679')

In [11]:
import pprint
pprint.pprint(posts.find_one())

{'_id': ObjectId('628de76d9c628e0a3f69ff33'),
 'author': 'Mike',
 'date': datetime.datetime(2022, 5, 25, 8, 16, 48, 627000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


In [12]:
pprint.pprint(posts.find_one({"author": "Mike"}))

{'_id': ObjectId('628de76d9c628e0a3f69ff33'),
 'author': 'Mike',
 'date': datetime.datetime(2022, 5, 25, 8, 16, 48, 627000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


In [13]:
posts.find_one({"author": "Eliot"})

In [15]:
pprint.pprint(posts.find_one({"_id": post_id}))

{'_id': ObjectId('628e77c09598a2b042476679'),
 'author': 'Mike',
 'date': datetime.datetime(2022, 5, 25, 18, 38, 56, 601000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


In [16]:
post_id_as_str = str(post_id)
posts.find_one({"_id": post_id_as_str}) # No result

## Inserting many Documents
`insert_many()` This method is used to insert multiple entries in a collection or the database in MongoDB. The parameter of this method is a list that contains dictionaries of the data that we want to insert in the collection.


In [18]:
post_2 = {"author": "Leo",
        "text": "Fasting 14-10",
        "tags": ["python", "pymongo", "django"],
        "date": datetime.datetime.utcnow()}

post_3 = {"author": "Jack",
        "text": "Fastest Car",
        "tags": ["mongodb", "python", "pyspark"],
        "date": datetime.datetime.utcnow()}

In [19]:
new_result = collection.insert_many([item2, item3])

print(f"Multiple items: {new_result.inserted_ids}")

Multiple items: [ObjectId('628e79149598a2b04247667a'), ObjectId('628e79149598a2b04247667b')]


This is faster and more straightforward than calling `.insert_one()` multiple times. The call to `.insert_many()` takes an iterable of documents and inserts them into the tutorial collection in your rptutorials database.

## Querying for More Than One Document


To retrieve documents from a collection, you can use `.find()`. Without arguments, `.find()` returns a Cursor object that yields the documents in the collection on demand:

In [39]:
for doc in newCollection.find():
    print(doc)

{'_id': ObjectId('6220899c61cdcb1411c5421d'), 'title': 'Working With JSON Data in Python', 'author': 'Lucas', 'contributors': ['Aldren', 'Dan', 'Joanna'], 'url': 'https://realpython.com/python-json/'}
{'_id': ObjectId('622089ca61cdcb1411c5421e'), 'title': 'Working With JSON Data in Python', 'author': 'Lucas', 'contributors': ['Aldrenn', 'Dan', 'Joanna'], 'url': 'https://realpython.com/python-json/'}
{'_id': ObjectId('62208aa561cdcb1411c5421f'), 'title': "Python's Requests Library (Guide)", 'author': 'Alex', 'contributors': ['Aldren', 'Brad', 'Joanna'], 'url': 'https://realpython.com/python-requests/'}
{'_id': ObjectId('62208aa561cdcb1411c54220'), 'title': 'Object-Oriented Programming (OOP) in Python 3', 'author': 'David', 'contributors': ['Aldren', 'Joanna', 'Jacob'], 'url': 'https://realpython.com/python3-object-oriented-programming/'}


## Counting

If we just want to know how many documents match a query we can perform a `count_documents()` operation instead of a full query. We can get a count of all of the documents in a collection:

In [37]:
newCollection.count_documents({})

4

## Aggregation

There are several methods of performing aggregations in MongoDB. These examples cover the new aggregation framework, using map reduce and using the group method.

create a sample collection named inventory with the following document:

In [22]:
from pymongo import MongoClient
database = MongoClient().database
collection = database.collention
collection.insert_one({"_id" : 2, "item" : "ABC1", "sizes": [ "S", "M", "L"]})

<pymongo.results.InsertOneResult at 0x2478d9581c0>

The following aggregation uses the $unwind stage to output a document for each element in the sizes array:

In [23]:
result = collection.aggregate( [ { "$unwind": "$sizes" } ] )
print(list(result))

[{'_id': 2, 'item': 'ABC1', 'sizes': 'S'}, {'_id': 2, 'item': 'ABC1', 'sizes': 'M'}, {'_id': 2, 'item': 'ABC1', 'sizes': 'L'}]


In [24]:
from pymongo import MongoClient
database = MongoClient().database
collection = database.collention

result = collection.insert_many([{"x": 1, "tags": ["dog", "cat"]},
                                {"x": 2, "tags": ["cat"]},
                                {"x": 2, "tags": ["mouse", "cat", "dog"]},
                                {"x": 3, "tags": []}])
result.inserted_ids

[ObjectId('628e7c849598a2b04247667f'),
 ObjectId('628e7c849598a2b042476680'),
 ObjectId('628e7c849598a2b042476681'),
 ObjectId('628e7c849598a2b042476682')]

As python dictionaries don’t maintain order you should use `SON` or `collections.OrderedDict` where explicit ordering is required eg `“$sort”`:

In [25]:
from bson.son import SON
pipeline = [
    {"$unwind": "$tags"},
    {"$group": {"_id": "$tags", "count": {"$sum": 1}}},
    {"$sort": SON([("count", -1), ("_id", -1)])}
]
result = collection.aggregate( pipeline )

In [26]:
print(list(result))

[{'_id': 'cat', 'count': 3}, {'_id': 'dog', 'count': 2}, {'_id': 'mouse', 'count': 1}]
