# MongoDB Guide

### Step 1
Install [pymongo](https://www.mongodb.com/docs/drivers/pymongo/) driver

In [1]:
!pip install "pymongo[srv]"

Collecting pymongo[srv]
  Downloading pymongo-4.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (677 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m677.1/677.1 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dnspython<3.0.0,>=1.16.0 (from pymongo[srv])
  Downloading dnspython-2.5.0-py3-none-any.whl (305 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m305.4/305.4 kB[0m [31m35.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.5.0 pymongo-4.6.1


### Step 2

#### Connect to Atlas cluster

In [2]:
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi

In [3]:
!curl ipecho.net/plain

35.184.57.17

In [4]:
username = ""
password = ""
cluster_url = ""

uri = f"mongodb+srv://{username}:{password}@{cluster_url}/?retryWrites=true&w=majority"

# Create a new client and connect to the server
client = MongoClient(uri, server_api=ServerApi('1'))

# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)

Pinged your deployment. You successfully connected to MongoDB!


### Step 3

#### Basic commands

##### Getting a Database
A single instance of MongoDB can support multiple independent databases. When working with PyMongo you access databases using attribute style access on MongoClient instances:

In [None]:
db = client.test_database

If your database name is such that using attribute style access won’t work (like test-database), you can use dictionary style access instead:

In [None]:
db = client['test-database']

##### Getting a Collection

A collection is a group of documents stored in MongoDB, and can be thought of as roughly the equivalent of a table in a relational database. Getting a collection in PyMongo works the same as getting a database:

In [None]:
collection = db.test_collection

or (using dictionary style access):

In [None]:
collection = db["test-collection"]

An important note about collections (and databases) in MongoDB is that they are created lazily - none of the above commands have actually performed any operations on the MongoDB server. Collections and databases are created when the first document is inserted into them.

##### Documents
Data in MongoDB is represented (and stored) using JSON-style documents. In PyMongo we use dictionaries to represent documents. As an example, the following dictionary might be used to represent a blog post:

In [None]:
import datetime
post = {
    "author": "Mike",
    "text": "My first blog post!",
    "tags": ["mongodb", "python", "pymongo"],
    "date": datetime.datetime.now(tz=datetime.timezone.utc),
}

Note that documents can contain native Python types (like datetime.datetime instances) which will be automatically converted to and from the appropriate BSON types.

##### Inserting a Document
To insert a document into a collection we can use the insert_one() method:

In [None]:
posts = db.posts2
post_id = posts.insert_one(post).inserted_id
post_id

ObjectId('6561c2824615117e21a1e4be')

When a document is inserted a special key, "_id", is automatically added if the document doesn’t already contain an "_id" key. The value of "_id" must be unique across the collection. insert_one() returns an instance of InsertOneResult. For more information on "_id", see the documentation on _id.

After inserting the first document, the posts collection has actually been created on the server. We can verify this by listing all of the collections in our database:

In [None]:
db.list_collection_names()

['posts', 'posts2']

##### Getting a Single Document
The most basic type of query that can be performed in MongoDB is find_one(). This method returns a single document matching a query (or None if there are no matches). It is useful when you know there is only one matching document, or are only interested in the first match. Here we use find_one() to get the first document from the posts collection:

In [None]:
import pprint
pprint.pprint(posts.find_one())

{'_id': ObjectId('6561c2824615117e21a1e4be'),
 'author': 'Mike',
 'date': datetime.datetime(2023, 11, 25, 9, 46, 39, 771000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


The result is a dictionary matching the one that we inserted previously.

find_one() also supports querying on specific elements that the resulting document must match. To limit our results to a document with author “Mike” we do:

In [None]:
pprint.pprint(posts.find_one({"author": "Mike"}))

{'_id': ObjectId('6561c2824615117e21a1e4be'),
 'author': 'Mike',
 'date': datetime.datetime(2023, 11, 25, 9, 46, 39, 771000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


If we try with a different author, like “Eliot”, we’ll get no result:

In [None]:
pprint.pprint(posts.find_one({"author": "Eliot"}))

None


##### Querying By ObjectId
We can also find a post by its _id, which in our example is an ObjectId:

In [None]:
post_id
pprint.pprint(posts.find_one({"_id": post_id}))

{'_id': ObjectId('6561c2824615117e21a1e4be'),
 'author': 'Mike',
 'date': datetime.datetime(2023, 11, 25, 9, 46, 39, 771000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


Note that an ObjectId is not the same as its string representation:

In [None]:
post_id_as_str = str(post_id)
print(post_id_as_str, type(post_id_as_str))
print(post_id, type(post_id))


pprint.pprint(posts.find_one({"_id": post_id_as_str}))  # No result

pprint.pprint(posts.find_one({"_id": post_id}))  # No result

6561c2824615117e21a1e4be <class 'str'>
6561c2824615117e21a1e4be <class 'bson.objectid.ObjectId'>
None
{'_id': ObjectId('6561c2824615117e21a1e4be'),
 'author': 'Mike',
 'date': datetime.datetime(2023, 11, 25, 9, 46, 39, 771000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


A common task in web applications is to get an ObjectId from the request URL and find the matching document. It’s necessary in this case to convert the ObjectId from a string before passing it to find_one:

In [None]:
from bson.objectid import ObjectId

pprint.pprint(posts.find_one({'_id': ObjectId(post_id_as_str)}))

{'_id': ObjectId('6561c2824615117e21a1e4be'),
 'author': 'Mike',
 'date': datetime.datetime(2023, 11, 25, 9, 46, 39, 771000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


##### Bulk Inserts
In order to make querying a little more interesting, let’s insert a few more documents. In addition to inserting a single document, we can also perform bulk insert operations, by passing a list as the first argument to insert_many(). This will insert each document in the list, sending only a single command to the server:

In [None]:
new_posts = [
    {
        "author": "Mike",
        "text": "Another post!",
        "tags": ["bulk", "insert"],
        "date": datetime.datetime(2009, 11, 12, 11, 14),
    },
    {
        "author": "Eliot",
        "title": "MongoDB is fun",
        "text": "and pretty easy too!",
        "date": datetime.datetime(2009, 11, 10, 10, 45),
    },
]
result = posts.insert_many(new_posts)
result.inserted_ids

##### Querying for More Than One Document

To get more than a single document as the result of a query we use the find() method. find() returns a Cursor instance, which allows us to iterate over all matching documents. For example, we can iterate over every document in the posts collection:

In [None]:
for post in posts.find():
    pprint.pprint(post)

Just like we did with find_one(), we can pass a document to find() to limit the returned results. Here, we get only those documents whose author is “Mike”:

In [None]:
for post in posts.find({"author": "Mike"}):
    pprint.pprint(post)

##### Counting
If we just want to know how many documents match a query we can perform a count_documents() operation instead of a full query. We can get a count of all of the documents in a collection:

In [None]:
posts.count_documents({})

or just of those documents that match a specific query:

In [None]:
posts.count_documents({"author": "Mike"})

##### Range Queries
MongoDB supports many different types of advanced queries. As an example, lets perform a query where we limit results to posts older than a certain date, but also sort the results by author:

In [None]:
d = datetime.datetime(2009, 11, 12, 12)
for post in posts.find({"date": {"$lt": d}}).sort("author"):
    pprint.pprint(post)

Here we use the special "$lt" operator to do a range query, and also call sort() to sort the results by author.

##### Indexing
Adding indexes can help accelerate certain queries and can also add additional functionality to querying and storing documents. In this example, we’ll demonstrate how to create a unique index on a key that rejects documents whose value for that key already exists in the index.

First, we’ll need to create the index:

In [None]:
from pymongo import ASCENDING


result = db.profiles.create_index([("user_id", ASCENDING)], unique=True)
sorted(list(db.profiles.index_information()))

Notice that we have two indexes now: one is the index on _id that MongoDB creates automatically, and the other is the index on user_id we just created.

Now let’s set up some user profiles:

In [None]:
user_profiles = [{"user_id": 211, "name": "Luke"}, {"user_id": 212, "name": "Ziltoid"}]
result = db.profiles.insert_many(user_profiles)

The index prevents us from inserting a document whose user_id is already in the collection:

In [None]:
new_profile = {"user_id": 213, "name": "Drew"}
duplicate_profile = {"user_id": 212, "name": "Tommy"}
result = db.profiles.insert_one(new_profile)  # This is fine.
result = db.profiles.insert_one(duplicate_profile)