### Accessing your MongoDB Atlas cluster through PyMongo

Make sure you have PyMongoDB installed on your machine.
See [reference](https://pypi.org/project/pymongo/).

In [65]:
# importing necessary libraries
import pymongo

In [66]:
# creating a connection to your cluster and database
# Check on your MongoDB Atlas connection tab the string you need to use for connecting to your cluster.
# Remember to check the Network Access tab and adding your IP address *EVERY TIME* you change your network connection
# Important parameters: username, password, cluster name, and database name
myclient = pymongo.MongoClient("mongodb+srv://marcosebarreto:ST207LSE2021@cluster0-meb.u4vks.mongodb.net/ST207_TEST?retryWrites=true&w=majority")
myclient

MongoClient(host=['cluster0-meb-shard-00-01.u4vks.mongodb.net:27017', 'cluster0-meb-shard-00-00.u4vks.mongodb.net:27017', 'cluster0-meb-shard-00-02.u4vks.mongodb.net:27017'], document_class=dict, tz_aware=False, connect=True, retrywrites=True, w='majority', authsource='admin', replicaset='atlas-b40p0x-shard-0', ssl=True)

In [67]:
# We can retrieve the list of existing databases from your cluster
# Even if we use a specific database for connecting to the server, you can still 
# retrieve all the databases and change to other databases
print(myclient.list_database_names())

['ST207_TEST', 'sample_airbnb', 'sample_analytics', 'sample_geospatial', 'sample_mflix', 'sample_restaurants', 'sample_supplies', 'sample_training', 'sample_weatherdata', 'admin', 'local']


In [68]:
# you can create a new database OR open an existing database
# NewDB is the name of your database
mydb = myclient["NewDB"] 

In [69]:
# we can create a new collection
mycol = mydb["customers"] 

In [70]:
# creating an example document
newDoc = { "userId" : 1, "name": "Peter Pan", "address": "Highway 37" }

In [71]:
# inserting one document into the collection
x = mycol.insert_one(newDoc)
# checking the object id for the new document
print("Object ID for the new document:", x.inserted_id)

Object ID for the new document: 619fe9918fb9a2fe042f255b


The most basic type of query that can be performed in MongoDB is `find_one()`. This method returns a single document matching a query (or `None` if there are no matches). It is useful when you know there is only one matching document, or are only interested in the first match. Here we use `find_one()` to get the first document from the posts collection:

In [72]:
# retrieving the first document from our collection
person1 = mycol.find_one()
print("Document:", person1)

Document: {'_id': ObjectId('619fe9918fb9a2fe042f255b'), 'userId': 1, 'name': 'Peter Pan', 'address': 'Highway 37'}


In [73]:
# bulk insert
# we can insert a given number of documents, and each document can have a slightly different structure

import datetime

newDocs = [{"userId": 10,
            "name": "Mike",
            "address": "Street One, 12",
            "date": datetime.datetime(2021, 11, 24, 11, 0)},
           {"userId": 11,
            "name": "Eliot",
            "address": "Street Two, 20",
            "email": "eliot@someplace.com",
            "date": datetime.datetime(2021, 11, 25, 11, 10)},
           {"userId": 12,
            "name": "Mary",
            "address": "Street Two, 45",
            "email": "mary@someplace2.com",
            "date": datetime.datetime(2021, 11, 25, 11, 10)},
           {"userId": 13,
            "name": "Ana",
            "address": "High Street, 200",
            "date": datetime.datetime(2021, 11, 26, 11, 20)},
           {"userId": 14,
            "name": "Billy",
            "address": "Street One, 50",
            "email": "billy@someplace2.com",
            "date": datetime.datetime(2021, 11, 27, 11, 10)},
           {"userId": 15,
            "name": "Karl",
            "address": "Street Two, 2001",
            "date": datetime.datetime(2021, 11, 27, 11, 20)},
           {"userId": 16,
            "name": "Bia",
            "address": "Street One, 5000",
            "email": "bia@someplace2.com",
            "date": datetime.datetime(2021, 11, 28, 11, 10)}]
result = mycol.insert_many(newDocs)
result.inserted_ids

[ObjectId('619fe9938fb9a2fe042f255c'),
 ObjectId('619fe9938fb9a2fe042f255d'),
 ObjectId('619fe9938fb9a2fe042f255e'),
 ObjectId('619fe9938fb9a2fe042f255f'),
 ObjectId('619fe9938fb9a2fe042f2560'),
 ObjectId('619fe9938fb9a2fe042f2561'),
 ObjectId('619fe9938fb9a2fe042f2562')]

In [74]:
# retrieving a particular document based on some search criteria
import pprint

pprint.pprint(mycol.find_one({"name": "Ana"}))

{'_id': ObjectId('619fe9938fb9a2fe042f255f'),
 'address': 'High Street, 200',
 'date': datetime.datetime(2021, 11, 26, 11, 20),
 'name': 'Ana',
 'userId': 13}


In [75]:
# we can query by objectID
# in this case, we are using the _id from the first document inserted into the collection
pprint.pprint(mycol.find_one({"_id": x.inserted_id}))

{'_id': ObjectId('619fe9918fb9a2fe042f255b'),
 'address': 'Highway 37',
 'name': 'Peter Pan',
 'userId': 1}


In [76]:
# retrieving several documents from a collection
for n in mycol.find():
    pprint.pprint(n)

{'_id': ObjectId('619fe9918fb9a2fe042f255b'),
 'address': 'Highway 37',
 'name': 'Peter Pan',
 'userId': 1}
{'_id': ObjectId('619fe9938fb9a2fe042f255c'),
 'address': 'Street One, 12',
 'date': datetime.datetime(2021, 11, 24, 11, 0),
 'name': 'Mike',
 'userId': 10}
{'_id': ObjectId('619fe9938fb9a2fe042f255d'),
 'address': 'Street Two, 20',
 'date': datetime.datetime(2021, 11, 25, 11, 10),
 'email': 'eliot@someplace.com',
 'name': 'Eliot',
 'userId': 11}
{'_id': ObjectId('619fe9938fb9a2fe042f255e'),
 'address': 'Street Two, 45',
 'date': datetime.datetime(2021, 11, 25, 11, 10),
 'email': 'mary@someplace2.com',
 'name': 'Mary',
 'userId': 12}
{'_id': ObjectId('619fe9938fb9a2fe042f255f'),
 'address': 'High Street, 200',
 'date': datetime.datetime(2021, 11, 26, 11, 20),
 'name': 'Ana',
 'userId': 13}
{'_id': ObjectId('619fe9938fb9a2fe042f2560'),
 'address': 'Street One, 50',
 'date': datetime.datetime(2021, 11, 27, 11, 10),
 'email': 'billy@someplace2.com',
 'name': 'Billy',
 'userId': 14}


In [77]:
# couting documents
mycol.count_documents({})

8

In [78]:
# range queries
# we want to retrieve all documents inserted before 25/11/2021
d = datetime.datetime(2021, 11, 25, 12)
for n in mycol.find({"date": {"$lt": d}}).sort("name"):
    pprint.pprint(n)

{'_id': ObjectId('619fe9938fb9a2fe042f255d'),
 'address': 'Street Two, 20',
 'date': datetime.datetime(2021, 11, 25, 11, 10),
 'email': 'eliot@someplace.com',
 'name': 'Eliot',
 'userId': 11}
{'_id': ObjectId('619fe9938fb9a2fe042f255e'),
 'address': 'Street Two, 45',
 'date': datetime.datetime(2021, 11, 25, 11, 10),
 'email': 'mary@someplace2.com',
 'name': 'Mary',
 'userId': 12}
{'_id': ObjectId('619fe9938fb9a2fe042f255c'),
 'address': 'Street One, 12',
 'date': datetime.datetime(2021, 11, 24, 11, 0),
 'name': 'Mike',
 'userId': 10}


In [79]:
# regular expressions
# all addresses starting with S
myquery = { "address": { "$regex": "^S" } }

mydoc = mycol.find(myquery)

for x in mydoc:
  print(x)

{'_id': ObjectId('619fe9938fb9a2fe042f255c'), 'userId': 10, 'name': 'Mike', 'address': 'Street One, 12', 'date': datetime.datetime(2021, 11, 24, 11, 0)}
{'_id': ObjectId('619fe9938fb9a2fe042f255d'), 'userId': 11, 'name': 'Eliot', 'address': 'Street Two, 20', 'email': 'eliot@someplace.com', 'date': datetime.datetime(2021, 11, 25, 11, 10)}
{'_id': ObjectId('619fe9938fb9a2fe042f255e'), 'userId': 12, 'name': 'Mary', 'address': 'Street Two, 45', 'email': 'mary@someplace2.com', 'date': datetime.datetime(2021, 11, 25, 11, 10)}
{'_id': ObjectId('619fe9938fb9a2fe042f2560'), 'userId': 14, 'name': 'Billy', 'address': 'Street One, 50', 'email': 'billy@someplace2.com', 'date': datetime.datetime(2021, 11, 27, 11, 10)}
{'_id': ObjectId('619fe9938fb9a2fe042f2561'), 'userId': 15, 'name': 'Karl', 'address': 'Street Two, 2001', 'date': datetime.datetime(2021, 11, 27, 11, 20)}
{'_id': ObjectId('619fe9938fb9a2fe042f2562'), 'userId': 16, 'name': 'Bia', 'address': 'Street One, 5000', 'email': 'bia@someplace2

In [80]:
# updating data

# filter condition
myquery = { "address": "High Street, 200" }
# update rule
newvalues = { "$set": { "address": "High Street, 2000" } }

# we can use update_one or update_many
mycol.update_many(myquery, newvalues)

# print "customers" after the update:
for x in mycol.find():
  print(x)

{'_id': ObjectId('619fe9918fb9a2fe042f255b'), 'userId': 1, 'name': 'Peter Pan', 'address': 'Highway 37'}
{'_id': ObjectId('619fe9938fb9a2fe042f255c'), 'userId': 10, 'name': 'Mike', 'address': 'Street One, 12', 'date': datetime.datetime(2021, 11, 24, 11, 0)}
{'_id': ObjectId('619fe9938fb9a2fe042f255d'), 'userId': 11, 'name': 'Eliot', 'address': 'Street Two, 20', 'email': 'eliot@someplace.com', 'date': datetime.datetime(2021, 11, 25, 11, 10)}
{'_id': ObjectId('619fe9938fb9a2fe042f255e'), 'userId': 12, 'name': 'Mary', 'address': 'Street Two, 45', 'email': 'mary@someplace2.com', 'date': datetime.datetime(2021, 11, 25, 11, 10)}
{'_id': ObjectId('619fe9938fb9a2fe042f255f'), 'userId': 13, 'name': 'Ana', 'address': 'High Street, 2000', 'date': datetime.datetime(2021, 11, 26, 11, 20)}
{'_id': ObjectId('619fe9938fb9a2fe042f2560'), 'userId': 14, 'name': 'Billy', 'address': 'Street One, 50', 'email': 'billy@someplace2.com', 'date': datetime.datetime(2021, 11, 27, 11, 10)}
{'_id': ObjectId('619fe99

In [81]:
# Aggregations

# we can create a new collection
mycol2 = mydb["orders"] 

result = mycol2.insert_many( [
   { "_id" : 0, "productName" : "Steel beam", "status" : "new", "quantity" : 10 },
   { "_id" : 1, "productName" : "Steel beam", "status" : "urgent", "quantity" : 20 },
   { "_id" : 2, "productName" : "Steel beam", "status" : "urgent", "quantity" : 30 },
   { "_id" : 3, "productName" : "Iron rod", "status" : "new", "quantity" : 15 },
   { "_id" : 4, "productName" : "Iron rod", "status" : "urgent", "quantity" : 50 },
   { "_id" : 5, "productName" : "Iron rod", "status" : "urgent", "quantity" : 10 }
] )

result.inserted_ids

[0, 1, 2, 3, 4, 5]

The `$match` stage:

* Filters the documents to those with a status of urgent.
* Outputs the filtered documents to the `$group` stage.

The `$group` stage:

* Groups the input documents by `productName`.
* Uses `$sum` to calculate the total quantity for each `productName`, which is stored in the `sumQuantity` field returned by the aggregation pipeline.

In [88]:
result = mycol2.aggregate([
   { "$match": { "status": "urgent" } },
   { "$group": { "_id" : "$productName", "sumQuantity": { "$sum": "$quantity" } } }
])

for x in result:
    pprint.pprint(x)

{'_id': 'Iron rod', 'sumQuantity': 60}
{'_id': 'Steel beam', 'sumQuantity': 50}


MongoDB implements [different types of indexes](https://docs.mongodb.com/manual/indexes/).

In [83]:
# indexes

result = mydb.customers.create_index([('userId', pymongo.ASCENDING)], unique=True)
sorted(list(mydb.customers.index_information()))

['_id_', 'userId_1']

In [84]:
# inserting some data 
newDocs = [{"userId": 200,
            "name": "Steve",
            "address": "Street One, 120",
            "date": datetime.datetime(2021, 11, 27, 11, 0)},
           {"userId": 211,
            "name": "Gracie",
            "address": "Street Two, 20",
            "email": "gracie@otherplace.com",
            "date": datetime.datetime(2021, 11, 27, 11, 10)},
           {"userId": 212,
            "name": "Penny",
            "address": "High Street, 200",
            "email": "penny@someplace2.com",
            "date": datetime.datetime(2021, 11, 28, 11, 10)}]
result = mycol.insert_many(newDocs)
result.inserted_ids

[ObjectId('619fe9a88fb9a2fe042f2563'),
 ObjectId('619fe9a88fb9a2fe042f2564'),
 ObjectId('619fe9a88fb9a2fe042f2565')]

In [86]:
newDoc = {"userId": 200, "name": "Bruce", "address": "Street Four, 120", "date": datetime.datetime(2021, 11, 28, 11, 0)}

result = mycol.insert_one(newDoc)
result.inserted_ids

DuplicateKeyError: E11000 duplicate key error collection: NewDB.customers index: userId_1 dup key: { userId: 200 }, full error: {'index': 0, 'code': 11000, 'keyPattern': {'userId': 1}, 'keyValue': {'userId': 200}, 'errmsg': 'E11000 duplicate key error collection: NewDB.customers index: userId_1 dup key: { userId: 200 }'}