## MongoDB

In [1]:
import pymongo

from pymongo import MongoClient

## Establishing a Connection

In [2]:
client = MongoClient()
client

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

## Working with Databases, Collections, and Documents

In [3]:
db = client.rptutorials

In [4]:
db

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'rptutorials')

In [5]:
db = client['rptutorials']

*_NOTE_*: When you use the `mongo` shell, you have access to the database through the `db` global object. When you use `pymongo`, you can assign the database to the variable called `db` to get similar behaviour.

In [6]:
tutorial1 = {
    "title" : "Working With JSON Data in Python",
    "author" : "Lucas",
    "contributors" : [
        "Aldren",
        "Dan",
        "Joanna"
    ],
    "url" : "https://realpython.com/python-json/"
}

In [7]:
tutorial = db.tutorial
tutorial

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'rptutorials'), 'tutorial')

In [8]:
result = tutorial.insert_one(tutorial1)
result

<pymongo.results.InsertOneResult at 0x124c31a60>

In this case, `tutorial` is an instance of `Collection` and represents a physical collection of documents in your database. You can insert documents into tutorial by calling `.insert_one()` on it with a document as an argument.

In [9]:
print(f'One tutorial: {result.inserted_id}')

One tutorial: 6286ae480a31df0bdf40a239


Here, `insert_one()` takes `tutorial1`, inserts it into the `tutorial` collection and returns an `InsertOneResult` object. The object provides feedback on the inserted document. Note that since MongoDB generates the OBjectId dynamically, your output won't match the ObjectId shown above.

In [10]:
tutorial2 = {
    "title" : "Python's Requests Library (Guide)",
    "author" : "Alex",
    "contributors" : [
        "Aldren",
        "Brad",
        "Joanna"
    ],
    "url" : "https://realpython.com/python-requests/"
}

tutorial3 = {
    "title" : "Object-Oriented Programming (OOP) in Python 3",
    "author": "David",
    "contributors": [
        "Aldren",
        "Joanna",
        "Jacob"
        ],
    "url": "https://realpython.com/python3-object-oriented-programming/"
}

In [11]:
new_result = tutorial.insert_many([tutorial2, tutorial3])

In [12]:
print(f'Multiple tutorials: {new_result.inserted_ids}')

Multiple tutorials: [ObjectId('6286ae480a31df0bdf40a23a'), ObjectId('6286ae480a31df0bdf40a23b')]


The call to `.insert_many()` takes an iterable of documents and inserts them into the `tutorial` collection in your `rptutorials` database. The method returns an instance of `InsertManyResult`, which provides information on the inserted documents.

In [13]:
from pprint import pprint

In [14]:
for doc in tutorial.find():
    pprint(doc)

{'_id': ObjectId('62869a48be2eb911c3d96219'),
 'author': 'Jon',
 'contributors': ['Aldren', 'Geir Arne', 'Joanna', 'Jason'],
 'title': 'Reading and Writing CSV Files in Python',
 'url': 'https://realpython.com/python-csv/'}
{'_id': ObjectId('62869b49be2eb911c3d9621a'),
 'author': 'Leodanis',
 'contributors': ['Aldren', 'Jim', 'Joanna'],
 'title': 'How to Iterate Through a Dictionary in Python',
 'url': 'https://realpython.com/iterate-through-dictionary-python/'}
{'_id': ObjectId('62869b49be2eb911c3d9621b'),
 'author': 'Joanna',
 'contributors': ['Adrianna', 'David', 'Dan', 'Jim', 'Pavel'],
 'title': "Python 3's f-Strings: An Improved String Formatting Syntax",
 'url': 'https://realpython.com/python-f-strings/'}
{'_id': ObjectId('62869ded2c4da9a620f36cc9'),
 'author': 'Lucas',
 'contributors': ['Aldren', 'Dan', 'Joanna'],
 'title': 'Working With JSON Data in Python',
 'url': 'https://realpython.com/python-json/'}
{'_id': ObjectId('62869e052c4da9a620f36ccb'),
 'author': 'Lucas',
 'contri

To retrieve documents from a collection, you can use `.find()`. Without arguments, `.find()` returns a `Cursor` object that `yields` the documents in the collection on demand.

You can use a dictionary that contains fields to retrive specific documents instead.

In [15]:
jon_tutorial = tutorial.find_one({'author' : 'Jon'})
pprint(jon_tutorial)

{'_id': ObjectId('62869a48be2eb911c3d96219'),
 'author': 'Jon',
 'contributors': ['Aldren', 'Geir Arne', 'Joanna', 'Jason'],
 'title': 'Reading and Writing CSV Files in Python',
 'url': 'https://realpython.com/python-csv/'}


## Closing Connections

Establishing a connection to a `MongoDB` database is typically an expensive operation. If you have an application that constantly retries and manipulates data in a `MongoDB` database, then you probably don't want to be opening and closing the connection all the time since this might affect your application's performance.

In this kind of situation, you should keep your connection alive and only close it before existing the application to clear all the acquired resources.

In [16]:
client.close()

Another situation is when you have an application that occasionally uses a `MongoDB` database. In this case, you might want to open the connection when needed and close it immediately after use for freeing the acquired resources. A  consistent approach to this problem would be to use the `with` statement.

In [17]:
with MongoClient() as client:
    
    db = client.rptutorials
    
    for doc in db.tutorial.find():
        pprint(doc)

{'_id': ObjectId('62869a48be2eb911c3d96219'),
 'author': 'Jon',
 'contributors': ['Aldren', 'Geir Arne', 'Joanna', 'Jason'],
 'title': 'Reading and Writing CSV Files in Python',
 'url': 'https://realpython.com/python-csv/'}
{'_id': ObjectId('62869b49be2eb911c3d9621a'),
 'author': 'Leodanis',
 'contributors': ['Aldren', 'Jim', 'Joanna'],
 'title': 'How to Iterate Through a Dictionary in Python',
 'url': 'https://realpython.com/iterate-through-dictionary-python/'}
{'_id': ObjectId('62869b49be2eb911c3d9621b'),
 'author': 'Joanna',
 'contributors': ['Adrianna', 'David', 'Dan', 'Jim', 'Pavel'],
 'title': "Python 3's f-Strings: An Improved String Formatting Syntax",
 'url': 'https://realpython.com/python-f-strings/'}
{'_id': ObjectId('62869ded2c4da9a620f36cc9'),
 'author': 'Lucas',
 'contributors': ['Aldren', 'Dan', 'Joanna'],
 'title': 'Working With JSON Data in Python',
 'url': 'https://realpython.com/python-json/'}
{'_id': ObjectId('62869e052c4da9a620f36ccb'),
 'author': 'Lucas',
 'contri

At the end of the `with` code block, the client's `.__exit__()` method gets called, which at the same time closes the connection by calling `.close()`.

## Using MongoDB with Python and MongoEngine

While `PyMongo` is powerful for interfacing with MongoDB, it may not be high-level enough for more complex projects. One library that provides a higher abstraction on top of PyMongo is `MongoEngine`. This is an _object-document mapper_ (ODM), which is roughly equivalent to an SQL-based _object-relational mapper_ (ORM). `MongoEngine` provides a class-based abstraction, so all the models you create are classes.

In [20]:
from mongoengine import connect, Document, ListField, StringField, URLField
connect(db='rptutorials', host='localhost', port=27017)

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, read_preference=Primary(), uuidrepresentation=3)

To create documents with `MongoEngine`, you first need to define what data you want the documents to have--otherwise known as the _document schema_. `MongoEngine` encourages you to define a document schema to help you reduce coding errors and to allow you to define utility or helper methods.

Similar to ORMs, ODMs like `MonogEngine` provide a base or model class for you to define a document schema. In ORMs, that class is equivalent to a table, and its instances are equivalent to rows.

In [21]:
class Tutorial(Document):
    title = StringField(required=True, max_length=70)
    author = StringField(required=True, max_length=20)
    contributors = ListField(StringField(mx_length=20))
    url = URLField(required=True)

With this model, you tell `MongoEngine` that you expect a `Tutorial` document to have a`.title`, an `.author`, a list of `.contributors`, and a `.url`. The bass class, Document, uses that information along with the field types to validate the input data for you.

> *_Note_*: One of the more difficult tasks with database models is **data validation**. How do you make sure that the input data conforms to your format requirements? That's one of the reasons for you to have a coherent and uniform document schema.
>
>`MongoDB` is said to be a schemaless database, but that doesn't mean it's schema-free. Having documents with a different schema within the same collection can lead to processing errors and inconsistent behaviour.

Now, if you try a save a `Tutorial` object without a `.title`, then your model throws an exception and lets you know. You can take this even further and add more restrictions, such as the length of the `.title`, and so on.

Here are some of those parameters:
- `db-field` specifies a different field name
- `required` ensures that the field is provided
- `default` provides a default value for a given field if no value is given
- `unique` ensures that no other document in the collection has the same value for this field

> Each specific field type also has its own set of parameters.

To save a document to your database, you need to call `.save()` on a document object. If the document already exists, then all the changes will be applied to the existing document. If the document doesn't exist, then it'll be created.

In [22]:
tutorial1 = Tutorial(
    title = 'Beatiful Soup: Build a Web Scraper with Python',
    author = 'Martin',
    contributors = [
        'Aldren',
        'Geir Arne',
        'Jaya',
        'Joanna',
        'Mike'
    ],
    url = 'https://realpython.com/beautiful-soup-web-scraper-python'
)

In [23]:
# insert the new tutorial
tutorial1.save()

<Tutorial: Tutorial object>

By default, `.save()` inserts the new document into a collection named after the model class `Tutorial`, except using lowercase letters. In this case, the collection name is `tutorial`, which matches the collection you've been using to save the other tutorials.

`PyMongo` performs **data validation** when you call `.save()`. This means that it checks the input data against data against the schema you declared in the `Tutorial` model class. If the input data violates the schema or any of its constraints, then you get an exception, and the data isn't saved into the database.

In [26]:
# tutorial2 does not specify the field `title`` but the class `Tutorial`` requires it
tutorial2 = Tutorial()
tutorial2.author = "Alex"
tutorial2.contributors = ["Aldren", "Jon", "Joanna"]
tutorial2.url = "https://realpython.com/convert-python-string-to-int/"
tutorial2.save()

ValidationError: ValidationError (Tutorial:None) (Field is required: ['title'])

Each Document subclass has an `.objects` attribute that you can use to access the documents in the associated collection.

In [27]:
for doc in Tutorial.objects:
    print(doc.title)

Reading and Writing CSV Files in Python
How to Iterate Through a Dictionary in Python
Python 3's f-Strings: An Improved String Formatting Syntax
Working With JSON Data in Python
Working With JSON Data in Python
Python's Requests Library (Guide)
Object-Oriented Programming (OOP) in Python 3
Working With JSON Data in Python
Python's Requests Library (Guide)
Object-Oriented Programming (OOP) in Python 3
Working With JSON Data in Python
Python's Requests Library (Guide)
Object-Oriented Programming (OOP) in Python 3
Working With JSON Data in Python
Python's Requests Library (Guide)
Object-Oriented Programming (OOP) in Python 3
Beatiful Soup: Build a Web Scraper with Python


In [28]:
for doc in Tutorial.objects(author='Alex'):
    print(doc.title)

Python's Requests Library (Guide)
Python's Requests Library (Guide)
Python's Requests Library (Guide)
Python's Requests Library (Guide)
