<img src="https://s3.amazonaws.com/edu-static.mongodb.com/lessons/M220/notebook_assets/screen_align.png" style="margin: 0 auto;">


<h1 style="text-align: center; font-size=58px;">Writes with Error Handling</h1>

At this point we've gotten pretty comfortable writing data to Mongo, creating and updating with different durabilities. We've even configured the driver to change the way our writes are perceived. But there are still times when the writes we send to the server will result in an error, and we've briefly discussed the way our application can deal with these errors.

In this lesson we're gonna encounter some of the basic errors in the Pymongo driver, and how to handle these errors in a way that makes our application more consistent and reliable.

In [None]:
from pymongo import MongoClient, errors
uri = "mongodb+srv://m220-user:m220-pass@m220-lessons-mcxlm.mongodb.net/test"
mc = MongoClient(uri)
lessons = mc.lessons
shipments = lessons.shipments

So here's a URI string connecting to our Atlas cluster, and I've initialized a client with that string.

We're using a new collection called `shipments`, and the scenario for this lesson is that our application is a clothing manufacturer that also handles the shipping for their clothing items.

In [None]:
import time
import random
from pprint import pprint

shipments.drop()

cities = [ "Atlanta", "New York", "Miami", "Chicago", "Los Angeles", "Seattle", "Dallas" ]
products = [ "shoes", "pants", "shirts", "hats", "socks" ]
quantities = [ 10, 20, 40, 80, 160, 320, 640, 1280, 2560 ]
docs = []

for truck_id in range(30):
    source = random.choice(cities)
    destination = random.choice([c for c in cities if c != source])
    product = random.choice(products)
    quantity = random.choice(quantities)
    
    doc = {
        "truck_id": truck_id,
        "source": source,
        "destination": destination,
        "product": product,
        "quantity": quantity
    }
    
    docs.append(doc)

In [None]:
insert_response = shipments.insert_many(docs)
shipments.count_documents({})

This is a short script that's gonna create some test data for the clothing manufacturer. This is included in this notebook so you can test it out yourself.

You can see the documents we're producing have 5 fields each, with this `truck_id` determined by the iteration of our loop (point to `truck_id`). The (point) source and destination are both derived from this `cities` array, and this (point to `destination`) part will make sure that the destination city is different from the source.

Each shipment also has a product and a quantity, but the part we're gonna focus on is this (point) `truck_id` field. This is gonna record the truck currently allocated for this shipment, so that truck can be considered unavailable for any another shipments. This way when a new shipment comes in, we can make sure the truck that gets assigned to that shipment isn't already doing another one.

In [None]:
shipments.find_one()

Right now this loop only has 30 iterations (point to the loop) so we have exactly 30 documents in the `shipments` collection. If we take a look at one of them...

(run command)

Then we can see that they each have these five fields. The assumption I'm making for this data is that, while this (point) document exists in the collection, the shipment is still ongoing. When this shipment is complete, we would delete this document, or maybe add a flag to the document like `{ completed: True }`.

This means that 30 documents in the `shipments` collection means 30 shipments that are happening right now. And if we tried to insert a new shipment, it has to have a unique `truck_id` (point). This way each truck is only assigned to one shipment at a time.

In [None]:
shipments.create_index("truck_id", unique=True)

So this is the way we're gonna enforce uniqueness among these `truck_id`s. This is called a unique index, which will create an index on the `truck_id` field, and also make sure that there are no duplicate `truck_id`s.

(enter command)

And it created this index called `truck_id_1`, the 1 meaning that the index is sorted in ascending order.

In [None]:
doc = {
    "source": "New York",
    "destination": "Atlanta",
    "truck_id": 4,
    "product": "socks",
    "quantity": 40
}

try:
    res = shipments.insert_one(doc)
    print(res.inserted_id)
except errors.DuplicateKeyError:
    truck_id = doc["truck_id"]
    print(f"Truck #{truck_id} is currently performing a shipment. Please select another truck.")

Here's an example of a shipment being added to our collection. We want to ship 40 socks from New York to Atlanta, and we've chosen to assign truck 4 to perform this shipment. This (point) truck number could have been user input, or something determined by our application, but either way this is going to cause a `DuplicateKeyError` because we already have a shipment assigned to truck (point) 4.

(enter command)

So using the try-except block, our program prints out a message when a `DuplicateKeyError` is thrown. The message tells us that the truck we wanted to use has already been sent out. So the application allows the insert to fail, and then sends an error message up to the user to choose another truck.

In [None]:
import string

trucks = lessons.trucks
trucks.drop()

trucks.insert_many([
    { "_id": i, "license": "".join(random.choice(string.ascii_uppercase + string.digits) for _ in range(7)) } for i in range(50)
])
trucks.count_documents({})

But we can actually be a little more proactive in handling this error, if we know about the other trucks who are available for this job.

Here's a new collection called `trucks`, which we're gonna use to find another available truck. This should insert exactly 50 documents into this collection...

(enter command)

And it looks like it worked.

In [None]:
trucks.find_one()

The documents each only have two fields: an `_id` from 0 to 49 (point), which will relate to the `truck_id` from the `shipments` collection. And I've assigned a random string of 7 uppercase letters and numbers to be the license plate number, although actually some US states only allow 6 characters.

In [None]:
doc = {
    "source": "New York",
    "destination": "Atlanta",
    "truck_id": 4,
    "product": "socks",
    "quantity": 40
}

try:
    res = shipments.insert_one(doc)
    print(res.inserted_id)
except errors.DuplicateKeyError:
    busy_trucks = set(shipments.distinct("truck_id"))
    all_trucks = set(trucks.distinct("_id"))
    available_trucks = all_trucks.difference(busy_trucks)
    old_truck_id = doc["truck_id"]
    if available_trucks:
        chosen_truck = random.choice(list(available_trucks))
        new_truck_id = doc["truck_id"] = chosen_truck
        res = shipments.insert_one(doc)
        print(f"Truck #{old_truck_id} is currently performing a shipment. Truck #{new_truck_id} has been sent out instead.")
    else:
        print(f"Truck #{old_truck_id} is currently performing a shipment. Could not find another truck.")

So the error handling now is a little more proactive.

Instead of just surfacing an error to the user, the application actually chooses a new truck, sends out that truck, and then alerted the user that the action was performed, just by a different vehicle.

(enter command)

In this case we tried to send truck number 4 out, but it wasn't available. So we decided to try sending another truck.

But this time we were a little more careful. We performed a couple queries to figure out all the vehicles that are available for this shipment.

We pulled the `truck_id`s from all the trucks into a set, and then pulled all the `truck_id`s from shipments into a set, and then took the difference to figure out which `truck_id`s were not already out on a shipment.

This check, to see which trucks are available, is actually somewhat expensive as these two `distinct()` queries require two database round trips. Because of this, the application takes a pretty lazy approach here assigning trucks to shipments, and doesn't do any round trips until the truck it tries to send out comes up unavailable.

This might be suitable if the collisions won't occur very often. Which is to say, trucks are usually available when we request them. But on the rare occasions they are not available, we do a little extra work, and then send out a truck that we KNOW, with some certainty, will be available.

## Summary

* `DuplicateKeyError` can occur on `_id` as well as fields in unique indexes
* When handling errors, determine how fatal the error is
    * Should this error be returned to the user?
    * Can we react to this error in a useful way?

So in this lesson we demonstrated how to handle a specific `DuplicateKeyError`, and it's important to remember that this error usually occurs on `_id`, which is unique by default. But it also pertains to fields contained in a unique index, like the index we had on `truck_id`.

Really when handling these errors we want to think about how much we can do after receiving the error. If there's nothing we can do in response, if this error is truly fatal, we should just return it to the user.

But if we can do something, as was the case with the `shipments` collection, we should try to handle the error in a more flexible way. In the example, the error was that the truck we tried to reserve was already in use. But at that time, the program actually had the resources to determine which trucks were still available, so it sent one of those trucks out instead.