# Indexing: Exercises

In [None]:
from pymongo import MongoClient

client = MongoClient()
db = client.nobel

## An index for high-share categories

We want to speed up the following operation:
```python
db.prizes.distinct("category", {"laureates.share": {"$gt": "3"}})
```
- Confirm that the operation takes approximately 1 ms without an index.

In [None]:
%%timeit
db.prizes.distinct("category", {"laureates.share": {"$gt": "3"}})

- Specify a compound index model `index_model` to pass to `db.prizes.create_index`.

In [None]:
index_model = [("laureates.share", 1), ("category", 1)]
db.prizes.create_index(index_model)

- Confirm that the execution time is now below 400 µs.

In [None]:
%%timeit
db.prizes.distinct("category", {"laureates.share": {"$gt": "3"}})

In [None]:
# Drop the index for consistency
db.prizes.drop_index(index_model)

## Recently single?

A prize might be awarded to a single laureate or to several. For each prize category, report the most recent year that a single laureate -- rather than several -- received a prize in that category.

- Specify an index model `index_model` to pass to `db.prizes.create_index` that speeds up finding prizes by category and sorting results by decreasing year. That is, the model should index first on category (ascending) and second on year (descending).
- Save a string `report` for printing the last single-laureate year for each distinct category, one category per line. To do this, for each distinct prize category, find the latest-year prize of that category with a laureate share of "1".

In [None]:
# Specify an index model for compound sorting
index_model = [("category", 1), ("year", -1)]
db.prizes.create_index(index_model)

# Collect the last single-laureate year for each category
report = ""
for category in sorted(db.prizes.distinct("category")):
    doc = db.prizes.find_one(
        {"category": category, "laureates.share": "1"},
        sort=[("year", -1)]
    )
    report += "{category}: {year}\n".format(**doc)

print(report)

# Drop the index for consistency
db.prizes.drop_index(index_model)

## Born and affiliated

Some countries are, for one or more laureates, both their country of birth ("bornCountry") and a country of affiliation for one or more of their prizes ("prizes.affiliations.country"). You will find the five countries of birth with the highest counts of such laureates.

- Create an index on country of birth ("bornCountry") for `db.laureates` to ensure efficient gathering of distinct values and counting of documents
- Complete the skeleton dictionary comprehension to construct `n_born_and_affiliated`, the count of laureates as described above for each distinct country of birth.


In [None]:
from collections import Counter

# Ensure an index on country of birth
db.laureates.create_index([("bornCountry", 1)])

# Collect a count of laureates for each country of birth
n_born_and_affiliated = {
    country: db.laureates.count_documents({
        "bornCountry": country,
        "prizes.affiliations.country": country
    })
    for country in db.laureates.distinct("bornCountry")
}

five_most_common = Counter(n_born_and_affiliated).most_common(5)
print(five_most_common)

# Drop the index for consistency
db.laureates.drop_index([("bornCountry", 1)])