# MongoDB Aggregation Guide

### Step 1
Install [pymongo](https://www.mongodb.com/docs/drivers/pymongo/) driver

In [None]:
!pip install "pymongo[srv]"

### Step 2

#### Connect to Atlas cluster

In [None]:
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi

In [None]:
!curl ipecho.net/plain

In [None]:
username = ""
password = ""
cluster_url = ""

uri = f"mongodb+srv://{username}:{password}@{cluster_url}/?retryWrites=true&w=majority"

# Create a new client and connect to the server
client = MongoClient(uri, server_api=ServerApi('1'))

# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)

### Step 3

#### Setup

To start, we’ll insert some example data which we can perform aggregations on:

In [None]:
db = client.aggregation_example
db.things.insert_many(
    [
        {"x": 1, "tags": ["dog", "cat"]},
        {"x": 2, "tags": ["cat"]},
        {"x": 2, "tags": ["mouse", "cat", "dog"]},
        {"x": 3, "tags": []},
    ]
)

#### Aggregation Framework

This example shows how to use the [aggregate()](https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.aggregate) method to use the aggregation framework. We’ll perform a simple aggregation to count the number of occurrences for each tag in the tags array, across the entire collection. To achieve this we need to pass in three operations to the pipeline. First, we need to unwind the `tags` array, then group by the tags and sum them up, finally we sort by count.

As python dictionaries don’t maintain order you should use [SON](https://pymongo.readthedocs.io/en/stable/api/bson/son.html#bson.son.SON) or [collections.OrderedDict](https://docs.python.org/3/library/collections.html#collections.OrderedDict) where explicit ordering is required eg “$sort”:

In [None]:
from bson.son import SON

pipeline = [
    {"$unwind": "$tags"},
    {"$group": {"_id": "$tags", "count": {"$sum": 1}}},
    {"$sort": SON([("count", -1), ("_id", -1)])},
]

import pprint

pprint.pprint(list(db.things.aggregate(pipeline)))

##### Different ways to iterate a curosr:

* Using list to transform cursor to list

In [None]:
# Get cursor object
results = db.things.aggregate(pipeline)

# Apply list method to the cursor, to transform to a list
results_list = list(results)

# Work with the list, print the length, first item, last item, ...
print(f'Length of the list is {len(results_list)}')
print(f'First item in the list is : {results_list[0]}')
print(f'Last item in the list is : {results_list[-1]}')

* Using `for item in cursor`

In [None]:
# Get cursor object
results = db.things.aggregate(pipeline)

# Iterate the cursor, find the result with count = 2
for item in results:
  if item['count'] == 2:
    print(item)
    break

* Use `.alive` and `next()`

In [None]:
# Get cursor object
results = db.things.aggregate(pipeline)

# Iterate the cursor, find the result with count = 2
while results.alive:
  item = results.next()
  if item['count'] == 2:
    print(item)
    break