# Aggregation in MongoDB

- `$match`: Filters documents to pass only documents that match the specified condition(s) to the next pipeline stage
- `$group`: Groups Documents by specified identifier expression and applies the accumulator expression(s) to each group
- `$project`: Passes along the document with the requested fields to the next stage of pipeline. It can also add new fields
- `$sort`: Sorts to document
-`$limit`: Limits the number of documents passed to the next stage
- `$unwind`: Deconstructs an array fields from the inputs documents to output a document for each element
- `$sum`, `$avg`, `$max`, `$min` (and other accumulator operators): They are just accumulator functions used to perform calculations
- `$lookup`: Peforms left outer join to another collection in the same database to filter in documents from the "joined" collection for processing


In [1]:

from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi

uri = "mongodb+srv://152003harsh:9903018224@cluster0.sje4wcv.mongodb.net/?retryWrites=true&w=majority"

# Create a new client and connect to the server
client = MongoClient(uri)

# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)
    

Pinged your deployment. You successfully connected to MongoDB!


In [3]:
databases = client.list_database_names()
print("Databases: ", databases)
db = client['sample_analytics']
collections = db.list_collection_names()
print("Collections: ",collections)
collection = db['accounts']

Databases:  ['metadata', 'sample_airbnb', 'sample_analytics', 'sample_geospatial', 'sample_guides', 'sample_mflix', 'sample_restaurants', 'sample_supplies', 'sample_training', 'sample_weatherdata', 'admin', 'local']
Collections:  ['accounts', 'customers', 'transactions']


In [4]:
pipeline = [   # Just simple logic, creating a pipeline
    {"$match":{'products': ['Commodity', 'InvestmentStock']}} #and then matching the documents who have the same products as mentioned
]
result = list(collection.aggregate(pipeline)) # collecting them then pritning them
result[:5]

[{'_id': ObjectId('5ca4bbc7a2dd94ee5816239d'),
  'account_id': 864905,
  'limit': 10000,
  'products': ['Commodity', 'InvestmentStock']},
 {'_id': ObjectId('5ca4bbc7a2dd94ee5816239c'),
  'account_id': 692278,
  'limit': 10000,
  'products': ['Commodity', 'InvestmentStock']},
 {'_id': ObjectId('5ca4bbc7a2dd94ee5816240d'),
  'account_id': 160912,
  'limit': 10000,
  'products': ['Commodity', 'InvestmentStock']},
 {'_id': ObjectId('5ca4bbc7a2dd94ee58162510'),
  'account_id': 177069,
  'limit': 10000,
  'products': ['Commodity', 'InvestmentStock']},
 {'_id': ObjectId('5ca4bbc7a2dd94ee5816243d'),
  'account_id': 175894,
  'limit': 10000,
  'products': ['Commodity', 'InvestmentStock']}]

In [5]:
pipeline = [ #Grouping based on account_id, and then calculating the number of records
    {"$group": {"_id": "account_id","total_records": {"$sum":1}}}
]
result = list(collection.aggregate(pipeline))
result[-5:] #printing last five of list


[{'_id': 'account_id', 'total_records': 1760}]

**Important for `$Group` stage, these are good examples**
- You can specify which field you want to give by just giving 1(or "true") to that field and it will be included in the output documents and 0(or "false") to excluse the output documents.
ex: 
```python
{"$project": {"name":1,"city":0}}
```
This means that the name will be included and city will be excluded in the output documents

- You can also make derived fields based on existing fields.
ex:
```python
{"$project": {"fullName": {"$concat": ["$firstName", " ", "$lastName"]}}}
```

In [6]:
pipeline = [
    {"$project":{"is_limit":[{"$gte":["limit",10000]}],"account_id":1}}
]#prints the account_id and shows if the limit is above 10000
result= list(collection.aggregate(pipeline))
result[:5]

[{'_id': ObjectId('5ca4bbc7a2dd94ee5816239d'),
  'account_id': 864905,
  'is_limit': [True]},
 {'_id': ObjectId('5ca4bbc7a2dd94ee581623a0'),
  'account_id': 572981,
  'is_limit': [True]},
 {'_id': ObjectId('5ca4bbc7a2dd94ee58162392'),
  'account_id': 794875,
  'is_limit': [True]},
 {'_id': ObjectId('5ca4bbc7a2dd94ee5816239c'),
  'account_id': 692278,
  'is_limit': [True]},
 {'_id': ObjectId('5ca4bbc7a2dd94ee5816238d'),
  'account_id': 557378,
  'is_limit': [True]}]

In [10]:
pipeline= [
    {"$sort":{"account_id":1}}, #sort by name
    {"$skip":10}, #skip the first 10 documents
    {"$limit":3}   #limit to the next 5 documents
]
result = list(collection.aggregate(pipeline))
result

[{'_id': ObjectId('5ca4bbc7a2dd94ee58162499'),
  'account_id': 54977,
  'limit': 10000,
  'products': ['CurrencyService',
   'Commodity',
   'InvestmentFund',
   'InvestmentStock']},
 {'_id': ObjectId('5ca4bbc7a2dd94ee5816248a'),
  'account_id': 55104,
  'limit': 10000,
  'products': ['InvestmentFund', 'InvestmentStock']},
 {'_id': ObjectId('5ca4bbc7a2dd94ee581629d5'),
  'account_id': 55473,
  'limit': 10000,
  'products': ['Brokerage', 'InvestmentStock']}]

In [9]:
pipeline=[
    # {"group":{"AccountID":"account_id"}}
    {"$limit":2}
]
result = list(collection.aggregate(pipeline=pipeline))
result

[{'_id': ObjectId('5ca4bbc7a2dd94ee5816239d'),
  'account_id': 864905,
  'limit': 10000,
  'products': ['Commodity', 'InvestmentStock']},
 {'_id': ObjectId('5ca4bbc7a2dd94ee581623a0'),
  'account_id': 572981,
  'limit': 10000,
  'products': ['InvestmentStock', 'CurrencyService']}]