# Big Data Modeling and Management 2021


## 🚚 BDMM Third Homework Assignment 🚚 

_The Wide World Importers (WWI) is a wholesales novelty goods importer and distributor operating from the San Francisco bay area. In this assignment we will be working with their database._ 
You can get more information and details about the WWI database can be found in the following link: https://docs.microsoft.com/en-us/sql/samples/wide-world-importers-what-is?view=sql-server-ver15

The focus of the third assignment is modelling. We will use the same data source that was used the previous assignment, the World Wide Importers database, and convert it to a document-based database. To that end, we will be  leveraging concepts like data denormalization, indexes, and mongodb design patterns. 

More information on the extended datamodel to be found here: </br>  
https://docs.microsoft.com/en-us/sql/samples/wide-world-importers-oltp-database-catalog?view=sql-server-ver15

## Problem Description

Your team has just arrived at WWI (a leading company in logitics). Welcome!   <br>
Even though business is striving, the IT department is going through a bad time.   <br>
Digitalization was never a priority for the company and now the company operational and analytical requirements is starting to grow beyond the capabilities of their existing data architecture.   <br>

WWI data is spread accross different systems. Namely, an old SQL database, data extracted through an API, and data stored in CSV files. <br>
Currently, the costs to develop the necessary queries to collect data to answer questions asked by the different departments are too high. <br>
Management concluded it is the right time to revise and revamp the data architecture, in order to speed up operations. 

In that context, your team was tasked with merging all the company data into a single and coherent Mongo database. <br>
It is expected that, with your solution, WWI will have a better understanding of their business and that the different departments will be able to obtain efficiently the answers they desperatly need.

The WWI team shared with you an ERD of their current datamodel:<br>
![datamodel](WWI.png)

Addtionally, the WWI team asked you the deliver the following outputs in **10 days**:
- Understand and model the database.  
- Migrate all data to the database
- Answer the questions.  
- Submit the results by following the instructions.  
- Prepare a short oral presentation to explain your design choices and the results you obtained.

With these deliveries, you will have created a prototype and allows the management to decide whether MongoDB is a good solution that meets their requirements.

### Design Requirements

You have been informed that the WWI has the following query requirements to the database.

The web team needs:  
- From which state province are our suppliers from?   
- From which state province are the customers who have a higher credit limit?  


The warehouse group needs:  
- To know which items get ordered together the most?   
- Which items get ordered the most in bulk (bigger amounts)?  
- Which customers have delivery addresses under 10km of distance?  

The CFO:  
- Would like to know the monthly order count?  
- Would like to know the average monthly sales prices?  
- Would like to know the yearly expenditures with suppliers (per supplier name)?  

Partnerships:  
- Would like to know what's the most common payment type?  
- Which supplier of `Novelty Goods Supplier` as the most transactions?  

The marketing team:  
- Want to make an appreciation post and needs the name of the sales person with the most invoices in 2013 (person who's customers brought the most money)?

---

Transform the SQL tables, API results and CSV files provided in the annex with this file and model a database following mongo's best practises.

Write MongoDB queries to awnser the above mentioned queries

Take advantage of database indexes to improve your query speeds

### Deliverables

1. Notebook with all DB creation operations and CRUD operations;
2. Second notebook with all required 'queries to

### Data Source Materials

For the development of this assignment you will have access to the RDBMS/SQL database hosting the original WWI database. To connect to the database use the following credentials:
```
host:rhea.isegi.unl.pt
user:wwi-read-only-user
pass:jGp2GCqrss6nfTEu5ZawhW3mksLsQYQb
database:WWI

# !pip install mysql-connector-python
import mysql.connector
mydb = mysql.connector.connect(host={host}, user={user}, database={database}, port=3306, password={password})
mycursor = mydb.cursor()

mycursor.execute('SHOW TABLES;')
print(f"Tables: {mycursor.fetchall()}")
mycursor.execute('DESCRIBE Purchasing_PurchaseOrderLines;')
print(f"Purchasing_PurchaseOrderLines schema: {mycursor.fetchall()}")
```

Additionally you have access to the following documents.

CSV with Warehouse Data  
**https://liveeduisegiunl-my.sharepoint.com/:f:/g/personal/fpinheiro_novaims_unl_pt/Eh8Mj-m6r4dOt84tPDGUnhUBd5oMC0CJKAeyJm3urNB-8g?e=JuPMuW**

API with Application data  
**http://rhea.isegi.unl.pt:8080/**

## Additional Information

#### Groups  

This is a group activity. <br>
Students should form groups of at least 4 and at most 5. <br>
We will use the current defined groups that have been established during the previous assignments, and that are identified on Moodle.

#### MongoDB database access  

Each group will have access to its own mongodb instance.<br>
Each group will receive an email with their access credentials. <br>
You will use the database to store your results. <br>

Connection details will have the following template:<br>
```
Host: rhea.isegi.unl.pt:27017  
Username: {groups_username}  
Password: {groups_password}  
```
Which then can be used as follows:
```
client = MongoClient(f"{protocol}://{user}:{password}@{host}:{port}/")
```

#### Submission  Deadline

The submission must contain a notebook with the queries and their results, also indicate the name of the database that you created. <br>
Upload the notebook on moodle before **23:59 of May 30nd**

#### Evaluation   

The third homework assignment counts 20% towards your final mark of the curricular unit. <br>
The assignment will be scored from 0 to 20. <br>
Your final task will be to present the owner of the company your database proposal and how would it make everyone satisfied. <br>

Each group submission will be evaluated on two components:
1. correctness of results;
2. simplicity of the solution;

50% -  Database design  
50% -  Query results  
*    25% - Correctness of queries   
*    25% - Right results

Please note that all code delivered in this assignment will go through plagiarism automated checks. <br>
Groups high similarity levels in their code will undergo investigation.

**Presentations**

Presentations will be held between the 2nd and 3rd of June and you need to sign up your group in this calendly link:<br>
https://calendly.com/d/m9sj-qwpk/presentations (Please try to avoid empty windows)

In [1]:
import pandas as pd

In [2]:
from pymongo import MongoClient
from bson.objectid import ObjectId
from pprint import pprint

host="rhea.isegi.unl.pt"
port="27044"
user="GROUP_27"
password="841604675h8960303529464l24011g49"
protocol="mongodb"
client = MongoClient(f"{protocol}://{user}:{password}@{host}:{port}")

wwi = client["admin"]
print(client.list_database_names())

['Modeling', 'admin', 'config', 'local']


## Questions

### Design Requirements

You have been informed that the WWI has the following query requirements to the database.

The web team needs:  
- From which state province are our suppliers from?   
- From which state province are the customers who have a higher credit limit?  


The warehouse group needs:  
- To know which items get ordered together the most?   
- Which items get ordered the most in bulk (bigger amounts)?  
- Which customers have delivery addresses under 10km of distance?  

The CFO:  
- Would like to know the monthly order count?  
- Would like to know the average monthly sales prices?  
- Would like to know the yearly expenditures with suppliers (per supplier name)?  

Partnerships:  
- Would like to know what's the most common payment type?  
- Which supplier of `Novelty Goods Supplier` as the most transactions?  

The marketing team:  
- Want to make an appreciation post and needs the name of the sales person with the most invoices in 2013 (person who's customers brought the most money)?

### 1.  From which state province are our suppliers from?   

- We decided to merge both Collections in order to make it easier to access and navigate throughout the location of each supplier, agreggating all state provinces inside the city collection

- Both of these query is to create the intermediate collection that we used to merge the first 
- and the second collections
- This query is to drop the intermediate collection that we used to merge the first 
- and the second collections

In [3]:
#wwi.Application_Cities.aggregate(pipeline)
#wwi.Purchasing_Suppliers.aggregate(pipeline2)
#wwi.collection3_embeded.drop() 

In [4]:
query_1 = {
        "$lookup":{
           "from": "Application_StateProvinces",
           "localField": "StateProvinceID",
           "foreignField": "StateProvinceID",
           "as": "StateProvinceInf"
        }
    }

query_2 = { 
        "$merge" : {
            "into" : "collection3_embeded",
            "whenMatched": "merge"
        } 
    }

query_3 = {
        "$lookup":{
           "from": "collection3_embeded",
           "localField": "DeliveryCityID",
           "foreignField": "CityID",
           "as": "City_Inf"
        }
    }

query_4 = { 
        "$merge" : {
            "into" : "Purchasing_Suppliers_embeded",
            "whenMatched": "merge"
        } 
    }

pipeline = [query_1, query_2]
pipeline2 = [query_3, query_4]
wwi.Application_Cities.aggregate(pipeline)
wwi.Purchasing_Suppliers.aggregate(pipeline2)
wwi.collection3_embeded.drop() 

In [5]:
wwi.Purchasing_Suppliers_embeded.find_one() 

{'_id': ObjectId('60b0300e9400f55a7f5964f2'),
 'AlternateContactPersonID': 22,
 'BankAccountBranch': 'Woodgrove Bank Zionsville',
 'BankAccountCode': '356981',
 'BankAccountName': 'A Datum Corporation',
 'BankAccountNumber': '8575824136',
 'BankInternationalCode': '25986',
 'City_Inf': [{'_id': ObjectId('60b0377f44fa2d62dc237ef8'),
   'CityID': 38171,
   'CityName': 'Zionsville',
   'LatestRecordedPopulation': 14160.0,
   'Location': '0xE6100000010CDE115F37B6F9434031276893C39055C0',
   'StateProvinceID': 15,
   'StateProvinceInf': [{'_id': ObjectId('60b0384d44fa2d62dc237f15'),
     'StateProvinceID': 15,
     'StateProvinceCode': 'IN',
     'StateProvinceName': 'Indiana',
     'CountryID': 230,
     'SalesTerritory': 'Great Lakes',
     'LatestRecordedPopulation': 6570902}]}],
 'DeliveryAddressLine1': 'Suite 10',
 'DeliveryAddressLine2': '183838 Southwest Boulevard',
 'DeliveryCityID': 38171,
 'DeliveryLocationLat': 39.95090103149414,
 'DeliveryLocationLong': -86.26190185546875,
 'Deli

In [6]:
query_1 = {
    '$group': {
        '_id': {'StateProvince' : '$City_Inf.StateProvinceInf.StateProvinceName'},
        'count' : {'$sum' : 1}
    }
}

query_2 = {
    '$sort': {
        'count': -1
    }
}

query_3 = {
    '$limit': 2
}

pipeline = [query_1, query_2, query_3]

r_7 = wwi.Purchasing_Suppliers_embeded.aggregate(pipeline)

result_7 = list(r_7)

result_7

[{'_id': {'StateProvince': [['California']]}, 'count': 3},
 {'_id': {'StateProvince': [['Tennessee']]}, 'count': 2}]

### 2.From which state province are the customers who have a higher credit limit? 

- Same thought process as before, since we wanted to have an easier and quicker way on how to access all the details needed to solve the customer about any customer. So we merge the State Province with the city as before but with Sales Customers;

- Both of these query is to create the intermediate collection that we used to merge the first 
- and the second collections
- This query is to drop the intermediate collection that we used to merge the first 
- and the second collections

In [7]:
#wwi.Application_Cities.aggregate(pipeline)
#wwi.Sales_Customers.aggregate(pipeline2)
#wwi.collection3_embeded.drop()

In [8]:
query_1 = {
        "$lookup":{
           "from": "Application_StateProvinces",
           "localField": "StateProvinceID",
           "foreignField": "StateProvinceID",
           "as": "StateProvinceInf"
        }
    }

query_2 = { 
        "$merge" : {
            "into" : "collection3_embeded",
            "whenMatched": "merge"
        } 
    }

query_3 = {
        "$lookup":{
           "from": "collection3_embeded",
           "localField": "PostalCityID",
           "foreignField": "CityID",
           "as": "City_Inf"
        }
    }

query_4 = { 
        "$merge" : {
            "into" : "Sales_Customers_embeded",
            "whenMatched": "merge"
        } 
    }


pipeline = [query_1, query_2]
pipeline2 = [query_3, query_4]

In [9]:
pipeline = [
    {'$unwind':'$City_Inf'},
    {'$sort':{'CreditLimit':-1}},
    {'$limit':1}
]

r_2 = wwi.Sales_Customers_embeded.aggregate(pipeline)

result_2= list(r_2)

result_2

[{'_id': ObjectId('60b030129400f55a7f596add'),
  'AccountOpenedDate': datetime.datetime(2013, 1, 1, 0, 0),
  'AlternateContactPersonID': None,
  'BillToCustomerID': 890,
  'City_Inf': {'_id': ObjectId('60b0377f44fa2d62dc23187f'),
   'CityID': 11753,
   'CityName': 'Flowella',
   'LatestRecordedPopulation': 118.0,
   'Location': '0xE6100000010C4847832568373B40C63368E81F8458C0',
   'StateProvinceID': 45,
   'StateProvinceInf': [{'_id': ObjectId('60b0384d44fa2d62dc237f33'),
     'StateProvinceID': 45,
     'StateProvinceCode': 'TX',
     'StateProvinceName': 'Texas',
     'CountryID': 230,
     'SalesTerritory': 'Southwest',
     'LatestRecordedPopulation': 27506120}]},
  'CreditLimit': Decimal128('4630.50'),
  'CustomerCategoryID': 5,
  'CustomerID': 890,
  'CustomerName': 'Olya Izmaylov',
  'DeliveryAddressLine1': 'Suite 4',
  'DeliveryAddressLine2': '1129 Hulsegge Boulevard',
  'DeliveryCityID': 11753,
  'DeliveryLocationLat': 27.216400146484375,
  'DeliveryLocationLong': -98.064399719

### 3. The warehouse group needs: To know which items get ordered together the most?

In [10]:
import bson

- Our thought process was to try to get the most co occurences in terms of items purchases from Sales Orders and Sales OrderLines and sort them by the counting to have the most frequently(using the counting process)

In [11]:
pipeline_3 = [
    {
        "$project": {
            "_id": 0,
            "a": "$$ROOT"
        }
    }, 
    {
        "$lookup": {
            "localField": "a.OrderID",
            "from": "Sales_OrderLines",
            "foreignField": "OrderID",
            "as": "b"
        }
    }, 
    {
        "$unwind": {
            "path": "$b"
        }
    }, 
    {
        "$project": {
            "a.Description": "$a.Description",
            "b.Description": "$b.Description",
            "_id": 0
        }
    }
]

In [12]:
r3 = wwi.Sales_OrderLines.aggregate(pipeline_3,allowDiskUse = True)
list(r3)

[{'a': {'Description': '32 mm Double sided bubble wrap 50m'},
  'b': {'Description': '32 mm Double sided bubble wrap 50m'}},
 {'a': {'Description': 'Ride on toy sedan car (Black) 1/12 scale'},
  'b': {'Description': 'Ride on toy sedan car (Black) 1/12 scale'}},
 {'a': {'Description': 'Developer joke mug - old C developers never die (White)'},
  'b': {'Description': 'Developer joke mug - old C developers never die (White)'}},
 {'a': {'Description': 'Developer joke mug - old C developers never die (White)'},
  'b': {'Description': 'USB food flash drive - chocolate bar'}},
 {'a': {'Description': '"The Gu" red shirt XML tag t-shirt (Black) 3XS'},
  'b': {'Description': '"The Gu" red shirt XML tag t-shirt (Black) 3XS'}},
 {'a': {'Description': '"The Gu" red shirt XML tag t-shirt (Black) 3XS'},
  'b': {'Description': '32 mm Anti static bubble wrap (Blue) 10m'}},
 {'a': {'Description': '32 mm Anti static bubble wrap (Blue) 10m'},
  'b': {'Description': '"The Gu" red shirt XML tag t-shirt (Bla

## 4.Which items get ordered the most in bulk (bigger amounts)?

In [13]:
pipeline = [
       {
        "$lookup":{
           "from": "Warehouse_StockItems",
           "localField": "StockItemID",
           "foreignField": "StockItemID",
           "as": "StockItemInfo"
        }
    },{
        '$unwind': '$StockItemInfo'
    },{
        '$group': {
            '_id': '$StockItemInfo.StockItemName',
            'Sum_Quantity':{'$sum' : '$Quantity'},
            'Avg_Quantity': {'$avg': '$Quantity'}
        }
    },
    
    {'$sort': {'Sum_Quantity': -1}},
    { '$limit': 3}
]

r_4 = wwi.Sales_OrderLines.aggregate(pipeline)

result_4 = list(r_4)

result_4

[{'_id': 'Shipping carton (Brown) 457x279x279mm',
  'Sum_Quantity': 1500,
  'Avg_Quantity': 187.5},
 {'_id': 'Clear packaging tape 48mmx75m',
  'Sum_Quantity': 1326,
  'Avg_Quantity': 165.75},
 {'_id': 'Black and orange fragile despatch tape 48mmx75m',
  'Sum_Quantity': 1152,
  'Avg_Quantity': 192.0}]

## 5."Which customers have delivery addresses under 10km of distance to the customer with customerID 961?"

In [14]:
host="rhea.isegi.unl.pt"
port="27044"
user='wwi-read-only-user'
password='jGp2GCqrss6nfTEu5ZawhW3mksLsQYQb'
database='WWI'
import mysql.connector
mydb = mysql.connector.connect(host=host, user=user, database=database, port=3306, password=password)
mycursor = mydb.cursor()


In [15]:
import pandas as pd
from pymongo import DESCENDING, ASCENDING, TEXT, GEO2D
from bson.son import SON
geo_df = pd.read_sql('SELECT CustomerID,DeliveryLocationLat, DeliveryLocationLong FROM Sales_Customers', con=mydb)
#geo_dict = []

geo_df

Unnamed: 0,CustomerID,DeliveryLocationLat,DeliveryLocationLong
0,1,41.4972,-102.6200
1,2,48.7163,-115.8740
2,3,34.2689,-112.7270
3,4,37.2811,-98.5804
4,5,43.1992,-78.5761
...,...,...,...
658,1057,41.5853,-87.8431
659,1058,41.2751,-79.1131
660,1059,40.6267,-90.3051
661,1060,47.1670,-122.4050


In [16]:
for row in geo_df.values:
    record = {
        'nome': row[0],
        'coords': [row[1], row[2]]
    }
    
    geo_dict.append(record)

geo_dict

NameError: name 'geo_dict' is not defined

In [None]:
result = wwi.places.insert_many(geo_dict)

In [None]:
wwi.places.create_index([("coords", GEO2D)])

In [None]:
r_1 = list(wwi.places.find({'nome':'960'}))[0]['coords']
result_1 = list(r_1)
result_1

## 6.Would like to know the monthly order count?

In [None]:
r_6 = wwi.Sales_Orders.aggregate([
    {
        "$project":
          {
            "month": { "$month": "$OrderDate" },
            "year": { "$year": "$OrderDate" }
          }
    },
    {"$group" : {
        "_id" : '$year',
        "jan" : {"$sum" : { "$cond": [ {"$eq": ["$month", 1]}, 1, 0] }},
        "feb" : {"$sum" : { "$cond": [ {"$eq": ["$month", 2]}, 1, 0] }},
        "mar" : {"$sum" : { "$cond": [ {"$eq": ["$month", 3]}, 1, 0] }},
        "apr" : {"$sum" : { "$cond": [ {"$eq": ["$month", 4]}, 1, 0] }},
        "may" : {"$sum" : { "$cond": [ {"$eq": ["$month", 5]}, 1, 0] }},
        "jun" : {"$sum" : { "$cond": [ {"$eq": ["$month", 6]}, 1, 0] }},
        "jul" : {"$sum" : { "$cond": [ {"$eq": ["$month", 7]}, 1, 0] }},
        "aug" : {"$sum" : { "$cond": [ {"$eq": ["$month", 8]}, 1, 0] }},
        "sep" : {"$sum" : { "$cond": [ {"$eq": ["$month", 9]}, 1, 0] }},
        "oct" : {"$sum" : { "$cond": [ {"$eq": ["$month", 10]}, 1, 0] }},
        "nov" : {"$sum" : { "$cond": [ {"$eq": ["$month", 11]}, 1, 0] }},
        "dec" : {"$sum" : { "$cond": [ {"$eq": ["$month", 12]}, 1, 0] }}
      }}])
result_6 = list(r_6)
result_6

## 7.Would like to know the average monthly sales prices?

In [None]:
pipeline = [
       {
        '$lookup':{
           'from': 'Sales_OrderLines',
           'localField': 'OrderID',
           'foreignField': 'OrderID',
           'as': 'OrderInfo'
        }
    },{
        '$unwind': '$OrderInfo'
    },
    {
   '$project':
     {'month': { '$month': "$OrderDate" },
      'total': { '$multiply': [ "$OrderInfo.UnitPrice", "$OrderInfo.Quantity" ] }
     }
    },
    {
   '$group':{'_id':{'month': "$month"}, 'avgValue': {'$avg': "$total" }}
    }
]

r_7 = wwi.Sales_Orders.aggregate(pipeline)

result_7 = list(r_7)

result_7

## 8.Would like to know the yearly expenditures with suppliers (per supplier name)?

- We grouped by year and try to get the sum of Transaction Amount that were paid for each Supplier

In [None]:
pipeline_8 = [
    {
        '$project': {
            '_id': 0,
            'a': '$$ROOT'
        }
    },
    
    {
        '$lookup': {
            'localField': 'a.SupplierID',
            'from': 'Purchasing_Suppliers',
            'foreignField': 'SupplierID',
            'as': 'b'
        }
    },
    
    {
        '$unwind': {'path': '$b'}
    },
    
    {
        '$match': {
            'a.SupplierInvoiceNumber': {
                '$ne': None
            }
        }
    },
    
    {
        '$sort': bson.son.SON([('b.SupplierName', -1)])
    },
    
    {
        '$project': {
            'a.TransactionDate': {'$year':'$a.TransactionDate'},
            'b.SupplierName': '$b.SupplierName',
            'a.TransactionAmount': '$a.TransactionAmount',
            '_id': 0
        }
    },
    {'$group':{
              '_id':{'Year':'$a.TransactionDate', 'Supplier':'$b.SupplierName'},
              'Total':{'$sum':'$a.TransactionAmount'}
            }
        
    }
]

In [None]:
r8 = wwi.Purchasing_SupplierTransactions.aggregate(pipeline_8,allowDiskUse = True)
list(r8)

## 9.Would like to know what's the most common payment type?

In [None]:
pipeline_9 = [
    {
        '$project': {
            '_id': 0,
            'a': '$$ROOT'
        }
    },
    
    {
        '$lookup': {
            'localField': 'a.PaymentMethodID',
            'from': 'Application_PaymentMethods',
            'foreignField': 'PaymentMethodID',
            'as': 'b'
        }
    }, 
    
    {
        '$unwind': {'path': '$b'}
    }, 
    {
        '$lookup': {
            'localField': 'b.PaymentMethodID',
            'from': 'Sales_CustomerTransactions',
            'foreignField': 'PaymentMethodID',
            'as': 'c'
        }
    }, 
    
    {
        '$unwind': {'path': "$c",}
    }, 
    {
        '$group': {
            '_id': {
                "Payment_Method": "$b.PaymentMethodName"
            },
            'COUNT': {
                '$sum': 1
            }
        }
    },
    
    {
        '$project': {
            'PaymentMethodName': '$_id.Payment_Method',
            'COUNT': '$COUNT',
            '_id': 0
        }
    }
]

In [None]:
r9 = wwi.Purchasing_SupplierTransactions.aggregate(pipeline_9)
list(r9)

## 10.Which supplier of Novelty Goods Supplier has the most transactions?

In [None]:
pipeline_10 = [
    {
        '$project': {
            '_id': 0,
            'a': '$$ROOT'
        }
    }, 
    
    {
        '$lookup': {
            'localField': 'a.SupplierCategoryID',
            'from': 'Purchasing_SupplierCategories',
            'foreignField': 'SupplierCategoryID',
            'as': 'b'
        }
    }, 
    
    {
        '$unwind': {'path': '$b'}
    }, 
    {
        '$lookup': {
            'localField': 'a.SupplierID',
            'from': 'Purchasing_SupplierTransactions',
            'foreignField': 'SupplierID',
            'as': 'c'
        }
    }, 
    
    {
        '$unwind': {'path': '$c'}
    }, 
    {
        '$match': {
            'b.SupplierCategoryName': 'Novelty Goods Supplier'
        }
    },
    
    {
        '$group': {
            '_id': {
                'b\u1390SupplierCategoryName': '$b.SupplierCategoryName',
                'a\u1390SupplierName': '$a.SupplierName'
            },
            'COUNT': {
                '$sum': 1
            }
        }
    }, 
    
    {'$project': {
            'b.SupplierCategoryName': '$_id.b\u1390SupplierCategoryName',
            'a.SupplierName': '$_id.a\u1390SupplierName',
            'COUNT': '$COUNT(*)',
            '_id': 0
        }
    }, 
    {
        '$sort': bson.son.SON([('COUNT',-1)])
    },
    {
        '$limit':1
    }
]

In [None]:
r10 = wwi.Purchasing_Suppliers.aggregate(pipeline_10)
list(r10)

## 11.Want to make an appreciation post and needs the name of the sales person with the most invoices in 2013 (person who's customers brought the most money)?

In [None]:
import bson

In [None]:
pipeline_11 = [
    {
        '$project': {
            '_id': 0,
            'a': '$$ROOT'
        }
    }, 
    
    {
        '$lookup': {
            'localField': 'a.PersonID',
            'from': 'Sales_Invoices',
            'foreignField': 'SalespersonPersonID',
            'as': 'b'
        }
    },
    
    {
        '$unwind': {'path': '$b'}
    }, 
    {
        '$lookup': {
            'localField': 'b.InvoiceID',
            'from': 'Sales_InvoiceLines',
            'foreignField': 'InvoiceID',
            'as': 'c'
        }
    },
    
    {
        '$unwind': {'path': '$c'}
    }, 
    {
        '$redact': {
            '$cond': [
                { 
                    '$and': [ 
                        { '$eq': [{'$year': '$b.InvoiceDate'},2013] }
                    ]
                },
                '$$KEEP',
                '$$PRUNE'
            ]
        }
    },
    
    {
    '$group': {
            '_id': {
                'a\u1390FullName': '$a.FullName'
            },
            'SUM(c\u1390Quantity)': {
                '$sum': '$c.Quantity'
            }
        }
    },
    
    {
        '$project': {
            'a.FullName': '$_id.a\u1390FullName',
            'SUM(c.Quantity)': '$SUM(c\u1390Quantity)',
            '_id': 0
        }
    }, 

    {
        '$sort': bson.son.SON([('SUM(c.Quantity)',-1)])
    },
    {
        '$project': {
            '_id': 0,
            'a.FullName': '$a.FullName',
            'SUM(c\u1390Quantity)': '$SUM(c.Quantity)'
        }
    }
]

In [None]:
r11 = wwi.Application_People.aggregate(pipeline_11)
list(r11)