# MongoDB

MongoDB es una base de datos de documentos de propósito general.

## Documentos
Los datos en Mongo se representan como documentos JSON.

Los campos pueden variar entre documentos. 

Se pueden anidar documentos para expresar jerarquías y armar estructuras como arrays.

### Colecciones

Es un grupo de documentos. Como una tabla, pero más flexible: no tienen un schema a menos que lo configures.

### Indices
MongoDB soporta varias estrategias de indexado para soportar ejecución eficiente de queries.

### Pipelines de agregación
MongoDB incorpora un framework para crear pipelines de procesamiento de datos con gran variedad de operadores y expresiones.

In [None]:
from pymongo import MongoClient
import pymongo

uri = "mongodb://mongo:27017/"
client = MongoClient(uri)
client.admin.command("ping")

In [None]:
db = client["clase"]

In [None]:
#db["coleccion1"].drop()
#db["coleccion2"].drop()

In [None]:
db.create_collection("coleccion1")
db.create_collection("coleccion2")

In [None]:
db.list_collection_names()

In [None]:
collection = db["coleccion1"]

# Insertar

In [None]:
result = collection.insert_one({"key": "value"})
print(result.acknowledged)

In [None]:
document_list = [
   {"key": "value1"},
   {"key": "value2"}
]
result = collection.insert_many(document_list)
print(result.acknowledged)

# Actualizar

In [None]:
query_filter = { "key" : "value2" }
update_operation = { "$set" : 
    { "key" : "value" }
}
result = collection.update_one(query_filter, update_operation)
print(result.modified_count)

In [None]:
query_filter = { "key" : "value" }
update_operation = { "$set" : 
    { "key" : "value1000" }
}
result = collection.update_many(query_filter, update_operation)
print(result.modified_count)

# Reemplazar

In [None]:
query_filter = { "key" : "value1000" }
replace_document = { "another_key" : "another_value" }
result = collection.replace_one(query_filter, replace_document)
print(result.modified_count)

# Borrar

In [None]:
query_filter = { "key" : "value1000" }
result = collection.delete_one(query_filter)
print(result.deleted_count)

In [None]:
query_filter = { "key" : "value1000" }
result = collection.delete_many(query_filter)
print(result.deleted_count)

# Buscar

In [None]:
collection = db["fruits"]
collection.insert_many([
        { "name": "apples", "qty": 5, "rating": 3, "color": "red", "type": ["fuji", "honeycrisp"] },
        { "name": "bananas", "qty": 7, "rating": 4, "color": "yellow", "type": ["cavendish"] },
        { "name": "oranges", "qty": 6, "rating": 2, "type": ["naval", "mandarin"] },
        { "name": "pineapple", "qty": 3, "rating": 5, "color": "yellow" },
])

In [None]:
results = collection.find({ "color": "yellow" })
list(results)

## Comparación

- `$eq`
- `$gt`
- `$gte`
- `$in`
- `$lt`
- `$lte`
- `$ne`
- `$nin`

[Referencias](https://www.mongodb.com/docs/manual/reference/operator/query-comparison/)

In [None]:
results = collection.find({ "rating": { "$gt" : 2 }})
list(results)

## Operadores lógicos

- `$and`
- `$not`
- `$nor`
- `$or`

In [None]:
results = collection.find({ 
    "$or": [
        { "qty": { "$gt": 5 }},
        { "color": "yellow" }
    ]
})
list(results)

## Otros

Ver [array operators](https://www.mongodb.com/docs/languages/python/pymongo-driver/current/read/specify-a-query/#array-operators), [element operators](https://www.mongodb.com/docs/languages/python/pymongo-driver/current/read/specify-a-query/#element-operators), [evaluation operators](https://www.mongodb.com/docs/languages/python/pymongo-driver/current/read/specify-a-query/#evaluation-operators)

- `$exists`

In [None]:
results = collection.find({ 
    "type": {
        "$exists": False
    }
})
list(results)

# Proyección

In [None]:
results = collection.find({
    "type": {
        "$exists": True
    }
}, {"name": 1})
list(results)

In [None]:
results = collection.find({
    "type": {
        "$exists": True
    }
}, {"name": 1, "_id": 0})
list(results)

# Documentos a retornar

In [None]:
results = collection.find({
    "type": {
        "$exists": True
    }
}, {"name": 1, "rating": 1, "_id": 0}).sort("rating", pymongo.DESCENDING).skip(1).limit(2)
list(results)

## Distinct

In [None]:
results = collection.distinct("color", {"rating": {"$gte": 4}})
list(results)

# Indices

Ver [más ejemplos](https://www.mongodb.com/docs/languages/python/pymongo-driver/current/indexes/)

In [None]:
result = collection.create_index("type")
print(f'Index created: {result}')

In [None]:
result = collection.create_index([
    ("type", pymongo.ASCENDING),
    ("rating", pymongo.ASCENDING)
])
print(f"Index created: {result}")

# Agregaciones

Es un data pipeline.

In [None]:
import requests
import bson.json_util as jsutil
import json

accounts_url = "https://raw.githubusercontent.com/neelabalan/mongodb-sample-dataset/main/sample_analytics/accounts.json"
customers_url = "https://raw.githubusercontent.com/neelabalan/mongodb-sample-dataset/main/sample_analytics/customers.json"
transactions_url = "https://raw.githubusercontent.com/neelabalan/mongodb-sample-dataset/main/sample_analytics/transactions.json"

In [None]:
jsutil.loads(requests.get(accounts_url).content.decode().split("\n")[0])

In [None]:
from tqdm.autonotebook import tqdm

db = client["analytics"]

db["accounts"].drop()
accounts = db["accounts"]
for line in tqdm(requests.get(accounts_url).content.decode().strip().split("\n")):
    accounts.insert_one(jsutil.loads(line))

db["customers"].drop()
customers = db["customers"]
for line in tqdm(requests.get(customers_url).content.decode().strip().split("\n")):
    customers.insert_one(jsutil.loads(line))    

db["transactions"].drop()
transactions = db["transactions"]
for line in tqdm(requests.get(transactions_url).content.decode().strip().split("\n")):
    transactions.insert_one(jsutil.loads(line))    

In [None]:
pipeline = [
    { "$unwind": "$transactions" },
    { "$match": {"transactions.transaction_code": "buy", "transactions.symbol": {"$in": ["aapl", "msft", "nvda"]} }},
    {'$set': {'transactions.total': {'$toDouble': '$transactions.total'}}},
    { "$group": {"_id": "$transactions.symbol",
                 "operations": {"$sum": 1},
                 "volume": {"$sum": "$transactions.total"},
                 "shares_volume": {"$sum": "$transactions.amount"},
                "first_purchase": {"$first": "$transactions.date"}
                }},
    {"$sort": {"volume": 1}}
]

result = transactions.aggregate(pipeline)
list(result)

In [None]:
customers.find_one({})

Ver [la referencia](https://www.mongodb.com/docs/manual/aggregation/).

# Ejercicios
1. Leer un elemento cualquiera de cada una de las tres colecciones
2. Calcular cuantas cuentas tienen cada tipo de producto y el limite promedio que tienen
3. El [siguiente código](https://stackoverflow.com/a/60352517) es un template de como crear [una vista](https://www.mongodb.com/docs/manual/core/views/join-collections-with-view/):

```python
db.create_collection(
    'parsed_tests_view',
    viewOn='parsed_tests',
    pipeline=[{
        '$lookup': {
            'from': "raw_tests",
            'localField': "repository_path",
            'foreignField': "repository_path",
            'as': "raw_data"
        }
    }]
)
```

Cree una vista uniendo `customers` con `accounts`.

4. Con la vista creada en el punto 3, muestre los mails de los 10 clientes con mayor limite de cuenta total sumando los limites de todas sus cuentas.
5. Arme un pipeline para calcular el PnL promedio de todas las cuentas
6. Arme un pipeline para calcular el top 5 de cuentas con mejores ganancias realizadas