<font size="6">Document Databases and MongoDB</font>

# Setup

1. run Jupyter 
2. run Docker
3. create MongoDB container in Command Prompt: make sure you don't have 'leftover' container from previous runs

        docker run --name mongodb -d -e MONGO_INITDB_ROOT_USERNAME=AzureDiamond -e MONGO_INITDB_ROOT_PASSWORD=hunter2 -p 27017:27017 mongo

4. run Studio3T

5. create New Connection OR reconnect to existing Connection:

    5.A	In Studio3T create Connection:
            a.	Press Connection -> New Connection
            b.	Insert credentials into URI field: mongodb://AzureDiamond:hunter2@localhost:27017
            c.	Press Test Connection
            d.	Assign a name to the Connection (top empty line) -> Save
            e.	Press Connect
            
    5.B In Studio3T reconnect to Connection:
            a. press Connect (top left corner)
            b. choose MongoDB connection  -> press Connect (MongoDB container must be runnning by this time)
            
6.	In Studio3T import the Database:
            a.	Press Import
            b.	Choose BSON mongodump archive 
            c.	Find your database path
            d.	Select All file formats and choose the database file ‘sampledata.archive’
            e.	Press Run: be patient while the database loads
            f.	Inspect the collections and documents

Done!

Overview of Lab 4:

1. Intro to document databases & basic terminology
2. Querying documents by using find_one() and find().
3. Using supporting operators: `$gt`, `$lt`, `$eq`, `$in`, `$and`, `$or`, `$elemMatch`.
4. Inserting documents by using insert_one() and insert_many().
5. Updating documents by using update_one() and update_many().
6. Deleting documents by using delete_one() and delete_many().

![logo](./img/studio3t.png)

# Key concepts and terms

***MongoDB Database*** - is a general purpose document database, which structures MongoDB documents that are similar to JSON objects. It has a flexible schema model.

***Features***: scalability, resilience, speed of development, privacy and security.

***Document model*** - allows data to be modelled in any shape or structure; easier to plan application data as it corresponds to data in the database; can model key-value pairs, text, geospacialn indices, time series, etc. 

***Data format***: displays in JSON, stores in BSON - binary JSON - which allows for more data formats than JSON like dates, numbers, object  id's, and more.

![MongoBigPicture_half1.png](./img/mongodb_logo.png)

***Document*** is a basic unit of data in MongoDB. Each document contains **field:value** pairs

***Collection*** is a grouping of those documents, where documents within a collection are typically silimar, but don't have to be with exact same structure. MongoDB allows for *polymorphic documents* where fields and value types can vary across documents within a collection.

***Database*** is a container for our collections.

***Document format***

Documents in MongoDB are displayed in JSON -JavaScript Object Notation- format, while stored in BSON, binary JSON.

**BSON** is optimized for storage, retrieval, and transmission of data across the wire. It is more secure than plaintext JSON, and supports more data types than JSON.

Every document requires an _id field, which acts as a primary key (MongoDB will auto-generate one if it's missing).

The values in a document can be any data type, including strings, objects, arrays, booleans, nulls, dates, ObjectIds, and more. 

**MongoDB Syntax**:

    { "key" : value,
      "key" : value,
      "key" : value }

**JSON example**:

    {   "_id" : 130,
        "name" : "AC3 Phone",
        "colors" : ["black", "silver"],
        "price" : 200,
        "available" : true  }


# Loading and Inspecting database


In [2]:
# Python Connector

# %pip install pymongo
# or #!conda install -y pymongo

from datetime import datetime
from pprint import pprint
import time
from bson.objectid import ObjectId

from pymongo import MongoClient

user="AzureDiamond"
password="hunter2"
host="localhost"
port="27017"
protocol="mongodb"

client = MongoClient(f"{protocol}://{user}:{password}@{host}:{port}")

# Database check
db = client.sample_analytics 
print(f"Database info: {db}\n")
db.name 

#    (You could also run all scripts in the Studio3T UI directly.) 

Database info: Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'sample_analytics')



'sample_analytics'

In [30]:
# Collections are inside our Database 'sample_analytics'

collection_list = db.list_collection_names()

print(f"The database contains {len(collection_list)} collections")

print(f"All collections: {collection_list[0:]}")

print(f"Collection {collection_list[0]} contains {db[collection_list[0]].count_documents({})} documents")

The database contains 3 collections
All collections: ['transactions', 'customers', 'accounts']
Collection transactions contains 1746 documents


[PyMongo documentation](https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html)

## FIND_ONE( )   and   FIND( )

[Documentation](https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.find_one)

In [31]:
# Find one document in the collection 'accounts'

document = db.accounts.find_one()
pprint(document)

# ObjectID, different data formats

# https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.find_one

{'_id': ObjectId('5ca4bbc7a2dd94ee5816238c'),
 'account_id': 371138,
 'limit': 9000,
 'products': ['Derivatives', 'InvestmentStock'],
 'status': 'VIP account',
 'verified': True}


[ObjectId](https://docs.mongodb.com/manual/reference/method/ObjectId/) - unique identifier of a document within a collection, and most probably unique within a database.

Important to note: Mongo creates this `_id` automatically and you should only set it yourself if you have a good reason.


 The 12-byte ObjectId value consists of:

    a 4-byte timestamp value, representing the ObjectId’s creation, in seconds (Unix epoch)
    a 5-byte random value
    a 3-byte incrementing counter, initialized to a random value
   

In [32]:
db.accounts.find_one({"_id": ObjectId('5ca4bbc7a2dd94ee5816238c')})

{'_id': ObjectId('5ca4bbc7a2dd94ee5816238c'),
 'account_id': 371138,
 'limit': 9000,
 'products': ['Derivatives', 'InvestmentStock'],
 'verified': True,
 'status': 'VIP account'}

In [33]:
# Finding a document by ObjectId

db.accounts.find_one({"_id": ObjectId('5ca4bbc7a2dd94ee5816238c')})

{'_id': ObjectId('5ca4bbc7a2dd94ee5816238c'),
 'account_id': 371138,
 'limit': 9000,
 'products': ['Derivatives', 'InvestmentStock'],
 'verified': True,
 'status': 'VIP account'}

In [34]:
query = {'products': ['Derivatives', 'CurrencyService', 'InvestmentStock']}

document = db.accounts.find_one(query)

pprint(document)

{'_id': ObjectId('5ca4bbc7a2dd94ee5816238e'),
 'account_id': 198100,
 'limit': 10000,
 'products': ['Derivatives', 'CurrencyService', 'InvestmentStock'],
 'status': 'VIP account'}


In [35]:
# Finding several documents

documents = db.accounts.find().limit(3)

for x in documents:
    pprint(x)
    pprint("                         ")

{'_id': ObjectId('5ca4bbc7a2dd94ee5816238c'),
 'account_id': 371138,
 'limit': 9000,
 'products': ['Derivatives', 'InvestmentStock'],
 'status': 'VIP account',
 'verified': True}
'                         '
{'_id': ObjectId('5ca4bbc7a2dd94ee5816238d'),
 'account_id': 557378,
 'limit': 10000,
 'products': ['InvestmentStock', 'Commodity', 'Brokerage', 'CurrencyService'],
 'status': 'VIP account'}
'                         '
{'_id': ObjectId('5ca4bbc7a2dd94ee5816238e'),
 'account_id': 198100,
 'limit': 10000,
 'products': ['Derivatives', 'CurrencyService', 'InvestmentStock'],
 'status': 'VIP account'}
'                         '


In [36]:
# Alternative to looping through output - list:

pprint(list(db.accounts.find()[0:3]))

[{'_id': ObjectId('5ca4bbc7a2dd94ee5816238c'),
  'account_id': 371138,
  'limit': 9000,
  'products': ['Derivatives', 'InvestmentStock'],
  'status': 'VIP account',
  'verified': True},
 {'_id': ObjectId('5ca4bbc7a2dd94ee5816238d'),
  'account_id': 557378,
  'limit': 10000,
  'products': ['InvestmentStock', 'Commodity', 'Brokerage', 'CurrencyService'],
  'status': 'VIP account'},
 {'_id': ObjectId('5ca4bbc7a2dd94ee5816238e'),
  'account_id': 198100,
  'limit': 10000,
  'products': ['Derivatives', 'CurrencyService', 'InvestmentStock'],
  'status': 'VIP account'}]


### Comparison Operators: Greater than  &  Less than

In [37]:
# Comparison operators: $gt, $lt, $gte, $lte

query = {"limit" : {"$lt" : 9000}}

document = db.accounts.find(query).limit(3)

for x in document:
    pprint(x)

{'_id': ObjectId('5ca4bbc7a2dd94ee58162458'),
 'account_id': 852986,
 'limit': 7000,
 'products': ['Derivatives',
              'Commodity',
              'CurrencyService',
              'InvestmentFund',
              'InvestmentStock'],
 'status': 'VIP account'}
{'_id': ObjectId('5ca4bbc7a2dd94ee5816247a'),
 'account_id': 777752,
 'limit': 7000,
 'products': ['CurrencyService',
              'Brokerage',
              'Commodity',
              'InvestmentFund',
              'InvestmentStock'],
 'status': 'VIP account'}
{'_id': ObjectId('5ca4bbc7a2dd94ee581624d5'),
 'account_id': 312740,
 'limit': 8000,
 'products': ['CurrencyService', 'InvestmentStock'],
 'status': 'VIP account'}


### Operators: EQ & IN

In [38]:
# Operators $eq: to find documents with a field and value.
#     Syntax:  { field: { $eq:  <value> } }

query = {"limit" : {"$eq" : 3000}}

document = db.accounts.find(query).limit(3)

for x in document:
    pprint(x)

{'_id': ObjectId('5ca4bbc7a2dd94ee58162661'),
 'account_id': 417993,
 'limit': 3000,
 'products': ['InvestmentStock', 'InvestmentFund'],
 'status': 'VIP account'}
{'_id': ObjectId('5ca4bbc7a2dd94ee581626ad'),
 'account_id': 113123,
 'limit': 3000,
 'products': ['CurrencyService', 'InvestmentStock'],
 'status': 'VIP account'}


In [39]:
# Operators $in: to select documents equal to the values specified in the array.
#     Syntax:  { <field>: { $in:  [ <value>,  <value>,  <value>  ..] } } 

query = { "products": { "$in": ['Commodity', 'Brokerage'] } }

document = db.accounts.find(query).limit(2)

for x in document:
    pprint(x)

{'_id': ObjectId('5ca4bbc7a2dd94ee5816238d'),
 'account_id': 557378,
 'limit': 10000,
 'products': ['InvestmentStock', 'Commodity', 'Brokerage', 'CurrencyService'],
 'status': 'VIP account'}
{'_id': ObjectId('5ca4bbc7a2dd94ee58162390'),
 'account_id': 278603,
 'limit': 10000,
 'products': ['Commodity', 'InvestmentStock'],
 'status': 'VIP account'}


### Logical Operators: AND & OR & elemMatch

In [48]:
# Operator $and: performs a logical AND operation on an array of two or more expressions.
#     Syntax explicit:  { $and: [ { <expression>}, { <expression>}, … ] }

query = {
    "$and" : [
        {"limit" : 10000},
        {"products" : ['InvestmentStock', 'InvestmentFund']}
    ]
}

# same as
query = {
    "limit" : 10000,
    "products" : ['InvestmentStock', 'InvestmentFund']
    
}

document = db.accounts.find(query).limit(3)

for x in document:
    pprint(x)

{'_id': ObjectId('5ca4bbc7a2dd94ee58162477'),
 'account_id': 244662,
 'limit': 10000,
 'products': ['InvestmentStock', 'InvestmentFund'],
 'status': 'VIP account'}
{'_id': ObjectId('5ca4bbc7a2dd94ee58162590'),
 'account_id': 638191,
 'limit': 10000,
 'products': ['InvestmentStock', 'InvestmentFund'],
 'status': 'VIP account'}
{'_id': ObjectId('5ca4bbc7a2dd94ee5816268d'),
 'account_id': 683393,
 'limit': 10000,
 'products': ['InvestmentStock', 'InvestmentFund'],
 'status': 'VIP account'}


In [50]:
# Operator $or: performs a logical OR operation on an array of two or more expressions, 
#     and selects at least one expression given.

#     Syntax explicit:  { $or: [ { <expression>}, { <expression>}, … ] }

query = {
    "$or" : [
        {"limit" : 10000},
        {"products" : ['InvestmentStock', 'InvestmentFund']} 
    ]
}

document = db.accounts.find(query).limit(3)

for x in document:
    pprint(x)

{'_id': ObjectId('5ca4bbc7a2dd94ee5816238d'),
 'account_id': 557378,
 'limit': 10000,
 'products': ['InvestmentStock', 'Commodity', 'Brokerage', 'CurrencyService'],
 'status': 'VIP account'}
{'_id': ObjectId('5ca4bbc7a2dd94ee5816238e'),
 'account_id': 198100,
 'limit': 10000,
 'products': ['Derivatives', 'CurrencyService', 'InvestmentStock'],
 'status': 'VIP account'}
{'_id': ObjectId('5ca4bbc7a2dd94ee58162390'),
 'account_id': 278603,
 'limit': 10000,
 'products': ['Commodity', 'InvestmentStock'],
 'status': 'VIP account'}


In [51]:
# Operator $elemMatch: to query within an array and/or scalar value.

#     Syntax to find element only in arrays: use $eq: { field : { $elemMatch: { $eq: value }, },}   

query = {
    "$and" : [
        {"limit": {"$lte" : 7000}}, 
        {"products": 
            { 
                "$elemMatch" : {
                    "$eq" : 'CurrencyService', 
                    "$eq" : 'Brokerage'
                } 
            }
        }
    ]
}

document = db.accounts.find(query).limit(3)

for x in document:
    pprint(x)

{'_id': ObjectId('5ca4bbc7a2dd94ee5816247a'),
 'account_id': 777752,
 'limit': 7000,
 'products': ['CurrencyService',
              'Brokerage',
              'Commodity',
              'InvestmentFund',
              'InvestmentStock'],
 'status': 'VIP account'}
{'_id': ObjectId('5ca4bbc7a2dd94ee58162530'),
 'account_id': 453851,
 'limit': 7000,
 'products': ['CurrencyService', 'Brokerage', 'Derivatives', 'InvestmentStock'],
 'status': 'VIP account'}
{'_id': ObjectId('5ca4bbc7a2dd94ee581625ad'),
 'account_id': 385361,
 'limit': 7000,
 'products': ['InvestmentStock', 'Brokerage'],
 'status': 'VIP account'}


## INSERT_ONE( ) & INSERT_MANY( )

creates new documents in a JSON format and add it to the collection.

Typically, the insert operation is used within **multi-document transaction**. Where transaction referes to a group of database operations that are completed together as a unit or not at all. They are used when a group of related operations must either all succeed or all fail together. This property is known as automicity. 
Use cases: money transfer online, shopping online. Example in money transactions: involves two collections - accounts and transfers - that both must be updated simultanously and together.

[Documentation](https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.insert_one)

In [52]:
# Inspect a document in the collection 'customers'

pprint (db.customers.find_one())

{'_id': ObjectId('5ca4bbcea2dd94ee58162a68'),
 'accounts': [371138, 324287, 276528, 332179, 422649, 387979],
 'active': True,
 'address': '9286 Bethany Glens\nVasqueztown, CO 22939',
 'birthdate': datetime.datetime(1977, 3, 2, 2, 20, 31),
 'email': 'arroyocolton@gmail.com',
 'name': 'Elizabeth Ray',
 'tier_and_details': {'0df078f33aa74a2e9696e0520c1a828a': {'active': True,
                                                           'benefits': ['sports '
                                                                        'tickets'],
                                                           'id': '0df078f33aa74a2e9696e0520c1a828a',
                                                           'tier': 'Bronze'},
                      '699456451cc24f028d2aa99d7534c219': {'active': True,
                                                           'benefits': ['24 '
                                                                        'hour '
                                              

In [53]:
# Create a new document

time_now = time.time()

document_new = db.customers.insert_one({
     'accounts': [],
     'active': True,
     'address': 'Travessa Outeiro da Vela, 18, Lisbon',
     'birthdate': datetime(1998, 3, 2, 2, 20, 31),
     'email': 'paulobarister@email.com',
     'name': 'Paulo Barister',
     'last_modified': time_now
})

In [54]:
document = db.customers.find({'email': 'paulobarister@email.com'})

for x in document:
    pprint(x)

{'_id': ObjectId('65f066d0c3b278d43078065b'),
 'accounts': [],
 'active': True,
 'address': 'Travessa Outeiro da Vela, 18, Lisbon',
 'birthdate': datetime.datetime(1998, 3, 2, 2, 20, 31),
 'email': 'paulobarister@email.com',
 'last_modified': 1710253776.038034,
 'name': 'Paulo Barister'}
{'_id': ObjectId('65f06f53c3b278d43078065c'),
 'accounts': [],
 'active': True,
 'address': 'Travessa Outeiro da Vela, 18, Lisbon',
 'birthdate': datetime.datetime(1998, 3, 2, 2, 20, 31),
 'email': 'paulobarister@email.com',
 'last_modified': 1710255955.2582924,
 'name': 'Paulo Barister'}


In [55]:
# Check the documents in reverse id order

check = db.customers.find().sort('_id', -1).limit(5)

for x in check:
    pprint(x)

{'_id': ObjectId('65f06f53c3b278d43078065c'),
 'accounts': [],
 'active': True,
 'address': 'Travessa Outeiro da Vela, 18, Lisbon',
 'birthdate': datetime.datetime(1998, 3, 2, 2, 20, 31),
 'email': 'paulobarister@email.com',
 'last_modified': 1710255955.2582924,
 'name': 'Paulo Barister'}
{'_id': ObjectId('65f066d0c3b278d43078065b'),
 'accounts': [],
 'active': True,
 'address': 'Travessa Outeiro da Vela, 18, Lisbon',
 'birthdate': datetime.datetime(1998, 3, 2, 2, 20, 31),
 'email': 'paulobarister@email.com',
 'last_modified': 1710253776.038034,
 'name': 'Paulo Barister'}
{'_id': ObjectId('5ca4bbcea2dd94ee58162c5e'),
 'accounts': [896364, 450464],
 'address': '6942 Connie Skyway\nPatrickville, WA 16551',
 'birthdate': datetime.datetime(1973, 10, 23, 23, 52, 10),
 'email': 'amber97@hotmail.com',
 'name': 'Brandon Contreras',
 'tier_and_details': {'f4cebafe5530421b991303dff297643d': {'active': True,
                                                           'benefits': ['shopping '
     

## UPDATE_ONE( ) & UPDATE_MANY( )

updates documenst that match specified criteria.

Syntax: db.collection.update_one (filter, update)

Operators: `$inc`, `$set`, `$push`, ..`

[Documentation](https://pymongo.readthedocs.io/en/stable/api/pymongo/operations.html#pymongo.operations.UpdateOne)


In [56]:
db.accounts.find_one()

{'_id': ObjectId('5ca4bbc7a2dd94ee5816238c'),
 'account_id': 371138,
 'limit': 9000,
 'products': ['Derivatives', 'InvestmentStock'],
 'verified': True,
 'status': 'VIP account'}

In [57]:
# Update a document
#     Syntax: db.collection.update_one (filter, update)

# Filter
document_to_update = {"_id": ObjectId("5ca4bbc7a2dd94ee5816238c")}

# Update
update_info = {"$set": {"verified": True}}

# Print original document
pprint(db.accounts.find_one(document_to_update))

# Execute the update
result = db.accounts.update_one(document_to_update, update_info)

# Print updated document
pprint('                    ')
pprint(db.accounts.find_one(document_to_update))


{'_id': ObjectId('5ca4bbc7a2dd94ee5816238c'),
 'account_id': 371138,
 'limit': 9000,
 'products': ['Derivatives', 'InvestmentStock'],
 'status': 'VIP account',
 'verified': True}
'                    '
{'_id': ObjectId('5ca4bbc7a2dd94ee5816238c'),
 'account_id': 371138,
 'limit': 9000,
 'products': ['Derivatives', 'InvestmentStock'],
 'status': 'VIP account',
 'verified': True}


In [58]:
# Update many documents
#     Syntax: db.collection.update_many (filter, update)

# Filter
document_to_update = {"limit": { "$lte" : 10000 }}

# Update
update_info = {"$set": {"status": 'VIP account'}}

# Print original document
pprint(db.accounts.find(document_to_update))

# Execute the update
result = db.accounts.update_many(document_to_update, update_info)


# Check results
pprint("Documents matched: " + str(result.matched_count))
pprint("Documents updated: " + str(result.modified_count))

pprint(db.accounts.find_one(document_to_update))


<pymongo.cursor.Cursor object at 0x7f9658d847d0>
'Documents matched: 1745'
'Documents updated: 0'
{'_id': ObjectId('5ca4bbc7a2dd94ee5816238c'),
 'account_id': 371138,
 'limit': 9000,
 'products': ['Derivatives', 'InvestmentStock'],
 'status': 'VIP account',
 'verified': True}


## REPLACE_ONE

In [59]:
pprint(db.accounts.find_one({'account_id': 674364}))

{'_id': ObjectId('5ca4bbc7a2dd94ee5816238f'),
 'account_id': 674364,
 'status': 'Regular account'}


In [60]:
# Upsert doesn't exist as its own function, but as a part of find commands
#     Upsert when True, inserts a new document if no document matches the query. Defaults to False.

query = {'account_id': 674364}

pprint(db.accounts.find_one(query))

db.accounts.replace_one(query, {
    'account_id': 674364,
    'status': 'Regular account'
},upsert = True)

pprint(db.accounts.find_one(query)) #check

# Need to specify a unique identifier of document, otherwise replaces the first document on the collection list.

{'_id': ObjectId('5ca4bbc7a2dd94ee5816238f'),
 'account_id': 674364,
 'status': 'Regular account'}
{'_id': ObjectId('5ca4bbc7a2dd94ee5816238f'),
 'account_id': 674364,
 'status': 'Regular account'}


## DELETE_ONE ( )  &  DELETE_MANY ( )

Deletes documenta from a collection.

Syntax: db.collection.delete_one (filter)

Query filter for the most precision would be ObjectID.

N.B. If to run empty, it will delete the first document in the collection: db.collection.delete_one ({})

In [61]:
pprint(db.customers.find_one())

{'_id': ObjectId('5ca4bbcea2dd94ee58162a68'),
 'accounts': [371138, 324287, 276528, 332179, 422649, 387979],
 'active': True,
 'address': '9286 Bethany Glens\nVasqueztown, CO 22939',
 'birthdate': datetime.datetime(1977, 3, 2, 2, 20, 31),
 'email': 'arroyocolton@gmail.com',
 'name': 'Elizabeth Ray',
 'tier_and_details': {'0df078f33aa74a2e9696e0520c1a828a': {'active': True,
                                                           'benefits': ['sports '
                                                                        'tickets'],
                                                           'id': '0df078f33aa74a2e9696e0520c1a828a',
                                                           'tier': 'Bronze'},
                      '699456451cc24f028d2aa99d7534c219': {'active': True,
                                                           'benefits': ['24 '
                                                                        'hour '
                                              

In [62]:
# Delete one using ObjectId

query = {"_id": ObjectId("5ca4bbcea2dd94ee58162a69")}

pprint (db.customers.find_one( query ))

db.customers.delete_one( query )

pprint (db.customers.find_one( query ))

None
None


In [63]:
# Delete many using filter

query = {'limit':  {'$lte': 3000}}

pprint (db.accounts.find_one( query ))

db.customers.delete_one( query )

pprint (db.customers.find_one( query ))

{'_id': ObjectId('5ca4bbc7a2dd94ee58162661'),
 'account_id': 417993,
 'limit': 3000,
 'products': ['InvestmentStock', 'InvestmentFund'],
 'status': 'VIP account'}
None


In Lab 4, you learned how to:
        
        Query documents by using find_one() and find().
        Insert documents by using insert_one() and insert_many().
        Update documents by using update_one() and update_many().
        Delete documents by using delete_one() and delete_many().
        Use supporting operators: `$gt`, `$lt`, `$eq`, `$in`, `$and`, `$or`, `$elemMatch`.