## MongoDB
- MongoDB belongs to the family of NoSQL databases which is used for storing unstructured documents in JSON format.
- mongoDB replaces the concept of rows used in conventional sql database with something called as document.
- It offers flexibility to work with evolving data models.


### Why mongoDB
Faster

Partition Tolerence

Unlike SQL databases, where you must determine and declare a table's schema before inserting data, MongoDB's collections, by default, do not require their documents to have the same schema. That is:
- The documents in a single collection do not need to have the same set of fields and the data type for a field can differ across documents within a collection.

Database(Community) --> Collection(friends) --> Documents      

    {
        'name': "abc",
        'social_media': {
            'twitter': '@gdabc',
            'facebook': 'abc11'
        }
        'sports': ["volleyball", "tennis"]
    }
    {
        'user_name': "ghj",
        'social_media': {
            'twitter': '@gmhij'
        } 
    }


## Importing and connecting with the MongoDB

In [2]:
from pymongo import MongoClient

In [3]:
## Connects to "localhost"
client = MongoClient()#('localhost', 27017),('mongodb://localhost:27017/')

#Checking the databases
client.drop_database('test_db')
client.list_database_names()

['admin', 'agg_demo', 'config', 'local', 'newdatabase', 'newdb', 'trial']

## Creating an object for a database and a particular collection in mongodb
* A **collection** is a group of documents stored in MongoDB. Equivalent to **table** in Relational DB
* A **document** in a collection is equivalent to a **row** in a table of relational DB. 
* Data in MongoDB is represented (and stored) using **JSON-style** documents.
* In PyMongo we use **dictionaries** to represent documents.

### Document structure
##### Embedded/nested documents
    {
        '_id': ObjectId("5099803df3f4948bd2f98391"),
        'name': { 
            'first': "Alan",
            'last': "Turing" 
        },
        'contact': { 
            'phone': { 
                'type': "cell", 
                'number': "111-222-3333" 
                } 
            },
        'birth': 'Jun 23, 1912',
        'death': 'Jun 07, 1954'`m,
        'contribs': [ "Turing machine", "Turing test", "Turingery" ],
        'views' : NumberLong(1250000)
    }

In [4]:
#Defining a database
db = client.test_db      
db = client['test_db']   
client.list_database_names()

['admin', 'agg_demo', 'config', 'local', 'newdatabase', 'newdb', 'trial']

In [5]:
#Defining a collection
collection = db.test_collection    #attribute style access
collection = db['test_collection'] #dictionary style access
#Listing the collections in the database
print(db.list_collection_names())

[]


### Topics
* Insert
* Find
* Count
* Update
* Limit
* Delete
* Aggregation

## Insert: 
* Special key, "_id", is automatically added if the document doesn’t already contain an "_id" key. The value of "_id" must be unique across the collection. 


In [6]:
#Creating the collection

#db.test_col.drop()
coll = db.test_col
#Defining a document
doc =   {
    'Name': "abc",
    'social_media': {
        'twitter': '@gdabc',
        'facebook': 'abc11'
    },
    'sports': ["volleyball", "tennis"]
}
#Insert the doc in coll
data = coll.insert_one(doc)
data_id = data.inserted_id
#Listing the collections in the database
print(db.list_collection_names())
#New database added
client.list_database_names()

['test_col']


['admin',
 'agg_demo',
 'config',
 'local',
 'newdatabase',
 'newdb',
 'test_db',
 'trial']

In [7]:
data_id

ObjectId('60910b53f8cca3ed9afbf8f8')

In [8]:
doc = db.test_col.find_one()
print(doc)

{'_id': ObjectId('60910b53f8cca3ed9afbf8f8'), 'Name': 'abc', 'social_media': {'twitter': '@gdabc', 'facebook': 'abc11'}, 'sports': ['volleyball', 'tennis']}


In [8]:
#coll.delete_one({'_id':12})
doc =   { 
        '_id': 12,
        "Name": "A11",
        "Pass" : 12345,
        "Contact":6783428973
        }
data_id = coll.insert_one(doc).inserted_id
#print(coll.find_one({"_id":12}))

In [9]:
coll.delete_one({'_id':5})
#insert_many()
multiple_docs = [
            {
            "Name" : "C33",
            "Pass" : 25445,
            "Contact":6858869980
            },
             
            {
            "Name": "D44",
            "Pass" : 22335,
            "Contact":4251683492
            },
             
            {
            "_id" : 5,
            "Name": "E55",
            "Pass" : 65265,
            "Contact":7843697348
            }
            ]
#Insert many at a time
result = coll.insert_many(multiple_docs)
result.inserted_ids

[ObjectId('60910bc5f8cca3ed9afbf8f9'), ObjectId('60910bc5f8cca3ed9afbf8fa'), 5]

## Find
* find_one() method returns a single document matching a query (or None if there are no matches). 
* It is useful when you know there is only one matching document, or are only interested in the first match.

In [10]:
data_id

12

In [11]:
#find_one()
#pprint()

import pprint #Doesnt work when only 2 entries, fails when doc contains "_id"

print("\nPrinting without pprint:\n")
print(coll.find_one({"_id": data_id}))

print("\n\nPrinting with pprint and findone without arguments:\n")
pprint.pprint(coll.find_one())

print("\n\nPrinting with pprint and findone with arguments:\n")
pprint.pprint(coll.find_one({"Name": "A11"}))


Printing without pprint:

{'_id': ObjectId('60910b53f8cca3ed9afbf8f8'), 'Name': 'abc', 'social_media': {'twitter': '@gdabc', 'facebook': 'abc11'}, 'sports': ['volleyball', 'tennis']}


Printing with pprint and findone without arguments:

{'Name': 'abc',
 '_id': ObjectId('60910b53f8cca3ed9afbf8f8'),
 'social_media': {'facebook': 'abc11', 'twitter': '@gdabc'},
 'sports': ['volleyball', 'tennis']}


Printing with pprint and findone with arguments:

None


In [13]:
#_id is not a string

print("\nPost ID:")
print(data_id)

print("\nFinding with post ID:")
pprint.pprint(coll.find_one({"_id": data_id}))

print("\nFinding with string representing post ID:")
pprint.pprint(coll.find_one({"_id":5}))


Post ID:
60910b53f8cca3ed9afbf8f8

Finding with post ID:
{'Name': 'abc',
 '_id': ObjectId('60910b53f8cca3ed9afbf8f8'),
 'social_media': {'facebook': 'abc11', 'twitter': '@gdabc'},
 'sports': ['volleyball', 'tennis']}

Finding with string representing post ID:
{'Contact': 7843697348, 'Name': 'E55', 'Pass': 65265, '_id': 5}


In [14]:
#Showing that the id as string is not same as this id
print(type(data_id))

<class 'bson.objectid.ObjectId'>


In [15]:
# Iterating over the docs:
for doc in coll.find():
    pprint.pprint(doc)   
    print()

{'Name': 'abc',
 '_id': ObjectId('60910b53f8cca3ed9afbf8f8'),
 'social_media': {'facebook': 'abc11', 'twitter': '@gdabc'},
 'sports': ['volleyball', 'tennis']}

{'Contact': 6858869980,
 'Name': 'C33',
 'Pass': 25445,
 '_id': ObjectId('60910bc5f8cca3ed9afbf8f9')}

{'Contact': 4251683492,
 'Name': 'D44',
 'Pass': 22335,
 '_id': ObjectId('60910bc5f8cca3ed9afbf8fa')}

{'Contact': 7843697348, 'Name': 'E55', 'Pass': 65265, '_id': 5}



In [16]:
# Sorthing the results of find():
for doc in coll.find().sort("Pass"):
    pprint.pprint(doc)
    print()

{'Name': 'abc',
 '_id': ObjectId('60910b53f8cca3ed9afbf8f8'),
 'social_media': {'facebook': 'abc11', 'twitter': '@gdabc'},
 'sports': ['volleyball', 'tennis']}

{'Contact': 4251683492,
 'Name': 'D44',
 'Pass': 22335,
 '_id': ObjectId('60910bc5f8cca3ed9afbf8fa')}

{'Contact': 6858869980,
 'Name': 'C33',
 'Pass': 25445,
 '_id': ObjectId('60910bc5f8cca3ed9afbf8f9')}

{'Contact': 7843697348, 'Name': 'E55', 'Pass': 65265, '_id': 5}



In [17]:
# Using operators : $in, $eq, $gt etc.
#pprint.pprint(coll.find_one({"_id": data_id}))
for doc in coll.find({'Name': { '$in': ['User','abc']}}):
    pprint.pprint(doc)

{'Name': 'abc',
 '_id': ObjectId('60910b53f8cca3ed9afbf8f8'),
 'social_media': {'facebook': 'abc11', 'twitter': '@gdabc'},
 'sports': ['volleyball', 'tennis']}


## Count

In [20]:
coll.count_documents({})

4

## Update

In [21]:
#Using regex and $set:

for doc in coll.find({"Name": {"$regex": "^B"}}):
    pprint.pprint(doc)
    
myquery = { "Name": { "$regex": "^C" } }
newvalues = { "$set": { "Name": 'ABC' } }

coll.update_many(myquery, newvalues)

for doc in coll.find({"Name": {"$regex": "^C"}}):
    pprint.pprint(doc)

In [22]:
coll.update_many({ "Name": { "$regex": "^C" } }, { "$set": { "Name": "Steve" } })
coll.update_one({ "Name": { "$regex": "^B" } }, { "$set": { "Name": "Annet" } })

<pymongo.results.UpdateResult at 0x17b5c890fc0>

In [23]:
for doc in coll.find():
    pprint.pprint(doc)
    print()

{'Name': 'abc',
 '_id': ObjectId('60910b53f8cca3ed9afbf8f8'),
 'social_media': {'facebook': 'abc11', 'twitter': '@gdabc'},
 'sports': ['volleyball', 'tennis']}

{'Contact': 6858869980,
 'Name': 'ABC',
 'Pass': 25445,
 '_id': ObjectId('60910bc5f8cca3ed9afbf8f9')}

{'Contact': 4251683492,
 'Name': 'D44',
 'Pass': 22335,
 '_id': ObjectId('60910bc5f8cca3ed9afbf8fa')}

{'Contact': 7843697348, 'Name': 'E55', 'Pass': 65265, '_id': 5}



# Limit

In [24]:
for doc in coll.find().limit(3):
    pprint.pprint(doc)
    print()

{'Name': 'abc',
 '_id': ObjectId('60910b53f8cca3ed9afbf8f8'),
 'social_media': {'facebook': 'abc11', 'twitter': '@gdabc'},
 'sports': ['volleyball', 'tennis']}

{'Contact': 6858869980,
 'Name': 'ABC',
 'Pass': 25445,
 '_id': ObjectId('60910bc5f8cca3ed9afbf8f9')}

{'Contact': 4251683492,
 'Name': 'D44',
 'Pass': 22335,
 '_id': ObjectId('60910bc5f8cca3ed9afbf8fa')}



# Delete

In [25]:
for doc in coll.find():
    pprint.pprint(doc)
    print()

{'Name': 'abc',
 '_id': ObjectId('60910b53f8cca3ed9afbf8f8'),
 'social_media': {'facebook': 'abc11', 'twitter': '@gdabc'},
 'sports': ['volleyball', 'tennis']}

{'Contact': 6858869980,
 'Name': 'ABC',
 'Pass': 25445,
 '_id': ObjectId('60910bc5f8cca3ed9afbf8f9')}

{'Contact': 4251683492,
 'Name': 'D44',
 'Pass': 22335,
 '_id': ObjectId('60910bc5f8cca3ed9afbf8fa')}

{'Contact': 7843697348, 'Name': 'E55', 'Pass': 65265, '_id': 5}



In [27]:
coll.delete_one({'_id': 5})
for doc in coll.find():
    pprint.pprint(doc)
    print()

{'Name': 'abc',
 '_id': ObjectId('60910b53f8cca3ed9afbf8f8'),
 'social_media': {'facebook': 'abc11', 'twitter': '@gdabc'},
 'sports': ['volleyball', 'tennis']}

{'Contact': 6858869980,
 'Name': 'ABC',
 'Pass': 25445,
 '_id': ObjectId('60910bc5f8cca3ed9afbf8f9')}

{'Contact': 4251683492,
 'Name': 'D44',
 'Pass': 22335,
 '_id': ObjectId('60910bc5f8cca3ed9afbf8fa')}



##### Aggregation

    Aggregation operations process multiple documents and return computed results. Aggregation operations group values from 
    multiple documents together, and can perform a variety of operations on the grouped data to return a single result.

Documentation for aggregation framework in python

https://pymongo.readthedocs.io/en/stable/examples/aggregation.html#aggregation-framework 

In [29]:
import pymongo
db = client.agg_demo
db.grades.drop()
docs = [
    {'s_id': 1,'c_id': 1,'grades': 10},
    {'s_id': 1,'c_id': 2,'grades': 15},
    {'s_id': 1,'c_id': 3,'grades': 50},
    {'s_id': 2,'c_id': 1,'grades': 40},
    {'s_id': 2,'c_id': 2,'grades': 20},
    {'s_id': 2,'c_id': 3,'grades': 11},
    {'s_id': 3,'c_id': 1,'grades': 16},
    {'s_id': 3,'c_id': 2,'grades': 18},
    {'s_id': 3,'c_id': 3,'grades': 37},
    {'s_id': 4,'c_id': 1,'grades': 23},
    {'s_id': 4,'c_id': 2,'grades': 41},
    {'s_id': 4,'c_id': 3,'grades': 53}
]

db.grades.insert_many(docs)

for doc in db.grades.aggregate([{"$group":{"_id":"$s_id","total":{"$sum":"$grades"}}},{"$sort":{"_id":pymongo.ASCENDING}},
                                {"$limit":2}]):
    pprint.pprint(doc)
    print()

{'_id': 1, 'total': 75}

{'_id': 2, 'total': 71}



In [26]:
#Printing the key names in document:
db = client.test_db
print("Key Names for one document:")
for key_name in  coll.find_one({'Name':'ABC'}):
    print(key_name)

Key Names for one document:
_id
Name
Pass
Contact
