# MongoDB - PyMongo 
####  Shubham Kumar [18030142032]

<b>MongoDB</b> is an open source document-oriented database system that stores data in form of documents (key and value pairs). It is part of the NoSQL family of database systems. Instead of storing data in tables as done in a relational database, MongoDB stores structured data as JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster. The database is used by MTV Networks, Craigslist, Foursquare and UIDAI Aadhaar. MongoDB is the most popular NoSQL database management system.<br>

<b>PyMongo</b> is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python.

## <br> Downloading and Installation -

<b>[MongoDB]</b><br>
Follow the instructions given in the link to download and install MongoDB on your system:<br>
https://docs.mongodb.com/manual/administration/install-community/<br>

<b>[PyMongo]</b><br>
Follow the instructions given in the link to download and install PyMongo on your systen:<br>
https://api.mongodb.com/python/current/installation.html<br>
or use PIP to install : $ python -m pip install pymongo 

## <br><br> Introduction to NoSQL 
NoSQL databases are different than relational databases. In relational database you need to create the table, define schema, set the data types of fields etc before you can actually insert the data. In NoSQL you don’t have to worry about that, you can insert, update data on the fly.<br>
NoSQL database are easy to scale and they are much faster in most types of operations that we perform on database. There are certain situations where you would prefer relational database over NoSQL, however when you are dealing with huge amount of data then NoSQL database is your best choice.<br><br>
<b>Types of NoSQL Databases:</b><br><br>
<b>Key Value Store:</b> Memcached, Redis, Coherence<br>
<b>Tabular:</b> Hbase, Big Table, Accumulo<br>
<b>Document based:</b> MongoDB, CouchDB, Cloudant<br>
<br>
    
<b>When to choose NoSQL Database</b>

    When you want to store and retrieve huge amount of data.
    The relationship between the data you store is not that important
    The data is not structured and changing over time
    Constraints and Joins support is not required at database level
    The data is growing continuously and you need to scale the database regular to handle the data.


## Differences between some NoSQL Databases
<img src="./Cassandra_Reddis.png" style="width:100%">
<br><br>
<img src="./Mongo_Hadoop.png" style="width:100%">
<br><br>

## MongoDB Features - 
1. MongoDB provides high performance. Most of the operations in the MongoDB are faster compared to relational databases.
2. MongoDB provides auto replication feature that allows you to quickly recover data in case of a failure.
3. Horizontal scaling is possible in MongoDB because of sharing. Sharding is partitioning of data and placing it on multiple machines in such a way that the order of the data is preserved. Horizontal scaling means adding more machines to handle the data.
4. Load balancing: Horizontal scaling allows MongoDB to balanace the load.
5. High Availabilty: Auto Replication improves the availability of MongoDB database.
6. Indexing: Index is a single field within the document. Indexes are used to quickly locate data without having to search every document in a MongoDB database. This improves the performance of operations performed on the MongoDB database.

## <br><br> Documents and Collections 
<b>Document</b> is the unit of storing data in a MongoDB database.
Document use JSON (JavaScript Object Notation) style for storing data.
<br><br>
<b>JSON</b> is short for JavaScript Object Notation, and is a way to store information  easy-to-access manner. 
It gives us a human-readable collection of data that we can access in a really logical manner.
It can be stored in array or nested format also.<br>
JSON Example:<br>
<code>{"menu": {
      "id": "file",
      "value": "File",
      "popup": {
        "menuitem": [
          {"value": "New", "onclick": "CreateNewDoc()"},
          {"value": "Open", "onclick": "OpenDoc()"},
          {"value": "Close", "onclick": "CloseDoc()"}
        ]
      }
    }}</code>
    
It’s easy for developers to work with  no sql as there is no structure to collections and as many keys can be added to documents whereas it’s not possible with RDBMS without changing table structure.
In relational database systems you must define a schema before adding records to a database. 
The schema is the structure described in a formal language supported by the database and provides a blueprint for the tables in a database and the relationships between tables of data. 
NOSQL is faster because it does not deal with any structure metadata ,
We can design no sql in structured way also that’s why it’s call not only sql - NOSQL<br><br>

<b>Collection: </b>A collection may store a number of documents. A collection is analogous to a table of an RDBMS.A collection may store documents those who are not same in structure. 
<br><br>

### Key Differences


<b>Table(SQL) - RDBMS</b>
* Maintains relations between the data
* Fixed or predefined schema Data is stored in rows and columns
* Foreign Key relations are supported by DB.
* Data will not be stored if we violate any of the column data type or foreign key or primary key.
* Joins can be used effectively to query the data.
* Vertically Scalable (would be limited on the hardware, say you cannot keep on adding RAM into a server machine, The machine has its own limit of how much RAM can be increased) Storing and Retrieving is comparatively slower when data is huge.

<br>

<b>MongoDB Collection - NoSQL </b>
* No relation is maintained between the data - Dynamic Schema
* Data is stored as Document
* Dynamic schema allows to save the document of any data type or any number of parameters.
* Horizontally Scalable which is simply can be done by adding more servers - Storing and Retrieving is faster
* No explicit foreign Key support is available whereas we can design the schema by having foreign key


## Mapping RDBMS to MongoDB
<img src="./RDBMS_MongoDB_Mapping.jpg">

    Collections in MongoDB is equivalent to the tables in RDBMS.
    Documents in MongoDB is equivalent to the rows in RDBMS.
    Fields in MongoDB is equivalent to the columns in RDBMS.

### <br>Table vs Collection
<img src="Format_mapping_relational_database_to_MongoDB.jpg">
<br><br><br><br><br>

# PyMongo Basic Opeations<br><br>

### Connecting to MongoDB

In [1]:
import pymongo

# Connection to Mongo DB
conn = None
try:
    uri = 'localhost:27017'
    # 'mongodb:// USER : PASSWORD @ SERVER_NAME : PORT / DATABASENAME')
    conn=pymongo.MongoClient(uri)
    print("Connected successfully!!!")
except pymongo.errors.ConnectionFailure as e:
    print("Could not connect to MongoDB: %s" % e)  
conn

Connected successfully!!!


MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

In [2]:
conn.stats # show details about the connection

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'stats')

### Database Opearations 
Mongodb creates databases and collections automatically if they don't exist already. A single instance of MongoDB can support multiple independent databases.

In [3]:
# Set database name to work with or create if not available.
db = conn.mydb
db

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'mydb')

In [4]:
# Using dictionary style access
db1 = conn['mydb-test']
db1

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'mydb-test')

<b>NOTE:</b> Databases with no collections or with empty collections will not show up with database_names(). Same goes when we try to list empty collections in a database.

In [5]:
# Returns Databases Names
# conn.database_names() - depricated
conn.list_database_names()

['admin', 'config', 'local', 'mydb', 'people']

In [6]:
# Drop existing Database
conn.drop_database('mydb-test')

### <br><br> Collection Opearations 
We can create the collection or leave to MongoDB to create it as soon as a document is generated.

In [7]:
# Create a new collection. 
db.create_collection('adressbook') # Optional collection creation

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'mydb'), 'adressbook')

In [8]:
# Show Collections.
list (db.list_collections())
# Query returns a Cursor [ ].

[{'name': 'adressbook',
  'type': 'collection',
  'options': {},
  'info': {'readOnly': False,
   'uuid': UUID('eade0218-e5fa-4504-98b2-d30bff9b0d82')},
  'idIndex': {'v': 2,
   'key': {'_id': 1},
   'name': '_id_',
   'ns': 'mydb.adressbook'}},
 {'name': 'addressbook',
  'type': 'collection',
  'options': {},
  'info': {'readOnly': False,
   'uuid': UUID('33165179-b258-4186-9cb0-f6636932cff9')},
  'idIndex': {'v': 2,
   'key': {'_id': 1},
   'name': '_id_',
   'ns': 'mydb.addressbook'}}]

In [9]:
# Set the collection to work
collection = db.adressbook
collection.insert_one({'name' : 'Shubham'})     # Insert one item to create the collection
list(db.list_collections())

[{'name': 'adressbook',
  'type': 'collection',
  'options': {},
  'info': {'readOnly': False,
   'uuid': UUID('eade0218-e5fa-4504-98b2-d30bff9b0d82')},
  'idIndex': {'v': 2,
   'key': {'_id': 1},
   'name': '_id_',
   'ns': 'mydb.adressbook'}},
 {'name': 'addressbook',
  'type': 'collection',
  'options': {},
  'info': {'readOnly': False,
   'uuid': UUID('33165179-b258-4186-9cb0-f6636932cff9')},
  'idIndex': {'v': 2,
   'key': {'_id': 1},
   'name': '_id_',
   'ns': 'mydb.addressbook'}}]

In [10]:
# Rename a collections
db.adressbook.rename('addressbooks')
collection = db.addressbook
list(db.list_collections())

[{'name': 'addressbooks',
  'type': 'collection',
  'options': {},
  'info': {'readOnly': False,
   'uuid': UUID('eade0218-e5fa-4504-98b2-d30bff9b0d82')},
  'idIndex': {'v': 2,
   'key': {'_id': 1},
   'name': '_id_',
   'ns': 'mydb.addressbooks'}},
 {'name': 'addressbook',
  'type': 'collection',
  'options': {},
  'info': {'readOnly': False,
   'uuid': UUID('33165179-b258-4186-9cb0-f6636932cff9')},
  'idIndex': {'v': 2,
   'key': {'_id': 1},
   'name': '_id_',
   'ns': 'mydb.addressbook'}}]

In [11]:
# Delete collection
db.drop_collection('addressbooks')
# empty list '[]' means that there are not collections in database
print(list (db.list_collections()))

[{'name': 'addressbook', 'type': 'collection', 'options': {}, 'info': {'readOnly': False, 'uuid': UUID('33165179-b258-4186-9cb0-f6636932cff9')}, 'idIndex': {'v': 2, 'key': {'_id': 1}, 'name': '_id_', 'ns': 'mydb.addressbook'}}]


### 1. Create Read Update Delete (one document)

#### 1.1 Create Document :  insert_one()

In [12]:
data = {  'name' : "Mike Myers" ,                                    # String 
          'age' : 55, # Integer
          'gender' : "Male", # String 
          'likes_python' : False, # Boolean
          'address': {
              'street' : "564 $ Ronald Reagan Blvd, Longwood",  # String ( special character with escape \ )
              'number' : "405-359-385", # String containing a number
              'city' :  "New York",  # String 
              'floor' : None,  # Null 
              'postalcode' : "300193",  # String containing a number
          'favouriteFruits': ['banana','pineapple','orange'] # Array  

          }        
       }

insert_result = collection.insert_one(data)

In [13]:
insert_result.acknowledged    # Confirms that insert is successful

True

In [14]:
insert_result.inserted_id     # Shows the document ID

ObjectId('5c84a1b3ba268b3b354dc39f')

#### 1.2 Read document: find()

In [15]:
list (collection.find()) # gets all data of collection

[{'_id': ObjectId('5c84973bba268b39f39af59a'),
  'name': 'Mike Myers',
  'age': 55,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']},
  'favoriteColor': 'red',
  'favoriteBook': 'Harry Potter'},
 {'_id': ObjectId('5c84a1b3ba268b3b354dc39f'),
  'name': 'Mike Myers',
  'age': 55,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']}}]

In [16]:
list( collection.find( {'_id' : insert_result.inserted_id } )) # Find the inserted document using the objectID

[{'_id': ObjectId('5c84a1b3ba268b3b354dc39f'),
  'name': 'Mike Myers',
  'age': 55,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']}}]

In [17]:
list ( collection.find( {'name' : "Mike Myers" } )) # find, can use one key or more

[{'_id': ObjectId('5c84973bba268b39f39af59a'),
  'name': 'Mike Myers',
  'age': 55,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']},
  'favoriteColor': 'red',
  'favoriteBook': 'Harry Potter'},
 {'_id': ObjectId('5c84a1b3ba268b3b354dc39f'),
  'name': 'Mike Myers',
  'age': 55,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']}}]

In [18]:
list ( collection.find( {'address.city' : "New York" } )) # find, can use one key or more

[{'_id': ObjectId('5c84973bba268b39f39af59a'),
  'name': 'Mike Myers',
  'age': 55,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']},
  'favoriteColor': 'red',
  'favoriteBook': 'Harry Potter'},
 {'_id': ObjectId('5c84a1b3ba268b3b354dc39f'),
  'name': 'Mike Myers',
  'age': 55,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']}}]

In [19]:
list ( collection.find().limit(1)) # gets a Limited set of documents

[{'_id': ObjectId('5c84973bba268b39f39af59a'),
  'name': 'Mike Myers',
  'age': 55,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']},
  'favoriteColor': 'red',
  'favoriteBook': 'Harry Potter'}]

In [20]:
list ( collection.find().skip(1)) # gets all documents skipping first

[{'_id': ObjectId('5c84a1b3ba268b3b354dc39f'),
  'name': 'Mike Myers',
  'age': 55,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']}}]

#### 1.3 Update document: update_one()

In [21]:
## Update an existing document
update_result = collection.update_one( {'name' : "Mike Myers"}, {'$set' : { 'age' : 30 }}) 
list (collection.find( {'name' : "Mike Myers" }))

[{'_id': ObjectId('5c84973bba268b39f39af59a'),
  'name': 'Mike Myers',
  'age': 30,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']},
  'favoriteColor': 'red',
  'favoriteBook': 'Harry Potter'},
 {'_id': ObjectId('5c84a1b3ba268b3b354dc39f'),
  'name': 'Mike Myers',
  'age': 55,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']}}]

In [22]:
update_result.raw_result

{'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}

In [23]:
## Insert a new document with update, will avoid to crash during insert if document already exist
insert_result = collection.update_one( {'name' : 'Javi Gonzalez'}, {'$set' : { 'age' : 30 }}, upsert= True )
list(collection.find( {'name' : 'Javi Gonzalez'} ))

[{'_id': ObjectId('5c84a1b4a32d6bf367f8eb0f'),
  'name': 'Javi Gonzalez',
  'age': 30}]

In [24]:
update_result.acknowledged

True

#### 1.4 Delete document: delete_one()

In [25]:
delete = collection.delete_one({'name': 'Javi Gonzalez'})

In [26]:
delete.deleted_count   # informs that 1 document has been deleted

1

In [27]:
collection.delete_one({'name': "Mike Myers"})

<pymongo.results.DeleteResult at 0x7f94f04911c8>

### 2. Create Read Update Delete (many document)

#### 2.1 Create documents: insert_many()

In [28]:
import datetime
collection.insert_many([ # <--- start a list with [
##  Insert Document 1
  {
  'name': 'Jordi Gonzalez',
  'age': 25,
  'likes_python': True,
  'registered': datetime.datetime(2018, 2, 11, 4, 22, 39),
  'address': {
      'street': 'Torrent de l\'Olla',
      'number': 70,
      'floor': None,
      'city': 'Barcelona',
      'postalCode': '08012'
             },
  'height':  1.72,
  'favouriteFruits': ['banana','pineapple','orange']
  },

##  Insert Document 2
  {
  'name': 'Maria Smith',
  'age': 30,
  'likes_python': True,
  'registered': datetime.datetime(2019, 2, 23, 7, 34, 12),
  'address': {
      'street': 'Numancia',
       ##  missing number
       ##  missing floor
      'city': 'Barcelona',
      'postalCode': '08029'
             },
  'height':  1.56,
  'favouriteFruits': ['lemon','pineapple']
  }
  ]) # <--- finalize the list ]

<pymongo.results.InsertManyResult at 0x7f94f04ad2c8>

#### 2.2 Read many documents: find()

In [29]:
list(collection.find({'$or': [{'name': 'Jordi Gonzalez'},{'name': 'Maria Smith'} ]}))
# $or(aggregation) - Evaluates one or more expressions and returns true if any of the expressions are true.
# Otherwise, $or returns false.

[{'_id': ObjectId('5c84a1b4ba268b3b354dc3a0'),
  'name': 'Jordi Gonzalez',
  'age': 25,
  'likes_python': True,
  'registered': datetime.datetime(2018, 2, 11, 4, 22, 39),
  'address': {'street': "Torrent de l'Olla",
   'number': 70,
   'floor': None,
   'city': 'Barcelona',
   'postalCode': '08012'},
  'height': 1.72,
  'favouriteFruits': ['banana', 'pineapple', 'orange']},
 {'_id': ObjectId('5c84a1b4ba268b3b354dc3a1'),
  'name': 'Maria Smith',
  'age': 30,
  'likes_python': True,
  'registered': datetime.datetime(2019, 2, 23, 7, 34, 12),
  'address': {'street': 'Numancia',
   'city': 'Barcelona',
   'postalCode': '08029'},
  'height': 1.56,
  'favouriteFruits': ['lemon', 'pineapple']}]

#### 2.3 Update many documents: update_many()

In [30]:
collection.update_many( {'isActive': True }, {'$set' : { 'isActive': False }} )
list (collection.find( )) # List all documents

[{'_id': ObjectId('5c84a1b3ba268b3b354dc39f'),
  'name': 'Mike Myers',
  'age': 55,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']}},
 {'_id': ObjectId('5c84a1b4ba268b3b354dc3a0'),
  'name': 'Jordi Gonzalez',
  'age': 25,
  'likes_python': True,
  'registered': datetime.datetime(2018, 2, 11, 4, 22, 39),
  'address': {'street': "Torrent de l'Olla",
   'number': 70,
   'floor': None,
   'city': 'Barcelona',
   'postalCode': '08012'},
  'height': 1.72,
  'favouriteFruits': ['banana', 'pineapple', 'orange']},
 {'_id': ObjectId('5c84a1b4ba268b3b354dc3a1'),
  'name': 'Maria Smith',
  'age': 30,
  'likes_python': True,
  'registered': datetime.datetime(2019, 2, 23, 7, 34, 12),
  'address': {'street': 'Numancia',
   'city': 'Barcelona',
   'postalCode': '08029'},
  'height': 1.56,
  'f

#### 2.4 Delete many documents: delete_many()

In [31]:
delete = collection.delete_many({'likes_python': True}) # deletes as many documents as the filter
list (collection.find()) # List all documents

[{'_id': ObjectId('5c84a1b3ba268b3b354dc39f'),
  'name': 'Mike Myers',
  'age': 55,
  'gender': 'Male',
  'likes_python': False,
  'address': {'street': '564 $ Ronald Reagan Blvd, Longwood',
   'number': '405-359-385',
   'city': 'New York',
   'floor': None,
   'postalcode': '300193',
   'favouriteFruits': ['banana', 'pineapple', 'orange']}}]

In [32]:
delete.deleted_count   # items deleted

2

## Closing a Connection

In [33]:
try:
    conn.close()
except Exception as e:
    print(e) 

## <br><br><br> Importing DataSet directly into MongoDB

In [34]:
from pymongo import MongoClient 
uri = 'localhost:27017'
client = MongoClient( uri )

In [35]:
client.list_database_names() # From previous module, we will have already a database called 'people'
client.drop_database('people') # We delete previous module data

In [36]:
client.list_database_names()

['admin', 'config', 'local', 'mydb']

In [37]:
# sudo apt install mongo-tools - use to install mongo tools - mongoimport, etc
!mongoimport --jsonArray --db people --collection addressbook ./contacts.json

2019-03-10T11:03:41.856+0530	connected to: localhost
2019-03-10T11:03:42.144+0530	imported 1000 documents


In [38]:
client.list_database_names()

['admin', 'config', 'local', 'mydb', 'people']

### 2. Database overview

In [39]:
db = client.people  # Set the database to work on
db.list_collection_names() # List the collections available

['addressbook']

In [40]:
collection = db['addressbook'] # Set the collection to work on

In [41]:
## Dataset content summary
num_documents = collection.count_documents({'_id' : {'$exists' : 1}})
attributes = list (collection.find().limit(1))

print ('Number of documents : %d' % num_documents)
print ('Attributes names : %s' % attributes)

Number of documents : 1000
Attributes names : [{'_id': ObjectId('5c84a1b5a32d6bf367f8eb24'), 'index': 3, 'name': 'Karyn Rhodes', 'isActive': True, 'registered': datetime.datetime(2014, 3, 11, 3, 2, 33), 'age': 39, 'gender': 'female', 'eyeColor': 'green', 'favoriteFruit': 'strawberry', 'company': {'title': 'RODEMCO', 'email': 'karynrhodes@rodemco.com', 'phone': '+1 (801) 505-3760', 'location': {'country': 'USA', 'address': '521 Seigel Street'}}, 'tags': ['cillum', 'exercitation', 'excepteur']}]


### <br><br><br> 3.Adding and deleting attributes

In [42]:
# Adding Attributes
collection.update_many( {"age" :{ "$gte" :0 }}, {"$set" : { "favoriteColor" : "red" }})
collection.update_many( {"age" :{ "$nin" : [""] }}, {"$set" : { "favoriteBook" : "Harry Potter" }})
list(collection.find({"age" : 20} , {"favoriteColor","favoriteBook", "name","age"}).limit(3))

[{'_id': ObjectId('5c84a1b5a32d6bf367f8eb2e'),
  'name': 'Wendy Sampson',
  'age': 20,
  'favoriteColor': 'red',
  'favoriteBook': 'Harry Potter'},
 {'_id': ObjectId('5c84a1b5a32d6bf367f8eb31'),
  'name': 'Aurelia Gonzales',
  'age': 20,
  'favoriteColor': 'red',
  'favoriteBook': 'Harry Potter'},
 {'_id': ObjectId('5c84a1b5a32d6bf367f8eb32'),
  'name': 'Grace Larson',
  'age': 20,
  'favoriteColor': 'red',
  'favoriteBook': 'Harry Potter'}]

In [43]:
# Deleting Attributes
collection.update_many( {"age" :{ "$gte" :0 }}, {"$unset" :{ "favoriteColor" :1 , "favoriteBook" :1}})
list(collection.find({"age" : 20} , {"favoriteColor","favoriteBook", "name","age"}).limit(3))

[{'_id': ObjectId('5c84a1b5a32d6bf367f8eb2e'),
  'name': 'Wendy Sampson',
  'age': 20},
 {'_id': ObjectId('5c84a1b5a32d6bf367f8eb31'),
  'name': 'Aurelia Gonzales',
  'age': 20},
 {'_id': ObjectId('5c84a1b5a32d6bf367f8eb32'),
  'name': 'Grace Larson',
  'age': 20}]

### 4. Listing Attributes Name 

In [44]:
print(list(collection.find({"age" : 38, "gender" : 'female'}).limit(1)[1]))

['_id', 'index', 'name', 'isActive', 'registered', 'age', 'gender', 'eyeColor', 'favoriteFruit', 'company', 'tags']


### <br><br> 5. Advance Search Opeations: 

#### 5.1 Find document by id

In [45]:
# Get the id of existing document
documents = collection.find( {"_id": {"$exists": True}} , ['name','age']).limit(1)
itemId = None

for item in documents:
    itemId = str( item['_id'] )

itemId

'5c84a1b5a32d6bf367f8eb24'

In [46]:
from bson.objectid import ObjectId
list(collection.find({"_id": ObjectId( itemId )} , ['name','age','favoriteFruit','company.email']))

[{'_id': ObjectId('5c84a1b5a32d6bf367f8eb24'),
  'name': 'Karyn Rhodes',
  'age': 39,
  'favoriteFruit': 'strawberry',
  'company': {'email': 'karynrhodes@rodemco.com'}}]

#### 5.2 Filter documents by field

In [47]:
filters = {"isActive": True}
fields = ['name','age', 'isActive','company.email']

list(collection.find( filters , fields ).limit(1))

[{'_id': ObjectId('5c84a1b5a32d6bf367f8eb24'),
  'name': 'Karyn Rhodes',
  'isActive': True,
  'age': 39,
  'company': {'email': 'karynrhodes@rodemco.com'}}]

In [48]:
print(collection.count_documents(filters))

516


In [49]:
filters = {"$or": [{"age" : 28}, {"age" : 29}] , "gender" : 'female'}

print(collection.count_documents (filters) ) # count in Mongo the found documents 
print(len (list (collection.find(filters))) ) # count in Python the found documents

36
36


#### 5.3 Find by REGEX

In [50]:
import re
regex = re.compile('^Sh', re.IGNORECASE)

filters = { 'name' : regex }
fields = { '_id' : 0, 'name' : 1, 'isActive' : 1, 'age' : 1 }     #  Hide _id in reply  

list ( collection.find( filters , fields ) )

[{'name': 'Sharon Grimes', 'isActive': True, 'age': 28},
 {'name': 'Sheila Lynch', 'isActive': True, 'age': 31},
 {'name': 'Sheri Jensen', 'isActive': False, 'age': 33},
 {'name': 'Shari Henderson', 'isActive': True, 'age': 22},
 {'name': 'Shepherd Haynes', 'isActive': False, 'age': 38},
 {'name': 'Shelly Wilson', 'isActive': True, 'age': 39},
 {'name': 'Sheena Spence', 'isActive': False, 'age': 28},
 {'name': 'Sharp Walker', 'isActive': False, 'age': 38},
 {'name': 'Shelley Cherry', 'isActive': True, 'age': 36},
 {'name': 'Sheryl Hogan', 'isActive': False, 'age': 39},
 {'name': 'Shirley Blankenship', 'isActive': True, 'age': 21},
 {'name': 'Shannon Burke', 'isActive': True, 'age': 33},
 {'name': 'Sherman Gutierrez', 'isActive': False, 'age': 36},
 {'name': 'Shana Fry', 'isActive': False, 'age': 39},
 {'name': 'Shaffer Hopkins', 'isActive': True, 'age': 38},
 {'name': 'Sherri Shepherd', 'isActive': True, 'age': 30}]

#### 5.4 Sorting Queries in Ascending or Descending Order

In [51]:
# Ascending
list ( collection.find( filters , fields ).sort('age', pymongo.ASCENDING) )

[{'name': 'Shirley Blankenship', 'isActive': True, 'age': 21},
 {'name': 'Shari Henderson', 'isActive': True, 'age': 22},
 {'name': 'Sharon Grimes', 'isActive': True, 'age': 28},
 {'name': 'Sheena Spence', 'isActive': False, 'age': 28},
 {'name': 'Sherri Shepherd', 'isActive': True, 'age': 30},
 {'name': 'Sheila Lynch', 'isActive': True, 'age': 31},
 {'name': 'Sheri Jensen', 'isActive': False, 'age': 33},
 {'name': 'Shannon Burke', 'isActive': True, 'age': 33},
 {'name': 'Shelley Cherry', 'isActive': True, 'age': 36},
 {'name': 'Sherman Gutierrez', 'isActive': False, 'age': 36},
 {'name': 'Shepherd Haynes', 'isActive': False, 'age': 38},
 {'name': 'Sharp Walker', 'isActive': False, 'age': 38},
 {'name': 'Shaffer Hopkins', 'isActive': True, 'age': 38},
 {'name': 'Shelly Wilson', 'isActive': True, 'age': 39},
 {'name': 'Sheryl Hogan', 'isActive': False, 'age': 39},
 {'name': 'Shana Fry', 'isActive': False, 'age': 39}]

In [52]:
# Descending 
list ( collection.find( filters , fields ).sort('age', pymongo.DESCENDING) )

[{'name': 'Shelly Wilson', 'isActive': True, 'age': 39},
 {'name': 'Sheryl Hogan', 'isActive': False, 'age': 39},
 {'name': 'Shana Fry', 'isActive': False, 'age': 39},
 {'name': 'Shepherd Haynes', 'isActive': False, 'age': 38},
 {'name': 'Sharp Walker', 'isActive': False, 'age': 38},
 {'name': 'Shaffer Hopkins', 'isActive': True, 'age': 38},
 {'name': 'Shelley Cherry', 'isActive': True, 'age': 36},
 {'name': 'Sherman Gutierrez', 'isActive': False, 'age': 36},
 {'name': 'Sheri Jensen', 'isActive': False, 'age': 33},
 {'name': 'Shannon Burke', 'isActive': True, 'age': 33},
 {'name': 'Sheila Lynch', 'isActive': True, 'age': 31},
 {'name': 'Sherri Shepherd', 'isActive': True, 'age': 30},
 {'name': 'Sharon Grimes', 'isActive': True, 'age': 28},
 {'name': 'Sheena Spence', 'isActive': False, 'age': 28},
 {'name': 'Shari Henderson', 'isActive': True, 'age': 22},
 {'name': 'Shirley Blankenship', 'isActive': True, 'age': 21}]

## <br><br><br>  Important Operators : 

#### 1. Count

In [53]:
collection.count_documents({"age": 38})

49

#### 2. Maximum and Minimum 

In [54]:
# Maximum
print(list(collection.find({},{"_id": 0, "age": 1}).sort('age', pymongo.DESCENDING).limit(1)))
max( collection.distinct( "age" ))

[{'age': 40}]


40

In [55]:
# Minimum
print(list( collection.find({},{"_id": 0, "age": 1}).sort('age', pymongo.ASCENDING).limit(1)))
min( collection.distinct( "age" ))

[{'age': 20}]


20

In [56]:
agemale   = collection.find({"gender" : 'male'}).distinct( "age" )
agefemale = collection.find({"gender" : 'female'}).distinct( "age")

print ('Male -  Min age: ' + str(min(agemale)) + ' and Max age: ' + str(max(agemale)))
print ('Female -  Min age: ' + str(min(agefemale)) + ' and Max age: ' + str(max(agefemale)))

Male -  Min age: 20 and Max age: 40
Female -  Min age: 20 and Max age: 40


#### 3. Inclusion, exclusion operators IN and NIN

In [57]:
print( collection.count_documents( 
    { "name" : { "$in": [ "Kimberley Chase", "Kinney Wynn" ] }}
))    # includes the names in count

print( collection.count_documents( 
    { "name" : { "$nin": [ "Kimberley Chase", "Kinney Wynn" ] }}
))   # excludes the names in count

2
998


In [58]:
list( collection.find( { 
    "name" : { "$in": ["Kimberley Chase", "Kinney Wynn"] }
}, ''))      # includes the names in find

[{'_id': ObjectId('5c84a1b6a32d6bf367f8ed5f')},
 {'_id': ObjectId('5c84a1b6a32d6bf367f8eee8')}]

In [59]:
collection.count_documents({"age" : { "$nin" : [""] } })# all documents

1000

#### 4. Relational Operators

In [60]:
collection.count_documents({"age": {"$gte" : 38}}) # greater than or equal

152

In [61]:
collection.count_documents({"age": {"$gt" : 38}}) # greater than

103

In [62]:
collection.count_documents({"age": {"$eq" : 38}}) # equal

49

In [63]:
collection.count_documents({"age": {"$lt" : 38}}) # lower than

848

In [64]:
collection.count_documents({"age": {"$lte" : 38}}) # lower than or equal

897

#### 5. Logical Opeators

In [65]:
# AND Operator
filters = { "$and":[ {"name" : "Kinney Wynn"}, {"age": 22} ]}
fields = {}      # if fields are empty, it shows by default the id
list ( collection.find( filters , fields ))

[{'_id': ObjectId('5c84a1b6a32d6bf367f8ed5f')}]

In [66]:
# OR Operator
filters = {"$or":[ {"age" : 28}, {"age" : 29} ]}
fields = {}

collection.count_documents( filters , fields )

88

In [67]:
# AND + OR
filters = { "$and":[ 
                {"$or":[ {"name" : "Kinney Wynn"}, {"name" : "Kimberley Chase"}]},
                {"age": 22} 
            ]}
fields = {'name','age'}      # if fields are empty, it shows by default the id

list ( collection.find( filters , fields ))

[{'_id': ObjectId('5c84a1b6a32d6bf367f8ed5f'),
  'name': 'Kinney Wynn',
  'age': 22},
 {'_id': ObjectId('5c84a1b6a32d6bf367f8eee8'),
  'name': 'Kimberley Chase',
  'age': 22}]

In [68]:
# EXISTS + NOT EXISTS
print(collection.count_documents({'_id' : {'$exists' : 1}}))  # Counts all documents with attribute '_id'
# Counts documents without attribute 'age'
print(collection.count_documents({'age' : {'$exists' : 0}}))

1000
0


#### 6. Listing Items of a List 

In [69]:
# Count documents with age equal to 28, 29 or 30
print(collection.count_documents({'age' : {'$in': [ 28, 29, 30]}}))
# Count documents with age different to 28, 29 or 30
print(collection.count_documents({'age' : {'$nin': [ 28, 29, 30]}}))
# Count documents with favorite fruit different to banana and apple
print(collection.count_documents({'favoriteFruit' : {'$nin': [ 'banana', 'apple']}}))

126
874
323


## Indexing 

In [70]:
collection.index_information() # Shows the existing indexes

{'_id_': {'v': 2, 'key': [('_id', 1)], 'ns': 'people.addressbook'}}

In [71]:
collection.create_index([( "age" , pymongo.ASCENDING)]) # Returns the name of the index

'age_1'

In [72]:
collection.drop_index("age_1") # Drop the existing index