### MongoDB

 - JSON is the basis of mongoDB's data format
 - JSON has two collection structures
     - objects map key strings to values
     - arrays order values
 - Values are strings, numbers, true, false, null or another object or arrays
 - JSON data types have equivalent in Python as such
 

<img src="assets/mongodb/python_json_maps.png" style="width: 300px;"/>
    
  - JSON/Python data types are expressed in mongoDB as follows:
  - A database maps names to collections. You can access collections by name the same way you can access a Python dictionary.
  - A collection is like a list of dictionaries, called documents by mongoDB
  - When a dictionary is a value within a document, it's called a sub-document
  - Values in a document can be any of the above types including dates or regular expressions

<img src="assets/mongodb/python_mongodb_maps.png" style="width: 600px;"/>

  - Access databases by name as attributes of the client, eg client.my_database
  - Access collections by name as attributes of databases, eg my_database.my_collection

In [7]:
import requests
from pymongo import MongoClient

# Client connects to "localhost" by default
client = MongoClient()


In [6]:
# Create local "nobel" database on the fly
db = client["nobel"]

for collection_name in ["prizes", "laureates"]:

    # collect the data from the API
    response = requests.get("http://api.nobelprize.org/v1/{}.json".format(collection_name[:-1] ))
    
    # convert the data to json
    documents = response.json()[collection_name]
    
    # Create collections on the fly
    db[collection_name].insert_many(documents)

In [None]:
# Save a list of names of the databases managed by client
db_names = client.list_database_names()
print(db_names)

# Save a list of names of the collections managed by the "nobel" database
nobel_coll_names = client.nobel.list_collection_names()
print(nobel_coll_names)

In [None]:
# Connect to the "nobel" database
db = client.nobel

# Retrieve sample prize and laureate documents
prize = db.prizes.find_one()
laureate = db.laureates.find_one()

# Print the sample prize and laureate documents
print(prize)
print(laureate)
print(type(laureate))

# Get the fields present in each type of document
prize_fields = list(prize.keys())
laureate_fields = list(laureate.keys())

print(prize_fields)
print(laureate_fields)

#### Comparisons and filtering

In [None]:
# Greater than
db.laureates.count_documents({'born':{'$gt':'1700'}})

# Less than
db.laureates.count_documents({'born':{'$lt':'1700'}})

# Create a filter for Germany-born laureates who died in the USA and with the first name "Albert"
criteria = {'firstname':'Albert', 
            'bornCountry': 'Germany', 
            'diedCountry': 'USA'}

# Save the count
count = db.laureates.count_documents(criteria)
print(count)

In [None]:
# Save a filter for laureates born in the USA, Canada, or Mexico
criteria = { 'bornCountry': 
                { "$in": ['USA','Canada','Mexico']}
             }

# Count them and save the count
count = db.laureates.count_documents(criteria)
print(count)

In [None]:
# Save a filter for laureates who died in the USA and were not born there
criteria = { 'diedCountry': 'USA',
               'bornCountry': { "$ne": 'USA'}, 
             }

# Count them
count = db.laureates.count_documents(criteria)
print(count)

#### Dot notation
 - lets us reach the document's substructure
 - full path to a field from the document's root

In [None]:
# Filter for laureates born in Austria with non-Austria prize affiliation
criteria = {'bornCountry': 'Austria', 
              'prizes.affiliations.country': {"$ne": 'Austria'}}

# Count the number of such laureates
count = db.laureates.count_documents(criteria)
print(count)

In [None]:
# Filter for documents without a "born" field
criteria = {"born": {"$exists": False}}

# Save count
count = db.laureates.count_documents(criteria)
print(count)

In [None]:
# Filter for laureates with at least three prizes
criteria = {"prizes.2": {'$exists': True}}

# Find one laureate with at least three prizes
doc = db.laureates.find_one(criteria)

# Print the document
print(doc)