# MongoDB

Based on https://docs.mongodb.com/getting-started 


MongoDB is a **NoSQL** open-source **document database**.  MongoDB provides horizontal scaling by replicating and partitioning the data over multiple nodes. This can improve the reliability and scalability of the system.

A record in MongoDB is a **document**, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects or Python dictionaries. The values of fields may include other documents, arrays, and arrays of documents.

This is an example of a document:
```JSON
{
   "_id" : ObjectId("54c955492b7c8eb21818bd09"),
   "address" : {
      "street" : "2 Avenue",
      "zipcode" : "10075",
      "building" : "1480",
      "coord" : [ -73.9557413, 40.7720266 ]
   },
   "borough" : "Manhattan",
   "cuisine" : "Italian",
   "grades" : [
      {
         "date" : ISODate("2014-10-01T00:00:00Z"),
         "grade" : "A",
         "score" : 11
      },
      {
         "date" : ISODate("2014-01-16T00:00:00Z"),
         "grade" : "B",
         "score" : 17
      }
   ],
   "name" : "Vella",
   "restaurant_id" : "41704620"
}
```
In MongoDB, documents have a unique **_id** field that acts as a primary key. MongoDB automatically adds a unique _id to each document if you are not providing it by yourself.

MongoDB stores documents in **collections**. Collections are analogous to tables in relational databases. Unlike a table, however, a collection does not require its documents to have the same schema.

You can start a Docker image with MongoDB like this:
```bash
docker run -p 27017:27017 -d mongo
```

In production you really (!) would need to enable authentication with username and password, but for development purposes this is fine.

In [10]:
# Install the pymongo Python Package 
# !pip3 install pymongo

In [11]:
from pymongo import MongoClient
from pprint import pprint
import requests 

Use MongoClient to create a connection. If you do not specify any arguments to MongoClient, then MongoClient defaults to the MongoDB instance that runs on the localhost interface on port 27017. You can also specify a complete MongoDB URI to define the connection, including explicitly specifying the host and port number. For example, the following creates a connection to a MongoDB instance that runs on mongodb0.example.net and the port of 27017: client = MongoClient("mongodb://mongodb0.example.net:27017")

In [12]:
# Client connects to "localhost" by default 
client = MongoClient()

The first fundamental class of objects you will interact with using pymongo is Database which represents the database construct in MongoDB. Databases hold groups of logically related collections. MongoDB creates new databases implicitly upon their first use. Connect (create) with a database of your name, e.g. 

```python
db = client["rolandmueller"]
``` 
or 

```python
db = client.rolandmueller
```

In [13]:
# Create local "bipm" database on the fly 
db = client['bipm']

In [14]:
# When we rerun the whole notebook, we start from scratch 
# by dropping the colection "courses"
db.courses.drop()

In [15]:
# Create a Python Dictonary
courses = [
    {'title': 'Data Science',
     'lecturer': {
         'name': 'Markus Löcher',
         'department': 'Math',
         'status': 'Professor'
     },
     'semester': 1},
    {'title': 'Data Warehousing',
     'lecturer': {
         'name': 'Roland M. Mueller',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 1},
    {'title': 'Business Process Management',
     'lecturer': {
         'name': 'Frank Habermann',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 1},
    {'title': 'Stratigic Issues of IT',
     'lecturer': {
         'name': 'Sven Pohland',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 1},
    {'title': 'Text, Web and Social Media Analytics Lab',
     'lecturer': {
         'name': 'Markus Löcher',
         'department': 'Math',
         'status': 'Professor'
     },
     'semester': 2},
    {'title': 'Enterprise Architectures for Big Data',
     'lecturer': {
         'name': 'Roland M. Mueller',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 2},
    {'title': 'Business Process Integration Lab',
     'lecturer': {
         'name': 'Frank Habermann',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 2},
    {'title': 'IT-Security and Privacy',
     'lecturer': {
         'name': 'Dennis Uckel',
         'department': 'Information Systems',
         'status': 'External'
     },
     'semester': 2},
    {'title': 'Research Methods',
     'lecturer': {
         'name': 'Marcus Birkenkrahe',
         'department': 'Information Systems',
         'status': 'Professor'
     },
     'semester': 2},
]

In [16]:
pprint(courses)

[{'lecturer': {'department': 'Math',
               'name': 'Markus Löcher',
               'status': 'Professor'},
  'semester': 1,
  'title': 'Data Science'},
 {'lecturer': {'department': 'Information Systems',
               'name': 'Roland M. Mueller',
               'status': 'Professor'},
  'semester': 1,
  'title': 'Data Warehousing'},
 {'lecturer': {'department': 'Information Systems',
               'name': 'Frank Habermann',
               'status': 'Professor'},
  'semester': 1,
  'title': 'Business Process Management'},
 {'lecturer': {'department': 'Information Systems',
               'name': 'Sven Pohland',
               'status': 'Professor'},
  'semester': 1,
  'title': 'Stratigic Issues of IT'},
 {'lecturer': {'department': 'Math',
               'name': 'Markus Löcher',
               'status': 'Professor'},
  'semester': 2,
  'title': 'Text, Web and Social Media Analytics Lab'},
 {'lecturer': {'department': 'Information Systems',
               'name': 'Roland M. Mu

## insert_many()

You can use the `insert_one()` method and the `insert_many()` method to add documents to a collection in MongoDB. If you attempt to add documents to a collection that does not exist, MongoDB will create the collection for you.

In [17]:
db.courses.insert_many(courses)

<pymongo.results.InsertManyResult at 0x199b29f3388>

## find()

You can use the find() method to issue a query to retrieve data from a collection in MongoDB. All queries in MongoDB have the scope of a single collection.
Queries can return all documents in a collection or only the documents that match a specified filter or criteria. You can specify the filter or criteria in a document and pass as a parameter to the find() method. With no parameter, find() returns all documents in the collection.

The find() method returns query results in a cursor, which is an iterable object that yields documents. Then you can print all documents.

```python
cursor = db.my_collection.find()

for document in cursor:
    pprint(document)
```


### Find the name of the course:

#### Loop throught the list and find title in the dictionary

In [39]:
type(courses)

list

In [40]:
course_list = []
for index in courses:
    for key in index.keys():
        if key == "title":
            course_list.append(index["title"])
pprint(course_list)

['Data Science',
 'Data Warehousing',
 'Business Process Management',
 'Stratigic Issues of IT',
 'Text, Web and Social Media Analytics Lab',
 'Enterprise Architectures for Big Data',
 'Business Process Integration Lab',
 'IT-Security and Privacy',
 'Research Methods']


#### Use mongo find function

In [49]:
course_names = db.courses.find()

for doc in course_names:
    print(doc["title"])

Data Science
Data Warehousing
Business Process Management
Stratigic Issues of IT
Text, Web and Social Media Analytics Lab
Enterprise Architectures for Big Data
Business Process Integration Lab
IT-Security and Privacy
Research Methods


## JSON

You can store a JSON document if you convert it before to a Python dictionary:

In [51]:
import json

In [52]:
my_json = '{"title": "Master Thesis", "semester": 3}'
another_course = json.loads(my_json)
another_course

{'title': 'Master Thesis', 'semester': 3}

## insert_one()

The `insert_one()` method adds the document into the collection.


In [53]:
# TODO: Store `another_course` as another course:
db.courses.insert_one(another_course)

<pymongo.results.InsertOneResult at 0x199b697acc8>

In [54]:
# TODO: Print all courses
course_names = db.courses.find()

for doc in course_names:
    print(doc["title"])

Data Science
Data Warehousing
Business Process Management
Stratigic Issues of IT
Text, Web and Social Media Analytics Lab
Enterprise Architectures for Big Data
Business Process Integration Lab
IT-Security and Privacy
Research Methods
Master Thesis


## find_one() and find()

`find_one()` returns the first match. ```find()```returns all matches.

The query condition for `find_one()` and `find()` for an equality match on fields has the following form:
```python
{ <field1>: <value1>, <field2>: <value2>, ... } 
```

The following operation finds the first documents whose name field equals "Manhattan".

```python
cursor = db.restaurants.find_one({"name": "Manhattan"})
```


In [59]:
# TODO: Find the course with the title "Data Science" 
# save the result in a varibale result
# and pprint the result.
result = db.courses.find_one({"title": "Data Science"})
pprint(result)

{'_id': ObjectId('5ed6c9e767601b13f002316d'),
 'lecturer': {'department': 'Math',
              'name': 'Markus Löcher',
              'status': 'Professor'},
 'semester': 1,
 'title': 'Data Science'}


In [60]:
print(result["_id"])
print(result["lecturer"]["name"])

5ed6c9e767601b13f002316d
Markus Löcher


In [69]:
# TODO: Find the first course (one course) in the second semester
# and print it
first_second_semester_course = db.courses.find_one({'semester': 2})
pprint(first_second_semester_course)

{'_id': ObjectId('5ed6c9e767601b13f0023171'),
 'lecturer': {'department': 'Math',
              'name': 'Markus Löcher',
              'status': 'Professor'},
 'semester': 2,
 'title': 'Text, Web and Social Media Analytics Lab'}


In [70]:
# TODO: Find all courses in the second semester
# and print the course titles
second_semester_courses = db.courses.find({'semester': 2})

for course in second_semester_courses:
    pprint(course)

{'_id': ObjectId('5ed6c9e767601b13f0023171'),
 'lecturer': {'department': 'Math',
              'name': 'Markus Löcher',
              'status': 'Professor'},
 'semester': 2,
 'title': 'Text, Web and Social Media Analytics Lab'}
{'_id': ObjectId('5ed6c9e767601b13f0023172'),
 'lecturer': {'department': 'Information Systems',
              'name': 'Roland M. Mueller',
              'status': 'Professor'},
 'semester': 2,
 'title': 'Enterprise Architectures for Big Data'}
{'_id': ObjectId('5ed6c9e767601b13f0023173'),
 'lecturer': {'department': 'Information Systems',
              'name': 'Frank Habermann',
              'status': 'Professor'},
 'semester': 2,
 'title': 'Business Process Integration Lab'}
{'_id': ObjectId('5ed6c9e767601b13f0023174'),
 'lecturer': {'department': 'Information Systems',
              'name': 'Dennis Uckel',
              'status': 'External'},
 'semester': 2,
 'title': 'IT-Security and Privacy'}
{'_id': ObjectId('5ed6c9e767601b13f0023175'),
 'lecturer': {'de

In [75]:
# TODO: Find all courses in the second semester 
# and print the lecturers names
second_semester_courses = db.courses.find({'semester': 2})

for course in second_semester_courses:
    pprint(course["lecturer"]["name"])

'Markus Löcher'
'Roland M. Mueller'
'Frank Habermann'
'Dennis Uckel'
'Marcus Birkenkrahe'


## Subelements

Sometimes documents contains embedded documents as its elements. To specify a condition on a field in these documents, use the dot notation. Dot notation requires quotes around the whole dotted field name. The following queries for documents whose grades array contains an embedded document with a field grade equal to "B".

```python
cursor = db.restaurants.find({"grades.grade": "B"})
```

In [84]:
# TODO: Find all courses of Frank Habermann
# and print the title and the semester
courses_Frank_Habermann = db.courses.find({"lecturer.name": "Frank Habermann"})

for course in courses_Frank_Habermann:
    print(course['_id'], course["title"])

5ed6c9e767601b13f002316f Business Process Management
5ed6c9e767601b13f0023173 Business Process Integration Lab


## Logical AND

You can specify a logical conjunction (AND) for a list of query conditions by separating the conditions with a comma in the conditions document.

```python
cursor = db.restaurants.find({"cuisine": "Italian", "address.zipcode": "10075"})
```

In [85]:
# TODO: Find all courses from Frank Habermann in the second semester
# and print the title and the semester
courses_second_semeste_Frank_Habermann = db.courses.find({"lecturer.name": "Frank Habermann",'semester': 2})

for course in courses_second_semeste_Frank_Habermann:
    print(course['_id'], course["title"])

5ed6c9e767601b13f0023173 Business Process Integration Lab


## Logical OR

You can specify a logical disjunction (OR) for a list of query conditions by using the $or query operator.

```python
cursor = db.restaurants.find({"$or": [{"cuisine": "Italian"}, {"address.zipcode": "10075"}]})
```


In [92]:
# TODO: Find all courses from Frank Habermann or Markus Löcher
# and print the title and the semester

courses_profs_Löcher_Habermann = db.courses.find({"$or": [{"lecturer.name": "Frank Habermann"},{"lecturer.name": "Markus Löcher"}]})

for course in courses_profs_Löcher_Habermann:
    print("prof:",course["lecturer"]["name"], "course:",course['_id'], course["title"])

prof: Markus Löcher course: 5ed6c9e767601b13f002316d Data Science
prof: Frank Habermann course: 5ed6c9e767601b13f002316f Business Process Management
prof: Markus Löcher course: 5ed6c9e767601b13f0023171 Text, Web and Social Media Analytics Lab
prof: Frank Habermann course: 5ed6c9e767601b13f0023173 Business Process Integration Lab


## Greater than, Less than

MongoDB provides operators to specify query conditions, such as comparison operators. Query conditions using operators generally have the following form:
```python
{ <field1>: { <operator1>: <value1> } }
```

Greater Than Operator (`$gt`). Query for documents whose grades array contains an embedded document with a field score greater than 30.

```python
cursor = db.restaurants.find({"grades.score": {"$gt": 30}})
```

Less Than Operator (`$lt`). Query for documents whose grades array contains an embedded document with a field score less than 10.

```python
cursor = db.restaurants.find({"grades.score": {"$lt": 10}})
```



In [94]:
# TODO: Find all courses in semester greater than 1
# and print the title and the semester
corses_semester_greater_than1 = db.courses.find({"semester": {"$gt": 1}})

for course in corses_semester_greater_than1:
    pprint(course)

{'_id': ObjectId('5ed6c9e767601b13f0023171'),
 'lecturer': {'department': 'Math',
              'name': 'Markus Löcher',
              'status': 'Professor'},
 'semester': 2,
 'title': 'Text, Web and Social Media Analytics Lab'}
{'_id': ObjectId('5ed6c9e767601b13f0023172'),
 'lecturer': {'department': 'Information Systems',
              'name': 'Roland M. Mueller',
              'status': 'Professor'},
 'semester': 2,
 'title': 'Enterprise Architectures for Big Data'}
{'_id': ObjectId('5ed6c9e767601b13f0023173'),
 'lecturer': {'department': 'Information Systems',
              'name': 'Frank Habermann',
              'status': 'Professor'},
 'semester': 2,
 'title': 'Business Process Integration Lab'}
{'_id': ObjectId('5ed6c9e767601b13f0023174'),
 'lecturer': {'department': 'Information Systems',
              'name': 'Dennis Uckel',
              'status': 'External'},
 'semester': 2,
 'title': 'IT-Security and Privacy'}
{'_id': ObjectId('5ed6c9e767601b13f0023175'),
 'lecturer': {'de

## Counting

`count_documents()` works like `find()` but returns the number of matched documents-

In [97]:
# TODO: How many courses are in the second semester?
num_courses = db.courses.count_documents({"semester": {"$gt": 1}})
print("Courses in the 2nd and 3rd semester:")
pprint(num_courses)

Courses in the 2nd and 3rd semester:
6


# Downloading Nobel Prize Winners with an API and storing them in MongoDB

![](https://upload.wikimedia.org/wikipedia/en/e/ed/Nobel_Prize.png)
The Nobel Prize offers a Web API https://nobelprize.readme.io/docs/prize

Because the API is giving us JSON and MongoDB is able to store documents in a JSON-like format, using a document database like MongoDB seems like a good fit to store the results of the API.  You can get all laureates at http://api.nobelprize.org/v1/laureate.json and all prizes at http://api.nobelprize.org/v1/prize.json

We will just download all laureates and prizes and store them in MongoDB!

In [98]:
# Create local "nobel" database on the fly 
db = client["nobel"]
db.prizes.drop()
db.laureates.drop()
# API documented at https://nobelprize.readme.io/docs/prize 
for collection_name in ["prizes", "laureates"]:
    singular = collection_name[:-1] # the API uses singular
    response = requests.get( "http://api.nobelprize.org/v1/{}.json".format(singular)) 
    documents = response.json()[collection_name] 
    # Create collections on the fly 
    db[collection_name].insert_many(documents)

In [99]:
pprint(db.laureates.find_one())

{'_id': ObjectId('5ed7181a67601b13f00233fd'),
 'born': '1845-03-27',
 'bornCity': 'Lennep (now Remscheid)',
 'bornCountry': 'Prussia (now Germany)',
 'bornCountryCode': 'DE',
 'died': '1923-02-10',
 'diedCity': 'Munich',
 'diedCountry': 'Germany',
 'diedCountryCode': 'DE',
 'firstname': 'Wilhelm Conrad',
 'gender': 'male',
 'id': '1',
 'prizes': [{'affiliations': [{'city': 'Munich',
                               'country': 'Germany',
                               'name': 'Munich University'}],
             'category': 'physics',
             'motivation': '"in recognition of the extraordinary services he '
                           'has rendered by the discovery of the remarkable '
                           'rays subsequently named after him"',
             'share': '1',
             'year': '1901'}],
 'surname': 'Röntgen'}


In [106]:
# TODO: Print the first name of the first document
first_laureate = db.laureates.find_one()

pprint(first_laureate["firstname"])

'Wilhelm Conrad'


With `count_documents` you can count the number of matching documents. 

In [107]:
# How many female laureates exists? 
female_laureates = db.laureates.count_documents({'gender': 'female'})
print("female laureates:")
pprint(female_laureates)

female laureates:
53


With the `$regex` function you can use a regular expression. `distinct` list only all distinct entries.

In [108]:
db.laureates.distinct("bornCountry", {"bornCountry": {"$regex": "Germany"}})

['Bavaria (now Germany)',
 'East Friesland (now Germany)',
 'Germany',
 'Germany (now France)',
 'Germany (now Poland)',
 'Germany (now Russia)',
 'Hesse-Kassel (now Germany)',
 'Mecklenburg (now Germany)',
 'Prussia (now Germany)',
 'Schleswig (now Germany)',
 'West Germany (now Germany)',
 'Württemberg (now Germany)']

In [131]:
# TODO: How many laureates are from Germany?
from_germany =  db.laureates.count_documents({"bornCountry": {"$in": [db.laureates.distinct("bornCountry", {"bornCountry": {"$regex": "Germany"}})]}})

## I tried couple of different things but I couldn't make it work
print(from_germany)

0


In [50]:
# TODO: Find all physics nobel laureates that are from Germany
# print the year of the first prize, the first name, and surename

In [135]:
# TODO: find and print the document for "Malala" (firstname)
malala = db.laureates.find({ 'firstname': 'Malala'})

for doc in malala:
    pprint(doc)

{'_id': ObjectId('5ed7181a67601b13f002376d'),
 'born': '1997-07-12',
 'bornCity': 'Mingora',
 'bornCountry': 'Pakistan',
 'bornCountryCode': 'PK',
 'died': '0000-00-00',
 'firstname': 'Malala',
 'gender': 'female',
 'id': '914',
 'prizes': [{'affiliations': [[]],
             'category': 'peace',
             'motivation': '"for their struggle against the suppression of '
                           'children and young people and for the right of all '
                           'children to education"',
             'share': '2',
             'year': '2014'}],
 'surname': 'Yousafzai'}


apparently we have only one doc so we could also use:

In [136]:
malala = db.laureates.find_one({ 'firstname': 'Malala'})
pprint(malala)

{'_id': ObjectId('5ed7181a67601b13f002376d'),
 'born': '1997-07-12',
 'bornCity': 'Mingora',
 'bornCountry': 'Pakistan',
 'bornCountryCode': 'PK',
 'died': '0000-00-00',
 'firstname': 'Malala',
 'gender': 'female',
 'id': '914',
 'prizes': [{'affiliations': [[]],
             'category': 'peace',
             'motivation': '"for their struggle against the suppression of '
                           'children and young people and for the right of all '
                           'children to education"',
             'share': '2',
             'year': '2014'}],
 'surname': 'Yousafzai'}


## Sort()

With `sort()` you can sort the list of documents. The parameter of sort is a list of sorting tuples. Each tuple is a value and an integer value 1 or -1 which states whether the collection to be sorted in ascending (1) or descending (-1) order.

Sort all restaurants according to the grade in ascending order.
```python
cursor = db.restaurants.find().sort([("grades.grade", 1)])
```

In [161]:
# TODO: Find only female nobel laureates 
# and sort them according the the prize year in ascending order
# print year of the first prize, firstname, and surename

female_laureates = db.laureates.find({'gender': 'female'}).sort([("prizes.year", 1)])

for laureate in female_laureates:
    print(laureate['prizes'][0]["year"], laureate['firstname'], laureate['surname'])

1903 Marie Curie
1905 Bertha von Suttner
1909 Selma Lagerlöf
1926 Grazia Deledda
1928 Sigrid Undset
1931 Jane Addams
1935 Irène Joliot-Curie
1938 Pearl Buck
1945 Gabriela Mistral
1946 Emily Greene Balch
1947 Gerty Cori
1963 Maria Goeppert Mayer
1964 Dorothy Crowfoot Hodgkin
1966 Nelly Sachs
1976 Betty Williams
1976 Mairead Corrigan
1977 Rosalyn Yalow
1979 Anjezë Gonxhe Bojaxhiu
1982 Alva Myrdal
1983 Barbara McClintock
1986 Rita Levi-Montalcini
1988 Gertrude B. Elion


KeyError: 'surname'

I don't understand the error here