A few things you should keep in mind when working on assignments:

1. Make sure you fill in any place that says `YOUR CODE HERE`. Do **not** write your answer in anywhere else other than where it says `YOUR CODE HERE`. Anything you write anywhere else will be removed or overwritten by the autograder.

2. Before you submit your assignment, make sure everything runs as expected. Go to menubar, select _Kernel_, and restart the kernel and run all cells (_Restart & Run all_).

3. Do not change the title (i.e. file name) of this notebook.

4. Make sure that you save your work (in the menubar, select _File_ → _Save and CheckPoint_)

5. You are allowed to submit an assignment multiple times, but only the most recent submission will be graded.

# Problem 2. MongoDB

In this problem, we work with MongoDB from a Python program by using the `pymongo` database driver.

In [None]:
from pymongo import MongoClient
import random

from nose.tools import assert_equal, assert_true, assert_is_instance

Suppose we are given the following table:

<div class="row">
    <div class="col-md-2">
      <div align="center">
        <b>Midterm</b>
      </div>
    <table>
  <tr>
    <th>Id</th>
    <th>Name</th> 
    <th>Score</th>
  </tr>
  <tr>
    <td>1</td>
    <td>Alice</td> 
    <td>97.3</td>
  </tr>
  <tr>
    <td>2</td>
    <td>Bob</td> 
    <td>87.7</td>
  </tr>
  <tr>
    <td>3</td>
    <td>Chris</td> 
    <td>91.5</td>
  </tr>
</table>
    </div>
</div>

We can represent this table as a list of dictionaries, as shown in the following code cell.

In [None]:
midterm = [
    {"Id": 1, "Name": "Alice", "Score": 97.3},
    {"Id": 2, "Name": "Bob", "Score": 87.7},
    {"Id": 3, "Name": "Chris", "Score": 91.5}
]

In this problem, we will store this data as documents in a MongoDB collection. We need a MongoDB server, so let's connect to the course MongoDB cloud computing system.

In [None]:
client = MongoClient("mongodb://192.168.100.23:27017")

Since we are using a shared resource without authentication, we use your netid to create a database for each student.

In [None]:
# Filename containing user's netid
fname = '/home/data_scientist/users.txt'
with open(fname, 'r') as fin:
    netid = fin.readline().rstrip()

dbname = 'assignment-{0}'.format(netid)
print("Database name: {}".format(dbname))

## Inserting Data

- Write a function that adds new documents `data` to our MongoDB collection.
- `insert_data` takes two arguments: `collection` and `data`.
- `collection` is a `pymongo.collection.Collection` instance, e.g. `client[dbname]["Midterm"]`.
- `data` is a list of dictionaries.
- The `insert_data()` function returns `None` if successfully completed.

Hints:
- The `collection` paramter (or `client[dbname]["Midterm"]` when the function is executed with `insert_data(client[dbname]["Midterm"], midterm)`) is a MongoDB collection. This instance of MongoDB collection has [insert_one()](http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.insert_one) and [insert_many()](http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.insert_many) methods. Use `insert_one()` or `insert_many()`. The [Introduction to MongoDB notebook](https://github.com/UI-DataScience/accy571-fa16/blob/master/Week13/notebooks/intro2mongodb.ipynb) has some examples.

In [None]:
def insert_data(collection, data):
    # YOUR CODE HERE

In [None]:
# we will delete our database if it exists before recreating
if dbname in client.database_names():
    client.drop_database(dbname)
    
# we now run our function to insert data
insert_data(client[dbname]["Midterm"], midterm)

print(
    "Existing databases:",
    [name for name in client.database_names() if netid in name]
)
print("Existing collections:", client[dbname].collection_names())
print("Number of documents:", client[dbname]["Midterm"].count())
for student in client[dbname]["Midterm"].find():
    print(student)

In [None]:
client.drop_database(dbname)

db_midterm = client[dbname]["Midterm"]
insert_data(db_midterm, midterm)

assert_true(dbname in client.database_names())
assert_true("Midterm" in client[dbname].collection_names())

assert_equal(db_midterm.count(), len(midterm))

for m in midterm:
    row = db_midterm.find_one({"Id": m["Id"]})
    assert_equal(row["Name"], m["Name"])
    assert_equal(row["Score"], m["Score"])
    
# extra test
test_data = [
    {"Id": i,
     "Name": ''.join(random.choice("abcdefghijklmnopqrstuv") for _ in range(5)),
     "Score": random.randint(0, 100)}
    for i in range(10)
]

db_test = client[dbname]["test0"]
insert_data(db_test, test_data)

assert_equal(db_test.count(), len(test_data))
for d in test_data:
    row = db_test.find_one({"Id": d["Id"]})
    assert_equal(row["Name"], d["Name"])
    assert_equal(row["Score"], d["Score"])

## Advanced Querying

- Write a function that finds and returns the names of all students whose midterm score is greater than or equal to 90.
- `query()` takes one argument, `collection`.
- `collection` is a `pymongo.collection.Collection` instance, e.g. `client[dbname]["Midterm"]`.
- The `query()` function returns a list of strings.
- For example, if we search the `Midterm` collection,
  ```python
  >>> a_students = query(client[dbname]["Midterm"])
  >>> print(a_students)
  ```
  we should get
  ```
  ['Alice', 'Chris']
  ```
  
Hints:
- To query a mongoDB database in `pymongo`, we use the [find()](http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find) method.
- You will also need to use the [query modifiers](https://docs.mongodb.com/manual/reference/operator/query/). The [Introduction to MongoDB notebook](https://github.com/UI-DataScience/accy571-fa16/blob/master/Week13/notebooks/intro2mongodb.ipynb) has some examples, as well as the [mongoDB documentation](https://docs.mongodb.com/manual/reference/operator/query/gte/#op._S_gte).

In [None]:
def query(collection):
    # YOUR CODE HERE

In [None]:
a_students = query(client[dbname]["Midterm"])
print(a_students)

In [None]:
answer = sorted([m['Name'] for m in midterm if m["Score"] >= 90.0])
assert_equal(len(a_students), len(answer))
assert_equal(set(a_students), set(answer))

# extra test
test_names = query(client[dbname]["test0"])
test_answer = sorted([d["Name"] for d in test_data if d["Score"] >= 90.0])
assert_equal(len(test_names), len(test_answer))
assert_equal(set(test_names), set(test_answer))

## Cleanup

When you are done or if you want to start over with a clean database, run the following code cell.

In [None]:
if dbname in client.database_names():
    client.drop_database(dbname)
    
assert_true(dbname not in client.database_names())