# Mongo tutorial

## Prerequisites

### Server status

Check the status of your Mongo cluster : `!sudo service mongodb status`

If the cluster is inactive, `!sudo service mongodb start` to start it.

In [3]:
# !sudo service mongodb status

489.67s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


[sudo] password for serendipita: 


### Import libraries

In [38]:
import datetime
from pprint import pprint

import pymongo
from pymongo import MongoClient

In [39]:
client = MongoClient('localhost', 27017)

In [40]:
# let's work in a test_database
db = client.test_database
posts = db.posts
users = db.users

In [41]:
post = {
    "author": "Mike",
    "text": "My first blog post!",
    "tags": ["mongodb", "python", "pymongo"],
    "date": datetime.datetime.utcnow()
}
post_id = posts.insert_one(post).inserted_id
post_id

ObjectId('631c5637c42f03d63340ccef')

In [42]:
db.list_collection_names()

['posts']

In [43]:
pprint(posts.find_one())

{'_id': ObjectId('631c5637c42f03d63340ccef'),
 'author': 'Mike',
 'date': datetime.datetime(2022, 9, 10, 9, 17, 43, 215000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


You can launch a terminal aside, connect to your server with a mongo client and check that the value is present :

```bash
vagrant@nosql:~$ mongo
> show databases;
admin          0.000GB
config         0.000GB
local          0.000GB
test_database  0.000GB
> use test_database;
switched to db test_database
> db.posts.find()
{ 
    "_id" : ObjectId("..."), 
    "author" : "Mike", 
    "text" : "My first blog post!", 
    "tags" : [ "mongodb", "python", "pymongo"], 
    "date" : ISODate("2019-02-10T11:33:47.883Z") 
}
```

## I. Quick start

### First steps

**Q** : Create a document `{msg: 'hello'}` in the `test` collection with `insert_one()`. Fetch it back to display it. What is the `_id` for ?

NB : if the collection doesn't exist yet, MongoDB automatically creates it.

In [44]:
#db = client.test_database
tests = db.test
test = {
    "msg": "hello",
    "date": datetime.datetime.utcnow()
}
test_id = tests.insert_one(test).inserted_id
test_id

ObjectId('631c5637c42f03d63340ccf0')

**Q**: Display the number of documents inside the `test` collection

In [45]:
db.list_collection_names()
pprint(tests.find_one())

{'_id': ObjectId('631c5637c42f03d63340ccf0'),
 'date': datetime.datetime(2022, 9, 10, 9, 17, 43, 510000),
 'msg': 'hello'}


### Interacting with a database

We have 2 `.json` files we want to interact with inside the `data` folder. Let's first dump them into a `MovieLens` database, inside `users` and `movies` collections.

For this section, you will need to read a bit on [query operators](https://docs.mongodb.com/manual/reference/operator/query/#query-selectors). Most methods on collections you will use have `filter` as a first parameter, on which you must pass a dictionary of query parameters.

**Q** : In the `MovieLens` database, load `data/movielens_movies.json` into `movies` and `data/movielens_users.json` into `users`. 

Use the dedicated shell command for this : `mongoimport --db <some_db> --collection <some_collection> --file <some_file>` 

In [46]:
#!sudo su

In [47]:
!mongoimport --db MovieLens --collection movies --file ../datasets/json/movielens_movies.json

2022-09-10T04:17:43.881-0500	connected to: mongodb://localhost/
2022-09-10T04:17:43.970-0500	continuing through error: E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 2 }
2022-09-10T04:17:43.970-0500	continuing through error: E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 10 }
2022-09-10T04:17:43.970-0500	continuing through error: E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 12 }
2022-09-10T04:17:43.970-0500	continuing through error: E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 13 }
2022-09-10T04:17:43.970-0500	continuing through error: E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 14 }
2022-09-10T04:17:43.970-0500	continuing through error: E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 16 }
2022-09-10T04:17:43.970-0500	continuing through error: E11000 duplicate k

In [48]:
!mongoimport --db MovieLens --collection users --file ../datasets/json/movielens_users.json

2022-09-10T04:17:44.504-0500	connected to: mongodb://localhost/
2022-09-10T04:17:46.641-0500	6040 document(s) imported successfully. 0 document(s) failed to import.


**Q** : how many users are in the `MovieLens` database ?

In [49]:
db = client.MovieLens
users = db.users
users.count_documents({})

movies = db.movies
movies.count_documents({})

3883

**Q** : Display all comedies (the `genres` property equals `Comedy`). 

NB : You will need to find how to go through a `command_cursor`, then use the `pprint` function for a better display of those documents.

In [50]:
movies = db.movies
# movies.find()
for movie in movies.find():
    if movie["genres"] == "Comedy":
        pprint(movie)

{'_id': 19,
 'genres': 'Comedy',
 'title': 'Ace Ventura: When Nature Calls (1995)'}
{'_id': 38, 'genres': 'Comedy', 'title': 'It Takes Two (1995)'}
{'_id': 5, 'genres': 'Comedy', 'title': 'Father of the Bride Part II (1995)'}
{'_id': 52, 'genres': 'Comedy', 'title': 'Mighty Aphrodite (1995)'}
{'_id': 63,
 'genres': 'Comedy',
 'title': "Don't Be a Menace to South Central While Drinking Your Juice in the "
          'Hood (1996)'}
{'_id': 65, 'genres': 'Comedy', 'title': 'Bio-Dome (1996)'}
{'_id': 69, 'genres': 'Comedy', 'title': 'Friday (1995)'}
{'_id': 88, 'genres': 'Comedy', 'title': 'Black Sheep (1996)'}
{'_id': 96, 'genres': 'Comedy', 'title': 'In the Bleak Midwinter (1995)'}
{'_id': 104, 'genres': 'Comedy', 'title': 'Happy Gilmore (1996)'}
{'_id': 109, 'genres': 'Comedy', 'title': 'Headless Body in Topless Bar (1995)'}
{'_id': 115, 'genres': 'Comedy', 'title': 'Happiness Is in the Field (1995)'}
{'_id': 119, 'genres': 'Comedy', 'title': 'Steal Big, Steal Little (1995)'}
{'_id': 101

**Q** : Fetch and display the `name` and `occupation` for Clifford Johnathan. The second paramater for `find()` ([doc here](https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find)) is called the `projection` and is used to limit which data to fetch from the query.

In [51]:
users = db.users
users = users.find({"name": "Clifford Johnathan"}, {"occupation":1})
for user in users:
    print(user.get("occupation"))
#users

technician/engineer


**Q**: How many minors (by `age`) have rated movies ?

In [52]:
users = db.users.find({"age": {"$lt": 18}})
# print(len(users["movies"]))
#print(len(users))

#pprint(users)
cont=0
for user in users:
    # pprint(user)
    rated_count = 0
    for movie in user["movies"]:
        if movie["rating"]:
            rated_count+=1
    if rated_count == len(user["movies"]):    
        cont+=1
    #pprint(user)
pprint(cont)
# “age”:{"$lt”:18}

222


**Q**: Display science fiction movies ('Sci-Fi') and suspense movies ('Thriller'). This time you need to use a regex to parse genres and look for those values.

In [53]:

movies = db.movies
movies = movies.collectionname.find({'genres':{'$regex':'*Sci-Fi|Thriller^'}})
movies
#for movie in movies:
#    pprint(movie)


<pymongo.cursor.Cursor at 0x7f703c146850>

§**Q**: If we want more advanced textual search, we need a particular index. Use the `create_index()` method to index as [TEXT](https://docs.mongodb.com/manual/core/index-text/) the `genres` field of the `movies` collection.

**Q**: Restart the search for science fiction and thriller movies with the operator `$text`

**Q**: Display the first 30 movies (`limit`) in alphabetical order (`sort`) by title

**Q**: How many users have seen the movie "Star Wars: Episode V - The Empire Strikes Back (1980)" (`_id 1196`) ? The `movies` argument is an array so we should try the [elemMatch](https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/) operator here.

**Q**: And how many gave it a rating of 1 or 2 ?

### Updating data

**Q**: Insert a new user with the properties `name`, `gender` ('M' or'F'), `occupation` and `age`, using the `insert_one()` command. Display it with `find_one()`.

**Q**: Add an appreciation on a viewed movie with `update_one()`, add the movies property containing a table with a document (`movieid`, `rating`, `timestamp` with the value `datetime.datetime.utcnow()`).

You will need to read the documentation on [update operators](https://docs.mongodb.org/manual/reference/operator/update/).

**Q**: Find the number of users who have declared a `programmer` occupation. Modify them so that they are `developer`. Verify your update.

## II. Modelling a blog

We will now model a blog using Mongo. 

First, switch to a new `Blog` database. Each blog post will have the following arguments:

* The author (author field, string type)
* The date (date field, string type in YYYY-MM-DD format)
* The content (field content)
* Tags (field tags, a string array)
* A list of comments (field comments) containing:
 * The author (author field, string type)
 * The date (date field, string type in YYYY-MM-DD format)
 * The content (field content)


**Q**: Create a first post by `rick`, on January 15th, with the tags `mongodb` and `nosql`.

**Q**: Create a second post by `kate`, on January 21, with the tag `nosql` and a comment from `rick` on the same day.

**Q**: Display the author of the last post with the tag `nosql`

**Q**: Add a comment by `jack` on January 25, to `kate`'s post

**Q**: Display all comments by `kate`

## Postquisites

In [54]:
!mongo test_database --eval 'db.dropDatabase()'

MongoDB shell version v4.4.1
connecting to: mongodb://127.0.0.1:27017/test_database?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("57ebf7ea-8f73-4d51-a7b2-f79458f28451") }
MongoDB server version: 4.4.1
{ "dropped" : "test_database", "ok" : 1 }


In [55]:
!mongo MovieLens --eval 'db.dropDatabase()'

MongoDB shell version v4.4.1
connecting to: mongodb://127.0.0.1:27017/MovieLens?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("8a2bd98e-7afd-449c-a819-26fe21fa025f") }
MongoDB server version: 4.4.1
{ "dropped" : "MovieLens", "ok" : 1 }


In [56]:
!mongo Blog --eval 'db.dropDatabase()'

MongoDB shell version v4.4.1
connecting to: mongodb://127.0.0.1:27017/Blog?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("044feb98-d239-43b1-9269-bc7913b0d9d8") }
MongoDB server version: 4.4.1
{ "ok" : 1 }
