# Mongo tutorial

## Prerequisites

### Documentation

You will find all documentation for :
* [Mongo commands](https://docs.mongodb.com/manual/reference/)
* [Mongo python client](http://api.mongodb.com/python/current/api/pymongo/mongo_client.html#pymongo.mongo_client.MongoClient)

### Import libraries

In [1]:
import datetime
from pprint import pprint

import pymongo
from pymongo import MongoClient

In [2]:
client = MongoClient('localhost', 27017)

In [3]:
# let's work in a test_database
db = client.test_database
posts = db.posts

In [4]:
post = {
    "author": "Mike",
    "text": "My first blog post!",
    "tags": ["mongodb", "python", "pymongo"],
    "date": datetime.datetime.utcnow()
}
post_id = posts.insert_one(post).inserted_id
post_id

ObjectId('66b91e4d06a099c393b51db4')

In [5]:
db.list_collection_names()

['posts']

In [6]:
pprint(posts.find_one())

{'_id': ObjectId('66b91e4d06a099c393b51db4'),
 'author': 'Mike',
 'date': datetime.datetime(2024, 8, 11, 20, 25, 49, 300000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


You can launch a terminal aside, connect to your server with a mongo client and check that the value is present :

```bash
vagrant@nosql:~$ mongo
> show databases;
admin          0.000GB
config         0.000GB
local          0.000GB
test_database  0.000GB
> use test_database;
switched to db test_database
> db.posts.find()
{ 
    "_id" : ObjectId("..."), 
    "author" : "Mike", 
    "text" : "My first blog post!", 
    "tags" : [ "mongodb", "python", "pymongo"], 
    "date" : ISODate("2019-02-10T11:33:47.883Z") 
}
```

## I. Quick start

### First steps

**Q** : Create a document `{msg: 'hello'}` in the `test` collection with `insert_one()`. Fetch it back to display it. What is the `_id` for ?

NB : if the collection doesn't exist yet, MongoDB automatically creates it.

In [7]:
test = db.test
test_id = test.insert_one({"msg":'hello'}).inserted_id
test_id

ObjectId('66b91e5806a099c393b51db5')

In [8]:
db.list_collection_names()

['posts', 'test']

**Q**: Display the number of documents inside the `test` collection

In [9]:
test.find_one()

{'_id': ObjectId('66b91e5806a099c393b51db5'), 'msg': 'hello'}

In [10]:
len(test.find_one())

2

### Interacting with a database

We have 2 `.json` files we want to interact with inside the `data` folder. Let's first dump them into a `MovieLens` database, inside `users` and `movies` collections.

For this section, you will need to read a bit on [query operators](https://docs.mongodb.com/manual/reference/operator/query/#query-selectors). Most methods on collections you will use have `filter` as a first parameter, on which you must pass a dictionary of query parameters.

**Q** : In the `MovieLens` database, load `data/movielens_movies.json` into `movies` and `data/movielens_users.json` into `users`. 

Use the dedicated shell command for this : `mongoimport --db <some_db> --collection <some_collection> --file <some_file>` 

In [13]:
db = client["movielens"]

In [14]:
movies = db['movies']

In [15]:
users = db['users']

In [16]:
# import json

In [11]:
!mongoimport --db movielens --collection movies --file data/movielens_movies.json

2024-08-11T22:26:19.158+0200	connected to: localhost
2024-08-11T22:26:19.362+0200	imported 3883 documents


In [12]:
!mongoimport --db movielens --collection users --file data/movielens_users.json

2024-08-11T22:26:26.245+0200	connected to: localhost
2024-08-11T22:26:27.422+0200	imported 6040 documents


In [16]:
users.count_documents({})

6040

In [17]:
movies.count_documents({})

3883

In [18]:
db.list_collection_names()

['movies', 'users']

In [22]:
# !mongoimport --db movielens --collection movies --file data/movielens_movies.json

2023-12-18T12:03:40.286+0100	Failed: error connecting to db server: Unsupported OP_QUERY command: ping. The client driver may require an upgrade. For more details see https://dochub.mongodb.org/core/legacy-opcode-removal
2023-12-18T12:03:40.318+0100	imported 0 documents


**Q** : how many users are in the `MovieLens` database ?

In [28]:
users.count_documents({})

6040

**Q** : Display all comedies (the `genres` property equals `Comedy`). 

NB : You will need to find how to go through a `command_cursor`, then use the `pprint` function for a better display of those documents.

In [19]:
movies_cursor = movies.find({'genres':'Comedy'},limit = 10)

In [20]:
movies.find_one()

{'_id': 12,
 'title': 'Dracula: Dead and Loving It (1995)',
 'genres': 'Comedy|Horror'}

In [21]:
for i in movies_cursor:
    print(i)

{'_id': 19, 'title': 'Ace Ventura: When Nature Calls (1995)', 'genres': 'Comedy'}
{'_id': 5, 'title': 'Father of the Bride Part II (1995)', 'genres': 'Comedy'}
{'_id': 38, 'title': 'It Takes Two (1995)', 'genres': 'Comedy'}
{'_id': 65, 'title': 'Bio-Dome (1996)', 'genres': 'Comedy'}
{'_id': 63, 'title': "Don't Be a Menace to South Central While Drinking Your Juice in the Hood (1996)", 'genres': 'Comedy'}
{'_id': 69, 'title': 'Friday (1995)', 'genres': 'Comedy'}
{'_id': 52, 'title': 'Mighty Aphrodite (1995)', 'genres': 'Comedy'}
{'_id': 88, 'title': 'Black Sheep (1996)', 'genres': 'Comedy'}
{'_id': 96, 'title': 'In the Bleak Midwinter (1995)', 'genres': 'Comedy'}
{'_id': 101, 'title': 'Bottle Rocket (1996)', 'genres': 'Comedy'}


**Q** : Fetch and display the `name` and `occupation` for Clifford Johnathan. The second paramater for `find()` ([doc here](https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find)) is called the `projection` and is used to limit which data to fetch from the query.

In [22]:
users.find_one()

{'_id': 6038,
 'name': 'Yaeko Hassan',
 'gender': 'F',
 'age': 95,
 'occupation': 'academic/educator',
 'movies': [{'movieid': 1419, 'rating': 4, 'timestamp': 956714815},
  {'movieid': 920, 'rating': 3, 'timestamp': 956706827},
  {'movieid': 3088, 'rating': 5, 'timestamp': 956707640},
  {'movieid': 232, 'rating': 4, 'timestamp': 956707640},
  {'movieid': 1136, 'rating': 4, 'timestamp': 956707708},
  {'movieid': 1148, 'rating': 5, 'timestamp': 956707604},
  {'movieid': 1183, 'rating': 5, 'timestamp': 956717204},
  {'movieid': 2146, 'rating': 4, 'timestamp': 956706909},
  {'movieid': 3548, 'rating': 4, 'timestamp': 956707604},
  {'movieid': 356, 'rating': 4, 'timestamp': 956707005},
  {'movieid': 1210, 'rating': 4, 'timestamp': 956706876},
  {'movieid': 1223, 'rating': 5, 'timestamp': 956707734},
  {'movieid': 1276, 'rating': 3, 'timestamp': 956707604},
  {'movieid': 1296, 'rating': 5, 'timestamp': 956714684},
  {'movieid': 1354, 'rating': 3, 'timestamp': 956714725},
  {'movieid': 1387, 

In [23]:
for j in users.find({'name':'Clifford Johnathan'}, projection={'name': True,'occupation':True}):
    pprint(j)

{'_id': 1276, 'name': 'Clifford Johnathan', 'occupation': 'technician/engineer'}


**Q**: How many minors (by `age`) have rated movies ?

In [25]:
users.find({'age': {'$lt':18} })

<pymongo.cursor.Cursor at 0x1cced405790>

In [26]:
len([i for i in users.find({'age': {'$lt':18} }) ])

222

In [78]:
#[i for i in users.find({'age': {'$lt':18}})]

**Q**: Display science fiction movies ('Sci-Fi') and suspense movies ('Thriller'). This time you need to use a regex to parse genres and look for those values.

In [27]:
for i in movies.find({'genres':{'$regex':'Sci-Fi|Thriller'}}, limit = 10):
    print(i)

{'_id': 16, 'title': 'Casino (1995)', 'genres': 'Drama|Thriller'}
{'_id': 18, 'title': 'Four Rooms (1995)', 'genres': 'Thriller'}
{'_id': 23, 'title': 'Assassins (1995)', 'genres': 'Thriller'}
{'_id': 24, 'title': 'Powder (1995)', 'genres': 'Drama|Sci-Fi'}
{'_id': 29, 'title': 'City of Lost Children, The (1995)', 'genres': 'Adventure|Sci-Fi'}
{'_id': 6, 'title': 'Heat (1995)', 'genres': 'Action|Crime|Thriller'}
{'_id': 22, 'title': 'Copycat (1995)', 'genres': 'Crime|Drama|Thriller'}
{'_id': 10, 'title': 'GoldenEye (1995)', 'genres': 'Action|Adventure|Thriller'}
{'_id': 32, 'title': 'Twelve Monkeys (1995)', 'genres': 'Drama|Sci-Fi'}
{'_id': 47, 'title': 'Seven (Se7en) (1995)', 'genres': 'Crime|Thriller'}


**Q**: If we want more advanced textual search, we need a particular index. Use the `create_index()` method to index as [TEXT](https://docs.mongodb.com/manual/core/index-text/) the `genres` field of the `movies` collection.

In [28]:
movies.create_index([('genres', 'text')])

'genres_text'

**Q**: Restart the search for science fiction and thriller movies with the operator `$text`

In [29]:
for i in movies.find({'$text':{'$search':'Sci-Fi Thriller'}}, limit = 10):
    print(i)

{'_id': 3934, 'title': 'Kronos (1957)', 'genres': 'Sci-Fi'}
{'_id': 3878, 'title': 'X: The Unknown (1956)', 'genres': 'Sci-Fi'}
{'_id': 3780, 'title': 'Rocketship X-M (1950)', 'genres': 'Sci-Fi'}
{'_id': 3779, 'title': 'Project Moon Base (1953)', 'genres': 'Sci-Fi'}
{'_id': 3687, 'title': 'Light Years (1988)', 'genres': 'Sci-Fi'}
{'_id': 3658, 'title': 'Quatermass and the Pit (1967)', 'genres': 'Sci-Fi'}
{'_id': 3486, 'title': 'Devil Girl From Mars (1954)', 'genres': 'Sci-Fi'}
{'_id': 3354, 'title': 'Mission to Mars (2000)', 'genres': 'Sci-Fi'}
{'_id': 3375, 'title': 'Destination Moon (1950)', 'genres': 'Sci-Fi'}
{'_id': 3032, 'title': 'Omega Man, The (1971)', 'genres': 'Sci-Fi'}


**Q**: Display the first 30 movies (`limit`) in alphabetical order (`sort`) by title

In [31]:
for i in movies.find().sort("title").limit(30):
    print(i)

{'_id': 2031, 'title': '$1,000,000 Duck (1971)', 'genres': "Children's|Comedy"}
{'_id': 3112, 'title': "'Night Mother (1986)", 'genres': 'Drama'}
{'_id': 779, 'title': "'Til There Was You (1997)", 'genres': 'Drama|Romance'}
{'_id': 2072, 'title': "'burbs, The (1989)", 'genres': 'Comedy'}
{'_id': 3420, 'title': '...And Justice for All (1979)', 'genres': 'Drama|Thriller'}
{'_id': 889, 'title': '1-900 (1994)', 'genres': 'Romance'}
{'_id': 2572, 'title': '10 Things I Hate About You (1999)', 'genres': 'Comedy|Romance'}
{'_id': 2085, 'title': '101 Dalmatians (1961)', 'genres': "Animation|Children's"}
{'_id': 1367, 'title': '101 Dalmatians (1996)', 'genres': "Children's|Comedy"}
{'_id': 1203, 'title': '12 Angry Men (1957)', 'genres': 'Drama'}
{'_id': 2826, 'title': '13th Warrior, The (1999)', 'genres': 'Action|Horror|Thriller'}
{'_id': 1609, 'title': '187 (1997)', 'genres': 'Drama'}
{'_id': 999, 'title': '2 Days in the Valley (1996)', 'genres': 'Crime'}
{'_id': 2492, 'title': '20 Dates (1998)

**Q**: How many users have seen the movie "Star Wars: Episode V - The Empire Strikes Back (1980)" (`_id 1196`) ? The `movies` argument is an array so we should try the [elemMatch](https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/) operator here.

In [36]:
print(users.count_documents({"movies":{"$elemMatch": {"movieid":1196}}}))

2990


**Q**: And how many gave it a rating of 1 or 2 ?

In [52]:
print(users.count_documents({'movies': {'$elemMatch': {'movieid': 1196,'rating': {'$in': [1, 2]}}}}))

105


### Updating data

**Q**: Insert a new user with the properties `name`, `gender` ('M' or'F'), `occupation` and `age`, using the `insert_one()` command. Display it with `find_one()`.

In [53]:
new_user = {
    "name": "Lurania",
    "gender": "F",
    "occupation": "witch",
    "age": 823,
}
users.insert_one(new_user)

pprint(users.find_one({"name": "Lurania"}))

{'_id': ObjectId('66b9282c06a099c393b51db6'),
 'age': 823,
 'gender': 'F',
 'name': 'Lurania',
 'occupation': 'witch'}


**Q**: Add an appreciation on a viewed movie with `update_one()`, add the movies property containing a table with a document (`movieid`, `rating`, `timestamp` with the value `datetime.datetime.utcnow()`).

You will need to read the documentation on [update operators](https://docs.mongodb.org/manual/reference/operator/update/).

In [54]:
from datetime import datetime

In [55]:
users.update_one({"name": "Lurania"},
    {"$push": {
        "movies": {
            "movieid": 1196,
            "rating": 0,
            "timestamp": datetime.utcnow()
            }}})

UpdateResult({'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

**Q**: Find the number of users who have declared a `programmer` occupation. Modify them so that they are `developer`. Verify your update.

In [56]:
print(users.count_documents({'occupation': 'programmer'}))

388


In [57]:
users.update_many({'occupation': 'programmer'},
    {'$set': {'occupation': 'developer'}})

UpdateResult({'n': 388, 'nModified': 388, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

In [58]:
print(users.count_documents({'occupation': 'programmer'}))

0


In [59]:
print(users.count_documents({'occupation': 'developer'}))

388


## II. Modelling a blog

We will now model a blog using Mongo. 

First, switch to a new `Blog` database. Each blog post will have the following arguments:

* The author (author field, string type)
* The date (date field, string type in YYYY-MM-DD format)
* The content (field content)
* Tags (field tags, a string array)
* A list of comments (field comments) containing:
 * The author (author field, string type)
 * The date (date field, string type in YYYY-MM-DD format)
 * The content (field content)


In [60]:
blog = client.Blog

posts = blog.posts

**Q**: Create a first post by `rick`, on January 15th, with the tags `mongodb` and `nosql`.

In [62]:
post_1 = {
    'author': 'Rick',
    'date': '2024-01-15', 
    'content': 'Never let your grandchild adopt a dragon.',
    'tags': ['mongodb', 'nosql'],
    'comments': []
}

posts.insert_one(post_1)

InsertOneResult(ObjectId('66b92bd306a099c393b51db7'), acknowledged=True)

**Q**: Create a second post by `kate`, on January 21, with the tag `nosql` and a comment from `rick` on the same day.

In [63]:
post_2 = {
    'author': 'Kate',
    'date': '2024-01-21', 
    'content': 'Can you believe this ? I just met a grey talking cat !',
    'tags': ['nosql'],
    'comments': [
        {'author': 'Rick',
        'date': '2024-01-21', 
        'content': 'You should get rid of that thing as soon as possible.'}]
}

posts.insert_one(post_2)

InsertOneResult(ObjectId('66b92dda06a099c393b51db8'), acknowledged=True)

**Q**: Display the author of the last post with the tag `nosql`

In [64]:
pprint(posts.find_one(
    {'tags': 'nosql'},
    sort=[('date', -1)]))

{'_id': ObjectId('66b92dda06a099c393b51db8'),
 'author': 'Kate',
 'comments': [{'author': 'Rick',
               'content': 'You should get rid of that thing as soon as '
                          'possible.',
               'date': '2024-01-21'}],
 'content': 'Can you believe this ? I just met a grey talking cat !',
 'date': '2024-01-21',
 'tags': ['nosql']}


**Q**: Add a comment by `jack` on January 25, to `kate`'s post

In [65]:
posts.update_one(
    {'author': 'Kate'},
    {
        "$push": {
            'comments': [
                {
                    'author': 'Jack',
                    'date': '2024-01-25',
                    'content': 'Don''t listen to haters. Don''t you want to know why that cat can speak ?'
                }
            ]
        }
    }
)

UpdateResult({'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

**Q**: Display all comments by `kate`

In [72]:
for i in posts.find({'author': 'Kate'}) : 
    pprint(i)

{'_id': ObjectId('66b92dda06a099c393b51db8'),
 'author': 'Kate',
 'comments': [{'author': 'Rick',
               'content': 'You should get rid of that thing as soon as '
                          'possible.',
               'date': '2024-01-21'},
              [{'author': 'Jack',
                'content': 'Dont listen to haters. Dont you want to know why '
                           'that cat can speak ?',
                'date': '2024-01-25'}]],
 'content': 'Can you believe this ? I just met a grey talking cat !',
 'date': '2024-01-21',
 'tags': ['nosql']}


## Postquisites

In [None]:
!mongo test_database --eval 'db.dropDatabase()'

In [None]:
!mongo MovieLens --eval 'db.dropDatabase()'

In [None]:
!mongo Blog --eval 'db.dropDatabase()'