# Mongo tutorial

## Prerequisites

### Documentation

You will find all documentation for :
* [Mongo commands](https://docs.mongodb.com/manual/reference/)
* [Mongo python client](http://api.mongodb.com/python/current/api/pymongo/mongo_client.html#pymongo.mongo_client.MongoClient)

### Import libraries

In [1]:
import datetime
from pprint import pprint

import pymongo
from pymongo import MongoClient

In [2]:
client = MongoClient('localhost', 27017)

In [3]:
# let's work in a test_database
db = client.test_database
posts = db.posts

In [4]:
post = {
    "author": "Mike",
    "text": "My first blog post!",
    "tags": ["mongodb", "python", "pymongo"],
    "date": datetime.datetime.utcnow()
}
post_id = posts.insert_one(post).inserted_id
post_id

ObjectId('65ae5d8a65925e6c68a9568b')

In [5]:
db.list_collection_names()

['posts']

In [6]:
pprint(posts.find_one())

{'_id': ObjectId('65ae5d8a65925e6c68a9568b'),
 'author': 'Mike',
 'date': datetime.datetime(2024, 1, 22, 12, 20, 26, 317000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


You can launch a terminal aside, connect to your server with a mongo client and check that the value is present :

```bash
vagrant@nosql:~$ mongo
> show databases;
admin          0.000GB
config         0.000GB
local          0.000GB
test_database  0.000GB
> use test_database;
switched to db test_database
> db.posts.find()
{ 
    "_id" : ObjectId("..."), 
    "author" : "Mike", 
    "text" : "My first blog post!", 
    "tags" : [ "mongodb", "python", "pymongo"], 
    "date" : ISODate("2019-02-10T11:33:47.883Z") 
}
```

## I. Quick start

### First steps

**Q** : Create a document `{msg: 'hello'}` in the `test` collection with `insert_one()`. Fetch it back to display it. What is the `_id` for ?

NB : if the collection doesn't exist yet, MongoDB automatically creates it.

In [8]:
document = {'msg': 'hello'}

test_collection = db['test']

doc_inserted = test_collection.insert_one(document)

retrieved_doc = test_collection.find_one({'_id' : doc_inserted.inserted_id})

In [9]:
print(retrieved_doc)

{'_id': ObjectId('65ae611b65925e6c68a9568c'), 'msg': 'hello'}


**Q**: Display the number of documents inside the `test` collection

In [11]:
nb_doc = test_collection.count_documents({})
print(nb_doc)

1


### Interacting with a database

We have 2 `.json` files we want to interact with inside the `data` folder. Let's first dump them into a `MovieLens` database, inside `users` and `movies` collections.

For this section, you will need to read a bit on [query operators](https://docs.mongodb.com/manual/reference/operator/query/#query-selectors). Most methods on collections you will use have `filter` as a first parameter, on which you must pass a dictionary of query parameters.

**Q** : In the `MovieLens` database, load `data/movielens_movies.json` into `movies` and `data/movielens_users.json` into `users`. 

Use the dedicated shell command for this : `mongoimport --db <some_db> --collection <some_collection> --file <some_file>` 

In [13]:
db = client.MovieLens
movies = db['collection_movies']
users = db['collection_users']

In [15]:
!mongoimport --db MovieLens --collection collection_movies --file data/movielens_movies.json 
!mongoimport --db MovieLens --collection collection_users --file data/movielens_users.json 

2024-01-22T13:46:09.288+0100	connected to: localhost
2024-01-22T13:46:09.460+0100	imported 3883 documents
2024-01-22T13:46:09.628+0100	connected to: localhost
2024-01-22T13:46:10.838+0100	imported 6040 documents


**Q** : how many users are in the `MovieLens` database ?

In [28]:
nb_users = users.count_documents({})
print(f"Nombre d'utilisateurs : {nb_users}")

Nombre d'utilisateurs : 6040


**Q** : Display all comedies (the `genres` property equals `Comedy`). 

NB : You will need to find how to go through a `command_cursor`, then use the `pprint` function for a better display of those documents.

In [25]:
comedy_movies = movies.find({"genres": "Comedy"})
for comedy_movie in comedy_movies:
    pprint(comedy_movie['title'])

'Ace Ventura: When Nature Calls (1995)'
'Father of the Bride Part II (1995)'
'It Takes Two (1995)'
("Don't Be a Menace to South Central While Drinking Your Juice in the Hood "
 '(1996)')
'Bio-Dome (1996)'
'Friday (1995)'
'Mighty Aphrodite (1995)'
'Black Sheep (1996)'
'In the Bleak Midwinter (1995)'
'Bottle Rocket (1996)'
'Mr. Wrong (1996)'
'Happiness Is in the Field (1995)'
'Steal Big, Steal Little (1995)'
'Flirting With Disaster (1996)'
'Happy Gilmore (1996)'
'Down Periscope (1996)'
'Headless Body in Topless Bar (1995)'
'Birdcage, The (1996)'
'Brothers McMullen, The (1995)'
'Blue in the Face (1995)'
'Jury Duty (1995)'
'Living in Oblivion (1995)'
'Mallrats (1995)'
'Love & Human Remains (1993)'
'Nine Months (1995)'
'Party Girl (1995)'
'Reckless (1995)'
'Jeffrey (1995)'
'To Wong Foo, Thanks for Everything! Julie Newmar (1995)'
'Bushwhacked (1995)'
'Billy Madison (1995)'
'Clerks (1994)'
'Destiny Turns on the Radio (1995)'
'Dumb & Dumber (1994)'
'Exit to Eden (1994)'
'Gordy (1995)'
'Houseg

**Q** : Fetch and display the `name` and `occupation` for Clifford Johnathan. The second paramater for `find()` ([doc here](https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find)) is called the `projection` and is used to limit which data to fetch from the query.

In [39]:
clifford = users.find({"name": "Clifford Johnathan"})
pprint(f"Name : {clifford[0]['name']}, Occupation : {clifford[0]['occupation']}")

'Name : Clifford Johnathan, Occupation : technician/engineer'


**Q**: How many minors (by `age`) have rated movies ?

In [42]:
minors_count = users.count_documents({"age": {"$lt": 18}})
print(minors_count)

222


**Q**: Display science fiction movies ('Sci-Fi') and suspense movies ('Thriller'). This time you need to use a regex to parse genres and look for those values.

In [44]:
# Define the regex pattern for genres
genres_regex = "Sci-Fi|Thriller"

# Find movies where the 'genres' property matches the regex pattern
science_fiction_and_suspense_movies = movies.find({"genres": {"$regex": genres_regex}})

# Display the results
for movie in science_fiction_and_suspense_movies:
    pprint(movie)

{'_id': 16, 'genres': 'Drama|Thriller', 'title': 'Casino (1995)'}
{'_id': 18, 'genres': 'Thriller', 'title': 'Four Rooms (1995)'}
{'_id': 22, 'genres': 'Crime|Drama|Thriller', 'title': 'Copycat (1995)'}
{'_id': 23, 'genres': 'Thriller', 'title': 'Assassins (1995)'}
{'_id': 24, 'genres': 'Drama|Sci-Fi', 'title': 'Powder (1995)'}
{'_id': 29,
 'genres': 'Adventure|Sci-Fi',
 'title': 'City of Lost Children, The (1995)'}
{'_id': 6, 'genres': 'Action|Crime|Thriller', 'title': 'Heat (1995)'}
{'_id': 10, 'genres': 'Action|Adventure|Thriller', 'title': 'GoldenEye (1995)'}
{'_id': 32, 'genres': 'Drama|Sci-Fi', 'title': 'Twelve Monkeys (1995)'}
{'_id': 47, 'genres': 'Crime|Thriller', 'title': 'Seven (Se7en) (1995)'}
{'_id': 51, 'genres': 'Action|Drama|Thriller', 'title': 'Guardian Angel (1994)'}
{'_id': 66,
 'genres': 'Sci-Fi|Thriller',
 'title': 'Lawnmower Man 2: Beyond Cyberspace (1996)'}
{'_id': 50, 'genres': 'Crime|Thriller', 'title': 'Usual Suspects, The (1995)'}
{'_id': 61, 'genres': 'Drama

**Q**: If we want more advanced textual search, we need a particular index. Use the `create_index()` method to index as [TEXT](https://docs.mongodb.com/manual/core/index-text/) the `genres` field of the `movies` collection.

In [59]:
movies.create_index({
    "genres" : "text"
})

'genres_text'

**Q**: Restart the search for science fiction and thriller movies with the operator `$text`

In [66]:
# Find movies where the 'genres' property matches the regex pattern
science_fiction_and_suspense_movies = movies.find({
        "$text": {
            "$search": "Sci-Fi Thriller" 
        }
    }
)

# Display the results
for movie in science_fiction_and_suspense_movies:
    pprint(movie)

{'_id': 3934, 'genres': 'Sci-Fi', 'title': 'Kronos (1957)'}
{'_id': 3878, 'genres': 'Sci-Fi', 'title': 'X: The Unknown (1956)'}
{'_id': 3780, 'genres': 'Sci-Fi', 'title': 'Rocketship X-M (1950)'}
{'_id': 3779, 'genres': 'Sci-Fi', 'title': 'Project Moon Base (1953)'}
{'_id': 3687, 'genres': 'Sci-Fi', 'title': 'Light Years (1988)'}
{'_id': 3658, 'genres': 'Sci-Fi', 'title': 'Quatermass and the Pit (1967)'}
{'_id': 3486, 'genres': 'Sci-Fi', 'title': 'Devil Girl From Mars (1954)'}
{'_id': 3375, 'genres': 'Sci-Fi', 'title': 'Destination Moon (1950)'}
{'_id': 3354, 'genres': 'Sci-Fi', 'title': 'Mission to Mars (2000)'}
{'_id': 3032, 'genres': 'Sci-Fi', 'title': 'Omega Man, The (1971)'}
{'_id': 2698, 'genres': 'Sci-Fi', 'title': 'Zone 39 (1997)'}
{'_id': 2665,
 'genres': 'Sci-Fi',
 'title': 'Earth Vs. the Flying Saucers (1956)'}
{'_id': 2660,
 'genres': 'Sci-Fi',
 'title': 'Thing From Another World, The (1951)'}
{'_id': 2667, 'genres': 'Sci-Fi', 'title': 'Mole People, The (1956)'}
{'_id': 266

**Q**: Display the first 30 movies (`limit`) in alphabetical order (`sort`) by title

In [73]:
first_30_movies = movies.find().sort("title").limit(30)

for movie in first_30_movies:
    pprint(movie)

{'_id': 2031, 'genres': "Children's|Comedy", 'title': '$1,000,000 Duck (1971)'}
{'_id': 3112, 'genres': 'Drama', 'title': "'Night Mother (1986)"}
{'_id': 779, 'genres': 'Drama|Romance', 'title': "'Til There Was You (1997)"}
{'_id': 2072, 'genres': 'Comedy', 'title': "'burbs, The (1989)"}
{'_id': 3420,
 'genres': 'Drama|Thriller',
 'title': '...And Justice for All (1979)'}
{'_id': 889, 'genres': 'Romance', 'title': '1-900 (1994)'}
{'_id': 2572,
 'genres': 'Comedy|Romance',
 'title': '10 Things I Hate About You (1999)'}
{'_id': 2085,
 'genres': "Animation|Children's",
 'title': '101 Dalmatians (1961)'}
{'_id': 1367, 'genres': "Children's|Comedy", 'title': '101 Dalmatians (1996)'}
{'_id': 1203, 'genres': 'Drama', 'title': '12 Angry Men (1957)'}
{'_id': 2826,
 'genres': 'Action|Horror|Thriller',
 'title': '13th Warrior, The (1999)'}
{'_id': 1609, 'genres': 'Drama', 'title': '187 (1997)'}
{'_id': 999, 'genres': 'Crime', 'title': '2 Days in the Valley (1996)'}
{'_id': 2492, 'genres': 'Comedy

**Q**: How many users have seen the movie "Star Wars: Episode V - The Empire Strikes Back (1980)" (`_id 1196`) ? The `movies` argument is an array so we should try the [elemMatch](https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/) operator here.

In [81]:
movies.find({'_id' : 1196})[0]

{'_id': 1196,
 'title': 'Star Wars: Episode V - The Empire Strikes Back (1980)',
 'genres': 'Action|Adventure|Drama|Sci-Fi|War'}

In [86]:
users.count_documents({
    "movies" : {
        "$elemMatch" : {
            "movieid" : { "$eq" : 1196}
        }
    }
})

2990

In [87]:
users.find_one()

{'_id': 6038,
 'name': 'Yaeko Hassan',
 'gender': 'F',
 'age': 95,
 'occupation': 'academic/educator',
 'movies': [{'movieid': 1419, 'rating': 4, 'timestamp': 956714815},
  {'movieid': 920, 'rating': 3, 'timestamp': 956706827},
  {'movieid': 3088, 'rating': 5, 'timestamp': 956707640},
  {'movieid': 232, 'rating': 4, 'timestamp': 956707640},
  {'movieid': 1136, 'rating': 4, 'timestamp': 956707708},
  {'movieid': 1148, 'rating': 5, 'timestamp': 956707604},
  {'movieid': 1183, 'rating': 5, 'timestamp': 956717204},
  {'movieid': 2146, 'rating': 4, 'timestamp': 956706909},
  {'movieid': 3548, 'rating': 4, 'timestamp': 956707604},
  {'movieid': 356, 'rating': 4, 'timestamp': 956707005},
  {'movieid': 1210, 'rating': 4, 'timestamp': 956706876},
  {'movieid': 1223, 'rating': 5, 'timestamp': 956707734},
  {'movieid': 1276, 'rating': 3, 'timestamp': 956707604},
  {'movieid': 1296, 'rating': 5, 'timestamp': 956714684},
  {'movieid': 1354, 'rating': 3, 'timestamp': 956714725},
  {'movieid': 1387, 

**Q**: And how many gave it a rating of 1 or 2 ?

In [98]:
users.count_documents({
    "movies" : {
        "$elemMatch" : {
            "movieid" : { "$eq" : 1196},
            "$or" : [ { "rating" : {"$eq" : 1} }, { "rating" : {"$eq" : 2} }]
        }
    }
})

105

### Updating data

**Q**: Insert a new user with the properties `name`, `gender` ('M' or'F'), `occupation` and `age`, using the `insert_one()` command. Display it with `find_one()`.

In [100]:
user = {
    "name" : "Ndeye-Fatou Dieng",
    "gender" : 'F',
    "occupation" : "data scientist",
    "age" : 23
}
users.insert_one(user)

InsertOneResult(ObjectId('65ae760a65925e6c68a9568d'), acknowledged=True)

In [106]:
users.find_one({'name' : 'Ndeye-Fatou Dieng'})

{'_id': ObjectId('65ae760a65925e6c68a9568d'),
 'name': 'Ndeye-Fatou Dieng',
 'gender': 'F',
 'occupation': 'data scientist',
 'age': 23}

**Q**: Add an appreciation on a viewed movie with `update_one()`, add the movies property containing a table with a document (`movieid`, `rating`, `timestamp` with the value `datetime.datetime.utcnow()`).

You will need to read the documentation on [update operators](https://docs.mongodb.org/manual/reference/operator/update/).

In [107]:
newMovie = {
    'movieid': 3088, 
    'rating': 5, 
    'timestamp': datetime.datetime.utcnow()
}

In [110]:
users.update_one({'name' : 'Ndeye-Fatou Dieng'}, {"$set": { "movies": newMovie } })

UpdateResult({'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

**Q**: Find the number of users who have declared a `programmer` occupation. Modify them so that they are `developer`. Verify your update.

In [126]:
nb_programmers = users.count_documents({
    "occupation" : "programmer"
})
print(f"Nombre de programmeur : {nb_programmers}")

Nombre de programmeur : 388


In [128]:
users.update_many({
    "occupation" : "programmer"
}, {
    "$set": { "occupation": "developer" } })

UpdateResult({'n': 388, 'nModified': 388, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

## II. Modelling a blog

We will now model a blog using Mongo. 

First, switch to a new `Blog` database. Each blog post will have the following arguments:

* The author (author field, string type)
* The date (date field, string type in YYYY-MM-DD format)
* The content (field content)
* Tags (field tags, a string array)
* A list of comments (field comments) containing:
 * The author (author field, string type)
 * The date (date field, string type in YYYY-MM-DD format)
 * The content (field content)


In [144]:
db = client.Blog
posts = db['posts']
comments = db['comments']

**Q**: Create a first post by `rick`, on January 15th, with the tags `mongodb` and `nosql`.

In [145]:
post = {
    "author" : "rick",
    "date" : "2024-01-15",
    "content" : "",
    "tags" : "mongodb nosql",
    "commments" : {
        "author" : "",
        "date" : "",
        "content" : ""
    }
}

posts.insert_one(post)

InsertOneResult(ObjectId('65ae802b65925e6c68a9568e'), acknowledged=True)

**Q**: Create a second post by `kate`, on January 21, with the tag `nosql` and a comment from `rick` on the same day.

In [146]:
post = {
    "author" : "kate",
    "date" : "2024-01-21",
    "content" : "Hello everybody I'm Kate",
    "tags" : "nosql",
    "commments" : {
        "author" : "rick",
        "date" : "2024-01-21",
        "content" : "Amazing"
    }
}

posts.insert_one(post)

InsertOneResult(ObjectId('65ae807365925e6c68a9568f'), acknowledged=True)

**Q**: Display the author of the last post with the tag `nosql`

In [152]:
posts.find({"tags" : "nosql"})

IndexError: no such item for Cursor instance

**Q**: Add a comment by `jack` on January 25, to `kate`'s post

**Q**: Display all comments by `kate`

## Postquisites

In [None]:
!mongo test_database --eval 'db.dropDatabase()'

In [None]:
!mongo MovieLens --eval 'db.dropDatabase()'

In [None]:
!mongo Blog --eval 'db.dropDatabase()'