---
<center><h1> Lesson 8.2 - NoSQL with Python: MongoDB</center></h1>

---

<img src="images/mongodb_logo.png" width="80%">

**MongoDB** is an open-source document database that provides high performance, high availability, and automatic scaling. MongoDB is written in C++.

MongoDB stores data in the form of documents, which are JSON-like field and value pairs. Documents are analogous to structures in programming languages that associate keys with values (e.g. dictionaries, hashes, maps, and associative arrays). _Documents are analogous to one row of a table in relational databases_. Formally, MongoDB documents are BSON documents. _BSON_ is a binary representation of JSON with additional type information. In the documents, the value of a field can be any of the BSON data types, including other documents, arrays, and arrays of documents. 

MongoDB stores all documents in collections. A collection is a group of related documents that have a set of shared common indexes. _Collections are analogous to a table in relational databases_. A collection exists within a single database. Collections do not enforce a schema. Documents within a collection can have different fields. Typically, all documents in a collection are of similar or related purpose.

Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection's documents may hold different types of data

<img src="images/document.jpg">

### Advantages of MongoDB over RDBMS

* Schema less : MongoDB is document database in which one collection holds different different documents. Number of fields, content and size of the document can be differ from one document to another.
* Structure of a single object is clear.
* No complex joins.
* Deep query-ability. MongoDB supports dynamic queries on documents using a document-based query language that's nearly as powerful as SQL.
* Tuning.
* Ease of scale-out: MongoDB is easy to scale.
* Conversion / mapping of application objects to database objects not needed.
* Uses internal memory for storing the (windowed) working set, enabling faster access of data
* Flexible schema - supports hierarchical data structure.
* Oriented toward programmers - it supports associative arrays such as php arrays, python dictionaries, JSON objects, Ruby hash etc.
* Lots of MongoDB Drivers and Client Libraries 
* Drivers in MongoDB are used for connectivity between client applications and the database. For example, if we have a Python program and we want to connect to MongoDB, then we need to download and integrate the Python driver so that the program can work with the MongoDB database.
* Flexible deployment.
* Documents correspond to native data types in many programming languages.
* Dynamic schema supports fluent polymorphism. 

### Installing MongoDB

The instruction of installing MongoDB on different platforms is described in details on [official site](https://docs.mongodb.org/manual/installation/) of MongoDB. 

# Interaction of Python and MongoDB through PyMongo Library

PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python. The most recommended way of intalling PyMongo on your computer is using pip

    pip install pymongo
    
Before starting make sure that you run [mongod](https://docs.mongodb.org/manual/reference/program/mongod/#bin.mongod) process:

* **For Windows OS** (we suppose, that you have installed MongoDB in the folder `C:\mongodb`):
    
    `C:\mongodb\bin\mongod.exe`
    
    
* **For Linux:**
    
    
    sudo service mongod start
    
    
* **For MacOS:**
   
   `mongod`
   
This material contains only base MongoDB commands and basic usage of PyMongo. More information you can find on [official site of MongoDB](https://docs.mongodb.org/manual/).

### Connection to the server

The first step when working with PyMongo is to create a `MongoClient` to the running mongod instance:

In [1]:
import pymongo
from pymongo import MongoClient
from pymongo.errors import ConnectionFailure

# Connection to Mongo DB
try:
    client = MongoClient()
    print "Connected successfully!!!"
except ConnectionFailure, e:
    print "Could not connect to MongoDB: %s" % e 

# We can also specify the host and port explicitly, as follows:
# `client = MongoClient('localhost', 27017)`
# or MongoDB URI format:
# `client = MongoClient('mongodb://localhost:27017/')`
    
client

Connected successfully!!!


MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

### Quick start

MongoDB creates databases and collections automatically for you if they don't exist already. A single instance of MongoDB can support multiple independent databases. When working with PyMongo you access databases using attribute style access:

In [2]:
db = client.my_database   
# If your database name is such that using attribute style access won’t work (like db-name), 
# you can use dictionary style access instead 
# db = client['my-database']
db

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'my_database')

In [3]:
# To know what databases are available:
client.database_names()

[u'local', u'cinema', u'admin']

We have already created one new database. Why didn't show up with the above command? Well, databases with no collections or with empty collections will not show up with `database_names()`. Same goes when we try to list empty collections in a database.

A collection is a group of documents stored in MongoDB, and can be thought of as roughly the equivalent of a table in a relational database. Getting a collection in PyMongo works the same as getting a database:

In [4]:
collection = db.my_collection
collection

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'my_database'), u'my_collection')

In [5]:
# To see all NOT empty collections
db.collection_names()

[]

> However one must be careful when trying to get existing collections. For example, if you have a collection `db.user` and you type `db.usr` this is clearly a mistake. Unlike an RDBMS, MongoDB won't protect you from this class of mistake.

To **insert** some data into MongoDB, all we need to do is create a dict and call `insert_one()` or `insert_many()` methods on the collection object:

In [6]:
collection.insert_one({"first_name": "Joe", "last_name": "Smith", "age": 45})
collection.insert_many(
    [
        {"first_name": "Rocky", "last_name": "Balboa", "age": 38},
        {"first_name": "Luke", "last_name": "Skywalker", "age": 32}
    ]
)

<pymongo.results.InsertManyResult at 0x7fc307324e60>

When a document is inserted a special key, `_id`, is automatically added if the document doesn’t already contain an `_id` key. The value of `_id` must be unique across the collection. `insert_one()` returns an instance of `InsertOneResult`.

In [7]:
# Let's see collections now
db.collection_names()

[u'my_collection', u'system.indexes']

In [8]:
# Get full name of collection including database name
db.my_collection.full_name

u'my_database.my_collection'

In [9]:
# Select the last created document
my_document = collection.find_one()
my_document

{u'_id': ObjectId('5c517705648884071317eee6'),
 u'age': 45,
 u'first_name': u'Joe',
 u'last_name': u'Smith'}

You may **delete** documents, collections and databases. `delete_one()` or `remove()` delete a single document, `delete_many()` deletes one or more documents.

In [10]:
# Delete a document
collection.delete_one(my_document)

<pymongo.results.DeleteResult at 0x7fc307324e10>

In [11]:
# Delete a collection
db.drop_collection('my_collection')

In [12]:
# Delete a database
client.drop_database("my_database")

In [13]:
# Let;s see whether database was removed:
client.database_names()

[u'local', u'cinema', u'admin']

### Bacis usage

Let's create a new database "cinema" with the collection "movies", where we will collect data about films, its actors, director, etc. in each new document. So, at first let's create the document for the film ["Forrest Gump"](https://en.wikipedia.org/wiki/Forrest_Gump). The following picture shows how SQL tables may be transform to a MongoDB document, i.e. how the relationship one-to-many can be realized.  

<img src="images/forrest_gump.jpg">

In [14]:
db = client.cinema
collection = db.movies

In [15]:
forrest_gump_dict = {
    "title": "Forrest Gump", 
    "released": 1994,
    "duration_min": 142,
    "country": "USA",
    "lang": "English",
    "persons": [
        {
            "name": "Tom Hanks",
            "born": 1956,
            "country": "USA",
            "relation": "ACTED_IN",
            "role": "Forrest Gump"
        },
        {
            "name": "Gary Sinise",
            "born": 1955,
            "country": "USA",
            "relation": "ACTED_IN",
            "role": "Lieutenant Dan Taylor"
        },
        {
            "name": "Robert Zemeckis",
            "born": 1952,
            "country": "USA",
            "relation": "DIRECTED",
        }
    ]
}

collection.insert_one(forrest_gump_dict)

<pymongo.results.InsertOneResult at 0x7fc307324d20>

The `find_one()` method **selects and returns** a single document from a collection and returns that document (or None if there are no matches). It is useful when you know there is only one matching document, or are only interested in the first match:

In [16]:
forrest_gump = collection.find_one()
forrest_gump

{u'_id': ObjectId('5c5092b564888406f3e1bcf1'),
 u'country': u'USA',
 u'duration_min': 142,
 u'lang': u'English',
 u'persons': [{u'born': 1956,
   u'country': u'USA',
   u'films': [u'Forrest Gump', u'The Green Mile'],
   u'name': u'Tom Hanks',
   u'relation': u'ACTED_IN',
   u'role': u'Forrest Gump'},
  {u'born': 1955,
   u'country': u'USA',
   u'name': u'Gary Sinise',
   u'relation': u'ACTED_IN',
   u'role': u'Lieutenant Dan Taylor'},
  {u'born': 1952,
   u'country': u'USA',
   u'name': u'Robert Zemeckis',
   u'relation': u'DIRECTED'}],
 u'released': 1994,
 u'title': u'Forrest Gump'}

To **get more than a single document** as the result of a query we use the `find()` method. `find()` returns a Cursor instance, which allows us to iterate over all matching documents.

Let's create a few new documents for  films:

In [17]:
green_mile_dict = {
    "title": "The Green Mile", 
    "released": 1999,
    "duration_min": 188,
    "country": "USA",
    "lang": "English",
    'box_office_Mdol': 290.7,
    "persons": [
        {
            "name": "Tom Hanks",
            "born": 1956,
            "country": "USA",
            "relation": "ACTED_IN",
            "role": "Paul Edgecomb"
        },
        {
            "name": "Gary Sinise",
            "born": 1955,
            "country": "USA",
            "relation": "ACTED_IN",
            "role": "Burt Hammersmith"
        },
        {
            "name": "Michael Clarke Duncan",
            "born": 1957,
            "country": "USA",
            "relation": "ACTED_IN",
            "role": "John Coffey"
        },
        {
            "name": "Frank Darabont",
            "born": 1959,
            "country": "USA",
            "relation": "DIRECTED",
        }
    ]
}

green_mile_id = collection.insert_one(green_mile_dict).inserted_id
green_mile_id

ObjectId('5c51787a648884071317eeea')

When a document is inserted a special key, `_id`, is automatically added if the document doesn’t already contain an `_id` key. This `_id` is saved in attribute `inserted_id`.

In [18]:
inseption_dict = {
    "title": "Inseption", 
    "released": 2010,
    "duration_min": 148,
    "country": "USA",
    "lang": "English",
    "box_office_Mdol": 825.5,
    "persons": [
        {
            "name": "Leonardo DiCaprio",
            "born": 1974,
            "country": "USA"
        }
    ]
}

taxi_dict = {
    "title": "Taxi", 
    "released": 1998,
    "duration_min": 86,
    "country": "France",
    "lang": "French",
    "continuation": True,
    "persons": [
        {
            "name": "Samy Naceri",
            "born": 1961,
            "country": "France"
        }
    ]
}

matrix_dict = {
    "title": "The Matrix", 
    "released": 1999,
    "duration_min": 136,
    "country": "USA",
    "lang": "English",
    "box_office_Mdol": 463.5,
    "continuation": True,
    "persons": [
        {
            "name": "Keanu Reeves",
            "born": 1964,
            "country": "USA",
            "relation": "ACTED_IN",
            "role": "Neo"
        },
        {
            "name": "Laurence Fishburne",
            "born": 1961,
            "country": "USA",
            "relation": "ACTED_IN",
            "role": "Morpheus"
        }
    ]
}

collection.insert_many([inseption_dict, taxi_dict, matrix_dict])

<pymongo.results.InsertManyResult at 0x7fc307324c80>

To **know how many documents** match a query we can perform a `count()` operation: 

In [19]:
collection.count()

10

In [20]:
db.collection_names(include_system_collections=False)
# `include_system_collections=False` miss "local" collection

[u'movies']

In [21]:
for movie in collection.find():
    print movie['title']

Forrest Gump
The Green Mile
Inseption
The Matrix
Taxi
Forrest Gump
The Green Mile
Inseption
Taxi
The Matrix


MongoDB queries are represented as JSON-like structure, just like documents. To build a query, you just need to specify a dictionary with the properties you wish the results to match. For example, this query will match all documents in the "movies" collection with `"country" == "USA"`:

In [22]:
for movie in collection.find({"country": "USA"}):
    print '{}, {}'.format(movie['title'], movie["country"])

Forrest Gump, USA
The Green Mile, USA
Inseption, USA
The Matrix, USA
Forrest Gump, USA
The Green Mile, USA
Inseption, USA
The Matrix, USA


We can also find a post by its `_id`:

In [23]:
collection.find_one({"_id": green_mile_id})

{u'_id': ObjectId('5c504cde64888406f3e1bc16'),
 u'box_office_Mdol': 290.7,
 u'country': u'USA',
 u'duration_min': 188,
 u'lang': u'English',
 u'persons': [{u'born': 1956,
   u'country': u'USA',
   u'name': u'Tom Hanks',
   u'relation': u'ACTED_IN',
   u'role': u'Paul Edgecomb'},
  {u'born': 1955,
   u'country': u'USA',
   u'name': u'Gary Sinise',
   u'relation': u'ACTED_IN',
   u'role': u'Burt Hammersmith'},
  {u'born': 1957,
   u'country': u'USA',
   u'name': u'Michael Clarke Duncan',
   u'relation': u'ACTED_IN',
   u'role': u'John Coffey'},
  {u'born': 1959,
   u'country': u'USA',
   u'name': u'Frank Darabont',
   u'relation': u'DIRECTED'}],
 u'released': 1999,
 u'title': u'The Green Mile'}

Queries can also use **special query operators**. These operators include `gt`, `gte`, `lt`, `lte`, `ne`, `nin`, `exists`, `size` (for arrays), `not`, `or` and many more. The full list of operators of this kind you  may find [here](https://docs.mongodb.org/manual/reference/operator/query/). The following queries show the use of some of these operators:

In [24]:
# Movies released after 1998
for movie in collection.find({"released": {"$gte":1999}}):
    print '{}, {}'.format(movie['title'], movie["released"])

The Green Mile, 1999
Inseption, 2010
The Matrix, 1999


In [25]:
# Display movies released in USA not in 1994 or in 2010 
# OR which have the "continuation" field and data about more then one person
q = {
    "$or":[
        {"country": "USA", "released": {"$not": {"$in": [1994, 2010]}}},
        {"continuation": {"$exists": True}, "persons": {"$size": 1}}
    ]
}
for movie in collection.find(q):
    print '{}, {}'.format(movie['title'], movie["released"])

The Green Mile, 1999
Taxi, 1998
The Matrix, 1999


> ### Exercise 1.1:

> Display those movie titles which were released before 2000 not in USA or after 1995 but had the box office over $500 M. You need write result to the Python list `results` with dictionaries containing keys `"title"`, `"released"` and `"box_office_Mdol"`. 

> ***Pay your attention:***

> MongoDB may contain a few records with all the same field (the same each field's  name and value), but they differ from each other by the "_id" field - unique record's identifier that contains 24 characters in hexadecimal numeral system. If you run some of above command cells containing insertion commands twice or more times, then respectivelly two or more identical records (here we mean that they will have all the same properties but various IDs) will be created in the collection and will be "perceived" as different records. Thus, all searching or filtering queries may return duplicates. That's why here and further we add the code updating the collection and removing all duplicates (see below).

In [77]:
# Coded "movies" collection's data wich should be obtained if each above command cell was run only once 
# We use this format to compress data
data = ''.join([
        'x\x9c\xbd\x94Mo\xda@\x10\x86\xff\xca\xde\xb8D\x11\x10c\xa0\xb7\x94\xaf\xd0bT\x01\xbd4\x8a\xd0\xd8\x9e\xe0\x15\xf6,',
        '\x1a\xefV\xa0*\xff\xbd\x8bi*-&\x89-\xd4\xdeV3\xf2<3\xf3\xbe\xe3\xc7_\xa6\x91\x02m\x1a\x9f\x84i\x8ch\x93\xca<i\xdc\xd8', 
        '\xb7\x96:\xc5":V\xcc\x98k11\xd9\xaeHE\xca\x90\xe6C\x91\xfc\xbe\xbc/b;\xe4\\Qnc\x8f\xb6b\xa8\x98\xec\xb3\xd5\xef\xf8', 
        '\xc7$\xab\xb7J\x11d\xa7\xccJe\xe2\x01h\x9b\x17a\xc6\x14\xb4TT\xa4\xee\x07\xab\xd1p=\x9d_d\xbf\xdc\x08\x87\xd7qx3\x89F#',
        '\x01i1\x04\x12+8\xa4\x8a]\xf0\x04\xf8 \x96\x92d\x8eW\xa2\xdb\xa5\xcf\x87\xd3\xc5\xe8X\xc1%.T\x88\xac\xc5\x0f\xcc0\xda',
        '\xca\xfcr\xed\xa7c46\\\x14[g\xb2`x\xaf\x08\x84\x1c\xe3\x82\xda\xf7Nm|\xa0\xe1*A1aD\x12\x81L\xf1M\x15C\xb5_\xab\xe7g',
        '\x19\xe1:\x88Ujs\xed~\xf3\xb6[\xa6\xf6\xabk\xfe\rL*F\xf1\x06#\x95\x85\xffC\xf4\xcf\xc6n\xf7\x01\xb2\xcc\xb6\x97I\x9d',
        '\xfc+\xbd\xbb\x0e\xf5\x8bJH\x0c\xec\xf2\xf0\xe0\x02\x03\x19%\x80\xa9\x18\xa4\xc0[\x14CC\x11\xd0\x95\xe8~E\xab\x8d\xd9n',
        '\xd7Z\x9f!T\xa4k8\xad\xd7\xabd\xab)\xe5\xb8+z\xa8\xee\xa8^\xbbs\xdb9sT\xbb\xd9j\xbe\xe3\xa8\xae\xe7L5CE\xc0\xb1\x12C9',
        '\x80\x1dKU\xe7\x84\xce\x07\x1b3R\x944\xca\x16\xef\x9d\x17=.3\xc2\xf3\xc3\x82\xbd|\xef\x07\xe8\xb7\x9c\xd6\x97\x90\x1d',
        '\xc4\x1c"dYj\xfaO\xfdS\xdf\x91\xd5K\x92y\x15x\xc5\x06/\x8c\xd3\xf3+_\x7f\x00\x9a\xe5\xbe\x86N\x9e\x7fW\xd2\xe9\x83\xcb',
        '\xf7=\xe7$\xe6\xa8\\;~E #\x16\x88?\xf1\xca\x93?\xad\xf5/)P\xbcK\xd0\xe4.n\x06\xe6(.\x8a\xb1]Ih\x98j_|\r)Zw\xfe\xcb',
        '\xd3o\x14\xd7L\xa3'
    ])
# Return to the native format of MongoDB document
data = eval(data.decode('zip'))
# Drop the old "movies" collection
db.drop_collection('movies')
# Create an empty "movies" collection
collection = db.movies
# Fill it with records from "data"
collection.insert_many(data)
q = {
    "$or":[
        {"country": {"$not": {"$in": ["USA"]}}, "released": {"$lt": 2000}},
        {"released": {"$gt": 1995}, "box_office_Mdol": {"$gt": 500.0}}
    ]
}
newlist = []
# type your code here
for movie in collection.find(q):
    print movie.keys()
    k = 0
    if "box_office_Mdol" not in movie.keys():
        k = 44.4
    else:
        k = movie["box_office_Mdol"]
    r = {"title":movie["title"],
        "released":movie["released"],
        "box_office_Mdol":k
        }
    newlist.append(r)
results = newlist
print results

[u'lang', u'title', u'country', u'box_office_Mdol', u'released', u'persons', u'duration_min', u'_id']
[u'lang', u'title', u'continuation', u'persons', u'country', u'duration_min', u'_id', u'released']
[{'box_office_Mdol': 825.5, 'released': 2010, 'title': u'Inseption'}, {'box_office_Mdol': 44.4, 'released': 1998, 'title': u'Taxi'}]


In [73]:
from test_helper import Test

Test.assertEqualsHashed(results, '0707c1327594b72d0b8a9f451718e18b1dad57aa', 'Incorrect query', "Exercise 1.1 is successful")

NameError: name 'results' is not defined

MongoDB can sort query results for you on the server-side. Especially if you are sorting results on a property which has an index, it can sort these far more efficiently than your client program can:

In [60]:
for movie in collection.find().sort([("released", pymongo.DESCENDING)]):
    print '{}, {}'.format(movie['title'], movie["released"])

Inseption, 2010
The Green Mile, 1999
The Matrix, 1999
Taxi, 1998
Forrest Gump, 1994


The above queries are not very optimal when you have large result sets. Pymongo have a `limit()` and `skip()` methods which let you fetch a limited number of results or miss some of them:

In [61]:
for movie in collection.find().sort([("released", pymongo.ASCENDING)]).limit(3):
    print '{}, {}'.format(movie['title'], movie["released"])

Forrest Gump, 1994
Taxi, 1998
The Green Mile, 1999


In [62]:
for movie in collection.find().sort([("released", pymongo.ASCENDING)]).skip(2).limit(2):
    print '{}, {}'.format(movie['title'], movie["released"])

The Green Mile, 1999
The Matrix, 1999


`distinct()` method allows returning only unigue items for some field:

In [63]:
for lang in collection.find().distinct("lang"):
    print lang

English
French


PyMongo can **update documents** in a number of different ways. Let's start for adding a new document to our collection.

Now we can use the `update()` method to modify the document:

In [64]:
forrest_gump

{u'_id': ObjectId('5c504c4b64888406f3e1bc15'),
 u'country': u'USA',
 u'duration_min': 142,
 u'lang': u'English',
 u'persons': [{u'born': 1956,
   u'country': u'USA',
   u'name': u'Tom Hanks',
   u'relation': u'ACTED_IN',
   u'role': u'Forrest Gump'},
  {u'born': 1955,
   u'country': u'USA',
   u'name': u'Gary Sinise',
   u'relation': u'ACTED_IN',
   u'role': u'Lieutenant Dan Taylor'},
  {u'born': 1952,
   u'country': u'USA',
   u'name': u'Robert Zemeckis',
   u'relation': u'DIRECTED'}],
 u'released': 1994,
 u'title': u'Forrest Gump'}

In [65]:
forrest_gump.update({"box_office_Mdol": 177.9})
forrest_gump

{u'_id': ObjectId('5c504c4b64888406f3e1bc15'),
 'box_office_Mdol': 177.9,
 u'country': u'USA',
 u'duration_min': 142,
 u'lang': u'English',
 u'persons': [{u'born': 1956,
   u'country': u'USA',
   u'name': u'Tom Hanks',
   u'relation': u'ACTED_IN',
   u'role': u'Forrest Gump'},
  {u'born': 1955,
   u'country': u'USA',
   u'name': u'Gary Sinise',
   u'relation': u'ACTED_IN',
   u'role': u'Lieutenant Dan Taylor'},
  {u'born': 1952,
   u'country': u'USA',
   u'name': u'Robert Zemeckis',
   u'relation': u'DIRECTED'}],
 u'released': 1994,
 u'title': u'Forrest Gump'}

If a field just exists, you can change it value like for a Python dict:

In [66]:
forrest_gump["box_office_Mdol"] = 677.9
forrest_gump

{u'_id': ObjectId('5c504c4b64888406f3e1bc15'),
 'box_office_Mdol': 677.9,
 u'country': u'USA',
 u'duration_min': 142,
 u'lang': u'English',
 u'persons': [{u'born': 1956,
   u'country': u'USA',
   u'name': u'Tom Hanks',
   u'relation': u'ACTED_IN',
   u'role': u'Forrest Gump'},
  {u'born': 1955,
   u'country': u'USA',
   u'name': u'Gary Sinise',
   u'relation': u'ACTED_IN',
   u'role': u'Lieutenant Dan Taylor'},
  {u'born': 1952,
   u'country': u'USA',
   u'name': u'Robert Zemeckis',
   u'relation': u'DIRECTED'}],
 u'released': 1994,
 u'title': u'Forrest Gump'}

### Updating commands:

The `update()` method replaces the whole document so be careful! If instead we want to modify specific fields of the document we can use MongoDB's update operators like `set`, `inc`, `push`, `pull` and many [more](https://docs.mongodb.org/manual/reference/operator/update/) together with `replace_one()`, `update_one()` or `update_many()` methods.

**Update operator `set`:**

This statement updates in the document in collection where field matches value1 by replacing the value of the field field1 with value2. This operator will add the specified field or fields if they do not exist in this document or replace the existing value of the specified field(s) if they already exist.

In [67]:
collection.update_one({"title": "Taxi"}, {"$set":{"box_office_Mdol": 10, "continuation": None}})
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c5073cf64888406f3e1bca4'),
 u'box_office_Mdol': 10,
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'lang': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'}],
 u'released': 1998,
 u'title': u'Taxi'}

By default MongoDB only modifies the first document that matches the query. If you want to modify all documents that match the query add `multi=True`.

**Update operator `inc`:**

The `inc` operator increments a value by a specified amount if field is present in the document. If the field does not exist, `inc` sets field to the number value.

In [68]:
collection.update_one({"title": "Taxi"}, {"$inc":{"box_office_Mdol": 100}})
# Look at how the value of "box_office_Mdol" changed
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c5073cf64888406f3e1bca4'),
 u'box_office_Mdol': 110,
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'lang': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'}],
 u'released': 1998,
 u'title': u'Taxi'}

> ### Exercise 1.2:

> Using `set` and `inc` operators increase in one action the "The matrix" duration by 25% and change the `"continuation"` value to "of course" and add new field `"parts"` with value 3. 

In [87]:
data = ''.join([
        'x\x9c\xbdTMo\xda@\x10\xfd+{\xe3\x12E@\xcc\x87{K\xf9\n-\xa0\n\xdcK#\x84\xc6f\x82W\xd8\xb3h\xbc[\x81\xaa\xfc\xf7\xaeM', 
        '\xd2j\xb1\x83@\xa8\xbd\xadf\xbc\xfb\xde\x9b\xf7\xc6\xcf\xbfL-\x01\xda\xd4>\tS\x1b\xd0&\x91Y\\\xbb\xb3\xe7\x1dr\xa6(',
        '\xb3\xf5g\xfbI\xa8\x98\xec\xb1\xe1\xb7\xday\x93U\x82\xc5\x8d\xa1b\xc6L\x8b\x91Iw\xc55\x82\xf4\xd8\tT*\x9e\x80\xb6YQfL', 
        '@KEE\xeb\xb1\x17\x0c\xfa\xab\xf1\xac\xe8D\xca\x90\xe6C\xd1\xf8\xbex\xac\xbd\xde\t\x07\xaf\xe5\xe0M$\x1a\x8d\x04\xa4E',
        '\x1fH\x04pH\x14\xbb\xc0#\xe0\x83XH\x92\x19\xde\x08\xdd,]\xef\x8f\xe7\x83\xfc\x05\x17q\xaeBd-~`\x8a\xd1Vf\xd5o/\xab', 
        '\xaayMK\xfd\xd10-6B\x86\xeb\x82\x8e\xef\xe5\xa5\xb5\xe1\x82\xce*\x95\x05K\xafyd]\xe9\xe1\xdf\xa7\x83\x18\xc5\x88',
        '\x11ILe\x82\x95\x0c\xf3Z\xa8\xf6+\xf5\xf2"#\\M\xd7*\xb1\xbd\xa6_\xbf\xef\x94\xb9\xf8\x97G\xe4\x1b\x98D\x0c\xd6\x1b',
        '\x8cT\x1a\xfe\x8f\x8c|6\xd6\x8c\'HSK/\x95:\xfeW\xf1\xe88\xa8_TL\xa2g\x87\x87\x07\x17p*\xa3\x180\x11\xbd\x04x\x8b\xa2o(',
        '\x02\xba\x11\xda\xbf0\x99C\xb6\xd3\xb5\x9b\xc2\x10*\xd2g\x82y\x1a\xabn\xf7\xa2X\x8d)\xc3]\xc1\xe1\xf2Du\x9b\xad\xfb',
        '\xd6I\xa2\x9a\xf5F\xfdL\xa2:\x9e\xa3j\x82\x8a\x80\xd7J\xf4e\x0fv,\xd5\x15\xc2\xbcSaCF\x8aJ\xeb\x02{\xf9\xf6(iI\xe6}',
        '\xce3EX\xad\xaaqV@\xbb\xe1\x08X@z\x103\x88\x90e\x89znY\x84\x15\xff\x8b\xb7F\x85\xa8n\xbb\xbc\xa0\x97\xf9\x97\xff\x16',
        '\xa6\xa0Y\xee\xaf0\xd0k?\x1c\r\xfcH\xac\xe7,\xc6\x0c\x95\x1b\xca\xaf\x08d\xc4\x1c\xf1\'\xde\xb8\xf8\xc7\xb1\xfeA\x9a*',
        '\xde\xc5h2\x17n\x02&\xb7\x18\xc5\xd0\xea\x0f\r\xd3\xd5{\xbf\xacHB\xc0\x06\xab\xf2\xf5P\xe1\x85\xff\xba\xfc\r{\x89T\x8d'
    ])
data = eval(data.decode('zip'))
db.drop_collection('movies')
collection = db.movies
collection.insert_many(data)

# type your code here
collection.update_one({"title": "The Matrix"},
                    {"$set": {"continuation": "of course"},
                     "$inc": {"duration_min": 136*0.25},
                     "$set":{"parts":3}}, upsert=True)
matrix = collection.find_one({"title": "The Matrix"})
#matrix.update({"parts":3})
matrix

{u'_id': ObjectId('5c507de664888406f3e1bce6'),
 u'box_office_Mdol': 463.5,
 u'continuation': True,
 u'country': u'USA',
 u'duration_min': 170.0,
 u'lang': u'English',
 u'parts': 3,
 u'persons': [{u'born': 1964,
   u'country': u'USA',
   u'name': u'Keanu Reeves',
   u'relation': u'ACTED_IN',
   u'role': u'Neo'},
  {u'born': 1961,
   u'country': u'USA',
   u'name': u'Laurence Fishburne',
   u'relation': u'ACTED_IN',
   u'role': u'Morpheus'}],
 u'released': 1999,
 u'title': u'The Matrix'}

In [88]:
results = []
for i in collection.find():
    del(i['_id'])
    results.append(i)
    
Test.assertEqualsHashed(results, '3ac6069c33b1a1129ce1b48c8e0733bb85ae7a32','Incorrect query', "Exercise 1.2 is successful")

1 test failed. Incorrect query


**Update operator `unset`:**

The `unset` operator deletes a particular field. If documents match the initial query but do not have the field specified in the unset operation, there the statement has no effect on the document.

In [89]:
collection.update_one({"title": "Taxi"}, {"$unset":{"box_office_Mdol": ""}})
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c507de664888406f3e1bce5'),
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'lang': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'}],
 u'released': 1998,
 u'title': u'Taxi'}

**Update operator `rename`:**

The `rename` operator updates the name of a field. The new field name must differ from the existing field name.

In [90]:
collection.update_one({"title": "Taxi"}, {"$rename":{"lang": "language"}})
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c507de664888406f3e1bce5'),
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'language': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'}],
 u'released': 1998,
 u'title': u'Taxi'}

**Update operator `push`:**

The `push` operator appends a specified value to an array. Be aware of the following behaviors:

* If the field specified in the push statement (e.g. `{$push: {field: value1}}`) does not exist in the matched document, the operation adds a new array with the specified field and value (e.g. `value1`) to the matched document.

* The operation will fail if the field specified in the push statement is not an array. `$push` does not fail when pushing a value to a non-existent field.

* If value1 is an array itself, push appends the whole array as an element in the identified array. To add multiple items to an array, use `pushAll`.

In [91]:
# The next command will generate an error because "language" field is not an array
collection.update_one({"title": "Taxi"}, {"$push":{"language": "Hindi"}})
collection.find_one({"title": "Taxi"})

WriteError: The field 'language' must be an array but is of type String in document {_id: ObjectId('5c507de664888406f3e1bce5')}

In [92]:
collection.update_one({"title": "Taxi"}, {"$push":{"persons": {'name': 'Frédéric Diefenthal', 'born': 1968, 'country': 'France'} }})
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c507de664888406f3e1bce5'),
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'language': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'},
  {u'born': 1968,
   u'country': u'France',
   u'name': u'Fr\xe9d\xe9ric Diefenthal'}],
 u'released': 1998,
 u'title': u'Taxi'}

**Update operator `pop`:**

The `pop` operator removes the first or last element of an array. Pass `pop` a value of 1 to remove the last element in an array and a value of -1 to remove the first element of an array. Be aware of the following pop behaviors:

* The `pop` operation fails if field is not an array.
* `pop` will successfully remove the last item in an array. field will then hold an empty array.

In [93]:
# Let's create a new array field in the document for "Taxi" film 
import random 

collection.update_one({"title": "Taxi"}, {"$set":{"array": 
                                                  [{"item1": random.randint(0,10), "item2": random.choice('abcdef')} 
                                                   for i in range(5)]
                                                }
                                         }
                     )
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c507de664888406f3e1bce5'),
 u'array': [{u'item1': 4, u'item2': u'e'},
  {u'item1': 4, u'item2': u'b'},
  {u'item1': 1, u'item2': u'e'},
  {u'item1': 6, u'item2': u'a'},
  {u'item1': 8, u'item2': u'b'}],
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'language': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'},
  {u'born': 1968,
   u'country': u'France',
   u'name': u'Fr\xe9d\xe9ric Diefenthal'}],
 u'released': 1998,
 u'title': u'Taxi'}

In [94]:
collection.update_one({"title": "Taxi"}, {"$pop":{"array": 1}})
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c507de664888406f3e1bce5'),
 u'array': [{u'item1': 4, u'item2': u'e'},
  {u'item1': 4, u'item2': u'b'},
  {u'item1': 1, u'item2': u'e'},
  {u'item1': 6, u'item2': u'a'}],
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'language': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'},
  {u'born': 1968,
   u'country': u'France',
   u'name': u'Fr\xe9d\xe9ric Diefenthal'}],
 u'released': 1998,
 u'title': u'Taxi'}

**Update operator `pull`:**

The `pull` operator removes all instances of a value from an existing array. If the value existed multiple times in the field array, `pull` would remove all instances of this value in this array. It is very handy when you exactly what value you want to remove.

In [95]:
collection.update_one({"title": "Taxi"}, {"$set":{"episodes": ["Taxi", "Taxi 2", "Taxi 3", "Taxi 4"] }})
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c507de664888406f3e1bce5'),
 u'array': [{u'item1': 4, u'item2': u'e'},
  {u'item1': 4, u'item2': u'b'},
  {u'item1': 1, u'item2': u'e'},
  {u'item1': 6, u'item2': u'a'}],
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'episodes': [u'Taxi', u'Taxi 2', u'Taxi 3', u'Taxi 4'],
 u'language': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'},
  {u'born': 1968,
   u'country': u'France',
   u'name': u'Fr\xe9d\xe9ric Diefenthal'}],
 u'released': 1998,
 u'title': u'Taxi'}

In [97]:
collection.update_one({"title": "Taxi"}, {"$pull":{"episodes": "Taxi 3"}})
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c507de664888406f3e1bce5'),
 u'array': [{u'item1': 4, u'item2': u'e'},
  {u'item1': 4, u'item2': u'b'},
  {u'item1': 1, u'item2': u'e'},
  {u'item1': 6, u'item2': u'a'}],
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'episodes': [u'Taxi', u'Taxi 2', u'Taxi 4'],
 u'language': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'},
  {u'born': 1968,
   u'country': u'France',
   u'name': u'Fr\xe9d\xe9ric Diefenthal'}],
 u'released': 1998,
 u'title': u'Taxi'}

**Update operator `addToSet`:**

The `addToSet` operator adds a value to an array only if the value is not in the array already. If the value is in the array, addToSet returns without modifying the array. Otherwise, `addToSet` behaves the same as push.

In [98]:
collection.update_one({"title": "Taxi"}, {"$addToSet":{"episodes": "Taxi 2"}})
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c507de664888406f3e1bce5'),
 u'array': [{u'item1': 4, u'item2': u'e'},
  {u'item1': 4, u'item2': u'b'},
  {u'item1': 1, u'item2': u'e'},
  {u'item1': 6, u'item2': u'a'}],
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'episodes': [u'Taxi', u'Taxi 2', u'Taxi 4'],
 u'language': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'},
  {u'born': 1968,
   u'country': u'France',
   u'name': u'Fr\xe9d\xe9ric Diefenthal'}],
 u'released': 1998,
 u'title': u'Taxi'}

In [99]:
collection.update_one({"title": "Taxi"}, {"$addToSet":{"episodes": "Taxi 3"}})
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c507de664888406f3e1bce5'),
 u'array': [{u'item1': 4, u'item2': u'e'},
  {u'item1': 4, u'item2': u'b'},
  {u'item1': 1, u'item2': u'e'},
  {u'item1': 6, u'item2': u'a'}],
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'episodes': [u'Taxi', u'Taxi 2', u'Taxi 4', u'Taxi 3'],
 u'language': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'},
  {u'born': 1968,
   u'country': u'France',
   u'name': u'Fr\xe9d\xe9ric Diefenthal'}],
 u'released': 1998,
 u'title': u'Taxi'}

**Update operator $:**

The positional $ operator identifies an element in an array field to update without explicitly specifying the position of the element in the array. The positional operator, when used with the `update()` method and acts as a placeholder for the first match of the update query selector.

In [100]:
# Update "Taxi 2" to "Taxi 5"
collection.update_one({"title": "Taxi", "episodes": "Taxi 2"}, {"$set":{"episodes.$": "Taxi 5"}})
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c507de664888406f3e1bce5'),
 u'array': [{u'item1': 4, u'item2': u'e'},
  {u'item1': 4, u'item2': u'b'},
  {u'item1': 1, u'item2': u'e'},
  {u'item1': 6, u'item2': u'a'}],
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'episodes': [u'Taxi', u'Taxi 5', u'Taxi 4', u'Taxi 3'],
 u'language': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'},
  {u'born': 1968,
   u'country': u'France',
   u'name': u'Fr\xe9d\xe9ric Diefenthal'}],
 u'released': 1998,
 u'title': u'Taxi'}

In [101]:
# Use the positional $ operator to update the value of the "item2" field to zero 
# in the embedded document with the "item1" less than 9:
collection.update_one({"title": "Taxi", "array.item1": {"$lt": 9}}, {"$set":{"array.$.item2": 0}})
collection.find_one({"title": "Taxi"})
# As you may see only one value was updated

{u'_id': ObjectId('5c507de664888406f3e1bce5'),
 u'array': [{u'item1': 4, u'item2': 0},
  {u'item1': 4, u'item2': u'b'},
  {u'item1': 1, u'item2': u'e'},
  {u'item1': 6, u'item2': u'a'}],
 u'continuation': None,
 u'country': u'France',
 u'duration_min': 86,
 u'episodes': [u'Taxi', u'Taxi 5', u'Taxi 4', u'Taxi 3'],
 u'language': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'},
  {u'born': 1968,
   u'country': u'France',
   u'name': u'Fr\xe9d\xe9ric Diefenthal'}],
 u'released': 1998,
 u'title': u'Taxi'}

Method `replace_one()` replaces a single document. We may use it to update the "Taxi" document to its start form.

In [102]:
collection.replace_one(
    {"title": "Taxi"},
    {
        "title": "Taxi", 
        "released": 1998,
        "duration_min": 86,
        "country": "France",
        "lang": "French",
        "continuation": True,
        "persons": [
            {
                "name": "Samy Naceri",
                "born": 1961,
                "country": "France"
            }
        ],
        "episodes": ["Taxi", "Taxi 2", "Taxi 3", "Taxi 4"],
    }
)
collection.find_one({"title": "Taxi"})

{u'_id': ObjectId('5c507de664888406f3e1bce5'),
 u'continuation': True,
 u'country': u'France',
 u'duration_min': 86,
 u'episodes': [u'Taxi', u'Taxi 2', u'Taxi 3', u'Taxi 4'],
 u'lang': u'French',
 u'persons': [{u'born': 1961, u'country': u'France', u'name': u'Samy Naceri'}],
 u'released': 1998,
 u'title': u'Taxi'}

> ### Exercise 1.3:

> Using above operators and Python syntax find all movies where Tom Hanks was acted in and add the list of these movies titles as a new field `"films"` of the `"person"` field with Tom Hanks data.  

In [121]:
data = ''.join([
        'x\x9c\xbdT\xdb\x8e\xda@\x0c\xfd\x95y\xe3e\x85\xb8\x84[\xdf\xb6\xdc\x96\x16P\x05\xf4\xa5+\x84\x9c\xc4\x90\x11\x89',
        "\'rf*P\xb5\xff\xdedhE\x87\xcb\x16\x84\xda7\xcb\x9e\xf8\xd8\xe7\x1c\xe7\xf5\x87)\xc5@\x9b\xd2\x07aJ}\xda\xc42\x8bJOy",
        '\xac\xa5\x8e\xd1f\x07\x8a\x193-\x86&Im)P\x864\xefm\xf1\xeb\xfc\xd9\xe6R\xe4LQ\x96\xe7^\xf3\x8e\xbeb\xca\xc3j\xa7\xd1,',
        '\x8a\xac\xae\xb5"H\x0e\x95\x85J\xc4\x0b\xd06\xb3i\xc6\x18\xb4TdK\xcf\xddE\xbf\xb7\x1aM/b\xbf=\t\x07\xaf\xe1\xe0\x8d%',
        '\x1a\x8d\x04\xa4E\x0fH,`\x1f+v\x81\x87\xc0{1\x97$3|\x10\xbav\xf6yo4\xeb\x17\x1d\\\xc4\x99\xf2\x91\xb5\xf8\x86\t\x06[',
        '\x99]\xee\xbd,\xb2\xa1a\xdbl\x95H\x8b\xe1\xfd\x86@\xc80\xb4\xa8\x1d\xef0\xc6_4\\D(\x86\x8cHb"c\xbc\xaa\xa2\xafv+\xb5^',
        '\xcb\x00W\x93P\xc5y\xad\xd6\xa9\x94[\xe7\xa8\x9d\xdb5\xff\x02&\x16\xfdp\x83\x81J\xfc\xff!\xfaG\x93\xb3\xfb\x02I\x92',
        '\x8f\x97H\x1d\xfd+\xbd[\x0e\xea\'\x15\x91\xe8\xe6\xe4\xe1\xde\x05\x9c\xc8 \x02\x8cE7\x06\xde\xa2\xe8\x19\n\x80\x1e\x84',
        '\xee\xdch\xb5\x01\xe7\xec\xe6\xd6g\xf0\x15\xe9;\x9c\xd6n\xdfd\xab\x11e\x98\xda\x19nwT\xbb\xd6(7N\x1cU\xabT+\xef8\xaa',
        '\xe59[\x8dQ\x11p\xa8DOv!e\xa9\xee9\xa1\xdb\x16+\xeee\x02\x9a\xe5\xee\x8e\xcd\xbcf\xfd\xb0\xd9\x955\x9a\x9e\xe3\x98)*W',
        '\xad\xcf\x08d\xc4\x0c\xf1;>x\x11\xcd\xaa\x834Q\x9cFh2\x17n\x0c\x86\x91\x02\x14\x83|\x7f\xdf0\xdd}\x10\x96\xe1\x14X\x17',
        '\xab\xd6\x0fOHK2\xc7\x06j-\xf2\xcf\xf8\xd7\xb1\x9d\xaa\xd1\xaa\x94+\x17~.\'\x12\r\x8a1\xcf\x14\x82\x9d<\x9b\xaa\xf0{p',
        '\x80\xc2Tf*D+\xc2\x1f\xaf\x8b@\xd4\x8ea\xfd\x18z\xa5\xe5\xf9,\xed\xf7\xf4\xac:|\xce!\xd9\x8b)\x04\xc8W\x07;0vB\xd2\x82',
        '\r^ \xa7\xdd|[\xfe\x04\xcd\x0ed,'
    ])
data = eval(data.decode('zip'))
db.drop_collection('movies')
collection = db.movies
collection.insert_many(data)

<pymongo.results.InsertManyResult at 0x7f6bd45b9e60>

In [129]:
# type your code here
actor = collection.find({"person":{"name":"Tom Hanks"}})
list1 = actor
list1 = list(collection.aggregate([
            {"$match": {"persons.name": "Tom Hanks"}}
        ]))
#collection = db.persons
actor = collection.find({"name":"Tom Hanks"})
collection.update_one({"persons.name":"Tom Hanks"}, {"$set":{"films":list1}})
#collection = db.movies
print actor

<pymongo.cursor.Cursor object at 0x7f6bd4657450>


In [127]:
results = []
for i in collection.find():
    del(i['_id'])
    results.append(i)
    
Test.assertEqualsHashed(results, '0c7cd8639a01f99485544fc3fb0a5909cccc61a5','Incorrect query', "Exercise 1.3 is successful")

1 test failed. Incorrect query


### Aggregation:

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. 

`aggregate()` method performs an aggregation using the aggregation framework on this collection. The `aggregate()` method accepts as its argument an array of stages, where each stage, processed sequentially, describes a data processing step.

Following are the basic pipeline operators and let us make use of these operators over the sample data which we created. We are not going to discuss about Map-Reduce in this post.

* `$match`: this is similar to `find_one()` or `find_many()` methods and SQL's `WHERE` clause; basically this filters the data which is passed on to the next operator. There can be multiple `$match` operators in the pipeline.
* `$unwind`: this is used to unwind document that are using arrays; when using an array the data is kind of pre-joinded and this operation will be undone with this to have individual documents again. 
* `$group`: the group pipeline operator is similar to the SQL's `GROUP BY` clause; is equivalent to `group()` method. The full list of Group Accumulator Operators you can find [here](https://docs.mongodb.org/manual/reference/operator/aggregation/group/).
* `$skip`: with this it is possible to skip forward in the list of documents for a given amount of documents; is equivalent to `skip()` method mentioned above.
* `$limit`: this limits the amount of documents to look at by the given number starting from the current position; is equivalent to `limit()` method.
* `$sort`: sorts the documents; is equivalent to `sort()` method.
* `$project`: used to select some specific fields from a collection.

In [122]:
list(collection.aggregate([{"$match": {"title": "Taxi"}}]))
# The same as collection.find_one({"title": "Taxi"})

[{u'_id': ObjectId('5c5090d364888406f3e1bcf0'),
  u'continuation': True,
  u'country': u'France',
  u'duration_min': 86,
  u'episodes': [u'Taxi', u'Taxi 2', u'Taxi 3', u'Taxi 4'],
  u'lang': u'French',
  u'persons': [{u'born': 1961,
    u'country': u'France',
    u'name': u'Samy Naceri'}],
  u'released': 1998,
  u'title': u'Taxi'}]

In [123]:
# Find all movies on English where Tom Hanks was acked in
list(collection.aggregate([
            {"$match": {"lang": "English"}},
            {"$match": {"persons.name": "Tom Hanks"}}
        ]))

[{u'_id': ObjectId('5c5090d364888406f3e1bcec'),
  u'country': u'USA',
  u'duration_min': 142,
  u'lang': u'English',
  u'persons': [{u'born': 1956,
    u'country': u'USA',
    u'name': u'Tom Hanks',
    u'relation': u'ACTED_IN',
    u'role': u'Forrest Gump'},
   {u'born': 1955,
    u'country': u'USA',
    u'name': u'Gary Sinise',
    u'relation': u'ACTED_IN',
    u'role': u'Lieutenant Dan Taylor'},
   {u'born': 1952,
    u'country': u'USA',
    u'name': u'Robert Zemeckis',
    u'relation': u'DIRECTED'}],
  u'released': 1994,
  u'title': u'Forrest Gump'},
 {u'_id': ObjectId('5c5090d364888406f3e1bced'),
  u'box_office_Mdol': 290.7,
  u'country': u'USA',
  u'duration_min': 188,
  u'lang': u'English',
  u'persons': [{u'born': 1956,
    u'country': u'USA',
    u'name': u'Tom Hanks',
    u'relation': u'ACTED_IN',
    u'role': u'Paul Edgecomb'},
   {u'born': 1955,
    u'country': u'USA',
    u'name': u'Gary Sinise',
    u'relation': u'ACTED_IN',
    u'role': u'Burt Hammersmith'},
   {u'born':

In [124]:
# Calculate movies amount with different languages using $group
list(collection.aggregate([
            {"$group": {"_id": "$lang", "count": {"$sum": 1}, "titles": {"$push": "$title"}}}
        ]))

[{u'_id': u'French', u'count': 1, u'titles': [u'Taxi']},
 {u'_id': u'English',
  u'count': 4,
  u'titles': [u'Forrest Gump',
   u'The Green Mile',
   u'Inseption',
   u'The Matrix']}]

In [125]:
# Include only "title" and "name" of "persons" and sort them by  
list(collection.aggregate([
            {"$project": {"title": 1, "persons.name": 1, "released": 1, "size": {"$size": "$persons"}}},
            {"$sort": {"released": -1,   # descending order
                       "size": 1}},      # ascending order
            {"$limit": 4}
        ]))

[{u'_id': ObjectId('5c5090d364888406f3e1bcee'),
  u'persons': [{u'name': u'Leonardo DiCaprio'}],
  u'released': 2010,
  u'size': 1,
  u'title': u'Inseption'},
 {u'_id': ObjectId('5c5090d364888406f3e1bcef'),
  u'persons': [{u'name': u'Keanu Reeves'}, {u'name': u'Laurence Fishburne'}],
  u'released': 1999,
  u'size': 2,
  u'title': u'The Matrix'},
 {u'_id': ObjectId('5c5090d364888406f3e1bced'),
  u'persons': [{u'name': u'Tom Hanks'},
   {u'name': u'Gary Sinise'},
   {u'name': u'Michael Clarke Duncan'},
   {u'name': u'Frank Darabont'}],
  u'released': 1999,
  u'size': 4,
  u'title': u'The Green Mile'},
 {u'_id': ObjectId('5c5090d364888406f3e1bcf0'),
  u'persons': [{u'name': u'Samy Naceri'}],
  u'released': 1998,
  u'size': 1,
  u'title': u'Taxi'}]

> ### Exercise 1.4:

> Calculate the average born year of all persons only for those movies where there are at least two person set. Display results in descending order by the released year of the movie. You need write result to a Python list `results` with dictionaries containing keys `"title"`, `"released"` and `"born_avg"`.  

In [133]:
data = ''.join([
        'x\x9c\xbdUM\x8f\xda0\x10\xfd+\xbe\xe5\xb2B\x10\xc2Wo[\xbe\x96\x16P\xb5\xd0KW\x08\x99d \x16\xc98\x9a\xd8\x15h\xc5\x7f',
        '\xafc\x104!T\xa0\xd5\xf6\xe6\x8cg\xe6\xcd\x9by\xe3\xbc\xbdk\'\xe2\xb8q\xbe0\xed\xf4q\x13\x894t\x9e\xcc9\x01J%\xa6\xc6',
        '\xfef\\\x90\xc7`]\xe62f/\x1c\xb7\xa9u\xf2\xa5FE{{\xf3s\xf6lm+Ih\x0c\xb5N\xa3\x99}\x92\x8c\x8e\x91\x03I\x04\xa9bC\x1d',
        "\'\xd6q-\xa2\xd8\xe6/\xb9\x9b\x87\xc0\x86\x04" + '\x80l"L\xfc\xc2f\x82\x88+!\xd1f{\xee\xce\xfb\xbd\xe5h\xea\x1c\x9e',
        '\xd8\xfb\xdf\xa0\x8d\x1c\xe8X\x80V\x80\x1c\x15\xebqds\xbe\x8f$Y\x843\xa3!\xa7=\x9b\t\x14)8\xb7a\xca\xd8\x16\xa1\xdd',
        '\xab\xf0\xde\xe8\xb5\x9fe\xc8#\xbe\xca\x15\x90b\xbf \x06\x7f+\xca;yX\xdc\xea\xaf\x12\xeaVG\r6\xf0\x14\x02[N\xc7\xcbL',
        '\x81&[\xce2\x16\xb6J\xcf=V]:\xf3K\xeaB\xffo\xcfz\xb7\x94\xeb\xb5\xf0a9\tdd\xee\xdcN\xb5\xd2\xba\xae\xa5\xf3)\x92\xfa',
        '\xc1u\xc4\xfa\xc1\x06|\x19\xaf\xfe\x9b\xa6\xbej3\xbc\x17\x1e\xc7\x86N,T\xf8Yrj\xe5P\xbf\xc9\x10Y\xd74\x1b\xf6y\xc0\x89',
        '\xf0C\x0e\x11\xebF\x9c\xb6\xc0z\x1a}\x8e\x1f\x84\xee\xdc\xa9\xe4\x01\x99\xb9\x99\xcd"\xbe\x92\xa8\xfe!\xe4\xa2\x0c\xdb',
        '\xed\xbbd8\xc2\x14\x12[\xc3\xfd\nl\xbb\x8dJ\xa3\xa0@\xb7Z\xab^+\xf0\xcc\xb7\xe5\xe5X\x8dA"\xa7@\xb2\x9e\xe8\xf2\x84',
        '\x84|\x80\x98w\x1f\xb1L\x8b\x13\xaeH\xec\x1e`\xe65\xebGf7h4\xbd\x9cb\xa6 \xf3\xd3\xfa\x0e\x1c5{\x05\xf8\r\xe9\xc7\x04',
        '\xd2\xac\xe5\x90&\x92\x92\x10t\x9a\x87\x1bsM\x80>\xb0\x81\xe1\xbf\xd2\x84\x0f/\x84\xedp\xc2IeT\xebG\x17T\x02\xf5%\x81',
        '\\3\x13F\xa7e+N\xa3U\xadTK\x1e\xa3\xc2\x88\x06Y\x99e\x7f\xbd\x02\xdf3\xb3\x19\x8f\xf7l\xca} qUx\xb6\x12>\x94\xbc\xdf',
        '\xa7\x8b\x82\x0e\xf8N\x94<\xdd\xed\xcc\x04\x89He\x00\xa7\xe7\xec\xec\x99\x1d\x98{9\xd6/G\xcfY\x944iN\x1aJ\x9a\xd3n',
        '\x1e\x16\x7f\x00\xc8}\x836'
    ])
data = eval(data.decode('zip'))
db.drop_collection('movies')
collection = db.movies
collection.insert_many(data)

<pymongo.results.InsertManyResult at 0x7f6bd45b9f50>

In [None]:
# results = ...

In [None]:
#TODO check this answer hash
Test.assertEqualsHashed(results, 'fec785dfaa874101ef918e84466e309a28d3dfa4','Incorrect query', "Exercise 1.4 is successful")

### Some helpfull commands:

1\) If you need to get some statistics about your databases.

In [130]:
db.command({'dbstats': 1})

{u'avgObjSize': 1527.111111111111,
 u'collections': 3,
 u'dataFileVersion': {u'major': 4, u'minor': 5},
 u'dataSize': 13744,
 u'db': u'cinema',
 u'extentFreeList': {u'num': 0, u'totalSize': 0},
 u'fileSize': 67108864,
 u'indexSize': 8176,
 u'indexes': 1,
 u'nsSizeMB': 16,
 u'numExtents': 4,
 u'objects': 9,
 u'ok': 1.0,
 u'storageSize': 90112}

2\) To get collection statistics use the collstats command:

In [131]:
db.command({'collstats': 'movies'})

{u'avgObjSize': 2697,
 u'count': 5,
 u'indexSizes': {u'_id_': 8176},
 u'lastExtentSize': 65536,
 u'nindexes': 1,
 u'ns': u'cinema.movies',
 u'numExtents': 2,
 u'ok': 1.0,
 u'paddingFactor': 1.004,
 u'size': 13488,
 u'storageSize': 73728,
 u'systemFlags': 1,
 u'totalIndexSize': 8176,
 u'userFlags': 1}

3\) To copy a database within a single mongod process, or between mongod servers, simply connect to the target mongod and use the `command()` method:

    client.admin.command('copydb',
                         fromdb='source_db_name',
                         todb='target_db_name')

4\) To copy from a different mongod server that is not password-protected:

    client.admin.command('copydb',
                         fromdb='source_db_name',
                         todb='target_db_name',
                         fromhost='source.example.com')

5\) Indexes support the efficient resolution of queries. Without indexes, MongoDB must scan every document of a collection to select those documents that match the query statement. This scan is highly inefficient and require the mongodb to process a large volume of data.

Indexes are special data structures, that store a small portion of the data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field as specified in index.

Use the `create_index()` method to create an index on a collection. Indexes can support the efficient execution of queries. MongoDB automatically creates an index on the `_id` field upon the creation of a collection.

In [134]:
collection.create_index([("films", pymongo.ASCENDING)])

u'films_1'

6\) `mongodump` is a utility for creating a binary export of the contents of a database. 

To dump your database for backup you call one of these commands on your terminal

    mongodump --db <database_name> --collection <collection_name>

This command will make a dump of given database in JSON and BSON formats. To import your backup file to mongodb you can use the following command on your terminal

    mongorestore --db <database_name> <path_to_bson_file>

You can also use gzip for taking backup of one collection and compressing the backup on the fly:

    mongodump --db <database_name> --collection <collection_name> --out - | gzip > <dump_name>.gz

or with a date in the file name:

    mongodump --db <database_name> --collection <collection_name> --out - | gzip > dump_`date "+%Y-%m-%d"`.gz

> ### Exercise 1.5:

> Based on the material learned in the lesson devoted to the Twitter API you need to do the following tasks: 

> **1\.** Create a new MongoDB database "twitter\_db" with two new collections "tweets" and "users". The collection "tweets" will contain data about last appeared `N = 2500` tweets with hashtags _"BigData"_ and _"DataScience"_. The collection "users" will contain data about users which created the tweet, i.e. authors. 

> **2\.** Using `requests` URL queries or `tweepy`'s `search()` method collect last appeared `N` tweets with hashtags _"BigData"_ and _"DataScience"_. It may be easily done as follows:

>    `tweets = []`<br></br>
>    `last_id = 0`<br></br>

>    `while len(tweets) < N:`<br></br>
>    <span style="margin-left:2em"></span>`response = api.search(q=['BigData', 'DataScience'], since_id=last_id, count=100)`<br></br>
>    <span style="margin-left:2em"></span>`last_id = str(results[-1].id)`<br></br>
>    <span style="margin-left:2em"></span>`tweets.extend(response)`

> where `api = tweepy.API(auth)` as written in the previous lesson **Lesson 8.1 - Work with Twitter API in Python.ipynb**. If `twwets` list contains over `N` records, take the first `N` of them. Try go get this result using `requests` library.

> **3\.** Collection "tweets" should contain the following fields from available tweet fields:

>    * `'created_at'`;
>    * `'author_id'` (corresponds to `author.id`);
>    * `'author_name'` (corresponds to `author.name`);
>    * `'retweet_count'`;
>    * `'id'`;
>    * `'lang'`;
>    * `'source'`;
>    * `'text'`.

> Necessary fields in the collection "users":

>    * `'created_at'`;
>    * `'id'` (means user's ID);
>    * `'name'`;
>    * `'description'`;
>    * `'followers_count'`;
>    * `'friends_count'`;
>    * `'lang'`;
>    * `'profile_image_url'`;
>    * `'location'`;
>    * `'time_zone'`;
>    * `'tweets'` (is an array of tweets ids from "tweets" colection).

> On this step you need fill both collection with respect data. Pay your attention, one user could create more than one tweet from "tweets" collection. IDs of all these tweets should be written to `'tweets'` field for the respective user.

> **4\.** Create a new collection `"bigdata_tweets_date1_date2"` with tweets that contain only "#BigData" hashtag, are written in English, where not retweeted and was created during the last hour. `date1` corresponds the full date in format `'%Y_%m_%d_%H_%M_%S'` of the first created tweet and `date2` is the full date of last created tweet in the obtained `"bigdata_tweets_date1_date2"` collection. <br></br>
> _**Hint:**_ The value of `'created_at'` field has the form and type of datetime.datetime object. 

> **5\.** Find TOP 5 tweets (from "tweets" collection) with the largest amount of retweets for each language. If there are a few tweets with the same retweets amount, sort them by `"author_name"` in descending order. Display its text, author name, date of creation and retweets amount. Put result into the Python list `result_5` containing dictionaries of such structure 

> <span style="margin-left: 30px"></span>`{'language': `<br></br>
> <span style="margin-left: 100px"></span>`[`<br></br>
> <span style="margin-left: 110px"></span>`{`<br></br>
> <span style="margin-left: 115px"></span>`'author_name': ...,`<br></br>
> <span style="margin-left: 115px"></span>`'created_at': ...,`<br></br>
> <span style="margin-left: 115px"></span>`'retweet_count': ...,`<br></br>
> <span style="margin-left: 115px"></span>`'text': ....`<br></br>
> <span style="margin-left: 115px"></span>`}, `<br></br>
> <span style="margin-left: 110px"></span>`...`<br></br>
> <span style="margin-left: 105px"></span>`]`<br></br>
> <span style="margin-left: 35px"></span>`}`

> for each unique language.

> **6\.** For each timezone (when it is defined, i.e. is not `None`) find the user with maximal average value of friends and followers who pointed out "en" or "es" or "fr" in the field `"lang"`. Put result into the Python list `result_6` containing dictionaries of such structure

> <span style="margin-left: 30px"></span>`{'timezone name':`<br></br>
> <span style="margin-left: 160px"></span>`{`<br></br>
> <span style="margin-left: 165px"></span>`'name': ...,`<br></br>
> <span style="margin-left: 165px"></span>`'profile_image_url': ...,`<br></br>
> <span style="margin-left: 165px"></span>`'tweets': `<br></br>
> <span style="margin-left: 235px"></span>`[`<br></br>
> <span style="margin-left: 245px"></span>`{`<br></br>
> <span style="margin-left: 250px"></span>`'created_at': ...,`<br></br>
> <span style="margin-left: 250px"></span>`'text': ...`<br></br>
> <span style="margin-left: 250px"></span>`},`<br></br>
> <span style="margin-left: 245px"></span>`...`<br></br>
> <span style="margin-left: 240px"></span>`]`<br></br>
> <span style="margin-left: 165px"></span>`},`<br></br>
> <span style="margin-left: 160px"></span>`...`<br></br>
> <span style="margin-left: 35px"></span>`}`

> for each unique language.

> Display his name, avatar and the list of tweets (text and date of creation of the tweet) from "tweets" collection. <br></br>
> _**Hint:**_ You may display image by url in the following way:

> `In [1]: from IPython.display import HTML`<br></br>
> <span style="margin-left:4.5em"></span>`bg = api.get_user("BillGates")`<br></br>
> <span style="margin-left:4.5em"></span>`print bg.name`<br></br>
> <span style="margin-left:4.5em"></span>`HTML('<img src="' + bg.profile_banner_url + '" width="700">')`<br></br>
> <span style="margin-left:4.5em"></span>`Bill Gates`<br></br>
> `Out [1]:`
> <img src="images/bill_gates.jpg">

### Step 1

In [23]:
import tweepy

# type your code here
consumer_key = 'o7ApzFDk58YKYoCXR7au2Qa8o'
consumer_secret = 'sCruk5UCcQpuyL7NWPizcgfDHeJEG5CEioG43j6iNOmUymJsVa'
access_token = '1437963396-DdboBDabBE3KyUqHd8Zz3sr8MxVpKDwVRJRUjiC'
access_token_secret = 'R4GmIhhjx2doAZFam1EMrgFuxFe5u7zXRBR7dLEgGjqSp'
# Authorization to consumer key and consumer secret 
auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
# Access to user's access key and access secret 
auth.set_access_token(access_token, access_token_secret) 
# Calling api 
api = tweepy.API(auth)
print api
# If the authentication was successful, you should see the name of the account print out
print "My name is", api.me().name

<tweepy.api.API object at 0x7fc30561ac10>
My name is Dmitriy Kisil


In [26]:
client = MongoClient()
twitter_db = client["twitter_db"]
print client.database_names()
tweets = twitter_db.tweets
print tweets
users = twitter_db.users
print users
print twitter_db.collection_names()

[u'local', u'cinema', u'admin', u'twitter_db']
Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'twitter_db'), u'tweets')
Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'twitter_db'), u'users')
[]


### Step 2

In [89]:
# type your code here

tweets_list = []
list1 = []
last_id = 0
while len(tweets_list) < 100:
    response = api.search(q=['BigData', 'DataScience'], count=1)
    #print response.id
    #last_id = str(response[-1].id)
    tweets_list.extend(response)
#print api.rate_limit_status()
print len(tweets_list)
print tweets_list[0].__dict__.keys()
print tweets_list[0].entities.keys()
#print tweets[0].user.keys()
#print tweets[0].author.keys()
print tweets_list[0].user.__dict__.keys()
print tweets_list[0].author.__dict__.keys()
print tweets_list[0].author.id
print tweets_list[0].author.name
print tweets_list[0].user.id
print tweets_list[0].user.name

for i in range(len(tweets_list)):
    list1.append(tweets_list[i].author.name)

print list1[0]

100
['contributors', 'truncated', 'text', 'is_quote_status', 'in_reply_to_status_id', 'id', 'favorite_count', '_api', 'author', '_json', 'coordinates', 'entities', 'in_reply_to_screen_name', 'id_str', 'retweet_count', 'in_reply_to_user_id', 'favorited', 'retweeted_status', 'source_url', 'user', 'geo', 'in_reply_to_user_id_str', 'lang', 'created_at', 'in_reply_to_status_id_str', 'place', 'source', 'retweeted', 'metadata']
[u'symbols', u'user_mentions', u'hashtags', u'urls']
['follow_request_sent', 'has_extended_profile', 'profile_use_background_image', '_json', 'time_zone', 'id', 'description', '_api', 'verified', 'profile_text_color', 'profile_image_url_https', 'profile_sidebar_fill_color', 'is_translator', 'geo_enabled', 'entities', 'followers_count', 'protected', 'id_str', 'default_profile_image', 'listed_count', 'lang', 'utc_offset', 'statuses_count', 'profile_background_color', 'friends_count', 'profile_link_color', 'profile_image_url', 'notifications', 'default_profile', 'profile_

In [90]:
tweets = twitter_db.tweets

users = twitter_db.users

for i in range(len(tweets_list)):
    tweets.insert_one({"created_at": tweets_list[i].created_at, "author_id": tweets_list[i].author.id, 
                       "author_name": list1[i],"retweet_count": tweets_list[i].retweet_count,
                       "id": tweets_list[i].id, "lang": tweets_list[i].lang, "source": tweets_list[i].source,
                       "text": tweets_list[i].text})
    users.insert_one({"created_at": tweets_list[i].user.created_at, "id": tweets_list[i].user.id, "name": tweets_list[i].user.name,
                      "description": tweets_list[i].user.description, "followers_count":tweets_list[i].user.followers_count,
                      "friends_count": tweets_list[i].user.friends_count, "lang": tweets_list[i].user.lang, 
                      "profile_image_url": tweets_list[i].user.profile_image_url, "location": tweets_list[i].user.location,
                      "time_zone": tweets_list[i].user.time_zone, "tweets":tweets_list[i].id})
    


In [91]:
print tweets.find_one()
print users.find_one()
twitter_db.collection_names()

{u'lang': u'en', u'retweet_count': 562, u'text': u'RT @KirkDBorne: One of my all-time favorites &gt;&gt; The Most Complete List of the Best Cheat Sheets for #DataScientists covering #AI #NeuralNet\u2026', u'created_at': datetime.datetime(2019, 1, 30, 14, 2, 37), u'author_name': u'Megg', u'source': u'Twitter Web App', u'author_id': 3075551314L, u'_id': ObjectId('5c51aef4648884071317efbd'), u'id': 1090611241402490880L}
{u'lang': u'en', u'description': u'Sue\xf1a, Siente, Mira', u'friends_count': 8, u'created_at': datetime.datetime(2015, 3, 6, 18, 52, 47), u'time_zone': None, u'profile_image_url': u'http://pbs.twimg.com/profile_images/573922103767994368/CBXjlh4a_normal.jpeg', u'followers_count': 226, u'location': u'', u'tweets': 1090611241402490880L, u'_id': ObjectId('5c51aef4648884071317efbe'), u'id': 3075551314L, u'name': u'Megg'}


[u'system.indexes', u'tweets', u'users']

In [88]:
twitter_db.drop_collection("tweets")
twitter_db.drop_collection("users")
tweets = twitter_db["tweets"]
users = twitter_db["users"]

In [None]:
Collection "tweets" should contain the following fields from available tweet fields:

    'created_at';
    'author_id' (corresponds to author.id);
    'author_name' (corresponds to author.name);
    'retweet_count';
    'id';
    'lang';
    'source';
    'text'.

Necessary fields in the collection "users":

    'created_at';
    'id' (means user's ID);
    'name';
    'description';
    'followers_count';
    'friends_count';
    'lang';
    'profile_image_url';
    'location';
    'time_zone';
    'tweets' (is an array of tweets ids from "tweets" colection).


### Step 3

In [86]:
# type your code here
print api.rate_limit_status()

{u'rate_limit_context': {u'access_token': u'1437963396-DdboBDabBE3KyUqHd8Zz3sr8MxVpKDwVRJRUjiC'}, u'resources': {u'feedback': {u'/feedback/show/:id': {u'reset': 1548857129, u'limit': 180, u'remaining': 180}, u'/feedback/events': {u'reset': 1548857129, u'limit': 1000, u'remaining': 1000}}, u'moments': {u'/moments/statuses/update': {u'reset': 1548857129, u'limit': 5, u'remaining': 5}, u'/moments/permissions': {u'reset': 1548857129, u'limit': 300, u'remaining': 300}}, u'oauth': {u'/oauth/invalidate_token': {u'reset': 1548857129, u'limit': 450, u'remaining': 450}}, u'tweet_prompts': {u'/tweet_prompts/show': {u'reset': 1548857129, u'limit': 180, u'remaining': 180}, u'/tweet_prompts/report_interaction': {u'reset': 1548857129, u'limit': 180, u'remaining': 180}}, u'live_pipeline': {u'/live_pipeline/events': {u'reset': 1548857129, u'limit': 180, u'remaining': 180}}, u'friendships': {u'/friendships/outgoing': {u'reset': 1548857129, u'limit': 15, u'remaining': 15}, u'/friendships/no_retweets/ids'

In [92]:
Test.existCollections(client, '"users" and/or "tweets" collections were not found', "Exercise 1.5.1 is successful")
Test.countRecord('id', client, 'Incorrect records amount in "twitter" collection', "Exercise 1.5.2 is successful")
Test.existField('id', client, 'Incorrect data in MongoDB', "Exercise 1.5.3 is successful")

1 test failed. "users" and/or "tweets" collections were not found
1 test failed. Incorrect records amount in "twitter" collection
1 test passed. Exercise 1.5.3 is successful


### Step 4

In [None]:
# type your code here

In [None]:
Test.bigDataTweets(client, 'Incorrect data in MongoDB', "Exercise 1.5.4 is successful")

### Step 5

In [None]:
# type your code here

# result_5 = ...

In [None]:
Test.top5Tweets(result_5, client, 'Incorrect data in MongoDB', "Exercise 1.5.5 is successful")

### Step 6

In [None]:
# type your code here

# result_6 = ...

In [None]:
Test.timeZoneTweets(result_6, client, 'Incorrect data in MongoDB', "Exercise 1.5.6 is successful")

<center><h3>Presented by <a target="_blank" rel="noopener noreferrer nofollow" href="http://datascience-school.com">datascience-school.com</a></h3></center>