# MongoDB Examples
## Map reduce
Here is an example on how to use map reduce to find how many tweets are unique and how many are retweets. The tweets are downloaded from archive.org. In this example the tweets are from 2020-12-31, 01:50
In archive.org website, tweets for every day of the year are given in form of a zip file. In the zip file the tweets are given in form of .bz2 for every minute.
You can unzip these .bz2 files with following command:
 
<p style="background:black"><code style="background:black;color:white"> $ bzip2 -dk [.bz2 file] </code></p>
    
The output of this command is a json file that contains the tweets in the given minute.
Now we need to add this tweets to the MongoDB. To do that write the following command in the terminal:
  
<p style="background:black"><code style="background:black;color:white"> $ mongoimport --db [database_name] --collection [collection_name] [json file] </code></p>

You can verify that database is added to MongoDB by running the following comment in MongoDB shell:
<p style="background:black"><code style="background:black;color:white"> > show dbs </code></p>
 
Finally, we use the following map reduce script to find out how many of this are retweet. In this example it assumed that the database is called tweets_2020 and the name of the collection is  m12_d31_h01_50 which is json file for tweets at 01:50 o'clock on 2020-12-31.
You need to install pymongo. In case you are using anaconda, do the installation by conda as follows:

<p style="background:black"><code style="background:black;color:white"> conda install -c anaconda pymongo </code></p>


In [2]:
import pandas as pd
import csv
import json
from bson.code import Code
from pymongo import MongoClient
db =  MongoClient().tweets_2020

map = Code("""function () {
        if (!this.hasOwnProperty('retweeted_status')){
            emit('unique', 1);
        }else{
            emit('retweet', 1);
        }
        
   
}""")

reduce = Code("""function (key, values) {
var total = 0;
for (var i = 0; i < values.length; i++) {
total += values[i];
}
return total;
}""")

result = db.m12_d31_h01_50.map_reduce(map, reduce, "myresults")
for doc in result.find():
    print(doc)

{'_id': 'unique', 'value': 2835.0}
{'_id': 'retweet', 'value': 1177.0}
