**Write Python or Java code that**
- creates a MongoDB DB called "amazon"
- reads "reviews_electronics.16.json" and uploads each review as a separate document to the collection "reviews" in the DB "amazon".
- uses MongoDB's map reduce function to build a new collection "avg_scores" that averages review scores by product ("asin"). Print the first 100 entries of "avg_scores" to screen.
- uses MongoDB's map reduce function to build a new collection "weighted_avg_scores" that averages review scores by product ("asin"), weighted by the number of votes + 1 (the second number + 1). Print the first 100 entires of "weighted_avg_scores" to screen.

The format of "reviews_electronics.16.json" is:
- reviewerID - ID of the reviewer, e.g. A2SUAM1J3GNN3B
- asin - ID of the product, e.g. 0000013714
- reviewerName - name of the reviewer
- helpful - helpfulness rating of the review, e.g. 2/3
- reviewText - text of the review
- overall - rating of the product
- summary - summary of the review
- unixReviewTime - time of the review

In [1]:
from pymongo import MongoClient
from bson.code import Code
from bson.son import SON

import json

In [2]:
# Read the first 3 lines of the json file
ii = 1
with open ("reviews_electronics.16.json", "r", encoding = "utf-8") as file:
    for i in file:
        print(i)
        ii += 1
        if ii > 3:
            break

{"reviewerID": "AKM1MP6P0OYPR", "asin": "0132793040", "reviewerName": "Vicki Gibson \"momo4\"", "helpful": [1, 1], "reviewText": "Corey Barker does a great job of explaining Blend Modes in this DVD. All of the Kelby training videos are great but pricey to buy individually. If you really want bang for your buck just subscribe to Kelby Training online.", "overall": 5.0, "summary": "Very thorough", "unixReviewTime": 1365811200, "reviewTime": "04 13, 2013"}

{"reviewerID": "A2X8VX4DPMQFQQ", "asin": "B00E4KP4W6", "reviewerName": "lily68", "helpful": [1, 1], "reviewText": "I can't believe I waited to long to switch to a glass screen protector.  I love this.  It feels and looks like there is no protector on.  It does show fingerprints, which I think is inevitable unless you use a matte finish screen protector, but they wipe right away. I would definitely recommend this! Easier to apply than the films too!", "overall": 5.0, "summary": "LOVE this screen protector!!", "unixReviewTime": 139345920

In [3]:
# Connect to the server
client = MongoClient('localhost', 27017)

# Create the "amazon" database
db = client["amazon"]

# Create the "reviews" collection
collection_reviews = db["reviews"]

In [4]:
# Insert every single review to the database
with open ("reviews_electronics.16.json", "r", encoding = "utf-8") as file:
    for i in file:
        doc = json.loads(i)
        collection_reviews.insert_one(doc)

In [5]:
# Check if the reviews were inserted successfully
cursor = collection_reviews.find({})
ii = 1
for i in cursor:
    print(i)
    ii += 1
    if ii > 3:
        break

{'_id': ObjectId('640901613a4571503de5efc3'), 'reviewerID': 'AKM1MP6P0OYPR', 'asin': '0132793040', 'reviewerName': 'Vicki Gibson "momo4"', 'helpful': [1, 1], 'reviewText': 'Corey Barker does a great job of explaining Blend Modes in this DVD. All of the Kelby training videos are great but pricey to buy individually. If you really want bang for your buck just subscribe to Kelby Training online.', 'overall': 5.0, 'summary': 'Very thorough', 'unixReviewTime': 1365811200, 'reviewTime': '04 13, 2013'}
{'_id': ObjectId('640901613a4571503de5efc4'), 'reviewerID': 'A2X8VX4DPMQFQQ', 'asin': 'B00E4KP4W6', 'reviewerName': 'lily68', 'helpful': [1, 1], 'reviewText': "I can't believe I waited to long to switch to a glass screen protector.  I love this.  It feels and looks like there is no protector on.  It does show fingerprints, which I think is inevitable unless you use a matte finish screen protector, but they wipe right away. I would definitely recommend this! Easier to apply than the films too!",

In [8]:
# Create the new collection "avg_scores" that averages review scores by product ("asin")
mapf = Code('''function() { emit(this.asin, this.overall) }''')
reducef = Code('''function(key, values) { return Array.avg(values) }''')

cmd = {
    'mapreduce': "reviews",
    'map': mapf,
    'reduce': reducef,
    'out': "avg_scores"
}


result = db.command(SON(cmd))

In [9]:
# Print the first 100 entries of "avg_scores" to screen.
collection_avg_scores = db['avg_scores']

cursor = collection_avg_scores.find({})

ii = 0
for i in cursor:
    print(i)
    if ii == 100:
        break

{'_id': 'B00JEH2P12', 'value': 5.0}
{'_id': 'B00JKPD3YQ', 'value': 3.0}
{'_id': 'B00JNUBSFO', 'value': 3.5}
{'_id': 'B00E4LAL82', 'value': 3.0}
{'_id': 'B00K2OEHJE', 'value': 1.0}
{'_id': 'B00FPG37R2', 'value': 3.1}
{'_id': 'B00GWF36LC', 'value': 4.666666666666667}
{'_id': 'B00GQTGQMK', 'value': 4.0}
{'_id': 'B00FWGU1CY', 'value': 4.4}
{'_id': 'B00IMFT1YG', 'value': 4.0}
{'_id': 'B00EN7IL2U', 'value': 4.0}
{'_id': 'B00HJYQE8C', 'value': 3.6923076923076925}
{'_id': 'B00HSYS3LO', 'value': 4.0}
{'_id': 'B00F6OCS6Y', 'value': 5.0}
{'_id': 'B00EEJPL8Y', 'value': 4.0}
{'_id': 'B00G5T8DJ0', 'value': 4.392156862745098}
{'_id': 'B00HX244TG', 'value': 4.666666666666667}
{'_id': 'B00IFB39EK', 'value': 1.0}
{'_id': 'B00J6JAZB0', 'value': 3.0}
{'_id': 'B00KQBUXAU', 'value': 5.0}
{'_id': 'B00HB5HS2K', 'value': 4.5}
{'_id': 'B00HKWFLGO', 'value': 3.0}
{'_id': 'B00EH2ZGBU', 'value': 4.0}
{'_id': 'B00ISWMMSA', 'value': 2.5}
{'_id': 'B00GWFWGTK', 'value': 5.0}
{'_id': 'B00EP9V9TS', 'value': 3.7142857142

In [28]:
# Uses MongoDB's map reduce function to build a new collection "weighted_avg_scores"
# that averages review scores by product ("asin"), weighted by the number of votes + 1 (the second number + 1).
mapf_w = Code('''function() { emit(this.asin, {value: this.overall, weight: this.helpful[1]}); }''')
reducef_w = Code('''
function (key, values) {
    var sum = 0;
    var weight_sum = 0;
    values.forEach(function(doc) {
        sum += doc.value * (doc.weight+1);
        weight_sum += (doc.weight+1);
    });
    return sum / weight_sum;
}
''')

cmd_w = {
    'mapreduce': "reviews",
    'map': mapf_w,
    'reduce': reducef_w,
    'out': "weighted_avg_scores"
}


result_w = db.command(SON(cmd_w))

In [29]:
# Print the first 100 entires of "weighted_avg_scores" to screen.
collection_weighted_avg_scores = db['weighted_avg_scores']

cursor = collection_weighted_avg_scores.find({})

ii = 0
for i in cursor:
    print(i)
    if ii == 100:
        break

{'_id': 'B00JUFJ1GA', 'value': 5.0}
{'_id': 'B00HET69DC', 'value': 3.0}
{'_id': 'B00K5UV03Q', 'value': 5.0}
{'_id': 'B00GGLOGR0', 'value': 5.0}
{'_id': 'B00IIHO4H2', 'value': 4.333333333333333}
{'_id': 'B00IM7LD9U', 'value': 1.5}
{'_id': 'B00FQVRDMG', 'value': 5.0}
{'_id': 'B00EFD7TQQ', 'value': 5.0}
{'_id': 'B00HD77RPY', 'value': 4.375}
{'_id': 'B00FGFMN3G', 'value': 3.5}
{'_id': 'B00F3BPCJA', 'value': 1.0}
{'_id': 'B00FNNKRCU', 'value': 3.0}
{'_id': 'B00GYURLW0', 'value': 3.0}
{'_id': 'B00ESTXCU4', 'value': 2.0}
{'_id': 'B00EUE5228', 'value': 3.0}
{'_id': 'B00ENC9IVI', 'value': 5.0}
{'_id': 'B00H28S5N2', 'value': 2.0}
{'_id': 'B00GMV8N7I', 'value': 1.0}
{'_id': 'B00FY2TZOG', 'value': 4.642857142857143}
{'_id': 'B00J0A1REY', 'value': 3.0}
{'_id': 'B00JVVU0SQ', 'value': 3.9210526315789473}
{'_id': 'B00GJMBRWI', 'value': 5.0}
{'_id': 'B00H5HF6L4', 'value': 5.0}
{'_id': 'B00HHORYGA', 'value': 3.0}
{'_id': 'B00I4C0SJ4', 'value': 5.0}
{'_id': 'B00HIAS93U', 'value': 1.0}
{'_id': 'B00H1I0A4U

{'_id': 'B00HJW738O', 'value': 5.0}
{'_id': 'B00GLURSKS', 'value': 4.6}
{'_id': 'B00K8J53H8', 'value': 4.517241379310345}
{'_id': 'B00GXTO6GQ', 'value': 4.0}
{'_id': 'B00GNIH79A', 'value': 3.2}
{'_id': 'B00EQHCP38', 'value': 4.409090909090909}
{'_id': 'B00E8XD4D0', 'value': 5.0}
{'_id': 'B00GVG049E', 'value': 5.0}
{'_id': 'B00I7PAYYC', 'value': 5.0}
{'_id': 'B00JZDMVBO', 'value': 5.0}
{'_id': 'B00H58QH3E', 'value': 4.0}
{'_id': 'B00H6ELKCA', 'value': 5.0}
{'_id': 'B00KXQD7WY', 'value': 5.0}
{'_id': 'B00ELSXBE4', 'value': 5.0}
{'_id': 'B00G3GRM56', 'value': 5.0}
{'_id': 'B00HIYA3WQ', 'value': 3.590909090909091}
{'_id': 'B00HHGFZRS', 'value': 4.5}
{'_id': 'B00FG0LOLI', 'value': 5.0}
{'_id': 'B00GU421IE', 'value': 5.0}
{'_id': 'B00KDX55WM', 'value': 2.090909090909091}
{'_id': 'B00GWFWGCW', 'value': 3.95}
{'_id': 'B00GWX9CSU', 'value': 5.0}
{'_id': 'B00IOWAEQ6', 'value': 2.6666666666666665}
{'_id': 'B00G925LB6', 'value': 4.375}
{'_id': 'B00ISGCDC6', 'value': 5.0}
{'_id': 'B00F3Q15A0', 'val