## GeoSpacial Analysis in MongoDB

To perform Geospacial analysis in MongoDB you need to have the document in particular format for MongoDb to create index. After you have converted the file you need to create a new collection in MongoDB database. 

After importing the documents(records) you then need to create a geospacial index on Location attribute.

Import Statements

In [1]:
import pandas as pd
import numpy as np

import json
data_path = "../Data/business.json"
out_data_path = "../Data/business_GeoSphere.json"

## Preparing Data for Geospacial analysis

In [None]:
import json
def main(f, o):
    """
    fix JSON input 'f' and save it as a JSON Array to output file 'o'
    :param f: input file path as a string
    :param o: output file path as a string
    :return: None
    """
    with open(f) as f:
        l = f.read().splitlines()
        res = []
        out = open(o, 'w')
        for line in l:
            j = json.loads(line)
            d = {}
            d['_id'] = j['business_id']
            d['name'] = j['name']
            d["loc"] = {
                        "type" : "Point",
                        "coordinates" : [j['longitude'], j["lattitude"]]
              }
            d['stars'] = j['stars']
            d['review_count'] = j['review_count']
            d['categories'] = j['categories']
            res.append(json.dumps(d))
        out.write('[' + ',\n'.join(res) + ']')
        out.close()


if __name__ == '__main__':
    main(data_path, out_data_path)

## Connecting with database

In [2]:
def get_db():
    # For local use
    from pymongo import MongoClient
    client = MongoClient('localhost:27017')
    # 'yelp' here is the database name. It will be created if it does not exist.
    db = client.yelp
    return db

if __name__ == "__main__":
    # For local use
    db = get_db() 
    #extract the data stored in the database
    businessGeoin = db.businessGeoSp.find()

Checking if the data has been imported correctly

In [None]:
for d in db.businessGeoSp.find()[:3]:
    print(d)

## Performing GeoSpacial analysis on Data

This pipeline filters the records within 15000 meters radius of Toronto coordinates.

It then unwinds or splits the list of categories in the data and then performs grouping aggegation

In [5]:
geo = db.businessGeoSp.aggregate([
   {  
     "$geoNear": {
        "near": { "type": "Point", "coordinates": [ -79.3832, 43.6532] },
        "distanceField": "dist.calculated",
        "minDistance":0,
        "maxDistance": 15000,
        "includeLocs": "dist.location",
        "spherical": "true"
     }},
     { "$unwind" : "$categories" },
    {"$group" :{"_id":{"categories":"$categories"},"avgStars" :{"$avg":"$stars"}, "avgReviews":{"$avg":"$review_count"}}}
    ,
    { "$out" : "near_toronto_business" }
])

<pymongo.command_cursor.CommandCursor at 0x1063109b0>

In [4]:
for d in geo:
    print(d)

{'_id': {'categories': 'Wine Bars'}, 'avgStars': 3.5, 'avgReviews': 23.0}
{'_id': {'categories': 'Diners'}, 'avgStars': 3.5, 'avgReviews': 23.0}
{'_id': {'categories': 'Leather Goods'}, 'avgStars': 1.5, 'avgReviews': 8.0}
{'_id': {'categories': 'Taxis'}, 'avgStars': 1.0, 'avgReviews': 9.0}
{'_id': {'categories': 'Candy Stores'}, 'avgStars': 5.0, 'avgReviews': 3.0}
{'_id': {'categories': 'Laser Hair Removal'}, 'avgStars': 3.1666666666666665, 'avgReviews': 14.666666666666666}
{'_id': {'categories': 'Chocolatiers & Shops'}, 'avgStars': 5.0, 'avgReviews': 3.0}
{'_id': {'categories': 'Hair Removal'}, 'avgStars': 3.1666666666666665, 'avgReviews': 14.666666666666666}
{'_id': {'categories': 'Grocery'}, 'avgStars': 3.5, 'avgReviews': 22.0}
{'_id': {'categories': 'Chinese'}, 'avgStars': 3.0, 'avgReviews': 32.5}
{'_id': {'categories': 'Massage'}, 'avgStars': 3.5, 'avgReviews': 10.0}
{'_id': {'categories': 'Hawaiian'}, 'avgStars': 4.0, 'avgReviews': 239.0}
{'_id': {'categories': 'Canadian (New)'},

## Left Join two collections for aggregations

In this I have attemped to aggregate the reviews from 2000 - 2017 for various business categories and perform aggregation. For this we need to use two seperate collections reviews and business. 
Then we perform a look up operation which is similar to Left Join in relational databases

In [12]:
db.businessGeoSp.aggregate([
    {
    "$geoNear": {
        "near": { "type": "Point", "coordinates": [43.761539, -79.411079] },
         "distanceField": "dist.calculated",
         "maxDistance": 15000,
         "includeLocs": "dist.location",
         "spherical": True
     }
    },
    { "$unwind" : "$categories"},
    {"$match" :  {"categories": {"$in" : ["Food"]}}},
    { "$lookup":
         {
            "from": "reviews",
            "localField": "_id",
            "foreignField": "business_id",
            "as": "business_review"
        }
       },
    {'date': {'$gte': 'ISODate("2000-01-01T00:00:00Z")','$lt': 'ISODate("2017-06-01T00:00:00Z")'}}
    {"$group" :{"_id":{"name": "$business_review.name"},"avgStars" :{"$avg":"$business_review.stars"}}}
    
])

<pymongo.command_cursor.CommandCursor at 0x1063c3668>