# Geospatial Index

---
### [Geospatial Index](https://docs.mongodb.com/manual/geospatial-queries/)

- Support queries on geospatial data.
> - Think of GPS location in Google maps, Uber, Zomato, etc.

- Support objects for point, line, polygon, etc.

- Can find nearest point to a given location, point existing within a bounding geometry, and more.

----

### Connect to local server

---

In [53]:
# Importing the required libraries
import pymongo

import pprint as pp
pp.sorted = lambda x, key=None: x

In [54]:
# Connect to local host
client = pymongo.MongoClient("mongodb://localhost:27017/")

In [55]:
# Connect to database
db = client['nyc']

In [56]:
# Sample document
pp.pprint(db.airbnb.find_one())

{'_id': ObjectId('60c21bf5b653d40e79b4a7d0'),
 'accom_id': 2595,
 'description': 'Skylit Midtown Castle',
 'host': {'id': 2845,
          'name': 'Jennifer',
          'listings_count': 3,
          'neighbourhood_list': ['Midtown', "Hell's Kitchen"]},
 'neighbourhood': {'name': 'Midtown', 'group': 'Manhattan'},
 'location': {'type': 'Point', 'coordinates': [-73.98559, 40.75356]},
 'room_type': 'Entire home/apt',
 'price': 150,
 'minimum_nights': 30,
 'reviews': {'number_of_reviews': 48,
             'last_review': datetime.datetime(2019, 11, 4, 0, 0),
             'reviews_per_month': 0.35},
 'availability_365': 365}


---
**Drop previous indexes.**

---

In [57]:
db.airbnb.drop_indexes()

---
**Create geospatial index.**

----

In [58]:
# Create geospatial index
db.airbnb.create_index([
                        ('location', '2dsphere')
                    ])

'location_2dsphere'

----
### $nearSphere operator

Suppose you want to find an accomodation close to location -73.93414657 longitude and 40.82302903 latitude.

[$nearSphere](https://docs.mongodb.com/manual/reference/operator/query/nearSphere/#mongodb-query-op.-nearSphere) specifies a point for which a geospatial query returns the documents from nearest to farthest.

----

In [59]:
# Query using geospatial index
pp.pprint(
    db.airbnb.find_one({
                        'location': {
                                        '$nearSphere': {
                                                            '$geometry': {
                                                                            'type': 'Point',
                                                                            'coordinates': [-73.93414657, 40.82302903]
                                                                        }
                                                        }
                                                    }
                        })
)

{'_id': ObjectId('60c21bfdb653d40e79b51958'),
 'accom_id': 42402892,
 'description': 'Peaceful',
 'host': {'id': 337459786,
          'name': 'Empress',
          'listings_count': 1,
          'neighbourhood_list': ['Harlem']},
 'neighbourhood': {'name': 'Harlem', 'group': 'Manhattan'},
 'location': {'type': 'Point', 'coordinates': [-73.93623, 40.82389]},
 'room_type': 'Private room',
 'price': 49,
 'minimum_nights': 36,
 'reviews': {'number_of_reviews': 0,
             'last_review': nan,
             'reviews_per_month': nan},
 'availability_365': 365}


---
**Query using minimum or maximum distances to narrow down the search parameter.**

`$minDistance` filters the results of query to those documents that are at least the specified distance from the center point. While `$maxDistance` determines the maximum distance to look for. Both fields take values in meters.

---

In [60]:
# Query using minDistance
len(list(db.airbnb.find({
                'location': {
                                '$nearSphere': {
                                                    '$geometry': {
                                                                    'type': 'Point',
                                                                    'coordinates': [-73.93414657, 40.82302903]
                                                                },
                                                    '$maxDistance': 1500
                                                }
                                            }
            })))#.count()

1117

---
For exclusion certain documents based on location, we can use `$minDistance`.

---

In [61]:
# Query minDistance and maxDistance
len(list(db.airbnb.find({
                'location': {
                                '$nearSphere': {
                                                    '$geometry': {
                                                        'type': 'Point',
                                                        'coordinates': [-73.93414657, 40.82302903]
                                                    },
                                                    '$minDistance': 1000,
                                                    '$maxDistance': 1500
                                                }
                            }
            })))#.count()

689

----
### Neighbourhood data

New York City neighbourhood boundaries data. 

Website - https://opendata.cityofnewyork.us/

----

In [71]:
# # Restore neighbourhoods data
# !mongorestore --db nyc --collection neighbourhoods "C:/Users/parin/Documents/AV/MongoDB/Course Handouts/Module 7 - Indexing in MongoDB/Notebook dataset/nyc_neighbourhoods/nyc/neighbourhoods.bson"

#C:\Users\parin>mongorestore --db nyc --collection neighbourhoods "C:/Users/parin/Documents/AV/MongoDB/Course Handouts/Module 7 - Indexing in MongoDB/Notebook dataset/nyc_neighbourhoods/nyc/neighbourhoods.bson"

In [72]:
# Collections
db.list_collection_names()

['neighbourhood', 'airbnb', 'neighbourhoods']

In [73]:
# Sample document
pp.pprint(
    db.neighbourhoods.find_one()
)

{'_id': ObjectId('60bf31520914cc69b40dd71e'),
 'type': 'Feature',
 'properties': {'ntacode': 'BX63',
                'shape_area': '19379048.8116',
                'county_fips': '005',
                'ntaname': 'West Concourse',
                'shape_leng': '28500.1688303',
                'boro_name': 'Bronx',
                'boro_code': '2'},
 'geometry': {'type': 'MultiPolygon',
              'coordinates': [[[[-73.9119181232027, 40.84325788668494],
                                [-73.9119464821854, 40.84317571365838],
                                [-73.91198635358123, 40.843057098087755],
                                [-73.91222180466877, 40.84235659120049],
                                [-73.91240381713898, 40.841816496608914],
                                [-73.91246930262152, 40.84162025687984],
                                [-73.91262256160424, 40.84115715928204],
                                [-73.91269769187517, 40.840930129302016],
                          

In [74]:
# Create geospatial index
db.neighbourhood.create_index([('geometry', '2dsphere')])

'geometry_2dsphere'

---
### $geoWithin operator

Suppose you need to find out the number of accomodations within a specific neighbourhood.

You would need to use [$geoWithin](https://docs.mongodb.com/manual/reference/operator/query/geoWithin/#mongodb-query-op.-geoWithin) that selects documents with geospatial data that exists entirely within a specified shape.

Find how many accomodations fall in the `Upper West Side` neighbourhood.

---

In [75]:
# Query
pp.pprint(
        db.neighbourhoods.find_one({
                                'properties.ntaname': 'Upper West Side'
                            })
)

{'_id': ObjectId('60bf31520914cc69b40dd6c0'),
 'type': 'Feature',
 'properties': {'ntacode': 'MN12',
                'shape_area': '34379942.4562',
                'county_fips': '061',
                'ntaname': 'Upper West Side',
                'shape_leng': '29160.206532',
                'boro_name': 'Manhattan',
                'boro_code': '1'},
 'geometry': {'type': 'MultiPolygon',
              'coordinates': [[[[-73.96003018437119, 40.798038107394326],
                                [-73.96052271735704, 40.79736846896001],
                                [-73.9607478509035, 40.79705597565531],
                                [-73.96097971807933, 40.796738644045305],
                                [-73.96144060721896, 40.79611082668159],
                                [-73.9618998546095, 40.79547927006113],
                                [-73.96235980150668, 40.794852060560665],
                                [-73.96282149803885, 40.794237235188895],
                     

In [76]:
# Neighbourhood
neighbourhood_loc = db.neighbourhoods.find_one({
                                                'properties.ntaname': 'Upper West Side'
                                            })['geometry']

In [77]:
# Neighbourhood geometry
neighbourhood_loc

{'type': 'MultiPolygon',
 'coordinates': [[[[-73.96003018437119, 40.798038107394326],
    [-73.96052271735704, 40.79736846896001],
    [-73.9607478509035, 40.79705597565531],
    [-73.96097971807933, 40.796738644045305],
    [-73.96144060721896, 40.79611082668159],
    [-73.9618998546095, 40.79547927006113],
    [-73.96235980150668, 40.794852060560665],
    [-73.96282149803885, 40.794237235188895],
    [-73.96297579484633, 40.794016181496374],
    [-73.96307931092772, 40.79386787632811],
    [-73.96321832678798, 40.793678573278086],
    [-73.96371100656353, 40.79300902804479],
    [-73.96380601043539, 40.79287990080324],
    [-73.96417598587449, 40.79236204502794],
    [-73.96468540673351, 40.791664026798635],
    [-73.96517705565138, 40.79099034109952],
    [-73.96562799538655, 40.79036611712901],
    [-73.96609500572444, 40.78973438976666],
    [-73.96655226678918, 40.78910715282552],
    [-73.96700977073424, 40.78847679023962],
    [-73.9674490837313, 40.787860721093054],
    [-73.9

----

Find all the documents that fall within the neighbourhood boundary in the airbnb collection. 

---

In [78]:
# Number of accomodations that fall within the neighbourhood
len(list(db.airbnb.find({
                'location': {
                                '$geoWithin': {
                                                '$geometry': neighbourhood_loc
                                            }
                            }
            })))#.count()

886

In [79]:
# Documents
cur = db.airbnb.find({
                        'location': {
                                        '$geoWithin': {
                                            '$geometry': neighbourhood_loc
                                                    }
                                    }
                    },
                    {
                        'neighbourhood':1,
                        '_id':0,
                        'accom_id':1
                    })

for doc in cur:
    pp.pprint(doc)

{'accom_id': 107895,
 'neighbourhood': {'name': 'Upper West Side', 'group': 'Manhattan'}}
{'accom_id': 2283143,
 'neighbourhood': {'name': 'Upper West Side', 'group': 'Manhattan'}}
{'accom_id': 6972192,
 'neighbourhood': {'name': 'Upper West Side', 'group': 'Manhattan'}}
{'accom_id': 16846366,
 'neighbourhood': {'name': 'Upper West Side', 'group': 'Manhattan'}}
{'accom_id': 6972089,
 'neighbourhood': {'name': 'Upper West Side', 'group': 'Manhattan'}}
{'accom_id': 5541897,
 'neighbourhood': {'name': 'Upper West Side', 'group': 'Manhattan'}}
{'accom_id': 5033691,
 'neighbourhood': {'name': 'Upper West Side', 'group': 'Manhattan'}}
{'accom_id': 5081207,
 'neighbourhood': {'name': 'Upper West Side', 'group': 'Manhattan'}}
{'accom_id': 4905248,
 'neighbourhood': {'name': 'Upper West Side', 'group': 'Manhattan'}}
{'accom_id': 5034136,
 'neighbourhood': {'name': 'Upper West Side', 'group': 'Manhattan'}}
{'accom_id': 4905118,
 'neighbourhood': {'name': 'Upper West Side', 'group': 'Manhattan'}}

---
### Aggregation Pipeline

We can calculate `$nearSphere` queries in aggregate pipeline suing [$geoNear](https://docs.mongodb.com/manual/reference/operator/aggregation/geoNear/#-geonear--aggregation-) operator.

It outputs documents in order of nearest to farthest from a specified point.

**Syntax -** `{ $geoNear: { <geoNear options> } }`

The `$geoNear` pipeline operator takes advantage of a geospatial index. When using `$geoNear`, the `$geoNear` pipeline operation must appear as the first stage in an aggregation pipeline.


----


For example, finding all documents in airbnb collection nearest to `[-73.93414657, 40.82302903]` between 1000 and 5000 meters from it.

---

In [80]:
# Aggregate pipeline
cur = db.airbnb.aggregate([
                        # geoNear
                        {
                            '$geoNear':{
                                            # Point
                                            'near': {
                                                        'type': 'Point',
                                                        'coordinates': [-73.93414657, 40.82302903]
                                                    },
                                            # Output field with calculated distance
                                            'distanceField': 'Distance',
                                            # Optional fields
                                            # Spherical geometry
                                            'spherical': True,
                                            # Maximum distance
                                            'maxDistance': 5000,
                                            # Minimum distance
                                            'minDistance': 1000, 
                                            # Quey
                                            'query': {'room_type': 'Private room'},
                                            # Location of the matched document
                                            'includeLocs': 'Location'
                                        }
                        },
                        # Project
                        {
                            '$project':{
                                            '_id':0,
                                            'ID': '$accom_id',
                                            'Distance': 1,
                                            'Location': 1,
                                            'Room': '$room_type'
                                        }
                        },
                        # Limit
                        {
                            '$limit': 5
                       }
                ])

for doc in cur:
    pp.pprint(doc)

{'Distance': 1001.3033274308483,
 'Location': {'type': 'Point', 'coordinates': [-73.94601, 40.82359]},
 'ID': 310325,
 'Room': 'Private room'}
{'Distance': 1003.4215563304882,
 'Location': {'type': 'Point', 'coordinates': [-73.94601, 40.82384]},
 'ID': 24892670,
 'Room': 'Private room'}
{'Distance': 1004.2754543256573,
 'Location': {'type': 'Point', 'coordinates': [-73.94467, 40.81879]},
 'ID': 31621785,
 'Room': 'Private room'}
{'Distance': 1004.8304867337902,
 'Location': {'type': 'Point', 'coordinates': [-73.946, 40.82404]},
 'ID': 41840320,
 'Room': 'Private room'}
{'Distance': 1005.4592093980679,
 'Location': {'type': 'Point', 'coordinates': [-73.946, 40.82197]},
 'ID': 5192165,
 'Room': 'Private room'}


----
### Exercises

- Find number of accomodations within 500 meters around `-73.9857, 40.7484` in airbnb collection?

- How many accomodations in airbnb are within the neighbourhoods whose `boro_name` is `Manhattan` and `boro_code` is `1`? ***Use the neighbourhoods collections for this along with the $geoWithin operator.*** 


----