<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Prerequisites" data-toc-modified-id="Prerequisites-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Prerequisites</a></span></li><li><span><a href="#Aggregations" data-toc-modified-id="Aggregations-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Aggregations</a></span><ul class="toc-item"><li><span><a href="#unwind" data-toc-modified-id="unwind-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span><code>unwind</code></a></span><ul class="toc-item"><li><span><a href="#Trying-with-a-single-document" data-toc-modified-id="Trying-with-a-single-document-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>Trying with a single document</a></span></li><li><span><a href="#Scale-it-up" data-toc-modified-id="Scale-it-up-2.1.2"><span class="toc-item-num">2.1.2&nbsp;&nbsp;</span>Scale it up</a></span></li></ul></li></ul></li><li><span><a href="#GeoJSON" data-toc-modified-id="GeoJSON-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>GeoJSON</a></span></li><li><span><a href="#Index" data-toc-modified-id="Index-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Index</a></span></li><li><span><a href="#Geospatial-Queries" data-toc-modified-id="Geospatial-Queries-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Geospatial Queries</a></span><ul class="toc-item"><li><span><a href="#near" data-toc-modified-id="near-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span><code>near</code></a></span></li><li><span><a href="#geoWithin" data-toc-modified-id="geoWithin-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span><code>geoWithin</code></a></span></li></ul></li><li><span><a href="#Further-resources" data-toc-modified-id="Further-resources-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Further resources</a></span></li></ul></div>

# Geoqueries

Our object is to be able to query our documents geospatially.
That is, if there is geographical data, such as latitude and longitude, we can look for documents nearby a certain point (e.g.: `Companies near Dallas...`) or, if we have access to polygon data, to see if there are documents inside an area (e.g.: `Power stations within the area of a thunderstorm`). 

## Prerequisites

In order to work with geographical data in mongo, we must make sure the data is stored on a proper format and that it is indexed. Indexes are what allow for the efficient execution of queries in mongoDB.

Since a lot of geometrical calculations must be done to query geospatially, and the more documents we have, the more complex, it is needed.

In [1]:
from pymongo import MongoClient

In [2]:
client = MongoClient()

In [3]:
companies = client.companies.companies
offices = client.companies.offices

## Aggregations

[Aggregations](https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/) are a special kind of query in mongoDB. It allows for a `pipeline` of processes to be performed onto the documents. We can perform calculations, generate new values, unwind and also filter.

### `unwind`

First, however, we must check that our data is properly stored for indexing. 

Our documents have the offices (where the geodata is stored) as an array, because some companies have more than one office, in different locations.

The ideal for us here is to separate each office into it's own document, so we treat them individually.

That's where the [`$unwind`](https://docs.mongodb.com/manual/reference/operator/aggregation/unwind/) operator comes in.

#### Trying with a single document

In [4]:
res = companies.find({"name":"Facebook"})
res = list(res)
#res[0]["offices"]
len(res)

1

In [5]:
res = companies.aggregate(
    [
        {"$match":{"name":"Facebook"}}, # Select only those who match {name:"Facebook"}
        {"$unwind":"$offices"}, # Separate into different documents based on the values on column "offices"
        {"$project":{"name":1,"offices":1}} # Choose which attributes we want on the final output
    ]
)
res = list(res)
len(res)

3

In [6]:
# One document turned into 3
res

[{'_id': ObjectId('52cdef7c4bab8bd675297d8e'),
  'name': 'Facebook',
  'offices': {'description': 'Headquarters',
   'address1': '1601 Willow Road',
   'address2': '',
   'zip_code': '94025',
   'city': 'Menlo Park',
   'state_code': 'CA',
   'country_code': 'USA',
   'latitude': 37.41605,
   'longitude': -122.151801}},
 {'_id': ObjectId('52cdef7c4bab8bd675297d8e'),
  'name': 'Facebook',
  'offices': {'description': 'Europe HQ',
   'address1': '',
   'address2': '',
   'zip_code': '',
   'city': 'Dublin',
   'state_code': None,
   'country_code': 'IRL',
   'latitude': 53.344104,
   'longitude': -6.267494}},
 {'_id': ObjectId('52cdef7c4bab8bd675297d8e'),
  'name': 'Facebook',
  'offices': {'description': 'New York',
   'address1': '340 Madison Ave',
   'address2': '',
   'zip_code': '10017',
   'city': 'New York',
   'state_code': 'NY',
   'country_code': 'USA',
   'latitude': 40.7557162,
   'longitude': -73.9792469}}]

#### Scale it up

Now that we've seen and got used to it with a single element, we can do the same for the whole of the collection.

- `Warning!⚠️:` Objects unwinded keep the same object id, in order to insert them into a new database, we must omit the id so new Object ids are created

In [7]:
res = companies.aggregate([{"$unwind":"$offices"},{"$project":{"_id":0}}])

In [8]:
offices.insert_many(res)

<pymongo.results.InsertManyResult at 0x15520eb80>

As we can see, the process was successful, but there is another obstacle. There can't be null values for latitude or longitude.

We must remove  those documents so we can create the index. Luckly, it can all be done on the same aggregation.

In [9]:
res = companies.aggregate([
    {"$unwind":"$offices"},
    {"$match":{"offices.latitude":{"$ne":None}, "offices.longitude":{"$ne":None}}},
    {"$project":{"_id":0}}])

In [10]:
offices.drop()  # Delelting previously inserted data with nulls.
offices.insert_many(res)

<pymongo.results.InsertManyResult at 0x137ef8dc0>

## GeoJSON

Now that we have our data properly cleaned, we must make sure it is on a proper format.

There are many standards for coordinate data, but we will use GeoJSON.

GeoJSONs are basically dictionaries, with a key `type` indicating whether it is a `Point`, `Polygon` or `Multipolygon`. And another `coordinates` that contains an array of points or a single point.

- `NOTE 🌍:` On geoJSON, the longitude must come before the longitude.

```js
{
    "type": "Point",
    "coordinates": [125.6, 10.1]
  }
```

The excerpt above is only the `geometry` part of a geoJSON (the other being the properties), but is all we need.

In [11]:
res = offices.find({},{"offices":1})

In [12]:
for comp in res:
    geojson = {
        "type":"Point",
        "coordinates":[comp["offices"]["longitude"], comp["offices"]["latitude"]]
    } 
    offices.update_one(comp, {"$set":{"geojson":geojson}})  # We update all of the elements with the new value.

## Index

The coordinates are now ready. We just have to convert them into a index, a 2dsphere index.

This can be acomplished through pymongo;

In [13]:
offices.create_index([("geojson", "2dsphere")])

'geojson_2dsphere'

Or through mongoCompass:

![](images/index_menu.png)

![](images/index_create.png)

## Geospatial Queries

Now that we have it all set, we can begin filtering our documents with this data. 
For an example, we can find the points closest and fartherst to ironhack.

### `near`

The operator near can be used to see if a document is close or far from a given point. You can also define a `$minDistance` and `$maxDistance`.

- `Note:` The results will be sorted from closest to furtherst


In [14]:
ironhack = {
        "type":"Point",
        "coordinates":[-3.6982891786021477,40.39256209165716,]
    }

In [15]:
res = offices.find({"geojson":{"$near":ironhack,"$maxDistance":10000}}, {"name":1,"offices":1})

In [16]:
res = list(res)

In [17]:
res[:1]

[{'_id': ObjectId('601be1de944ade9fc75061b7'),
  'name': 'Daily Flat Rental',
  'offices': {'description': 'Central Office',
   'address1': 'Lavapies 26 1A',
   'address2': '',
   'zip_code': '28012',
   'city': 'Madrid',
   'state_code': None,
   'country_code': 'ESP',
   'latitude': 40.412323,
   'longitude': -3.703248}}]

### `geoWithin`

You can also use this data to see which points are within a given area. 

We will however, need the coordinates for the polygons (the perimeter of the areas)

In [18]:
import json

with open("../datasets/spain-communities.geojson") as file: # Opening some geodata for spanish communities
    ca = json.load(file)

In [19]:
ca.keys() # It is a Json file, a single dictionary.

dict_keys(['type', 'features'])

Since this json is of the type `Feature Collection`, each of the elements on the `features` array will represent an entity.

In [20]:
ca["features"][0].keys()

dict_keys(['type', 'geometry', 'properties'])

In [21]:
ca["features"][0]["properties"]

{'cod_ccaa': '07',
 'noml_ccaa': 'COMUNIDAD DE CASTILLA Y LEON',
 'name': 'Castilla-Leon',
 'cartodb_id': 7,
 'created_at': '2014-09-30T00:00:00Z',
 'updated_at': '2014-12-25T02:07:41Z'}

In [22]:
spain = [{"name":com["properties"]["name"],**com} for com in ca["features"]]
# We do a list comprehension in order to add the name of each Community on a 
# more visible place instead of into the `properties`

In [23]:
[ca["name"] for ca in spain]

['Castilla-Leon',
 'Cataluña',
 'Ceuta',
 'Murcia',
 'La Rioja',
 'Baleares',
 'Canarias',
 'Cantabria',
 'Andalucia',
 'Asturias',
 'Valencia',
 'Melilla',
 'Navarra',
 'Galicia',
 'Aragon',
 'Madrid',
 'Extremadura',
 'Castilla-La Mancha',
 'Pais Vasco']

We add this data onto a new collection for easily selecting each of the communities.

In [24]:
client.companies.spain.insert_many(spain)

<pymongo.results.InsertManyResult at 0x1127ade80>

Whenever we want the data for a community, we simply query for it. 

In [25]:
res = client.companies.spain.find({"name":"Murcia"})

In [26]:
murcia = next(res)

When we want to check which documents are within a perimeter:
```js 
   <collection>.find( { <attribute> : { "$geoWithin" : { "$geometry" :  <geojson> } } )
```
- `collection` : Where the documents are
- `attribute` : the name of the attribute that contains the geometry data on the documents of collection.
- `$geometry` : this operator must be use to indicate we are passing a polygon
- `geojson` : The data of the polygon.

In [27]:
res = offices.find({"geojson":{"$geoWithin":{"$geometry":murcia["geometry"]}}}, {"name":1,"offices":1})
res = list(res)

In [28]:
res

[{'_id': ObjectId('601be1de944ade9fc75074d9'),
  'name': 'Cokidoo',
  'offices': {'description': 'Murcia Office',
   'address1': '',
   'address2': '',
   'zip_code': '30008',
   'city': 'Murcia',
   'state_code': None,
   'country_code': 'ESP',
   'latitude': 37.9928939,
   'longitude': -1.1317041}}]

## Further resources

- [Lots of geojson files](https://github.com/codeforamerica/click_that_hood/tree/master/public/data)