---
# Data modeling, importing, Indexing and Querying Crane datasets

Date: 04-12-2019 <br>
Concept version: 0.9 <br>
Author: Pieter Lems  <br>

Â© Copyright 2019 Ministerie van Defensie

This notebook wil provide information relating to creating data models for MongoDB.<br>
To create the data models we are going to use Python and MongoEngine. 

The notebook also shows how to import the data into the mongoDB datastores.<br>
A python only script is also available in the utilities folder ('~/Geostack-local/utilties/import_datasets_mongo/') located in the Geostack virtual machine.


## Contents of notebook
- Importing the required modules 
- Reading the datasets
- Validating the datasets
- Connecting to the database
    - Create Docker MongoDB database (if needed)
    - Connect
- Creating the model
- Loading the data using the model
    - Creating the import functions
    - Load the data
- Querying the data (pre-indexing)
- Indexing the data
- Querying the data (post-indexing)
- Loading GeoJSON data (Not needed but implented to show how its done)

### The data sets in used in this notebook can be found in the folder ("../Data/Crane_JSON/")

## Importing the required modules

In [1]:
import pandas as pd # Used to read the CSV datasets
from mongoengine import * # Used to model, import and index the data in MongoDB 
from datetime import datetime #Used to transform timestamps in valid datetime

##  Reading the datasets

In [8]:
SW_Crane_Agentha = pd.read_json('../Data/Crane_JSON/20181003_Dataset_SV_GPS_Crane_9407_STAW_Crane_RRW-BuGBk_Agnetha.json')
SW_Crane_Frida = pd.read_json('../Data/Crane_JSON/20181003_Dataset_SV_GPS_Crane_9381_STAW_Crane_RRW-BuGBk_Frida.json')
SW_Crane_Cajsa = pd.read_json('../Data/Crane_JSON/20181003_Dataset_SV_GPS_Crane_9472_STAW_Crane_RRW-BuGR_Cajsa.json')

GE_Crane_181527 = pd.read_json('../Data/Crane_JSON/20180928_Dataset_DE_GPS_Crane_181527_iCora_Crane_13_BuBuBr-YBuBk.json')
GE_Crane_181528= pd.read_json('../Data/Crane_JSON/20191103_Dataset_DE_GPS_Crane_181528_iCora_Crane_15_BuBuBr-WYW_Lotta.json')

LT_Crane_Zydelis = pd.read_json('../Data/Crane_JSON/20200103_Movebank_Common_Crane_Lithuania_GPS_2016_Dataset.json')


In [3]:
LT_Crane_Zydelis = pd.read_json('../Data/Crane_JSON/20200103_Movebank_Common_Crane_Lithuania_GPS_2016_Dataset.json')


##  Validating the datasets
to validate whether the correct datasets have been read, we should print the first row of each dataset.

In [8]:
SW_Crane_Agentha[:1] # Show one row of dataframe

Unnamed: 0,event-id,study-name,timestamp,visible,ground-speed,heading,location-long,location-lat,height-above-ellipsoid,individual-taxon-canonical-name,sensor-type,tag-voltage,individual-local-identifier
0,1154832230,"GPS telemetry of Common Cranes, Sweden",2013-07-20 03:04:08,True,0.0,,13.27158,57.390537,168,Grus grus,gps,4180,9407


In [9]:
SW_Crane_Frida[:1] # Show one row of dataframe

Unnamed: 0,event-id,study-name,timestamp,visible,ground-speed,heading,location-long,location-lat,height-above-ellipsoid,individual-taxon-canonical-name,sensor-type,tag-voltage,individual-local-identifier
0,1154727247,"GPS telemetry of Common Cranes, Sweden",2013-07-21 03:06:32,True,0.0,,13.583908,57.503796,193,Grus grus,gps,4110.0,9381


In [10]:
SW_Crane_Cajsa[:1] # Show one row of dataframe

Unnamed: 0,event-id,study-name,timestamp,visible,ground-speed,heading,location-long,location-lat,height-above-ellipsoid,individual-taxon-canonical-name,sensor-type,tag-voltage,individual-local-identifier
0,1154936959,"GPS telemetry of Common Cranes, Sweden",2013-07-20 04:35:39,True,0.0,,13.316729,57.334858,175,Grus grus,gps,4180,9472


In [11]:
GE_Crane_181527[:1] # Show one row of dataframe

Unnamed: 0,event-id,study-name,timestamp,visible,ground-speed,heading,location-long,location-lat,height-above-msl,individual-taxon-canonical-name,sensor-type,tag-voltage,individual-local-identifier
0,6926595058,GPS 181527,2018-06-14 05:08:08,True,0.0,108,13.033207,54.238594,19,Grus grus,gps,4147,181527


In [12]:
GE_Crane_181528[:1] # Show one row of dataframe

Unnamed: 0,event-id,study-name,timestamp,visible,ground-speed,heading,location-long,location-lat,height-above-msl,individual-taxon-canonical-name,sensor-type,tag-voltage,individual-local-identifier
0,6926602555,GPS 181528 Updated,2018-06-15 06:24:55,True,0.0,237,13.429357,54.335003,-11,Grus grus,gps,4139,181528


In [13]:
LT_Crane_Zydelis[:1]

Unnamed: 0,event-id,study-name,timestamp,visible,ground-speed,heading,location-long,location-lat,height-above-msl,individual-taxon-canonical-name,sensor-type,tag-voltage,individual-local-identifier
0,1890956035,"Common Crane Lithuania GPS, 2016",2016-07-30 06:43:06,True,0.0,154,24.609076,54.59235,155,Grus grus,gps,4150,16121


##  Connecting to the database

#### Create Docker container

In [14]:
# Uncomment the next line if you dont have a mongoDB docker container
# and you want to import the data in a docker container
#docker run -d -p 27017:27017 mongo:latest # Download mongodb image and run the container on port 27017 (localhost:27017)

#### Connect

In [4]:
connect('Crane_Database')

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, read_preference=Primary())

---

# Creating the model

---

In [5]:
# Creating the Tracker document
class Tracker(Document):
    
    # Name of the study
    study_name = StringField()
    
    # Name of the bird, in latin.
    individual_taxon_canonical_name = StringField()
    
    # Id of the crane 
    individual_local_identifier = IntField()
    
    #Start date of the study
    start_date = DateTimeField()
    
    #End date of the study
    end_date = DateTimeField()

    #Name of the crane
    name = StringField()
    
    #Amount of the transmissions related to the tracker
    transmission_Count= IntField()
    
    
    
# Creating the Geometry document
class Geometry(EmbeddedDocument):
 
    # coordinates of transmission coord=[1,2]
    # PointField automaticly adds 2dspehere index
    # Need to add 2d index manually
    coord = PointField()
    
    # altitude of tansmission
    alt = FloatField()

    
# Creating the Speed document    
class Speed(EmbeddedDocument):
    
    # Speed of the Crane
    ground_speed = FloatField()
    
    # Heading of the Crane in degrees
    heading = IntField()
    
# Creating the TrackerMetadata document
class TrackerMetadata(EmbeddedDocument):
    
    #Is the tracker still visible or not?
    visible = BooleanField()
    
    # Type of sensor used in tracker.
    sensor_type = StringField()
    
    # Voltage level of the tracker.
    tag_voltage = FloatField()
    
    
# Creating the Transmission document 
class Transmission(Document):
    
    # Identifier of the transmission
    event_id = IntField()
    
    # Timestamp of when transmission was send 
    timestamp = DateTimeField()
    
    # Embedded geometry of transmission
    geometry = EmbeddedDocumentField(Geometry)
    
    # Embedded speed related data of transmission
    speed = EmbeddedDocumentField(Speed)
    
    # Embedded metadata of transmission
    metadata = EmbeddedDocumentField(TrackerMetadata)
    
    # Reference to the tracker the transmission belongs to
    tracker = ReferenceField(Tracker)
    

---
# Loading the data using the model
---

###  Creating the import functions 

In [6]:
def load_data(df,name,country):
    
    #Create metadata for the tracker 
    start_Date = df.at[0,'timestamp']
    end_Date = df.at[df.shape[0]-1,'timestamp']
    transmission_Count = df.shape[0]
    
    #Create a new tracker, this is only done once 
    tracker = Tracker(study_name = df.at[0,'study-name'],
                      individual_taxon_canonical_name = df.at[0,'individual-taxon-canonical-name'],
                      individual_local_identifier = df.at[0,'individual-local-identifier'],
                      start_date = start_Date,
                      end_date = end_Date,
                      name = name,
                      transmission_Count = transmission_Count)
    tracker.save()
    
    # Create an empty list of transmissions
    # We will append the new transmissions to 
    # this list. Then we will pass the list to 
    # the mongodb bulk insert feature
    transmissions = []
    
    # Print when list appending starts 
    print('Start appending transmissions to list from: ' + str(name) )
    
    # For each row in the dataframe
    for index,row in df.iterrows():
        
        if country == "sw":  
            # Create geometry document for swedish sets
            # NOTE: To use Geometry queries,list the longitude first
            # and then latitude.
            geometry = Geometry(coord = [row['location-long'],row['location-lat']],
                                alt = row['height-above-ellipsoid'])
        else:
            # Create geometry document for german sets
            # NOTE: To use Geometry queries,list the longitude first
            # and then latitude.
            geometry = Geometry(coord = [row['location-long'],row['location-lat']],
                                alt = row['height-above-msl'])
        
        # Create metadata document
        metadata = TrackerMetadata(visible = row['visible'],
                                   sensor_type = row['sensor-type'],
                                   tag_voltage = row['tag-voltage'])
        
        # Create speed document
        speed = Speed(ground_speed = row['ground-speed'])
        
        # Create transmission document and append them 
        # to the transmissions list.
        transmissions.append(Transmission(event_id = row['event-id'],
                                          timestamp = row['timestamp'],
                                          geometry = geometry,
                                          speed = speed,
                                          metadata = metadata,
                                          tracker = tracker))
        
    # Print when list appending is done.
    print('Bulk inserting: '+ str(transmission_Count) + ' transmissions from: ' + str(name) )
        
    # Bulk insert the populated transmissions list
    Transmission.objects.insert(transmissions,load_bulk=True)

    # Print if insert is succesfull
    print("Done inserting "+ str(len(df.index)) + " transmissions")

### Loading the data

In [9]:
# Call the load data function and pass the dataframe and Crane name
load_data(SW_Crane_Agentha,"Agnetha",'sw')
load_data(SW_Crane_Frida,"Frida",'sw')
load_data(SW_Crane_Cajsa,"Cajsa",'sw')
load_data(GE_Crane_181527,"Nena",'ge')
load_data(GE_Crane_181528,"Lotta",'ge')

Start appending transmissions to list from: Agnetha
Bulk inserting: 44534 transmissions from: Agnetha
Done inserting 44534 transmissions
Start appending transmissions to list from: Frida
Bulk inserting: 123805 transmissions from: Frida
Done inserting 123805 transmissions
Start appending transmissions to list from: Cajsa
Bulk inserting: 67887 transmissions from: Cajsa
Done inserting 67887 transmissions
Start appending transmissions to list from: Nena
Bulk inserting: 11626 transmissions from: Nena
Done inserting 11626 transmissions
Start appending transmissions to list from: Lotta
Bulk inserting: 29934 transmissions from: Lotta
Done inserting 29934 transmissions


In [7]:
load_data(LT_Crane_Zydelis,"Zydelis",'lt')

Start appending transmissions to list from: Zydelis
Bulk inserting: 254228 transmissions from: Zydelis
Done inserting 254228 transmissions


---
# Querying the data pre-index

First we will run a couple of queries before we create the indexes on the database. By doing this, we can compare the time it takes to return a certain amount of data with and without an indexed database.

To find information related to the execution of the query add .explain() behind the query

---

In [10]:
#Query to find ID of crane Frida 
Tracker.objects(name = 'Frida').only('name','id').to_json()

'[{"_id": {"$oid": "5e1dbcae507872e91a9d5313"}, "name": "Frida"}]'

In [12]:
#Query to return al items related to Crane: Frida
Transmission.objects(tracker='5e1dbcae507872e91a9d5313')[:10].to_json()

'[{"_id": {"$oid": "5e1dbce4507872e91a9d5314"}, "event_id": 1154727247, "timestamp": {"$date": 1374375992000}, "geometry": {"coord": {"type": "Point", "coordinates": [13.583908, 57.503796]}, "alt": 193.0}, "speed": {"ground_speed": 0.0}, "metadata": {"visible": true, "sensor_type": "gps", "tag_voltage": 4110.0}, "tracker": {"$oid": "5e1dbcae507872e91a9d5313"}}, {"_id": {"$oid": "5e1dbce4507872e91a9d5315"}, "event_id": 1154727246, "timestamp": {"$date": 1374378694000}, "geometry": {"coord": {"type": "Point", "coordinates": [13.578312, 57.504063]}, "alt": 194.0}, "speed": {"ground_speed": 0.5144000000000001}, "metadata": {"visible": true, "sensor_type": "gps", "tag_voltage": 4100.0}, "tracker": {"$oid": "5e1dbcae507872e91a9d5313"}}, {"_id": {"$oid": "5e1dbce4507872e91a9d5316"}, "event_id": 1154727245, "timestamp": {"$date": 1374379629000}, "geometry": {"coord": {"type": "Point", "coordinates": [13.578205, 57.50415]}, "alt": 199.0}, "speed": {"ground_speed": 0.0}, "metadata": {"visible": 

In [13]:
#Query to check executing speed of transmissions related to Frida
Transmission.objects(tracker='5e1dbcae507872e91a9d5313').explain()

{'queryPlanner': {'plannerVersion': 1,
  'namespace': 'Crane_Database.transmission',
  'indexFilterSet': False,
  'parsedQuery': {'tracker': {'$eq': ObjectId('5e1dbcae507872e91a9d5313')}},
  'winningPlan': {'stage': 'COLLSCAN',
   'filter': {'tracker': {'$eq': ObjectId('5e1dbcae507872e91a9d5313')}},
   'direction': 'forward'},
  'rejectedPlans': []},
 'executionStats': {'executionSuccess': True,
  'nReturned': 123805,
  'executionTimeMillis': 229,
  'totalKeysExamined': 0,
  'totalDocsExamined': 532014,
  'executionStages': {'stage': 'COLLSCAN',
   'filter': {'tracker': {'$eq': ObjectId('5e1dbcae507872e91a9d5313')}},
   'nReturned': 123805,
   'executionTimeMillisEstimate': 210,
   'works': 532016,
   'advanced': 123805,
   'needTime': 408210,
   'needYield': 0,
   'saveState': 4156,
   'restoreState': 4156,
   'isEOF': 1,
   'invalidates': 0,
   'direction': 'forward',
   'docsExamined': 532014},
  'allPlansExecution': []},
 'serverInfo': {'host': 'geostack-system',
  'port': 27017,
 

In [14]:
#Query to find ID of crane lotta 
Tracker.objects(name = 'Lotta').only('name','id').to_json()

'[{"_id": {"$oid": "5e1dbd6e507872e91aa06d4c"}, "name": "Lotta"}]'

In [15]:
#Query to return al items related to Crane: Lotta
Transmission.objects(tracker='5e1dbd6e507872e91aa06d4c').explain()

{'queryPlanner': {'plannerVersion': 1,
  'namespace': 'Crane_Database.transmission',
  'indexFilterSet': False,
  'parsedQuery': {'tracker': {'$eq': ObjectId('5e1dbd6e507872e91aa06d4c')}},
  'winningPlan': {'stage': 'COLLSCAN',
   'filter': {'tracker': {'$eq': ObjectId('5e1dbd6e507872e91aa06d4c')}},
   'direction': 'forward'},
  'rejectedPlans': []},
 'executionStats': {'executionSuccess': True,
  'nReturned': 29934,
  'executionTimeMillis': 239,
  'totalKeysExamined': 0,
  'totalDocsExamined': 532014,
  'executionStages': {'stage': 'COLLSCAN',
   'filter': {'tracker': {'$eq': ObjectId('5e1dbd6e507872e91aa06d4c')}},
   'nReturned': 29934,
   'executionTimeMillisEstimate': 200,
   'works': 532016,
   'advanced': 29934,
   'needTime': 502081,
   'needYield': 0,
   'saveState': 4156,
   'restoreState': 4156,
   'isEOF': 1,
   'invalidates': 0,
   'direction': 'forward',
   'docsExamined': 532014},
  'allPlansExecution': []},
 'serverInfo': {'host': 'geostack-system',
  'port': 27017,
  'v

It took 143 miliseconds to return 29934 results using a COLLSCAN (Collection scan)

In [16]:
#Query to return al items related to Crane: Lotta, between 2018-06-01 and 2018-09-01
Transmission.objects(Q(tracker='5dde98e87990f3ac79500deb')&
                     Q(timestamp__gte=datetime(2018,6,1)) &
                     Q(timestamp__lte=datetime(2018,9,1))).explain()

{'queryPlanner': {'plannerVersion': 1,
  'namespace': 'Crane_Database.transmission',
  'indexFilterSet': False,
  'parsedQuery': {'$and': [{'tracker': {'$eq': ObjectId('5dde98e87990f3ac79500deb')}},
    {'timestamp': {'$lte': datetime.datetime(2018, 9, 1, 0, 0)}},
    {'timestamp': {'$gte': datetime.datetime(2018, 6, 1, 0, 0)}}]},
  'winningPlan': {'stage': 'COLLSCAN',
   'filter': {'$and': [{'tracker': {'$eq': ObjectId('5dde98e87990f3ac79500deb')}},
     {'timestamp': {'$lte': datetime.datetime(2018, 9, 1, 0, 0)}},
     {'timestamp': {'$gte': datetime.datetime(2018, 6, 1, 0, 0)}}]},
   'direction': 'forward'},
  'rejectedPlans': []},
 'executionStats': {'executionSuccess': True,
  'nReturned': 0,
  'executionTimeMillis': 228,
  'totalKeysExamined': 0,
  'totalDocsExamined': 532014,
  'executionStages': {'stage': 'COLLSCAN',
   'filter': {'$and': [{'tracker': {'$eq': ObjectId('5dde98e87990f3ac79500deb')}},
     {'timestamp': {'$lte': datetime.datetime(2018, 9, 1, 0, 0)}},
     {'timest

It took 80 miliseconds to return 110 results using a COLLSCAN (Collection scan)

In [17]:
# Query to return al items in a predefined bound box (The Netherlands in this case)
# Bounds of the box can be found using the following webite: https://www.keene.edu/campus/maps/tool/
Transmission.objects(geometry__coord__geo_within_box=[
    (3.2299835,50.7920471),(7.4926788,53.5729383)]).explain()

{'queryPlanner': {'plannerVersion': 1,
  'namespace': 'Crane_Database.transmission',
  'indexFilterSet': False,
  'parsedQuery': {'geometry.coord': {'$geoWithin': {'$box': [[3.2299835,
       50.7920471],
      [7.4926788, 53.5729383]]}}},
  'winningPlan': {'stage': 'COLLSCAN',
   'filter': {'geometry.coord': {'$geoWithin': {'$box': [[3.2299835,
        50.7920471],
       [7.4926788, 53.5729383]]}}},
   'direction': 'forward'},
  'rejectedPlans': []},
 'executionStats': {'executionSuccess': True,
  'nReturned': 503,
  'executionTimeMillis': 673,
  'totalKeysExamined': 0,
  'totalDocsExamined': 532014,
  'executionStages': {'stage': 'COLLSCAN',
   'filter': {'geometry.coord': {'$geoWithin': {'$box': [[3.2299835,
        50.7920471],
       [7.4926788, 53.5729383]]}}},
   'nReturned': 503,
   'executionTimeMillisEstimate': 640,
   'works': 532016,
   'advanced': 503,
   'needTime': 531512,
   'needYield': 0,
   'saveState': 4156,
   'restoreState': 4156,
   'isEOF': 1,
   'invalidates':

It took 744 miliseconds to return 489 results using a COLLSCAN (Collection scan)

In [18]:
# Query to return al items in a predefined polygone (The Netherlands in this case)
# Bounds of the polygone can be found using the following webite: https://www.keene.edu/campus/maps/tool/
Transmission.objects(geometry__coord__geo_within=[[
    [3.2409668,52.2395743],[3.8781738,51.1672889],
    [5.1443481,51.9950282],[3.2409668,52.2395743]]]).explain()

{'queryPlanner': {'plannerVersion': 1,
  'namespace': 'Crane_Database.transmission',
  'indexFilterSet': False,
  'parsedQuery': {'geometry.coord': {'$geoWithin': {'$geometry': {'type': 'Polygon',
      'coordinates': [[[3.2409668, 52.2395743],
        [3.8781738, 51.1672889],
        [5.1443481, 51.9950282],
        [3.2409668, 52.2395743]]]}}}},
  'winningPlan': {'stage': 'FETCH',
   'filter': {'geometry.coord': {'$geoWithin': {'$geometry': {'type': 'Polygon',
       'coordinates': [[[3.2409668, 52.2395743],
         [3.8781738, 51.1672889],
         [5.1443481, 51.9950282],
         [3.2409668, 52.2395743]]]}}}},
   'inputStage': {'stage': 'IXSCAN',
    'keyPattern': {'geometry.coord': '2dsphere'},
    'indexName': 'geometry.coord_2dsphere',
    'isMultiKey': False,
    'multiKeyPaths': {'geometry.coord': []},
    'isUnique': False,
    'isSparse': False,
    'isPartial': False,
    'indexVersion': 2,
    'direction': 'forward',
    'indexBounds': {'geometry.coord': ['[5116089176692

It took 720 miliseconds to return 131 results using a COLLSCAN (Collection scan)

# Indexing the database

there are 2 ways to create indexes on data. 
1. Create an index when modeling the data.<br>
to create an index while creating the data model, we have to add a meta field to  the 	document we want to create an index on. For example: If we want to create an index on 	the altitude field in the geometry document, we add the following meta field to our geometry document: <br> <br>
    meta = {<br>
    'collection': 'altitude',<br>
    'indexes': [
      {'fields': ['alt']}
    ]
  }

 
In the cell below is shown how to add the index to the altitude field.
 

In [None]:
# Creating the Geometry document with an index on the altitude field
class Geometry(EmbeddedDocument):
 
    # coordinates of transmission coord=[1,2]
    # PointField automaticly adds 2dspehere index
    # Need to add 2d index manually
    coord = PointField()
    
    # altitude of tansmission
    alt = FloatField()
    
    meta = {
        'collection': 'altitude',
        'indexes': [
          {'fields': ['alt']}
        ]
    }


- Create indexes after modeling the data <br>
  We can also create the indexes after we created the datamodel. We are going to use this way to create indexes below. For example: if we want to create an index on the altitude field after creating the data model we would run the following command: <br>
  Transmission.create_index(("geometry.alt"))


  

We want to create 4 indexes 
- 2D Sphere index
  This index will be used to query the coordinates of the crane
  (This was automaticly done when assiging PointField() to the coordinates entry, when creating the database model)
- 2D index
  We need this index to be able to find coordinates in a cetrain box 
- timestamp index 
  We need this index because we will query on the timestamp a lot of times
- tracker index (in the transmission collection)
  We need this index because we will query to find transmissions per tracker using the tracker id

In [19]:
# Create an index on the tracker field in the transmission collection
Transmission.create_index(("tracker"))

'tracker_1'

In [20]:
# Create an index on the timestamp field in the transmission collection
Transmission.create_index(("timestamp"))

'timestamp_1'

In [21]:
# Create an 2D index on the coordinates field in the transmission collection
Transmission.create_index([("geometry.coord.coordinates","2d")])

'geometry.coord.coordinates_2d'

---
# Querying the data post-index
---

In [22]:
#Query to check executing speed after indexing 
Transmission.objects(tracker='5de04102b54094744cf72be1').explain()

{'queryPlanner': {'plannerVersion': 1,
  'namespace': 'Crane_Database.transmission',
  'indexFilterSet': False,
  'parsedQuery': {'tracker': {'$eq': ObjectId('5de04102b54094744cf72be1')}},
  'winningPlan': {'stage': 'FETCH',
   'inputStage': {'stage': 'IXSCAN',
    'keyPattern': {'tracker': 1},
    'indexName': 'tracker_1',
    'isMultiKey': False,
    'multiKeyPaths': {'tracker': []},
    'isUnique': False,
    'isSparse': False,
    'isPartial': False,
    'indexVersion': 2,
    'direction': 'forward',
    'indexBounds': {'tracker': ["[ObjectId('5de04102b54094744cf72be1'), ObjectId('5de04102b54094744cf72be1')]"]}}},
  'rejectedPlans': []},
 'executionStats': {'executionSuccess': True,
  'nReturned': 0,
  'executionTimeMillis': 1,
  'totalKeysExamined': 0,
  'totalDocsExamined': 0,
  'executionStages': {'stage': 'FETCH',
   'nReturned': 0,
   'executionTimeMillisEstimate': 0,
   'works': 1,
   'advanced': 0,
   'needTime': 0,
   'needYield': 0,
   'saveState': 0,
   'restoreState': 0,

In [24]:
#Query to return al items related to Crane: Lotta
Transmission.objects(tracker='5e1dbd6e507872e91aa06d4c').explain()

{'queryPlanner': {'plannerVersion': 1,
  'namespace': 'Crane_Database.transmission',
  'indexFilterSet': False,
  'parsedQuery': {'tracker': {'$eq': ObjectId('5e1dbd6e507872e91aa06d4c')}},
  'winningPlan': {'stage': 'FETCH',
   'inputStage': {'stage': 'IXSCAN',
    'keyPattern': {'tracker': 1},
    'indexName': 'tracker_1',
    'isMultiKey': False,
    'multiKeyPaths': {'tracker': []},
    'isUnique': False,
    'isSparse': False,
    'isPartial': False,
    'indexVersion': 2,
    'direction': 'forward',
    'indexBounds': {'tracker': ["[ObjectId('5e1dbd6e507872e91aa06d4c'), ObjectId('5e1dbd6e507872e91aa06d4c')]"]}}},
  'rejectedPlans': []},
 'executionStats': {'executionSuccess': True,
  'nReturned': 29934,
  'executionTimeMillis': 37,
  'totalKeysExamined': 29934,
  'totalDocsExamined': 29934,
  'executionStages': {'stage': 'FETCH',
   'nReturned': 29934,
   'executionTimeMillisEstimate': 30,
   'works': 29935,
   'advanced': 29934,
   'needTime': 0,
   'needYield': 0,
   'saveState'

It took 30 miliseconds to return 29934 results using a IXSCAN (Index scan)

In [25]:
# Query to return al items in a predefined bound box (The Netherlands in this case)
# Bounds of the box can be found using the following webite: https://www.keene.edu/campus/maps/tool/
Transmission.objects(geometry__coord__geo_within_box=[
      (3.2299835,50.7920471),(7.4926788,53.5729383)]).explain()

{'queryPlanner': {'plannerVersion': 1,
  'namespace': 'Crane_Database.transmission',
  'indexFilterSet': False,
  'parsedQuery': {'geometry.coord': {'$geoWithin': {'$box': [[3.2299835,
       50.7920471],
      [7.4926788, 53.5729383]]}}},
  'winningPlan': {'stage': 'COLLSCAN',
   'filter': {'geometry.coord': {'$geoWithin': {'$box': [[3.2299835,
        50.7920471],
       [7.4926788, 53.5729383]]}}},
   'direction': 'forward'},
  'rejectedPlans': []},
 'executionStats': {'executionSuccess': True,
  'nReturned': 503,
  'executionTimeMillis': 625,
  'totalKeysExamined': 0,
  'totalDocsExamined': 532014,
  'executionStages': {'stage': 'COLLSCAN',
   'filter': {'geometry.coord': {'$geoWithin': {'$box': [[3.2299835,
        50.7920471],
       [7.4926788, 53.5729383]]}}},
   'nReturned': 503,
   'executionTimeMillisEstimate': 610,
   'works': 532016,
   'advanced': 503,
   'needTime': 531512,
   'needYield': 0,
   'saveState': 4156,
   'restoreState': 4156,
   'isEOF': 1,
   'invalidates':

It took 20 miliseconds to return 489 results using a IXSCAN (Index scan)

In [27]:
#Query to return al items related to Crane: Lotta, between 2018-06-01 and 2018-09-01
Transmission.objects(Q(tracker='5e1dbd6e507872e91aa06d4c')&
                     Q(timestamp__gte=datetime(2018,6,1)) &
                     Q(timestamp__lte=datetime(2018,9,1))).explain()

{'queryPlanner': {'plannerVersion': 1,
  'namespace': 'Crane_Database.transmission',
  'indexFilterSet': False,
  'parsedQuery': {'$and': [{'tracker': {'$eq': ObjectId('5e1dbd6e507872e91aa06d4c')}},
    {'timestamp': {'$lte': datetime.datetime(2018, 9, 1, 0, 0)}},
    {'timestamp': {'$gte': datetime.datetime(2018, 6, 1, 0, 0)}}]},
  'winningPlan': {'stage': 'FETCH',
   'filter': {'$and': [{'timestamp': {'$lte': datetime.datetime(2018, 9, 1, 0, 0)}},
     {'timestamp': {'$gte': datetime.datetime(2018, 6, 1, 0, 0)}}]},
   'inputStage': {'stage': 'IXSCAN',
    'keyPattern': {'tracker': 1},
    'indexName': 'tracker_1',
    'isMultiKey': False,
    'multiKeyPaths': {'tracker': []},
    'isUnique': False,
    'isSparse': False,
    'isPartial': False,
    'indexVersion': 2,
    'direction': 'forward',
    'indexBounds': {'tracker': ["[ObjectId('5e1dbd6e507872e91aa06d4c'), ObjectId('5e1dbd6e507872e91aa06d4c')]"]}}},
  'rejectedPlans': [{'stage': 'FETCH',
    'filter': {'tracker': {'$eq': Obj

It took 0 miliseconds to return 110 results using a IXSCAN (Index scan)

In [28]:
# Query to return al items in a predefined polygone (The Netherlands in this case)
# Bounds of the polygone can be found using the following webite: https://www.keene.edu/campus/maps/tool/
Transmission.objects(geometry__coord__geo_within=[[
    [3.2409668,52.2395743],[3.8781738,51.1672889],
    [5.1443481,51.9950282],[3.2409668,52.2395743]]]).explain()

{'queryPlanner': {'plannerVersion': 1,
  'namespace': 'Crane_Database.transmission',
  'indexFilterSet': False,
  'parsedQuery': {'geometry.coord': {'$geoWithin': {'$geometry': {'type': 'Polygon',
      'coordinates': [[[3.2409668, 52.2395743],
        [3.8781738, 51.1672889],
        [5.1443481, 51.9950282],
        [3.2409668, 52.2395743]]]}}}},
  'winningPlan': {'stage': 'FETCH',
   'filter': {'geometry.coord': {'$geoWithin': {'$geometry': {'type': 'Polygon',
       'coordinates': [[[3.2409668, 52.2395743],
         [3.8781738, 51.1672889],
         [5.1443481, 51.9950282],
         [3.2409668, 52.2395743]]]}}}},
   'inputStage': {'stage': 'IXSCAN',
    'keyPattern': {'geometry.coord': '2dsphere'},
    'indexName': 'geometry.coord_2dsphere',
    'isMultiKey': False,
    'multiKeyPaths': {'geometry.coord': []},
    'isUnique': False,
    'isSparse': False,
    'isPartial': False,
    'indexVersion': 2,
    'direction': 'forward',
    'indexBounds': {'geometry.coord': ['[5116089176692

It took 3 miliseconds to return 131 results using a IXSCAN (Index scan)

---
## More Queries

In [None]:
#Select all trackers by study name
# Parameters:
# - route_name

def select_Tracker_by_name(study_name):
    result = Tracker.objects(study_name__contains=study_name).to_json()
    return pd.read_json(result)


In [None]:
# All transmissions between a predefined DTG
# Parameters: 
# - Date time group 1
# - Date time group 2

def transmissions_between_dtg(dtg_1,dtg_2):
    result = Transmission.objects(Q(timestamp__gte=dtg_1) & 
                                  Q(timestamp__lte=dtg_2)).to_json()
    return pd.read_json(result)


In [None]:
#Select all Transmissions in predefined Sphere
# parameters:
# - lon
# - lat
# - radius

def transmissions_in_sphere(lat,lon,radius):
    result = Transmission.objects(geometry__coord__geo_within_sphere=[(lon,lat),radius]).to_json()
    return pd.read_json(result)

In [None]:
#Select all Transmissions in predefined polygone
# Use https://www.keene.edu/campus/maps/tool/ to find desired polygone.
# parameters:
# - point 1
# - point 2
# - point 3
# - point 4
def select_transmissions_in_polygone(p1,p2,p3,p4):
    result = Transmission.objects(geometry__coord__geo_within=[[p1,p2,p3,p4]]).to_json()
    return pd.read_json(result)

In [None]:
#Select all Transmissions in predefined box
# Use https://www.keene.edu/campus/maps/tool/ to find desired box.
# parameters:
# - <bottom left coordinates>
# - <upper right coordinates>

def select_transmissions_in_box(p1,p2):
    result = Transmission.objects(geometry__coord__geo_within_box=[p1,p2]).to_json()
    return pd.read_json(result)

---
## Load GeoJSON Data

This is implented just to show how it's done.

---

In [None]:
#Define the file which we want to load
inputfile = "../Data/Crane_GeoJSON/20181003_Dataset_SV_GPS_Crane_9381_STAW_Crane_RRW-BuGBk_Frida.json"

# Define the databese in which the data will be loaded
to_database = 'GeoJSON_Database'

# Define the collection in which the data will be loaded
to_collection =  'Transmissions'

# Define the server to which we will connect
to_server = 'localhost'

# Define the port the server is running on
to_port = '27017'

# Create the MongoDB connection string
uri = 'mongodb://' + to_server + ':' + to_port +'/'

# Set user to false (If no user is needed)
# Set to username if authentication is required
db_user = False

# If authentication is required, use the following code
if db_user:
  db_password = 'Your password'
  uri = 'mongodb://' + db_user + ':' + db_password + '@' + to_server + ':' + to_port +'/' + to_database

# Read the geojson file
with open(inputfile,'r') as f:
      geojson = json.loads(f.read())   

        
#Function for loading GeoJSON in MongoDB without model
# Parameter 1 = geojson to insert
# Parameter 2 = Collection to insert to 
# Parameter 3 = Database to insert to 
# Parameter 4 = Server the database is running on
# Parameter 5 = Port server is running on
# Parameter 6 = MongoDB connectionstring 

def load_geojson(inputfile, to_collection, to_database,
                 to_server, to_port, uri):

    
    # Assign connection related values to variables
    client = MongoClient(uri)
    db = client[to_database]
    collection = db[to_collection]

    # create MongoDB index on geometry feature
    # More info on indexing can be found in the cookbook:
    # "Data modeling in MongoDB"
    collection.create_index([("geometry", GEOSPHERE)])
    
    # Initialize the bulk operation
    bulk = collection.initialize_unordered_bulk_op()

    # For each item in the feature object of our GeoJSON
    for feature in geojson['features']:
        
      # Convert datetime to valid format if needed
      #timestamp_w_tz = feature['properties']['timestamp']
      #feature['properties']['timestamp'] = datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%S.%fZ')

      # append all features to bulk insert
      bulk.insert(feature)

    # execute bulk insert
    result = bulk.execute()
    
    # Print when data is inserted
    print("Features successully inserted")
    

In [None]:
# Run the function
load_geojson(inputfile, to_collection, to_database, to_server, to_port, uri)