# IoT Microdemos


## Indexing Strategy

A proper indexing strategy is key for efficient querying of data. The first index is mandatory for efficient time series queries in historical data. The second one is needed for efficient retreival of the current, i.e. open, bucket for each device. If all device types have the same bucket size, it can be created as a partial index - this will only keep the open buckets in the index. For varying bucket sizes, e.g. per device type, the type could be added to the index. The savings can be huge for large implementations.

In [None]:
import pymongo
import os
import datetime
import bson
from bson.json_util import loads, dumps, RELAXED_JSON_OPTIONS
import random
from pprint import pprint

CONNECTIONSTRING = "localhost:27017"

# Establish Database Connection
client = pymongo.MongoClient(CONNECTIONSTRING)
db = client.iot
collection = db.iot_raw

In [None]:
# Efficient queries per device and timespan
result = collection.create_index([("device",pymongo.ASCENDING),
                         ("min_ts",pymongo.ASCENDING),
                         ("max_ts",pymongo.ASCENDING)])
print("Created Index: " + result)

# Efficient retreival of open buckets per device
result = collection.create_index([("device",pymongo.ASCENDING),
                         ("cnt",pymongo.ASCENDING)],
                        partialFilterExpression={"cnt": {"$lt":3}})
print("Created Index: " + result)

### Index usage during ingestion

The execution trace below shows that the index based on `device` and `cnt` is used: An exact match on device and traversal from 0 to 3 for cnt:
```
'indexBounds': {
    'cnt': ['[-inf.0, 3)'], 
    'device': ['[4711, 4711]']
}
```

This will be a very efficinent operation, as there is usually only one open bucket per device. Only a few keys are examined in the index and exactly one document is returned:
```
'keysExamined': 1,
'nReturned': 1
```

In [None]:
result = db.command("explain", 
                    { 
                        "find": collection.name,
                        "filter":{
                            "device": 4711,
                            "cnt": { "$lt": 3 }
                        }
                    }, 
                    verbosity="executionStats"
                   )

pprint(result["executionStats"]["executionStages"])

### Index Usage for Querying Data

The same holds true for identifying and querying the buckets of interest.

In [None]:
LOWER_BOUND = datetime.datetime(2020, 4, 17, 14, 35, 13, 779000) # Replace with lower bound (copy & paste from results above)
UPPER_BOUND = datetime.datetime(2020, 4, 17, 14, 35, 18, 575000) # Replace with upper bound (copy & paste from results above)

result = db.command("explain", 
                    { 
                        "find": collection.name,
                        "filter":{
                            "device": 4711,
                            "min_ts": { "$lte": UPPER_BOUND },
                            "max_ts": { "$gte": LOWER_BOUND }
                        }
                    }, 
                    verbosity="executionStats"
                   )

pprint(result["executionStats"]["executionStages"])