# NoSQL Tools: Lightening Memory-mapped DB (LMDB)

  
  
LMDB is a fast, light-weight, and efficient memory-mapped file interface that provides key-value data storage. Please take ta look at the [documentation](https://lmdb.readthedocs.io/en/release/#)

#### LMDB is the SQLite3 equivalent for NoSQL


## FROM Wikipedia

---
_Lightning Memory-Mapped Database_ (LMDB) is a software library that provides a high-performance embedded transactional database in the form of a key-value store. 
LMDB is written in C with API bindings for several programming languages. 
LMDB stores arbitrary **key/data pairs** as byte arrays, has a range-based search capability, supports multiple data items for a single key and has a special mode for appending records at the end of the database (MDB_APPEND) which gives a dramatic write performance increase over other similar stores.
<span style="background:yellow">LMDB is not a relational database</span>, it is strictly a key-value store like Berkeley DB and dbm.

LMDB may also be used concurrently in a multi-threaded or multi-processing environment, with read performance scaling linearly by design. 
LMDB databases may have only one writer at a time, however unlike many similar key-value databases, write transactions do not block readers, nor do readers block writers. 
LMDB is also unusual in that multiple applications on the same system may simultaneously open and use the same LMDB store, as a means to scale up performance. 
Also, LMDB does not require a transaction log (thereby increasing write performance by not needing to write data twice) because it maintains data integrity inherently by design.

#### Technical Details

Internally LMDB uses B+Tree data structures. 
The efficiency of its design and small footprint had the unintended side-effect of providing good write performance as well. LMDB has an API similar to Berkeley DB and dbm. 
LMDB treats the computer's memory as a single address space, shared across multiple processes or threads using shared memory with copy-on-write semantics (known historically as a single-level store). 
Due to most former modern computing architectures having 32-bit memory address space limitations, which imposes a hard limit of 4GB on the size of any database using such techniques, the effectiveness of the technique of directly mapping a database into a single-level store was strictly limited. 
However, today's 64 bit processors now mostly implement 48 bit address spaces, giving access to 47 bit addresses or 128 terabytes of database size, making databases using shared memory useful once again in real-world applications.

Specific noteworthy technical features of LMDB are:

  * Its use of B+Tree. With an LMDB instance being in shared memory and the B+Tree block size being set to the OS page size, access to an LMDB store is extremely memory efficient[7]
  * New data is written without overwriting or moving existing data. This results in guaranteed data integrity and reliability without requiring transaction logs or cleanup services.
  * The provision of a unique append-write mode (MDB_APPEND)[1] which is implemented by allowing the new record to be added directly to the end of the B+Tree. This reduces the number of reads and write page operations, resulting in greatly-increased performance but requiring that the programmer is responsible for ensuring keys are already in sorted order when storing into the DB.
  * Copy-on-write semantics help ensure data integrity as well as providing transactional guarantees and simultaneous access by readers without requiring any locking, even by the current writer. New memory pages required internally during data modifications are allocated through copy-on-write semantics by the underlying OS: the LMDB library itself never actually modifies older data being accessed by readers because it simply cannot do so: any shared-memory updates automatically create a completely independent copy of the memory-page being written to.
  * As LMDB is memory-mapped, it can return direct pointers to memory addresses of keys and values through its API, thereby avoiding unnecessary and expensive copying of memory. This results in greatly-increased performance (especially when the values stored are extremely large), and expands the potential use cases for LMDB.
  * LMDB also tracks unused memory pages, using a B+Tree to keep track of pages freed (no longer needed) during transactions. By tracking unused pages the need for garbage-collection (and a garbage collection phase which would consume CPU cycles) is completely avoided. Transactions which need new pages are first given pages from this unused free pages tree; only after these are used up will it expand into formerly unused areas of the underlying memory-mapped file. On a modern filesystem with sparse file support this helps minimise actual disk usage.

---

![MISSING ComparisonGrid.png](../images/ComparisonGrid.png)

---

## Using LMDB from Python

LMDB is used by various software libraries for high-performance key-value storage.
One example is the [Caffe deep learning library](http://caffe.berkeleyvision.org/).

In this notebook, we are going to use LMDB and JSON to look at small-scale NoSQL concepts and trade-offs.
Keep in mind that LMDB does not require JSON formatted data, it can store binary such as [pickled](https://docs.python.org/3/library/pickle.html) data as well.
It can also store other formats of string data, such as comma and tab delimited rows.

As you read through this notebook, please ensure you are carefully reading the code comments


In [1]:
# Import the LMDB Pyhon Interface (interacts with System LMDB library)
# See: https://lmdb.readthedocs.io/en/release/#
import lmdb

# Import the JSON Python library
import json



### Use the help command on the library to get more information!

In [2]:
help(lmdb)

Help on package lmdb:

NAME
    lmdb - cffi wrapper for OpenLDAP's "Lightning" MDB database.

DESCRIPTION
    Please see https://lmdb.readthedocs.io/

PACKAGE CONTENTS
    __main__
    _config
    cffi
    cpython
    tool

CLASSES
    builtins.Exception(builtins.BaseException)
        Error
            BadDbiError
            BadRslotError
            BadTxnError
            BadValsizeError
            CorruptedError
            CursorFullError
            DbsFullError
            DiskError
            IncompatibleError
            InvalidError
            InvalidParameterError
            KeyExistsError
            LockError
            MapFullError
            MapResizedError
            MemoryError
            NotFoundError
            PageFullError
            PageNotFoundError
            PanicError
            ReadersFullError
            ReadonlyError
            TlsFullError
            TxnFullError
            VersionMismatchError
    builtins.object
        builtins.Cursor
 

#### Now we will load some JSON data into a LMDB

In [3]:
# Read in some sample JSON Data# 
array_of_dictionary = []
filename = "/dsa/data/all_datasets/inc5000_2016.json"

# Reading data
with open(filename, 'r') as f:
     array_of_dictionary = json.load(f)
        


In [6]:
#print(array_of_dictionary)

In [4]:

print("Number of lines: {}".format(len(array_of_dictionary)))

Number of lines: 5002


In [7]:

# Dump the first row to understand the data
for key in array_of_dictionary[0]:
    value = array_of_dictionary[0][key]
    print("{} -> {}".format(key,value))

# Alternative
#for key, value in array_of_dictionary[0].items():
#    print("{} -> {}".format(key,value))


url -> loot-crate
workers -> 218
ifmid -> 2
rank -> 1
metro -> Los Angeles
state_l -> California
yrs_on_list -> 1
growth -> 66788.5962
company -> Loot Crate
ifiid -> 4
state_s -> CA
revenue -> 116247698
city -> Los Angeles
industry -> Consumer Products & Services
id -> 42940


#### Note: The records have an `id` field.  

Which we will use as the LMDB key!

In [8]:
# Open the DB file
# The DB is a memory (env)ironment
env = lmdb.open('./test_nosql.db')

#############################
## SAVING KEYS FOR LATER !
#############################
all_keys = []

# Begin a transaction to write data
with env.begin(write=True) as txn:
    # For every row in the array | file, references as json_row
    for json_row in array_of_dictionary:
        
        # Pull the ID out 
        key = json_row['id']
        all_keys.append(key)
        # Debug Print, uncomment if you want to see the keys just before insert
        print("About to store key: {}".format(key))
        
        #
        txn.put(str(key).encode('UTF8'), json.dumps(json_row).encode('UTF8'))

print(env.stat())
env.close()

About to store key: 42940
About to store key: 42941
About to store key: 42942
About to store key: 36643
About to store key: 36639
About to store key: 42943
About to store key: 42944
About to store key: 42945
About to store key: 42946
About to store key: 42947
About to store key: 42948
About to store key: 42949
About to store key: 42950
About to store key: 42951
About to store key: 37197
About to store key: 42952
About to store key: 42953
About to store key: 42954
About to store key: 42955
About to store key: 42956
About to store key: 36809
About to store key: 42957
About to store key: 42958
About to store key: 42959
About to store key: 37267
About to store key: 42960
About to store key: 44578
About to store key: 42961
About to store key: 42962
About to store key: 42963
About to store key: 42964
About to store key: 42965
About to store key: 42966
About to store key: 42967
About to store key: 42968
About to store key: 42969
About to store key: 42970
About to store key: 42971
About to sto

About to store key: 37238
About to store key: 43462
About to store key: 43463
About to store key: 37366
About to store key: 37985
About to store key: 37207
About to store key: 36995
About to store key: 26415
About to store key: 21325
About to store key: 24946
About to store key: 37199
About to store key: 26242
About to store key: 37632
About to store key: 37386
About to store key: 23378
About to store key: 43464
About to store key: 43465
About to store key: 25504
About to store key: 36641
About to store key: 37037
About to store key: 25446
About to store key: 43466
About to store key: 25985
About to store key: 43467
About to store key: 43468
About to store key: 37577
About to store key: 37345
About to store key: 43469
About to store key: 43470
About to store key: 43471
About to store key: 25233
About to store key: 37269
About to store key: 43472
About to store key: 43473
About to store key: 37483
About to store key: 37191
About to store key: 21645
About to store key: 25115
About to sto

About to store key: 19793
About to store key: 36958
About to store key: 24947
About to store key: 37449
About to store key: 26307
About to store key: 43668
About to store key: 43669
About to store key: 21288
About to store key: 23309
About to store key: 37523
About to store key: 23584
About to store key: 36866
About to store key: 37107
About to store key: 26023
About to store key: 37044
About to store key: 43670
About to store key: 25679
About to store key: 36841
About to store key: 37439
About to store key: 21429
About to store key: 26479
About to store key: 23053
About to store key: 37248
About to store key: 37473
About to store key: 43671
About to store key: 37430
About to store key: 17646
About to store key: 26555
About to store key: 23319
About to store key: 25870
About to store key: 43672
About to store key: 19596
About to store key: 26296
About to store key: 24940
About to store key: 24164
About to store key: 25677
About to store key: 36657
About to store key: 21655
About to sto

About to store key: 18124
About to store key: 2680
About to store key: 23968
About to store key: 43902
About to store key: 23219
About to store key: 43903
About to store key: 43904
About to store key: 26512
About to store key: 43905
About to store key: 43906
About to store key: 43907
About to store key: 37801
About to store key: 24992
About to store key: 43908
About to store key: 23516
About to store key: 37554
About to store key: 43909
About to store key: 43910
About to store key: 37769
About to store key: 17954
About to store key: 26236
About to store key: 25459
About to store key: 25100
About to store key: 23596
About to store key: 37725
About to store key: 23945
About to store key: 922
About to store key: 24815
About to store key: 37336
About to store key: 43911
About to store key: 38117
About to store key: 37705
About to store key: 23715
About to store key: 37738
About to store key: 25678
About to store key: 43912
About to store key: 22784
About to store key: 37558
About to store 

About to store key: 44080
About to store key: 4946
About to store key: 37739
About to store key: 2872
About to store key: 36787
About to store key: 26297
About to store key: 17294
About to store key: 44081
About to store key: 4659
About to store key: 1986
About to store key: 44082
About to store key: 44083
About to store key: 44084
About to store key: 23452
About to store key: 23105
About to store key: 44085
About to store key: 44086
About to store key: 37375
About to store key: 44087
About to store key: 854
About to store key: 23575
About to store key: 21685
About to store key: 44088
About to store key: 44089
About to store key: 23700
About to store key: 19420
About to store key: 21167
About to store key: 44090
About to store key: 25822
About to store key: 21296
About to store key: 23391
About to store key: 44091
About to store key: 44092
About to store key: 38249
About to store key: 44093
About to store key: 18097
About to store key: 8906
About to store key: 37707
About to store key:

About to store key: 21541
About to store key: 37532
About to store key: 24291
About to store key: 44224
About to store key: 1987
About to store key: 24490
About to store key: 44225
About to store key: 26416
About to store key: 2526
About to store key: 44226
About to store key: 2502
About to store key: 44227
About to store key: 20389
About to store key: 23672
About to store key: 19880
About to store key: 44228
About to store key: 44229
About to store key: 24830
About to store key: 44230
About to store key: 44231
About to store key: 5019
About to store key: 23476
About to store key: 37121
About to store key: 44232
About to store key: 21720
About to store key: 420
About to store key: 24837
About to store key: 44233
About to store key: 25257
About to store key: 44234
About to store key: 44235
About to store key: 22576
About to store key: 22654
About to store key: 44236
About to store key: 25503
About to store key: 37636
About to store key: 21260
About to store key: 25429
About to store key

About to store key: 19849
About to store key: 20852
About to store key: 3634
About to store key: 21496
About to store key: 25081
About to store key: 21752
About to store key: 44404
About to store key: 44405
About to store key: 44406
About to store key: 21453
About to store key: 24374
About to store key: 1965
About to store key: 38179
About to store key: 18441
About to store key: 22329
About to store key: 25997
About to store key: 44407
About to store key: 478
About to store key: 44408
About to store key: 24915
About to store key: 44409
About to store key: 36963
About to store key: 20420
About to store key: 3885
About to store key: 24635
About to store key: 741
About to store key: 926
About to store key: 44410
About to store key: 24413
About to store key: 23558
About to store key: 25981
About to store key: 6714
About to store key: 17782
About to store key: 24353
About to store key: 25002
About to store key: 4952
About to store key: 44411
About to store key: 44412
About to store key: 244

About to store key: 17587
About to store key: 44552
About to store key: 19755
About to store key: 11437
About to store key: 20591
About to store key: 24407
About to store key: 1265
About to store key: 37750
About to store key: 3521
About to store key: 18546
About to store key: 21185
About to store key: 19652
About to store key: 10061
About to store key: 44553
About to store key: 24176
About to store key: 3798
About to store key: 44554
About to store key: 21770
About to store key: 38004
About to store key: 44555
About to store key: 44556
About to store key: 4204
About to store key: 44557
About to store key: 38293
About to store key: 3387
About to store key: 26358
About to store key: 17529
About to store key: 44558
About to store key: 4803
About to store key: 44559
About to store key: 22146
About to store key: 26418
About to store key: 1779
About to store key: 38253
About to store key: 44560
About to store key: 23245
About to store key: 25166
About to store key: 19340
About to store key:

In [9]:
import time

env = lmdb.open('./test_nosql.db')
with env.begin(write=True) as txn:
    # Start the Timer
    start = time.perf_counter()
    json_row = txn.get("42940".encode('UTF8'))
    # Stop the Timer
    end = time.perf_counter()
    
    # How long did this look up take?
    print("Time to Look Up Key in LMDB:")
    print(end - start)
    print('------------------')
    print(json_row)
env.close();

Time to Look Up Key in LMDB:
0.02399722402333282
------------------
b'{"url": "loot-crate", "workers": 218, "ifmid": 2, "rank": 1, "metro": "Los Angeles", "state_l": "California", "yrs_on_list": 1, "growth": 66788.5962, "company": "Loot Crate", "ifiid": 4, "state_s": "CA", "revenue": 116247698, "city": "Los Angeles", "industry": "Consumer Products & Services", "id": 42940}'


### We have loaded data and pulled it back out!

### How efficient is this key-value store?

This cell will randomly select 1000 keys from the `all_keys` list and measure the total time to retrieve them.

In [10]:
import numpy
# See: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html
random_keys = numpy.random.choice(all_keys, size=1000, replace=False)

In [11]:
# Timer variable
timer = 0.0
# Open the DB 
env = lmdb.open('./test_nosql.db')

# Set up a transaction
with env.begin(write=False) as txn:
    
    # for each key in the random key list
    for key in random_keys:
        # Format the key
        keyStr = "{}".format(key)
        # Start the Timer
        start = time.perf_counter()
        # Fetch the row from the key-value store
        json_row = txn.get(keyStr.encode('UTF8'))
        # Stop the Timer
        end = time.perf_counter()
        
        # Add the time to the accumulator
        timer += (end-start)
        
# close up shop, and print
env.close();
print("Total Time for {} key queries was {}".format(len(random_keys),round(timer,6)))

Total Time for 1000 key queries was 0.030731


## Building a Point of Comparison with SQL

The below code uses Pandas to create a DB table, then indexes the ID column.

How much faster is this then SQLite3, the equivalent file SQL API.

In [14]:
# Pandas should be familiar from the boot camps
# See: http://pandas.pydata.org/
import pandas
# Pandas loves a list of dictionaries!!!
df = pandas.DataFrame(array_of_dictionary)
df.head().transpose()

Unnamed: 0,0,1,2,3,4
city,Los Angeles,Somerville,Visalia,Evansville,Atlanta
company,Loot Crate,Paint Nite,CalCom Solar,eLuxurySupply.com,Company.com
growth,66788.6,36555.2,31633.5,23619.7,23486.9
id,42940,42941,42942,36643,36639
ifiid,4,4,16,18,17
ifmid,2,6,172,641,29
industry,Consumer Products & Services,Consumer Products & Services,Energy,Retail,Business Products & Services
metro,Los Angeles,Boston,"Visalia-Porterville, CA","Evansville, IN-KY",Atlanta
rank,1,2,3,4,5
revenue,116247698,55018793,33507450,30695215,33370967


In [15]:
import sqlite3
import os

# We will create a DB file, similar to what
# was done in the DB/SQL boot camp
sql_filename = './test_sql.db'

# If the file exists, remove it!
if os.path.exists(sql_filename):
    os.remove(sql_filename)


# Open the file and initialize 
# See: https://docs.python.org/3/library/sqlite3.html

with sqlite3.connect(sql_filename) as conn:
    
    # Use the magic of pandas and some sqlalchemy under the hood
    df.to_sql('inc5k',conn)

    # Get cursor
    c = conn.cursor()

    # Creating an unique index on the id column
    c.execute('CREATE INDEX id_idx on inc5k(id)')
    
    # commit changes
    conn.commit()

conn.close()


### What have we done:

![MISSING sqlite3_inc5k.png](../images/sqlite3_inc5k.png)

In [16]:
with sqlite3.connect(sql_filename) as conn:

    # Get cursor
    c = conn.cursor()

    # Start the Timer
    start = time.perf_counter()
    
    # json_row = txn.get("42940".encode('UTF8'))
    # 
    c.execute('SELECT * FROM inc5k WHERE id=42940')
    row = c.fetchone()
    
    # Stop the Timer
    end = time.perf_counter()
    
    # How long did this look up take?
    print("Time to Look Up Key in SQL:")
    print(end - start)
    print('------------------')
    print(row)

Time to Look Up Key in SQL:
0.002359249017899856
------------------
(0, 'Los Angeles', 'Loot Crate', 66788.5962, 42940, 4, 2, 'Consumer Products & Services', 'Los Angeles', 1, 116247698, 'California', 'CA', 'loot-crate', 218, 1)


### What is the 1000x `id` look up time in SQL?

In [17]:
timer = 0.0
with sqlite3.connect(sql_filename) as conn:
    c = conn.cursor()
    for key in random_keys:
        keyStr = "{}".format(key)
        start = time.perf_counter()
        c.execute('SELECT * FROM inc5k WHERE id=42940')
        row = c.fetchone()
        # Stop the Timer
        end = time.perf_counter()
        timer += (end-start)

print("Total Time for {} key queries was {}".format(len(random_keys),round(timer,6)))

Total Time for 1000 key queries was 0.099873


### This should be about 4-8 times longer than the number for the 1000x look ups in NoSQL example above.

Some of the time difference comes down to a few concepts:   
   * Flexibility:  SQL tables support very flexible and dynamic access, primarily due to having a dedicated language for access. 
   * Query Parsing: All commands must be parsed for SQL grammar, then objects verified (table names, column names, etc.), then planned and then finally executed.
   * The functional access to the NoSQL eliminates the need for a grammar, as the functional interface is the highly constrained grammar.
   

#### So, why use SQL?  Access patterns dictate storage!

For example, what if we want to count the number of entries per state abbreviation? 
NoSQL support two access methods: 
  1. key lookup - random access by key
  1. iteration - sequential access

Looking back at the table above, that is the `state_s` column.  In SQL, this is simply:

```SQL
SELECT state_s, count(*) FROM inc5k GROUP BY state_s
```

Another example, find only the entries with in the state of `'MO'`.

```SQL
SELECT * FROM inc5k WHERE state_s = 'MO'
```

We can accelerate the SQL database ability to answer these questions using indexes.


In [18]:
with sqlite3.connect(sql_filename) as conn:
    
    # Get cursor
    c = conn.cursor()

    # Creating an unique index
    c.execute('CREATE INDEX state_code_idx on inc5k(state_s)')
    
    # commit changes
    conn.commit()

conn.close()

In [21]:
timer = 0.0
with sqlite3.connect(sql_filename) as conn:
    
    # Get cursor
    c = conn.cursor()

    # Run the aggregation ... 100 times
    for i in range(0,100):
        
        # Start the timer
        start = time.perf_counter()
        
        # run an aggregation, then pull all the data
        c.execute('SELECT state_s, count(*) FROM inc5k GROUP BY state_s')
        rows = c.fetchall()
        
        # Compute the time for this iteration, accumulate
        end = time.perf_counter()
        timer += (end-start)

    print("Total Time for 100 aggregation queries was {}".format(round(timer,6)))
    
    for r in rows:
        print(r)

Total Time for 100 aggregation queries was 0.163767
('AK', 1)
('AL', 54)
('AR', 8)
('AZ', 108)
('CA', 667)
('CO', 126)
('CT', 40)
('DC', 51)
('DE', 15)
('FL', 338)
('GA', 232)
('HI', 6)
('IA', 24)
('ID', 13)
('IL', 217)
('IN', 72)
('KS', 38)
('KY', 30)
('LA', 41)
('MA', 149)
('MD', 119)
('ME', 8)
('MI', 118)
('MN', 93)
('MO', 72)
('MS', 11)
('MT', 7)
('NC', 130)
('ND', 9)
('NE', 24)
('NH', 22)
('NJ', 161)
('NM', 7)
('NV', 21)
('NY', 335)
('OH', 177)
('OK', 25)
('OR', 61)
('PA', 185)
('PR', 1)
('RI', 5)
('SC', 56)
('SD', 9)
('TN', 80)
('TX', 395)
('UT', 105)
('VA', 328)
('VT', 3)
('WA', 140)
('WI', 53)
('WV', 10)
('WY', 2)


In [20]:
######################
# NoSQL equivalent
######################
timer = 0.0
env = lmdb.open('./test_nosql.db')
with env.begin(write=False) as txn:
    
    # Get an access cursor
    cursor = txn.cursor()
    
    # Run the aggregation ... 100 times
    for i in range(0,100):
        
        # Create an empty dictionary
        states = {}
        # Start timer
        start = time.perf_counter()
        
        # for each record in the database
        for key,value in cursor.iternext(keys=True, values=True):

            # Get the record's state_s value
            state_str = json.loads(value.decode('UTF8'))['state_s']

            # Build aggregation counts
            # This should seem a standard pattern by now, 
            # based on the Python boot camp
            if state_str in states:
                states[state_str] += 1
            else:
                states[state_str] = 1
                
        # Stop timer and accumulate
        end = time.perf_counter()
        timer += (end-start)
                
env.close();

for state,count in states.items():
    print("({},{})".format(state,count))

print("Total Time for 100 aggregation queries was {}".format(round(timer,6)))


(UT,105)
(CA,667)
(NC,130)
(MN,93)
(NY,335)
(MO,72)
(ME,8)
(OK,25)
(MD,119)
(IA,24)
(KY,30)
(NJ,161)
(KS,38)
(MI,118)
(HI,6)
(TX,395)
(MS,11)
(NM,7)
(ID,13)
(CT,40)
(GA,232)
(DC,51)
(PA,185)
(SC,56)
(NV,21)
(AZ,108)
(AR,8)
(AK,1)
(MT,7)
(DE,15)
(PR,1)
(RI,5)
(WV,10)
(OR,61)
(FL,338)
(VA,328)
(IN,72)
(WI,53)
(NH,22)
(OH,177)
(LA,41)
(WY,2)
(ND,9)
(WA,140)
(IL,217)
(VT,3)
(CO,126)
(SD,9)
(NE,24)
(MA,149)
(TN,80)
(AL,54)
Total Time for 100 aggregation queries was 10.92779


<span style="background:yellow">WOW!!!</span>
That was slow!!!!
The processing, in Python, to generate the aggregation is _usually_ 60-70 times slower!

### Maybe that was just aggregations, what about column-based filtering?

In [22]:
timer = 0.0
with sqlite3.connect(sql_filename) as conn:
    
    # Get cursor
    c = conn.cursor()

    # Run the aggregation ... 100 times
    for i in range(0,100):
        start = time.perf_counter()
        c.execute("SELECT * FROM inc5k WHERE state_s='MO'")
        rows = c.fetchall()
        end = time.perf_counter()
        timer += (end-start)

    print("Total Time for 100 filter queries was {}".format(round(timer,6)))
    
#   for r in rows:
#       print(r)

Total Time for 100 filter queries was 0.082275


In [23]:
timer = 0.0
env = lmdb.open('./test_nosql.db')
with env.begin(write=False) as txn:
    cursor = txn.cursor()
    
    # Run the aggregation ... 100 times
    for i in range(0,100):
        rows = []
        start = time.perf_counter()
        for key,value in cursor.iternext(keys=True, values=True):

            row_str = json.loads(value.decode('UTF8'))

            if row_str['state_s'] in ('MO'):
                rows.append(row_str)

        end = time.perf_counter()
        timer += (end-start)
                
env.close();

#for r in rows:
#       print(r)

print("Total Time for 100 aggregation queries was {}".format(round(timer,6)))


Total Time for 100 aggregation queries was 10.825851


<span style="background:yellow">**WOW!!!**</span>
That was even slower!!!!
The processing, in Python, to generate the filtering is _approximately_ 80-120 times slower!

---

## Takeaway:  NoSQL is superior for key-base lookups, but not arbitrary row filtering.

### So, if the most simple of analytics are faster with SQL than NoSQL, then why are we discussing NoSQL?

The key-value data stores can be distributed easily, as systems were typically not expecting to facilitate aggregations and analytics during the initial designs. 

The over-arching goal was always key look-up performance and distributed processing.

The next portions of this module will examine larger scale, distributed architectures for data.

# SAVE YOUR NOTEBOOK