# Biomedical Data Bases, 2020-2021
### NoSQL databases
These are the notes by prof. Davide Salomoni (d.salomoni@unibo.it) for the Biomedical Data Base course at the University of Bologna, academic year 2020-2021.

### Install the redis module and try the first commands

Remember that __you should have already started the Redis container__. Look up how to do it in the slides or in the main README page of this GitHub.

In [1]:
! pip install redis



In [2]:
import redis
r = redis.Redis(host="my_redis")
print(r.ping())

True


In [3]:
r.set('temperature', 18.5)

True

Note that in Python redis returns strings as _bytes_ (notice the _b_ letter before the number in the output below):

In [4]:
r.get('temperature')

b'18.5'

In [5]:
import time
r.flushall() # delete ALL keys in the DB
TTL = 5
r.set('temperature', 18.5)
r.expire('temperature', TTL)  # the key will be deleted after TTL seconds
print("Temperature =", r.get('temperature'))
print("now sleeping for %s seconds..." % (TTL+1))
time.sleep(TTL+1)
print("Temperature =", r.get('temperature'))

Temperature = b'18.5'
now sleeping for 6 seconds...
Temperature = None


## How fast is it?
### Peformance measurements, test #1

Simple set and get of string items in Redis.

In [6]:
import time
r.flushall()  # delete ALL keys in the DB

start = time.time()
N = 20000
for i in range(N):
    key = "key%s" % i
    value = "value%s" % i
    r.set(key, value)
delta = time.time() - start

print("set: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

start = time.time()
N = 20000
for i in range(N):
    key = "key%s" % i
    value = r.get(key)
delta = time.time() - start

print("get: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

set: 20000 items in 8.34 seconds (2398.40 items/sec)
get: 20000 items in 8.33 seconds (2400.17 items/sec)


### Performance measurements, test #2

A marked improvement (more than an order of magnitude) can be had with _pipelines_.

In [7]:
import time
r.flushall()  # delete ALL keys in the DB

start = time.time()
pipe = r.pipeline()
N = 20000
for i in range(N):
    key = "key%s" % i
    value = "value%s" % i
    pipe.set(key, value)
pipe.execute()
delta = time.time() - start

print("set: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

start = time.time()
pipe = r.pipeline()
N = 20000
for i in range(N):
    key = "key%s" % i
    value = pipe.get(key)
pipe.execute()
delta = time.time() - start

print("get: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

set: 20000 items in 0.80 seconds (24869.23 items/sec)
get: 20000 items in 0.73 seconds (27321.01 items/sec)


### Performance measurements, test #3

Here we try the same set and get with SQLite, which performs better than the Redis pipeline above. 

In [8]:
import sqlite3 as sql
conn = sql.connect('test_perf.sqlite')
cur = conn.cursor()
cur.execute('''DROP TABLE IF EXISTS Performance;''')
cur.execute('''CREATE TABLE Performance(
                key TEXT NOT NULL UNIQUE,
                value TEXT NOT NULL);
            ''')
conn.commit()
conn.close()

start = time.time()
conn = sql.connect('test_perf.sqlite')
cur = conn.cursor()
N = 20000
for i in range(N):
    key = "key%s" % i
    value = "value%s" % i
    cur.execute('''INSERT INTO Performance VALUES(?, ?)''', (key,value))
conn.commit()
conn.close()
delta = time.time() - start

print("SQLite set: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

start = time.time()
conn = sql.connect('test_perf.sqlite')
cur = conn.cursor()
cur.execute('''SELECT * from Performance''')
results = cur.fetchall()
for res in results:
    (key, value) = res
conn.close()
delta = time.time() - start

print("SQLite get: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

SQLite set: 20000 items in 0.44 seconds (45905.51 items/sec)
SQLite get: 20000 items in 0.20 seconds (102557.37 items/sec)


### Performance measurements, test #4

However, with some wise usage of Redis built-in features (the _mset_ command to set multiple values), Redis outperforms SQLite. 

In [9]:
import time
r.flushall()  # delete ALL keys in the DB

start = time.time()
N = 20000
my_dict = {"key%s" % i: "value%s" % i for i in range(N)}
r.mset(my_dict)
delta = time.time() - start

print("mset: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

start = time.time()
N = 20000
keys = ["key%s" % i for i in range(N)]
values = r.mget(keys)
results = list(zip(keys, values))
delta = time.time() - start

print("mget: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

mset: 20000 items in 0.19 seconds (103120.53 items/sec)
mget: 20000 items in 0.16 seconds (124019.55 items/sec)


## Some Redis types

Let's see how to set and get some of the Redis data types:

In [10]:
# string
r.set('Temperature', 18.5)
# get the string value
get_string = r.get('Temperature')

# list
# Note that we can create a Redis list directly via a Python list, or passing an explicit list of strings
names = ['Peter', 'Paul', 'John']
r.delete('Names')
r.lpush('Names', *names)
# get the list members
get_list = r.lrange('Names', start=0, end=-1)

# set
r.delete('Chapters')
r.sadd('Chapters', 'Chapter 1', 'Chapter 2')
# same thing, but passing a Python set to sadd
r.delete('Chapters')
chapters = {'Chapter 1', 'Chapter 2'}
r.sadd('Chapters', *chapters)
# get the set members
get_set = r.smembers('Chapters')

# hash
my_dict = {'buongiorno':'buenos dias', 'buonasera':'buenas noches'}
r.delete('ITES')
r.hset('ITES', mapping=my_dict)
# get the hash members
get_hash = r.hgetall('ITES')

# print types and values
print("STRING type in Redis: %s" % r.type('Temperature')) 
print("  Value: %s" % get_string)
print("  Type in Python: %s" % type(get_string))

print("LIST type in Redis: %s" % r.type('Names'))
print("  Value: %s" % get_list)
print("  Type in Python: %s" % type(get_list))

print("SET type in Redis: %s" % r.type('Chapters'))
print("  Value: %s" % get_set)
print("  Type in Python: %s" % type(get_set))

print("HASH type in Redis: %s" % r.type('ITES'))
print("  Value: %s" % get_hash)
print("  Type in Python: %s" % type(get_hash))

STRING type in Redis: b'string'
  Value: b'18.5'
  Type in Python: <class 'bytes'>
LIST type in Redis: b'list'
  Value: [b'John', b'Paul', b'Peter']
  Type in Python: <class 'list'>
SET type in Redis: b'set'
  Value: {b'Chapter 1', b'Chapter 2'}
  Type in Python: <class 'set'>
HASH type in Redis: b'hash'
  Value: {b'buongiorno': b'buenos dias', b'buonasera': b'buenas noches'}
  Type in Python: <class 'dict'>


## Introduction to PubSub

See the dedicated _Generator_ and _Consumer_ notebooks for a more complete test of PubSub.

In [11]:
# a subscriber subscribes to the "bdb" channel...
a_subscriber = redis.Redis(host="my_redis")
sub = a_subscriber.pubsub()
sub.subscribe('bdb')

# ... and then gets messages over that channel
print("First get: ", sub.get_message())
print("Second get: ", sub.get_message())

First get:  {'type': 'subscribe', 'pattern': None, 'channel': b'bdb', 'data': 1}
Second get:  None


In [12]:
# a publisher publishes something on the "bdb" channel
a_publisher = redis.Redis(host="my_redis")
a_publisher.publish('bdb', 'pubsub test')

# the subscriber gets another messsage... this time it can read it
print("Third get: ", sub.get_message())

Third get:  {'type': 'message', 'pattern': None, 'channel': b'bdb', 'data': b'pubsub test'}


## Porting SQL to Redis

A simple example.

In [13]:
# an example of mapping a relational DB to Redis
r.hset('id:1', mapping={'first':'John', 'last':'Doe', 'age': 21, 'email':'john@doe.com'})
r.hset('id:2', mapping={'first':'Alice', 'last':'Doe', 'age': 22, 'email':'alice@doe.com'})
r.hset('id:3', mapping={'first':'Rose', 'last':'Short', 'age': 21, 'email':'rose@short.com'})

# the Redis equivalent to the SQL 'SELECT * FROM Students WHERE ID=1' :
r.hgetall('id:1')

{b'first': b'John', b'last': b'Doe', b'age': b'21', b'email': b'john@doe.com'}

In [14]:
# create a sorted set with ages
r.zadd('age', mapping={'id:1':21, 'id:2':22, 'id:3':21})
print("all elements:", r.zrange('age', start=0, end=-1, withscores=True))

# the Redis equivalent to the SQL 'SELECT * FROM Students WHERE Age < 22' :
result = r.zrangebyscore('age', min=0, max=21)
print("age<21:", result)

# now get all info for the returned results:
for res in result:
    print(r.hgetall(res))

all elements: [(b'id:1', 21.0), (b'id:3', 21.0), (b'id:2', 22.0)]
age<21: [b'id:1', b'id:3']
{b'first': b'John', b'last': b'Doe', b'age': b'21', b'email': b'john@doe.com'}
{b'first': b'Rose', b'last': b'Short', b'age': b'21', b'email': b'rose@short.com'}
