# Biomedical Data Bases, 2020-2021
### NoSQL databases
These are notes by prof. Davide Salomoni (d.salomoni@unibo.it) for the Biomedical Data Base course at the University of Bologna, academic year 2020-2021.

### Install the redis module and try out the first commands

Remember that __you should have already started the Redis container__. Look up how to do it in the course slides or in the main README page of this GitHub repository.

In [1]:
! pip install redis

Collecting redis
  Using cached redis-4.1.0-py3-none-any.whl (171 kB)
Collecting deprecated>=1.2.3
  Using cached Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Collecting wrapt<2,>=1.10
  Using cached wrapt-1.13.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (81 kB)
Installing collected packages: wrapt, deprecated, redis
Successfully installed deprecated-1.2.13 redis-4.1.0 wrapt-1.13.3


In [2]:
import redis
r = redis.Redis(host="my_redis")
print(r.ping())

True


In [3]:
r.set('temperature', 18.5)

True

Note that in Python by default redis returns strings as _bytes_ (indicated by the _b_ letter before the number in the output below):

In [4]:
r.get('temperature')

b'18.5'

You can convert bytes to strings using _decode_. You could also connect to the Redis server using the parameter _decode_responses=True_ to have all output automatically converted to strings.

In [5]:
r.get('temperature').decode()

'18.5'

### Time-To-Live applied to keys

In [6]:
import time
r.flushall() # delete ALL keys in the DB
TTL = 5
r.set('temperature', 18.5)
r.expire('temperature', TTL)  # the key will be deleted after TTL seconds
print("Temperature =", r.get('temperature'))
print("now sleeping for %s seconds..." % (TTL+1))
time.sleep(TTL+1)
print("Temperature =", r.get('temperature'))

Temperature = b'18.5'
now sleeping for 6 seconds...
Temperature = None


## Working with some Redis types

Let's see how to set and get some of the Redis data types:

In [11]:
r.flushall()  # delete ALL keys in the DB

# string or number
r.set('Temperature', 18.5)

# list, passing a python list
males = ['Peter', 'Paul', 'John']
r.lpush('Male_names', *males)
# we can also create a Redis list passing an explicit list of strings to lpush
r.lpush('Female_names', 'Sarah', 'Mary', 'Elizabeth')

# set, passing a python set
chapters = {'Chapter 1', 'Chapter 2'}
r.sadd('Chapters1:2', *chapters)
# we can also create a Redis set passing an explicit list of strings to sadd
r.sadd('Chapters3:4', 'Chapter 3', 'Chapter 4')

# hash (corresponding to a Python dictionary)
my_dict = {'buongiorno':'buenos dias', 'buonasera':'buenas noches'}
r.hset('Italian:Spanish', mapping=my_dict)

# get the different data types from Redis
print('STRING type in Redis') 
print('  Key: %s --> Value: %s' % ('Temperature', r.get('Temperature')))

print('LIST type in Redis')
print('  Key: %s --> Value: %s' % ('Male_names', r.lrange('Male_names', start=0, end=-1)))
print('  Key: %s --> Value: %s' % ('Female_names', r.lrange('Female_names', start=0, end=-1)))

print('SET type in Redis')
print('  Key: %s --> Value: %s' % ('Chapters1:2', r.smembers('Chapters1:2')))
print('  Key: %s --> Value: %s' % ('Chapters3:4', r.smembers('Chapters3:4')))

print('HASH type in Redis')
print('  Key: %s --> Value: %s' % ('Italian:Spanish', r.hgetall('Italian:Spanish')))

STRING type in Redis
  Key: Temperature --> Value: b'18.5'
LIST type in Redis
  Key: Male_names --> Value: [b'John', b'Paul', b'Peter']
  Key: Female_names --> Value: [b'Elizabeth', b'Mary', b'Sarah']
SET type in Redis
  Key: Chapters1:2 --> Value: {b'Chapter 2', b'Chapter 1'}
  Key: Chapters3:4 --> Value: {b'Chapter 4', b'Chapter 3'}
HASH type in Redis
  Key: Italian:Spanish --> Value: {b'buongiorno': b'buenos dias', b'buonasera': b'buenas noches'}


### Sorted sets

In [12]:
r.flushall()  # delete ALL keys in the DB

True

In [13]:
# create a sorted set with key 'universities'
r.zadd('universities', {'MIT':100, 'Stanford':98.4, 'Harvard':97.9, 'Caltech':97, 'Oxford':96.7})

5

In [14]:
# print the sorted set in ascending order
r.zrange('universities', start=0, end=-1)

[b'Oxford', b'Caltech', b'Harvard', b'Stanford', b'MIT']

In [15]:
# print the sorted set in descending order
r.zrange('universities', start=0, end=-1, desc=True)

[b'MIT', b'Stanford', b'Harvard', b'Caltech', b'Oxford']

In [16]:
# print the sorted set in descending order, including also the score associated to each element
r.zrange('universities', start=0, end=-1, desc=True, withscores=True)

[(b'MIT', 100.0),
 (b'Stanford', 98.4),
 (b'Harvard', 97.9),
 (b'Caltech', 97.0),
 (b'Oxford', 96.7)]

In [17]:
# print the universities with a score between 97 and 98
r.zrange('universities', start=97, end=98, withscores=True, byscore=True)

[(b'Caltech', 97.0), (b'Harvard', 97.9)]

## How fast is it?
### Peformance measurements, using just _set()_ and _get()_

Simple set and get of string items in Redis.

In [7]:
import time
r.flushall()  # delete ALL keys in the DB

start = time.time()
N = 20000
for i in range(N):
    key = "key%s" % i
    value = "value%s" % i
    r.set(key, value)
delta = time.time() - start

print("set: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

start = time.time()
N = 20000
for i in range(N):
    key = "key%s" % i
    value = r.get(key)
delta = time.time() - start

print("get: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

set: 20000 items in 7.51 seconds (2662.52 items/sec)
get: 20000 items in 7.44 seconds (2689.17 items/sec)


### Performance measurements, using pipelines

A marked improvement (more than an order of magnitude) can be had with _pipelines_.

In [8]:
import time
r.flushall()  # delete ALL keys in the DB

start = time.time()
pipe = r.pipeline()
N = 20000
for i in range(N):
    key = "key%s" % i
    value = "value%s" % i
    pipe.set(key, value)
pipe.execute()
delta = time.time() - start

print("set: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

start = time.time()
pipe = r.pipeline()
N = 20000
for i in range(N):
    key = "key%s" % i
    value = pipe.get(key)
pipe.execute()
delta = time.time() - start

print("get: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

set: 20000 items in 0.62 seconds (32508.64 items/sec)
get: 20000 items in 0.51 seconds (39208.92 items/sec)


### Performance measurements, comparison with SQLite

Here we try the same set and get with SQLite, which in this test performs better than the Redis pipeline above. 

In [9]:
import sqlite3 as sql
conn = sql.connect('test_perf.sqlite')
cur = conn.cursor()
cur.execute('''DROP TABLE IF EXISTS Performance;''')
cur.execute('''CREATE TABLE Performance(
                key TEXT NOT NULL UNIQUE,
                value TEXT NOT NULL);
            ''')
conn.commit()
conn.close()

start = time.time()
conn = sql.connect('test_perf.sqlite')
cur = conn.cursor()
N = 20000
for i in range(N):
    key = "key%s" % i
    value = "value%s" % i
    cur.execute('''INSERT INTO Performance VALUES(?, ?)''', (key,value))
conn.commit()
conn.close()
delta = time.time() - start

print("SQLite set: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

start = time.time()
conn = sql.connect('test_perf.sqlite')
cur = conn.cursor()
cur.execute('''SELECT * from Performance''')
results = cur.fetchall()
for res in results:
    (key, value) = res
conn.close()
delta = time.time() - start

print("SQLite get: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

SQLite set: 20000 items in 0.21 seconds (95316.21 items/sec)
SQLite get: 20000 items in 0.11 seconds (175126.78 items/sec)


### Performance measurements, using _mset()_ and _mget()_

However, with some wise usage of Redis built-in features (the _mset_ command to set multiple values), Redis outperforms SQLite. 

In [10]:
import time
r.flushall()  # delete ALL keys in the DB

start = time.time()
N = 20000
my_dict = {"key%s" % i: "value%s" % i for i in range(N)}
r.mset(my_dict)
delta = time.time() - start

print("mset: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

start = time.time()
N = 20000
keys = ["key%s" % i for i in range(N)]
values = r.mget(keys)
results = list(zip(keys, values))
delta = time.time() - start

print("mget: %d items in %.02f seconds"% (N, delta), end=' ')
print("(%.02f items/sec)" % (N/delta))

mset: 20000 items in 0.11 seconds (177219.37 items/sec)
mget: 20000 items in 0.09 seconds (230719.31 items/sec)


## Porting SQL to Redis

Let's map a simple relational DB to Redis. Assume we have the following table in a relational DB:

<img src="https://github.com/dsalomoni/bdb-2022/raw/main/nosql/RDBMS_sample.png" alt="A sample RDBMS table" style="height: 120px;"/>

Note the unique key, represented by the Id column.

In [18]:
# create several redis hashes, each one corresponding to an RDBMS row
r.hset('id:1', mapping={'first':'John', 'last':'Doe', 'age': 21, 'email':'john@doe.com'})
r.hset('id:2', mapping={'first':'Alice', 'last':'Doe', 'age': 22, 'email':'alice@doe.com'})
r.hset('id:3', mapping={'first':'Rose', 'last':'Short', 'age': 21, 'email':'rose@short.com'})

# the Redis equivalent to the SQL 'SELECT * FROM Students WHERE ID=1' would then be:
r.hgetall('id:1')

{b'first': b'John', b'last': b'Doe', b'age': b'21', b'email': b'john@doe.com'}

In [19]:
# create a sorted set with ages, mapping them to the corresponding hash key above
r.zadd('age', mapping={'id:1':21, 'id:2':22, 'id:3':21})
print("all elements:", r.zrange('age', start=0, end=-1, withscores=True))

all elements: [(b'id:1', 21.0), (b'id:3', 21.0), (b'id:2', 22.0)]


In [20]:
# the Redis equivalent to the SQL 'SELECT * FROM Students WHERE Age < 22' would then be:
result = r.zrange('age', start=0, end=21, byscore=True)
print("age<21:", result)

age<21: [b'id:1', b'id:3']


In [21]:
# now get all the info for the returned results:
for res in result:
    print(r.hgetall(res))

{b'first': b'John', b'last': b'Doe', b'age': b'21', b'email': b'john@doe.com'}
{b'first': b'Rose', b'last': b'Short', b'age': b'21', b'email': b'rose@short.com'}
