<a href="https://colab.research.google.com/github/catacg/BDS-book/blob/master/session_6/Key_Value_Stores.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Key-value Stores with TinyDB
This tutorial gives a basic introduction to working with key-value (KV) stores (or document DBs). We will be working with [TinyDB](https://tinydb.readthedocs.io/en/latest/index.html), an in-memory Python database, which is particularly attractive here as it is, as the name suggests, pretty small and lightweight.

We will begin with the relevant installs:

In [1]:
!pip install tinydb
!pip install faker
!pip install python-lorem

Collecting tinydb
  Downloading tinydb-4.8.2-py3-none-any.whl.metadata (6.7 kB)
Downloading tinydb-4.8.2-py3-none-any.whl (24 kB)
Installing collected packages: tinydb
Successfully installed tinydb-4.8.2
Collecting faker
  Downloading faker-40.5.1-py3-none-any.whl.metadata (16 kB)
Downloading faker-40.5.1-py3-none-any.whl (2.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m71.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faker
Successfully installed faker-40.5.1
Collecting python-lorem
  Downloading python_lorem-1.3.0.post3-cp312-none-any.whl.metadata (3.7 kB)
Downloading python_lorem-1.3.0.post3-cp312-none-any.whl (9.1 kB)
Installing collected packages: python-lorem
Successfully installed python-lorem-1.3.0.post3


We will be building our database using fake data generated by Faker:

In [2]:
import random
from faker import Faker
import pandas as pd
from lorem import paragraph
import itertools

fake = Faker()

def get_person():
  person = {}
  person['id'] = random.randrange(1000,9999999999999)
  person['first_name'] = fake.first_name()
  person['last_name'] = fake.last_name()
  person['email'] = fake.unique.ascii_email()
  person['company'] = fake.company()
  person['phone'] = fake.phone_number()
  person['review'] = list(itertools.islice(paragraph(count=1), 1))
  return person

personlist = []
for x in range(100):
  personlist.append(get_person())

df = pd.DataFrame.from_dict(personlist)
df.head()

Unnamed: 0,id,first_name,last_name,email,company,phone,review
0,7024631728926,Kristin,Whitney,andrew25@thompson.com,Hopkins-Delgado,001-286-956-4613x25474,[Quis sed sint dolor. Fugiat eu aliquip esse. ...
1,3647338498088,Pamela,Koch,ckelley@hotmail.com,Ward-Miller,441.974.5329,[Sit nulla nisi tempor pariatur. Nisi occaecat...
2,64997019465,Katrina,Rush,ubrown@little-taylor.com,Sanchez PLC,857-436-1973,[Aute ullamco proident irure deserunt. Ea sit ...
3,9337733896491,Oscar,Wang,edward83@rivers-collins.com,Trujillo-Wagner,942-872-9311x900,[Anim exercitation irure ut adipiscing anim mo...
4,7508733118717,Kayla,Watts,apowell@hotmail.com,Williams-Alvarado,001-771-310-2764x510,[Reprehenderit est sint dolor. Enim excepteur ...


Some fairly standard personal information and an additonal text column (using lorem ipsum). We have created this as a Pandas dataframe, but like most KV stores, TinyDB prefers data stored as a dictionary:

In [3]:
fake_data = df.to_dict(orient='records')
fake_data

[{'id': 7024631728926,
  'first_name': 'Kristin',
  'last_name': 'Whitney',
  'email': 'andrew25@thompson.com',
  'company': 'Hopkins-Delgado',
  'phone': '001-286-956-4613x25474',
  'review': ['Quis sed sint dolor. Fugiat eu aliquip esse. Est occaecat amet consequat irure. Est duis laborum nisi, minim aliqua ad labore id consectetur. Proident ipsum reprehenderit et qui ullamco magna veniam. Consequat nostrud occaecat anim non eu. Voluptate incididunt occaecat eu. Duis culpa excepteur in reprehenderit excepteur laborum. Consequat tempor enim ex laboris sint.']},
 {'id': 3647338498088,
  'first_name': 'Pamela',
  'last_name': 'Koch',
  'email': 'ckelley@hotmail.com',
  'company': 'Ward-Miller',
  'phone': '441.974.5329',
  'review': ['Sit nulla nisi tempor pariatur. Nisi occaecat et lorem laborum reprehenderit. Eu cillum elit consectetur ad, minim sed ea laboris id, irure non mollit commodo cillum. Eiusmod consequat mollit ad eu ipsum enim, elit est enim dolore aliquip aute ex. Ex exerc

With this transform in place we can load the data into our database. You may note the database itself is specified as JSON format:

In [4]:
from tinydb import TinyDB, Query

db = TinyDB('db.json')

for record in fake_data:
  db.insert(record)

We can check this has worked with a simple Python loop:

In [5]:
for item in db:
  print(item)

{'id': 7024631728926, 'first_name': 'Kristin', 'last_name': 'Whitney', 'email': 'andrew25@thompson.com', 'company': 'Hopkins-Delgado', 'phone': '001-286-956-4613x25474', 'review': ['Quis sed sint dolor. Fugiat eu aliquip esse. Est occaecat amet consequat irure. Est duis laborum nisi, minim aliqua ad labore id consectetur. Proident ipsum reprehenderit et qui ullamco magna veniam. Consequat nostrud occaecat anim non eu. Voluptate incididunt occaecat eu. Duis culpa excepteur in reprehenderit excepteur laborum. Consequat tempor enim ex laboris sint.']}
{'id': 3647338498088, 'first_name': 'Pamela', 'last_name': 'Koch', 'email': 'ckelley@hotmail.com', 'company': 'Ward-Miller', 'phone': '441.974.5329', 'review': ['Sit nulla nisi tempor pariatur. Nisi occaecat et lorem laborum reprehenderit. Eu cillum elit consectetur ad, minim sed ea laboris id, irure non mollit commodo cillum. Eiusmod consequat mollit ad eu ipsum enim, elit est enim dolore aliquip aute ex. Ex exercitation reprehenderit lorem

With our database setup, we can start to query our records. In TinyDB we do this by creating a query object:

In [7]:
User = Query() # query object

db.search(User.first_name == 'Chad') # adapt based on your data

[]

We can also add new data in dictionary/JSON-like format:

In [8]:
db.insert({'id': 123, 'first_name': 'Amir', 'star_sign': 'Dog', 'review': 'I do not speak Latin.'})

101

And retrieve the data as before:

In [9]:
db.search(User.id == 123)

[{'id': 123,
  'first_name': 'Amir',
  'star_sign': 'Dog',
  'review': 'I do not speak Latin.'}]

One thing to note here is that our new record does not follow the schema we may infer from the original dataset (i.e. the original data all used the same columns/fields). Here many of those fields are missing and we have the new field 'star_sign'.

This demonstrates the extra flexibility we get with a KV store over a relational model. We can also query our database to get all records that have a specific field:

In [10]:
db.search(User.star_sign.exists())

[{'id': 123,
  'first_name': 'Amir',
  'star_sign': 'Dog',
  'review': 'I do not speak Latin.'}]

This gives a basic intro into KV (and document) stores. While there are many competing brands/solutions, the common themes are the dictionary-like structure (key-value pairs) and flexibility to accept any fields (keys).