<a href="https://colab.research.google.com/github/amkayhani/Big-Data-Data-Engineering/blob/main/2_01_key_value_stores.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Key-value Stores with TinyDB
This tutorial gives a basic introduction to working with key-value (KV) stores (or document DBs). We will be working with [TinyDB](https://tinydb.readthedocs.io/en/latest/index.html), an in-memory Python database, which is particularly attractive here as it is, as the name suggests, pretty small and lightweight.

We will begin with the relevant installs:

In [1]:
!pip install tinydb
!pip install faker
!pip install python-lorem

Collecting tinydb
  Downloading tinydb-4.8.0-py3-none-any.whl (24 kB)
Installing collected packages: tinydb
Successfully installed tinydb-4.8.0
Collecting faker
  Downloading Faker-25.3.0-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: faker
Successfully installed faker-25.3.0
Collecting python-lorem
  Downloading python_lorem-1.3.0.post1-cp310-none-any.whl (9.1 kB)
Installing collected packages: python-lorem
Successfully installed python-lorem-1.3.0.post1


As you may infer from the pacakages installed, we will run something similar to one of our DuckDB examples - specifically building a database using fake data generated by Faker:

In [2]:
import random
from faker import Faker
import pandas as pd
from lorem import paragraph
import itertools

fake = Faker()

def get_person():
  person = {}
  person['id'] = random.randrange(1000,9999999999999)
  person['first_name'] = fake.first_name()
  person['last_name'] = fake.last_name()
  person['email'] = fake.unique.ascii_email()
  person['company'] = fake.company()
  person['phone'] = fake.phone_number()
  person['review'] = list(itertools.islice(paragraph(count=1), 1))
  return person

personlist = []
for x in range(100):
  personlist.append(get_person())

df = pd.DataFrame.from_dict(personlist)
df.head()

Unnamed: 0,id,first_name,last_name,email,company,phone,review
0,8769127434684,John,Sanders,lcraig@gmail.com,"Stone, Bennett and Foster",(649)565-8986x37555,"[Nisi exercitation dolore ullamco proident, la..."
1,8945484922479,Stephanie,Cole,chelseajohnson@yahoo.com,Kirk Ltd,+1-370-836-1857x5271,[Aliqua culpa quis amet adipiscing fugiat sint...
2,4622531443155,Megan,Payne,roblesfelicia@hotmail.com,Fitzpatrick PLC,(983)613-2242x3835,[Commodo enim do amet aliquip est non. Reprehe...
3,8097828854688,Paul,Blair,brianjefferson@yahoo.com,"Baker, Taylor and Whitehead",001-723-448-7046x61600,[Esse fugiat quis id velit consequat. Pariatur...
4,103614249821,Valerie,Coleman,beckerstephen@gmail.com,"White, Richardson and Cook",(838)734-0789x89424,[Non non ullamco eiusmod est. Eiusmod labore e...


Everything here is the same except we have also add a text column (using lorem ipsum). As before we have created this as a Pandas dataframe, but like most KV stores, TinyDB prefers data stored as a dictionary:

In [3]:
fake_data = df.to_dict(orient='records')
fake_data

[{'id': 8769127434684,
  'first_name': 'John',
  'last_name': 'Sanders',
  'email': 'lcraig@gmail.com',
  'company': 'Stone, Bennett and Foster',
  'phone': '(649)565-8986x37555',
  'review': ['Nisi exercitation dolore ullamco proident, laboris deserunt sit voluptate, adipiscing amet irure lorem anim laboris ullamco officia. Ea proident nisi incididunt veniam et amet, nostrud voluptate anim sed officia ullamco magna. Culpa voluptate deserunt sed. Consequat sed commodo culpa. Do proident pariatur ut qui quis ex. Sunt eu elit sed eiusmod incididunt eu ex, in sit commodo consequat. Nisi ipsum culpa qui do amet mollit. Exercitation sint sunt culpa. Occaecat anim deserunt enim, reprehenderit aute enim ad consectetur aute ex id. Voluptate voluptate elit aute sit esse id labore.']},
 {'id': 8945484922479,
  'first_name': 'Stephanie',
  'last_name': 'Cole',
  'email': 'chelseajohnson@yahoo.com',
  'company': 'Kirk Ltd',
  'phone': '+1-370-836-1857x5271',
  'review': ['Aliqua culpa quis amet ad

With this transform in place we can load the data into our database. You may note the database itself is specified as JSON format:

In [4]:
from tinydb import TinyDB, Query

db = TinyDB('db.json')

for record in fake_data:
  db.insert(record)

We can check this has worked with a simple Python loop:

In [5]:
for item in db:
  print(item)

{'id': 8769127434684, 'first_name': 'John', 'last_name': 'Sanders', 'email': 'lcraig@gmail.com', 'company': 'Stone, Bennett and Foster', 'phone': '(649)565-8986x37555', 'review': ['Nisi exercitation dolore ullamco proident, laboris deserunt sit voluptate, adipiscing amet irure lorem anim laboris ullamco officia. Ea proident nisi incididunt veniam et amet, nostrud voluptate anim sed officia ullamco magna. Culpa voluptate deserunt sed. Consequat sed commodo culpa. Do proident pariatur ut qui quis ex. Sunt eu elit sed eiusmod incididunt eu ex, in sit commodo consequat. Nisi ipsum culpa qui do amet mollit. Exercitation sint sunt culpa. Occaecat anim deserunt enim, reprehenderit aute enim ad consectetur aute ex id. Voluptate voluptate elit aute sit esse id labore.']}
{'id': 8945484922479, 'first_name': 'Stephanie', 'last_name': 'Cole', 'email': 'chelseajohnson@yahoo.com', 'company': 'Kirk Ltd', 'phone': '+1-370-836-1857x5271', 'review': ['Aliqua culpa quis amet adipiscing fugiat sint laboru

With our database setup, we can start to query our records. In TinyDB we do this by creating a query object:

In [6]:
User = Query() # query object

db.search(User.first_name == 'Chad') # adapt based on your data

[]

We can also add new data in dictionary/JSON-like format:

In [7]:
db.insert({'id': 123, 'first_name': 'Amir', 'star_sign': 'Dog', 'review': 'I do not speak Latin.'})

101

And retrieve the data as before:

In [8]:
db.search(User.id == 123)

[{'id': 123,
  'first_name': 'Amir',
  'star_sign': 'Dog',
  'review': 'I do not speak Latin.'}]

One thing to note here is that our new record does not follow the schema we may infer from the original dataset (i.e. the original data all used the same columns/fields). Here many of those fields are missing and we have the new field 'star_sign'.

This demonstrates the extra flexibility we get with a KV store over a relational model. We can also query our database to get all records that have a specific field:

In [9]:
db.search(User.star_sign.exists())

[{'id': 123,
  'first_name': 'Amir',
  'star_sign': 'Dog',
  'review': 'I do not speak Latin.'}]

This gives a basic intro into KV (and document) stores. While there are many competing brands/solutions, the common themes are the dictionary-like structure (key-value pairs) and flexibility to accept any fields (keys).