<a href="https://colab.research.google.com/github/MJMortensonWarwick/data_engineering_for_data_scientists/blob/main/1_3_Key_Value_Stores.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1.3 Key-value Stores with TinyDB
This tutorial gives a basic introduction to working with key-value (KV) stores (or document DBs). We will be working with [TinyDB](https://tinydb.readthedocs.io/en/latest/index.html), an in-memory Python database, which is particularly attractive here as it is, as the name suggests, pretty small and lightweight.

We will begin with the relevant installs:

In [1]:
!pip install tinydb
!pip install faker
!pip install python-lorem

Collecting tinydb
  Downloading tinydb-4.8.2-py3-none-any.whl.metadata (6.7 kB)
Downloading tinydb-4.8.2-py3-none-any.whl (24 kB)
Installing collected packages: tinydb
Successfully installed tinydb-4.8.2
Collecting faker
  Downloading faker-40.4.0-py3-none-any.whl.metadata (16 kB)
Downloading faker-40.4.0-py3-none-any.whl (2.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m37.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faker
Successfully installed faker-40.4.0
Collecting python-lorem
  Downloading python_lorem-1.3.0.post3-cp312-none-any.whl.metadata (3.7 kB)
Downloading python_lorem-1.3.0.post3-cp312-none-any.whl (9.1 kB)
Installing collected packages: python-lorem
Successfully installed python-lorem-1.3.0.post3


We will be building our database using fake data generated by Faker:

In [2]:
import random
from faker import Faker
import pandas as pd
from lorem import paragraph
import itertools

fake = Faker()

def get_person():
  person = {}
  person['id'] = random.randrange(1000,9999999999999)
  person['first_name'] = fake.first_name()
  person['last_name'] = fake.last_name()
  person['email'] = fake.unique.ascii_email()
  person['company'] = fake.company()
  person['phone'] = fake.phone_number()
  person['review'] = list(itertools.islice(paragraph(count=1), 1))
  return person

personlist = []
for x in range(100):
  personlist.append(get_person())

df = pd.DataFrame.from_dict(personlist)
df.head()

Unnamed: 0,id,first_name,last_name,email,company,phone,review
0,5871862414779,James,Salas,kathrynmolina@yahoo.com,Townsend Ltd,3269116366,[Sit pariatur commodo esse ad ad nulla culpa. ...
1,9050689350742,Robin,Phillips,adamyork@gmail.com,Higgins-Johnson,+1-544-840-7872,[Et duis minim culpa irure sed. Enim et est co...
2,4336972193397,Amy,Dyer,daniellemyers@hotmail.com,"Smith, Allen and Taylor",(390)418-3352x6042,[Excepteur minim cupidatat eu id consectetur a...
3,5286625285948,Pamela,Rivera,colleen11@mendez.com,"Miller, Curry and Sullivan",551-983-8488,[Sunt laboris et eu ipsum irure laborum cillum...
4,9759394151960,Michael,Hayes,robert55@gmail.com,Golden Inc,677.378.0573,[Commodo sint dolor nisi. Adipiscing magna sit...


Some fairly standard personal information and an additonal text column (using lorem ipsum). We have created this as a Pandas dataframe, but like most KV stores, TinyDB prefers data stored as a dictionary:

In [3]:
fake_data = df.to_dict(orient='records')
fake_data

[{'id': 5871862414779,
  'first_name': 'James',
  'last_name': 'Salas',
  'email': 'kathrynmolina@yahoo.com',
  'company': 'Townsend Ltd',
  'phone': '3269116366',
  'review': ['Sit pariatur commodo esse ad ad nulla culpa. Qui veniam ut lorem velit. Incididunt in occaecat cupidatat fugiat excepteur sint. Sint quis mollit nostrud ex. Ut dolor deserunt veniam aute pariatur aute. Aliqua qui lorem enim excepteur. In qui velit laboris tempor officia nulla.']},
 {'id': 9050689350742,
  'first_name': 'Robin',
  'last_name': 'Phillips',
  'email': 'adamyork@gmail.com',
  'company': 'Higgins-Johnson',
  'phone': '+1-544-840-7872',
  'review': ['Et duis minim culpa irure sed. Enim et est consectetur culpa proident incididunt. Pariatur nisi occaecat magna ad et ipsum proident, quis occaecat in do non ipsum duis. Cupidatat dolore elit eiusmod ex, aliqua minim aliqua reprehenderit. Nisi lorem ipsum enim non dolor ea velit. Est commodo nostrud occaecat cillum magna duis nisi.']},
 {'id': 43369721933

With this transform in place we can load the data into our database. You may note the database itself is specified as JSON format:

In [4]:
from tinydb import TinyDB, Query

db = TinyDB('db.json')

for record in fake_data:
  db.insert(record)

We can check this has worked with a simple Python loop:

In [5]:
for item in db:
  print(item)

{'id': 5871862414779, 'first_name': 'James', 'last_name': 'Salas', 'email': 'kathrynmolina@yahoo.com', 'company': 'Townsend Ltd', 'phone': '3269116366', 'review': ['Sit pariatur commodo esse ad ad nulla culpa. Qui veniam ut lorem velit. Incididunt in occaecat cupidatat fugiat excepteur sint. Sint quis mollit nostrud ex. Ut dolor deserunt veniam aute pariatur aute. Aliqua qui lorem enim excepteur. In qui velit laboris tempor officia nulla.']}
{'id': 9050689350742, 'first_name': 'Robin', 'last_name': 'Phillips', 'email': 'adamyork@gmail.com', 'company': 'Higgins-Johnson', 'phone': '+1-544-840-7872', 'review': ['Et duis minim culpa irure sed. Enim et est consectetur culpa proident incididunt. Pariatur nisi occaecat magna ad et ipsum proident, quis occaecat in do non ipsum duis. Cupidatat dolore elit eiusmod ex, aliqua minim aliqua reprehenderit. Nisi lorem ipsum enim non dolor ea velit. Est commodo nostrud occaecat cillum magna duis nisi.']}
{'id': 4336972193397, 'first_name': 'Amy', 'las

With our database setup, we can start to query our records. In TinyDB we do this by creating a query object:

In [6]:
User = Query() # query object

db.search(User.first_name == 'Chad') # adapt based on your data

[]

We can also add new data in dictionary/JSON-like format:

In [7]:
db.insert({'id': 123, 'first_name': 'Amir', 'star_sign': 'Dog', 'review': 'I do not speak Latin.'})

101

And retrieve the data as before:

In [8]:
db.search(User.id == 123)

[{'id': 123,
  'first_name': 'Amir',
  'star_sign': 'Dog',
  'review': 'I do not speak Latin.'}]

One thing to note here is that our new record does not follow the schema we may infer from the original dataset (i.e. the original data all used the same columns/fields). Here many of those fields are missing and we have the new field 'star_sign'.

This demonstrates the extra flexibility we get with a KV store over a relational model. We can also query our database to get all records that have a specific field:

In [9]:
db.search(User.star_sign.exists())

[{'id': 123,
  'first_name': 'Amir',
  'star_sign': 'Dog',
  'review': 'I do not speak Latin.'}]

This gives a basic intro into KV (and document) stores. While there are many competing brands/solutions, the common themes are the dictionary-like structure (key-value pairs) and flexibility to accept any fields (keys).