# flash-flood User Vignette

[flash-flood](https://github.com/HumanCellAtlas/flash-flood) is an event recorder and streamer built on top of AWS S3, supporting distributed writes and fast distributed bulk reads. It can be used to store and retrieve information about transactions and events in JSON format, which can be quickly filtered with JMESPath. In this notebook we demonstrate basic usage of the flash-flood library.

Let's get started by instantiating an instance of the FlashFlood class:

In [1]:
import boto3
s3 = boto3.resource('s3')

from flashflood import FlashFlood

flash-flood reads and writes events from a journal that is stored in an S3 bucket, so you must provide flash-flood with the name of an S3 bucket you have read/write access to:

In [2]:
ff = FlashFlood(s3, "my-flashflood-test-bucket", "my_prefix")

We can create a flash-flood event by providing flash-flood with event data, a unique event identifier, and a timestamp:

In [3]:
import datetime
import uuid

event_data = b'my event data'
event_uuid = str(uuid.uuid4())
event_date = datetime.datetime.now()

flash-flood exposes a CRUD API to access event information:

In [4]:
# Create
ff.put(event_data, event_uuid, event_date)

# Read
event = ff.get_event(event_uuid)
print("This is the data: " + str(event.data))
print("This is the date: " + str(event.date))
print("This is the event ID: " + event.event_id)

# Update
new_event_data = b'i want to put new data'
ff.update_event(event_uuid, new_event_data)
print("This is the updated data: " + str(ff.get_event(event_uuid).data))

# Delete
ff.delete_event(event_uuid)

Uploaded journal 2020-03-23T165426.235201Z--2020-03-23T165426.235201Z--new--bb66c684-5592-48ce-a4bf-518331754812
new journal 2020-03-23T165426.235201Z--2020-03-23T165426.235201Z--new--bb66c684-5592-48ce-a4bf-518331754812
This is the data: b'my event data'
This is the date: 2020-03-23 16:54:26.235201
This is the event ID: 6438b1e0-4cde-4b39-81f3-098e30bc3ef3
This is the updated data: b'my event data'


All events belong to a journal. Journals can be created ad-hoc or manually:

In [5]:
ff.journal(minimum_number_of_events=1)

Found journal to combine 2020-03-23T165426.235201Z--2020-03-23T165426.235201Z--new--bb66c684-5592-48ce-a4bf-518331754812
combining journal 2020-03-23T165426.235201Z--2020-03-23T165426.235201Z--new--bb66c684-5592-48ce-a4bf-518331754812


Journals can also be listed using the `ff.list_journals()` function in flash-flood.

When events are created, they are assigned a date. You can create a stream of all events that have occurred between two given dates. The code below creates fake events, then creates a stream between two dates:

In [6]:
import json

for i in range(40, 50):
    event_data = json.dumps({'foo': i}).encode()
    event_uuid = str(uuid.uuid4())
    event_date = datetime.datetime.fromtimestamp(10000 * i)
    ff.put(event_data, event_uuid, event_date)

arbitrary_from_date = datetime.datetime.fromtimestamp(10000 * 42)
arbitrary_to_date = datetime.datetime.fromtimestamp(10000 * 48)

for event in ff.replay(from_date=arbitrary_from_date, to_date=arbitrary_to_date):
    print(event.data)

Uploaded journal 1970-01-05T070640.000000Z--1970-01-05T070640.000000Z--new--9cce73cc-c1a4-4bcd-84c0-a5ddb6ac715f
new journal 1970-01-05T070640.000000Z--1970-01-05T070640.000000Z--new--9cce73cc-c1a4-4bcd-84c0-a5ddb6ac715f
Uploaded journal 1970-01-05T095320.000000Z--1970-01-05T095320.000000Z--new--321bdb37-a911-4765-8340-70e4d093667e
new journal 1970-01-05T095320.000000Z--1970-01-05T095320.000000Z--new--321bdb37-a911-4765-8340-70e4d093667e
Uploaded journal 1970-01-05T124000.000000Z--1970-01-05T124000.000000Z--new--a5b94332-a395-4da1-a4c9-7a867ddc53f6
new journal 1970-01-05T124000.000000Z--1970-01-05T124000.000000Z--new--a5b94332-a395-4da1-a4c9-7a867ddc53f6
Uploaded journal 1970-01-05T152640.000000Z--1970-01-05T152640.000000Z--new--3288da4a-17f1-44d4-8d5c-865ec84fdfdd
new journal 1970-01-05T152640.000000Z--1970-01-05T152640.000000Z--new--3288da4a-17f1-44d4-8d5c-865ec84fdfdd
Uploaded journal 1970-01-05T181320.000000Z--1970-01-05T181320.000000Z--new--894d9c62-fdae-4f9b-ba2f-c0d8d09303e1
new

Since the event data is JSON, we can use JMESPath to filter it:

In [7]:
import jmespath

events = []
for event in ff.replay(from_date=arbitrary_from_date, to_date=arbitrary_to_date):
    events.append(json.loads(event.data))

expression = jmespath.compile('events[].foo')
expression.search({'events': events})

replaying from journal 1970-01-05T124000.000000Z--1970-01-05T124000.000000Z--new--a5b94332-a395-4da1-a4c9-7a867ddc53f6
replaying from journal 1970-01-05T152640.000000Z--1970-01-05T152640.000000Z--new--3288da4a-17f1-44d4-8d5c-865ec84fdfdd
replaying from journal 1970-01-05T181320.000000Z--1970-01-05T181320.000000Z--new--894d9c62-fdae-4f9b-ba2f-c0d8d09303e1
replaying from journal 1970-01-05T210000.000000Z--1970-01-05T210000.000000Z--new--5bfe0732-2fa8-483f-b7d0-9b83cf19f194
replaying from journal 1970-01-05T234640.000000Z--1970-01-05T234640.000000Z--new--05e2ab49-410c-405a-b3ab-7a2da4f2805d
replaying from journal 1970-01-06T023320.000000Z--1970-01-06T023320.000000Z--new--ab5ed194-92bc-448b-afee-7cdb22ba0ada
replaying from journal 1970-01-06T052000.000000Z--1970-01-06T052000.000000Z--new--59c3cad0-af15-41f4-b5d2-3a438841d3cd


[43, 44, 45, 46, 47, 48]