# ZODB – the Python database

## TaskList example

* **Item** class: Task.
* **Container** class: TaskList.
* **Create** list with example tasks.
* **Sort** and **list** tasks by deadline.

In [1]:
import collections
import datetime
import uuid

class Task(object):

    def __init__(self, description, deadline):
        self.description = description
        self.deadline = deadline
        self.completed = False

    def __repr__(self):
        return '"{0:s}"\tby {1:s}'.format(
            self.description, self.deadline.strftime('%Y-%m-%d @ %H:%M'))

class TaskList(collections.UserDict):

    def add(self, description, deadline):
        self[str(uuid.uuid4())] = Task(description, deadline)

In [2]:
import operator

def expires_in(hours):
    return datetime.datetime.now() + datetime.timedelta(hours/24.)

tasks = TaskList()
tasks.add('Do the daily dishes', expires_in(3))
tasks.add('Read the latest news', expires_in(2))
tasks.add('Finish the slides', expires_in(1))

for task in sorted(tasks.values(), key=operator.attrgetter('deadline')):
    if not task.completed:
        print(task)

"Finish the slides"	by 2016-10-29 @ 23:55
"Read the latest news"	by 2016-10-30 @ 00:55
"Do the daily dishes"	by 2016-10-30 @ 01:55


## TaskList with ZODB support

* Inherit classes from ``persistent.Persistent``.
* Use ``persistent.list.PersistentList`` for mutable ``<list>``.
* Use ``persistent.mapping.PersistentMapping`` for mutable ``<dict>``.
* Use ``BTrees`` for storing data objects.
* Use class variables to provide default attribute values.

In [3]:
import BTrees
import datetime
import persistent
import uuid

class Task(persistent.Persistent):

    def __init__(self, description, deadline):
        self.description = description
        self.deadline = deadline
        self.completed = False

    def __repr__(self):
        return '"{0:s}"\tby {1:s}'.format(
            self.description, self.deadline.strftime('%Y-%m-%d @ %H:%M'))

class TaskList(persistent.mapping.PersistentMapping):
    
    def __init__(self):
        self.data = BTrees.OOBTree.OOBTree()

    def add(self, description, deadline):
        self[str(uuid.uuid4())] = Task(description, deadline)

In [4]:
import operator

def expires_in(hours):
    return datetime.datetime.now() + datetime.timedelta(hours/24.)

tasks = TaskList()
tasks.add('Do the daily dishes', expires_in(3))
tasks.add('Read the latest news', expires_in(2))
tasks.add('Finish the slides', expires_in(1))

for task in sorted(tasks.values(), key=operator.attrgetter('deadline')):
    if not task.completed:
        print(task)

"Finish the slides"	by 2016-10-29 @ 23:55
"Read the latest news"	by 2016-10-30 @ 00:55
"Do the daily dishes"	by 2016-10-30 @ 01:55


## TaskList with ZODB support – diff
```diff
@@ -1,9 +1,9 @@
-import collections
+import BTrees
 import datetime
+import persistent
 import uuid
 
-class Task(object):
-
+class Task(persistent.Persistent):
     def __init__(self, description, deadline):
         self.description = description
         self.deadline = deadline
@@ -13,7 +13,9 @@
         return '"{0:s}"\tby {1:s}'.format(
             self.description, self.deadline.strftime('%Y-%m-%d @ %H:%M'))
 
-class TaskList(collections.UserDict):
+class TaskList(persistent.mapping.PersistentMapping):
+    def __init__(self):
+        self.data = BTrees.OOBTree.OOBTree()
 
     def add(self, description, deadline):
         self[str(uuid.uuid4())] = Task(description, deadline)
```

## Persisting TaskList with ZODB

### Creating new ZODB

1. Create / open / compose a **storage**.
2. Wrap storage into a **database** (API and connection pool).
3. Open a **connection**.
4. Get the **root object**.

In [5]:
import ZODB
import ZODB.FileStorage

ZODB_FILENAME = 'mydata.fs'

storage = ZODB.FileStorage.FileStorage(ZODB_FILENAME)
db = ZODB.DB(storage)
connection = db.open()
root = connection.root()

### Analyzing empty ZODB

* ``ZODB.FileStorage`` is an append-only log file of pickled objects.

In [6]:
from ZODB.scripts import analyze
analyze.report(analyze.analyze(ZODB_FILENAME))

Processed 1 records in 1 transactions
Average record size is   68.00 bytes
Average transaction size is   68.00 bytes
Types used:
Class Name                                       Count    TBytes    Pct AvgSize
---------------------------------------------- ------- ---------  ----- -------
persistent.mapping.PersistentMapping                 1        68 100.0%   68.00
                            Total Transactions       1                    0.07k
                                 Total Records       1        0k 100.0%   68.00
                               Current Objects       1        0k 100.0%   68.00


### Persisting TaskList

* Objects must be reachable from the root object to persist.
* The first persistable change implicitly starts a new **transaction**.
* Transaction must be explicitly **committed** or **aborted**.
* Mutating object persists a complete new revision of that object.

In [7]:
root['tasks'] = TaskList()

import transaction
transaction.commit()

In [8]:
from ZODB.scripts import analyze
analyze.report(analyze.analyze(ZODB_FILENAME))

Processed 4 records in 2 transactions
Average record size is   74.75 bytes
Average transaction size is  149.50 bytes
Types used:
Class Name                                       Count    TBytes    Pct AvgSize
---------------------------------------------- ------- ---------  ----- -------
BTrees.OOBTree.OOBTree                               1        33  11.0%   33.00
__main__.TaskList                                    1        84  28.1%   84.00
persistent.mapping.PersistentMapping                 2       182  60.9%   91.00
                            Total Transactions       2                    0.15k
                                 Total Records       4        0k 100.0%   74.75
                               Current Objects       3        0k  77.3%   77.00
                                   Old Objects       1        0k  22.7%   68.00


### Purging old objects

* ``ZODB.FileStorage`` is an append-only log file of pickled objects.
* **Pack** will create new version of the storage with only current objects.
* Pros: Save disk space and get faster startup. Cons: Lose history.

In [9]:
import time
import ZODB.FileStorage
import ZODB.serialize

storage = root._p_jar.db().storage
storage.pack(time.time(), ZODB.serialize.referencesf)

In [10]:
from ZODB.scripts import analyze
analyze.report(analyze.analyze(ZODB_FILENAME))

Processed 3 records in 1 transactions
Average record size is   77.00 bytes
Average transaction size is  231.00 bytes
Types used:
Class Name                                       Count    TBytes    Pct AvgSize
---------------------------------------------- ------- ---------  ----- -------
BTrees.OOBTree.OOBTree                               1        33  14.3%   33.00
__main__.TaskList                                    1        84  36.4%   84.00
persistent.mapping.PersistentMapping                 1       114  49.4%  114.00
                            Total Transactions       1                    0.23k
                                 Total Records       3        0k 100.0%   77.00
                               Current Objects       3        0k 100.0%   77.00


### Persisting Task objects

* It just works.

In [11]:
import operator
import transaction

def expires_in(hours):
    return datetime.datetime.now() + datetime.timedelta(hours/24.)

tasks = root['tasks']
tasks.add('Do the daily dishes', expires_in(3))
tasks.add('Read the latest news', expires_in(2))
tasks.add('Finish the slides', expires_in(1))

transaction.commit()

for task in sorted(tasks.values(), key=operator.attrgetter('deadline')):
    if not task.completed:
        print(task)

"Finish the slides"	by 2016-10-29 @ 23:55
"Read the latest news"	by 2016-10-30 @ 00:55
"Do the daily dishes"	by 2016-10-30 @ 01:55


In [12]:
from ZODB.scripts import analyze
analyze.report(analyze.analyze(ZODB_FILENAME))

Processed 8 records in 2 transactions
Average record size is  124.00 bytes
Average transaction size is  496.00 bytes
Types used:
Class Name                                       Count    TBytes    Pct AvgSize
---------------------------------------------- ------- ---------  ----- -------
BTrees.OOBTree.OOBTree                               2       276  27.8%  138.00
__main__.Task                                        3       434  43.8%  144.67
__main__.TaskList                                    2       168  16.9%   84.00
persistent.mapping.PersistentMapping                 1       114  11.5%  114.00
                            Total Transactions       2                    0.48k
                                 Total Records       8        0k 100.0%  124.00
                               Current Objects       6        0k  88.2%  145.83
                                   Old Objects       2        0k  11.8%   58.50


### Persisting Task objects – diff
```diff
@@ -1,13 +1,16 @@
 import operator
+import transaction
 
 def expires_in(hours):
     return datetime.datetime.now() + datetime.timedelta(hours/24.)
 
-tasks = TaskList()
+tasks = root['tasks']
 tasks.add('Do the daily dishes', expires_in(3))
 tasks.add('Read the latest news', expires_in(2))
 tasks.add('Finish the slides', expires_in(1))
 
+transaction.commit()
+
 for task in sorted(tasks.values(), key=operator.attrgetter('deadline')):
     if not task.completed:
         print(task)
```

### Mutating Task objects


In [13]:
import random

tasks = root['tasks']

for task in tasks.keys():
    if random.random() > 0.5:
        tasks[task].completed = True

transaction.commit()

for task in sorted(tasks.values(), key=operator.attrgetter('deadline')):
    if not task.completed:
        print(task)

"Finish the slides"	by 2016-10-29 @ 23:55
"Read the latest news"	by 2016-10-30 @ 00:55
"Do the daily dishes"	by 2016-10-30 @ 01:55


### Deleting Task objects


In [14]:
for key in tuple(tasks.keys()):
    print('Deleted task {0:s}'.format(key))
    del tasks[key]

transaction.commit()

Deleted task 23b0bf55-9f48-4c43-b7ea-4e6ecb778524
Deleted task 9db2bc75-ad28-48a5-afff-33cc06fba80e
Deleted task d0609aa6-050b-4595-9a57-e2de809a0dc8


In [15]:
from ZODB.scripts import analyze
analyze.report(analyze.analyze(ZODB_FILENAME))

Processed 10 records in 3 transactions
Average record size is  110.90 bytes
Average transaction size is  369.67 bytes
Types used:
Class Name                                       Count    TBytes    Pct AvgSize
---------------------------------------------- ------- ---------  ----- -------
BTrees.OOBTree.OOBTree                               3       309  27.9%  103.00
__main__.Task                                        3       434  39.1%  144.67
__main__.TaskList                                    3       252  22.7%   84.00
persistent.mapping.PersistentMapping                 1       114  10.3%  114.00
                            Total Transactions       3                    0.36k
                                 Total Records      10        1k 100.0%  110.90
                               Current Objects       6        0k  60.0%  110.83
                                   Old Objects       4        0k  40.0%  111.00


### Undoing transactions

1. Get transaction timestamps from objects' ``_p_mtime``.
2. Map timestamps to undoable ids from ``undoLog``.
3. Undo (with fingers crossed).
4. On failure, try again with revised transactions.

In [16]:
for task in sorted(tasks.values(), key=operator.attrgetter('deadline')):
    print(task, task.completed)

In [17]:
obj = root['tasks']
info = db.undoInfo(specification=dict(time=obj._p_mtime))
db.undoMultiple([info[0]['id']])
transaction.commit()

In [18]:
for task in sorted(tasks.values(), key=operator.attrgetter('deadline')):
    print(task, task.completed)

"Finish the slides"	by 2016-10-29 @ 23:55 False
"Read the latest news"	by 2016-10-30 @ 00:55 False
"Do the daily dishes"	by 2016-10-30 @ 01:55 False


#### Undoing all undoable transaction at once


In [19]:
db.undoMultiple(map(operator.itemgetter('id'), db.undoLog()))
transaction.commit()

## Time travel with zc.beforestorage

* ZODB storages can be wrapped (stacked) for additional features.
* ``zc.beforestorage`` shows data as it was before given transaction or timestamp.
* Get transaction ID from object's ``_p_serial``.
* Or define timestamp in format ``2016-10-31T12:00``.


In [20]:
from zc.beforestorage import Before

obj = root['tasks']

before_storage = Before(storage, obj._p_serial)
before_db = ZODB.DB(before_storage)
before_connection = before_db.open()
before_root = before_connection.root()

In [21]:
for task in sorted(before_root['tasks'].values(), key=operator.attrgetter('deadline')):
    print(task, task.completed)

## Indexing Tasks in ZODB

* ZODB does not contain any indexing by its own (only oids).
* It's possible, though not recommended, to create index objects in ZODB.
* ``zope.index`` + ``repoze.catalog`` = ``hypatia``

In [22]:
import BTrees
import datetime
import persistent
import uuid
import hypatia.field

class IndexedTaskList(persistent.mapping.PersistentMapping):
        
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.data = BTrees.OOBTree.OOBTree()
        self.index = hypatia.field.FieldIndex('deadline')
        

    def add(self, description: str, deadline: datetime.datetime):
        uid = str(uuid.uuid4())
        task = Task(description, deadline)
        self[uid] = task
        
        self._p_jar.add(task)                # reserve OID
        docid = ZODB.utils.u64(task._p_oid)  # OID to DOCID int
        self.index.index_doc(docid, task)    # index with DOCID ints

In [23]:
root['tasks'] = IndexedTaskList()
transaction.commit()

In [24]:
import operator
import transaction

tasks = root['tasks']
tasks.add('Do the daily dishes', expires_in(3))
tasks.add('Read the latest news', expires_in(2))
tasks.add('Finish the slides', expires_in(1))

transaction.commit()

ids = tasks.index.indexed()             # indexed DOCID ints
ids_sorted = tasks.index.sort(ids)      # sorted DOCID ints
oids = map(ZODB.utils.p64, ids_sorted)  # docids to OIDs
objs = map(connection.get, oids)        # OIDs to objects

for task in objs:
    if not task.completed:
        print(task)

"Finish the slides"	by 2016-10-29 @ 23:55
"Read the latest news"	by 2016-10-30 @ 00:55
"Do the daily dishes"	by 2016-10-30 @ 01:55


## Resetting the example

1. Close open connections.
2. Close open storages.
3. Remove filestorage.

In [25]:
import os

ZODB_FILENAME = 'mydata.fs'

if 'connection' in locals():
    connection.transaction_manager.get().abort()
    connection.close()
    
if 'storage' in locals():
    storage.close()

In [26]:
if os.path.exists(ZODB_FILENAME):
    os.unlink(ZODB_FILENAME)