# A Zettelkasten in HDF5

This notebook demonstrates how to create a [Zettelkasten](https://kadavy.net/blog/posts/zettelkasten-method-slip-box-digital-example/) in HDF5. The Zettelkasten ("slip-box") is a method of note-taking and knowledge management that was developed, among others, by the German sociologist Niklas Luhmann. A physical Zettelkasten consists of a collection of notes, each of which is stored on a separate index card. The notes are linked together in a network of references, forming a kind of knowledge graph.

<div class="alert alert-block alert-info">
How do people <i>use</i> a Zettelkasten? (rather than populating it...)

<ol>
<li>Depending on your research question, you start by looking for notes tagged with the relevant keywords.</li>
<li>You open a relevant note from the list of results.</li>
<li>You follow the links to other notes, or (less frequently) pick up new keywords along the way.</li>
</ol>
</div>

In this Jupyter notebook, we will create a Zettelkasten in HDF5 using the `h5py` module. Each note will be stored as a group, decorated with appropriate attributes, in an HDF5 file, and the references between notes will be stored as links between groups.

A note template is shown below (see [Boltze2023a](./zettel/6.txt)):

<pre>
TITLE OF ZETTEL: #ZETTELNUMBER

ONE THOUGHT, IN YOUR OWN, CLEAR WORDS (see #CITATION)

IF NECESSARY: EXPLANATION/ELABORATION

Created: DATE

Last Edited: DATE

Tags: #TEMPLATE

LINK TO AT LEAST ONE OTHER ZETTEL ON THE SAME OR A RELATED MATTER. FROM THERE, LINK BACK TO HERE.
</pre>

The Zettelkasten will be stored in a file called `zettelkasten.h5`. The file will contain a group called `ZETTEL`, which will contain all the notes. Each note will be stored as a group within the `ZETTEL` group, with the name of the group being the ID (positive integer) of the note. The note groups will be decorated with a `text` attributes, which will contain the text of the note. Storing the entire note in a UTF-8 encoded string (attribute) preserves it for easy import from or export to external tools.

There will be special notes containing citations, which will be also linked to a group called `CITATIONS`. Finally, there is a group of `TAGS` that will contain all the tags used in the notes. The structure of the Zettelkasten will look like this:

<img src="img/zettelkasten.png" width="1280">

The figure might look more intimidating than it is. But you need a firm grasp on HDF5 groups and links to make sense of it. You could also mimic this picture in a POSIX file system with directories, hard links, and extended attributes. (I recommend you to try it and compare it with the HDF5-based solution!)

OK, let's get on with HDF5 and `h5py`!

A few helpers let us read the text notes, create the notes/zettels, and create the links between them.

In [8]:
'''
A helper function to read the contents of our notes (text files)
'''

def read_file_contents(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        return file.read()

In [20]:
'''
A helper function to create a zettel in the HDF5 file:

- Creates a group for the zettel
- Adds the text of the zettel text as an attribute
- If it is not a citation, links the zettel to the TAGS group
- If it is a citation, links the zettel to the CITATIONS group
- Returns the updated count and the group created
'''

def create_zettel(file, count, text, tags, is_citation=False):
    no = '{id:d}'.format(id=count)
    key = 'ZETTEL/{no:s}'.format(no=no)
    group = file.create_group(key)
    dt = h5py.string_dtype(encoding='utf-8')
    group.attrs.create('text', text.encode('utf-8'), dtype=dt)

    for t in tags:
        if not is_citation:
            file['TAGS/{t:s}/{no:s}'.format(t=t,no=no)] = group
        else:
            file['CITATIONS/{t:s}'.format(t=t)] = group

    return count + 1, group

Now we can create the Zettelkasten.

In [21]:
import h5py

with h5py.File("zettelkasten.h5", "w") as f:
    
    # create the top-level groups
    citations = f.create_group("CITATIONS")
    tags = f.create_group("TAGS")
    zettel = f.create_group("ZETTEL")
    
    # add a few zettels and keep track of the count
    count = 1
    count, _ = create_zettel(f, count, read_file_contents('zettel/1.txt'), ('template',))
    
    # citations are special
    count, _ = create_zettel(f, count, read_file_contents('zettel/2.txt'), ('Schmidt2018a',), is_citation=True)
    
    count, goals = create_zettel(f, count, read_file_contents('zettel/3.txt'), ('method', 'productivity'))
    count, tracy2017a = create_zettel(f, count, read_file_contents('zettel/4.txt'), ('Tracy2017a',), is_citation=True)
    
    # link the zettel to its citation
    goals['cite:Tracy2017a'] = tracy2017a
    # link the citation to the citing zettel
    tracy2017a['3'] = goals
    
    count, _ = create_zettel(f, count, read_file_contents('zettel/5.txt'), ('Fast2020a',), is_citation=True)
    count, _ = create_zettel(f, count, read_file_contents('zettel/6.txt'), ('Boltze2023a',), is_citation=True)
    count, kadavy2021a = create_zettel(f, count, read_file_contents('zettel/7.txt'), ('Kadavy2021a',), is_citation=True)
    count, kwds = create_zettel(f, count, read_file_contents('zettel/8.txt'), ('method',))
    kwds['cite:Kadavy2021a'] = kadavy2021a
    kadavy2021a['8'] = kwds
    
    # assume there is a cross-reference between zettel 3 and 8
    kwds['xref:3'] = goals
    goals['xref:8'] = kwds
    
    # store the count for the next round
    f.attrs['zettel_count'] = count - 1

Let's have a look at the Zettelkasten we just created using [H5Web](https://h5web.panosc.eu/)!

<div class="alert alert-block alert-warning">
<b>Caution:</b> Because of the double-linking, we are creating cycles in our knowledge graph, and not all tools handle that in an intuitive manner.
</div>

All that's missing is a cool UI to edit and navigate the Zettelkasten. I'll leave that as an exercise for the reader. `;-)`

Notice that we have also glossed over attributes like `created` and `last_edited`. HDF5 supports the tracking of times for object creation and modification, but we have not used them here.