Skip to content

Latest commit

 

History

History
156 lines (116 loc) · 10.7 KB

2022-07-14-vulpea-0.3.org

File metadata and controls

156 lines (116 loc) · 10.7 KB

/images/2022-07-14-vulpea-0.3/2022-07-20-09-51-46-vulpea-logo.webp

The time has come to release a new version of Vulpea, a collection of note-taking functions based on Org and Org Roam. Almost a year has passed since the release of v0.2. Over this time, Vulpea got several major features that deserve a dedicated version. In truth, I wanted to release it almost half a year ago, but the war kicked in, and I had more important things to focus on.

The main focus of this release is the performance of query operations. Expect a 4.5x speed boost for generic vulpea-db-query and even more with specialized queries. All that thanks to materialized view table for notes.

testresult sizeregularview tableratio
tags-some30 notes4.66934606509999951.01124787124.6174100
tags-every3168 notes4.73338444369999961.00598191764.7052381
links-some1657 notes4.80957712831.04622361289999994.5970833
links-every92 notes4.55174733379999951.02048330894.4603839
testresult sizegenericspecializedratio
tags-some30 notes1.01124787120.0066033426153.14182
tags-every3168 notes1.00598191760.57093929649999991.7619770
links-some1657 notes1.04622361289999990.42485805322.4625251
links-every92 notes1.02048330890.054531359618.713696

Checkout CHANGELOG for full list of changes. Read further to learn details.

Materialized view table

Materialized view table, huh? The concept is actually pretty simple. But first, you need to understand how Org Roam DB is designed.

Under the hood, Org Roam uses SQLite with 7 different tables. Each table stores various information about every single note. There is a table with ‘node’ information like title, level, properties, etc. But tags, aliases, and links are stored in separate tables. And you have to query data from all these tables to get a fully populated note. It becomes messy and slow if you need to get lots of fully populated notes.

I won’t go too much into detail explaining how it worked previously in Vulpea and how it still works in Org Roam. That deserves a separate post that can be based on my conversation with Jörg Volbers. In short, it uses a trick to multiply all needed tables into one big table. Definitely not my proudest contribution. But most importantly, it is error-prone and slow.

After adding links to the vulpea-note, I realised that vulpea-db-query became too slow for my collection of notes. And so I decided to add an extra table, where every row contains a fully materialised/populated note. This simple idea provided a 4.5x performance boost to all functions using vulpea-db-query.

This is a game changer with only one downside - write performance degradation. Of course, we write the same data twice - in Org Roam tables and then in materialised view table. But there is more to writing performance degradation. It is hard to hook into Org Roam parsing and writing routine, so I had to duplicate buffer parsing. And this is the biggest performance offender. In most cases, when you simply modify a single note, this is hardly noticeable. But syncing 9k notes takes x2 time.

testregularview tablediffratio
9554 notes172.79389154999998337.61603822164.822151.9538656
small note0.0003540798890.0004162621946.2182305e-51.1756166
medium note0.0004921994160.0005393899974.7190581e-51.0958770
huge note0.17328518480.22405082430.0507656401.2929601

This penalty can be omitted if materialised view table becomes part of Org Roam. Alternatively, the Org Roam routine can benefit from refactoring to enable this kind of hook without any performance loss. Do I plan to work in this direction? Definitely! But I feel a lack of motivation and resources to promise anything.

See vulpea#116 for benchmarks.

Specialized queries

As I already mentioned, all this performance mumbo jumbo started because of the inclusion of links to vulpea-note. I literally was disappointed with the performance. And I decided to take the path of least resistance and provided some specialized queries (because they kind of solve the problem for my most used patterns).

What are the most used patterns? Get the list of notes tagged by all/any given tags. Get the list of notes linking to all/any given notes.

  • vulpea-db-query-by-tags-some - return all notes tagged with one of the provided TAGS.
  • vulpea-db-query-by-tags-every - return all notes tagged by every tag from the list of provided TAGS.
  • vulpea-db-query-by-links-some - return all notes linking at least one of the provided DESTINATIONS.
  • vulpea-db-query-by-links-every - return all notes linking each and every provided DESTINATIONS.

These are benchmark results on a set of 8390 notes before the materialized view was introduced.

testorg-roam APIvulpea-db-queryspecialized
simple0.289743342.282821580.02278065
popular1.859300862.346369071.42250805
big intersection4.534201412.272493250.46843158

What exactly makes specialized queries so fast? We start by narrowing down all notes in the fastest possible way. And then, we exchange this list of ids to list of notes in a single transaction.

*-some

We use IN operator to get notes tagged by at least one of the provided tags.

select distinct node_id
from tags
where tag in ('"wine"', '"barberry/public"');

Ignore serialisation of the string data. It is related to emacsql limitations.

Now that we have the list of ids, we can get list of notes in one transaction by using vulpea-db-query-by-ids.

*-every

We use INTERSECT operator to get notes tagged by all provided tags. Looks more tricky than IN operator, but the idea is simple. We find a list of notes tagged by first tag, a list of notes tagged by second tag, a list of notes tagged by third tag, and so on… And then we calculate intersection of these lists. Voilà!

select distinct node_id
from tags
where tag = '"wine"'

INTERSECT

select distinct node_id
from tags
where tag = '"barberry/public"'

;

Now that we have the list of ids, we can get list of notes in one transaction by using vulpea-db-query-by-ids.

Other goodies

  • More data is available in vulpea-note: properties and links.
  • Note meta is persisted in org-roam-db in meta table.
  • API to access data from vulpea-note-meta:
    • vulpea-note-meta-get-list - to get all values of given PROP and TYPE;
    • vulpea-note-meta-get - to get the first value of given PROP and TYPE.
  • New function to remove buffer properties - vulpea-buffer-prop-remove.
  • Improve filetags handling:
    • Property format them with : as separator;
    • Remove property when setting them to empty list instead of leaving empty property.
  • Allow to configure candidates source for vulpea-find function via vulpea-find-default-candidates-source variable.
  • New function to select from arbitrary list of notes as opposed to relying on filter - vulpea-select-from.
  • Add shortcuts for checking tags on the note:
    • vulpea-note-tagged-all-p - return non-nil if a NOTE is tagged by all of the TAGS.
    • vulpea-note-tagged-any-p - return non-nil if a NOTE is tagged by any of the TAGS.

Next steps

Vulpea already offers a lot of features. While working on other projects that use Vulpea, I discover new missing features that I would love to implement. And issue tracker is not empty. Unless something changes, my plan is the following.

  1. Add attachment directory path to vulpea-note. This is needed to boost barberry.io construction, so I prioritise this simple feature.
  2. Provide a simple way to add more tables to org-roam-db and improve their initialisation flow.
  3. Provide outline-level metadata. See vulpea#75.

Stay tuned and safe travels! Remember to use Emacs responsibly.