Faster and more configurable memory store #862

hmottestad · 2017-07-15T09:44:15Z

I’m look at creating a new Memory Store that is configurable so that it can be optimised for any scenario.

Here is what I’m thinking of doing:

B+ tree indexes (because they support range queries)
Thread based indexing (async)
Deletes in diff index (merged when index.size > some max)
Hashmap index for exists(triple1) queries

I don’t want to have transactional support. I use transactions with disk based databases, but the memory store I usually only use for embedded purposes that are single threaded.

Scenarios I want to support:

read heavy
write heavy
transforms (deserialise, query, update, serialise)

Timeline for this task is 3-6 months.

hmottestad · 2017-07-15T09:44:26Z

In general, are there any recommendations or requirements that others might have before I get too committed?

I also know that a lot of triple stores convert IRIs and literals to a hash/integer so that the hash/integer is stored in the indexes and the IRIs/literals are stored in a lookup table. I’m considering doing this too, but it might not be of much benefit unless I migrate to manual memory management using the unsafe library. Any thoughts on using sun.misc.Unsafe?

Tpt · 2017-07-15T13:18:39Z

A maybe not relevant suggestion : you should have a look at mapDB (and similar libraries) that could be helpful to implement such things (a nice feature is that mapDB could be used with both memory only and disk-based storages).

abrokenjester · 2021-04-08T06:30:12Z

@hmottestad is this still something that is on your radar?

hmottestad · 2021-04-08T08:17:05Z

My work from this resulted in the ExtensibleStore.

What I managed to implement:

A B+tree in-memory store - the issue here was that comparator code we had was slow (it's now fast, but my B+tree code is stale as hell)
An Adaptive Radix Tree (ART), which was also very performant but I couldn't get is quite as performant as I wanted

In general I was able to get better performance on loading data for the ART based store, and for both stores I got better performance when selecting for more than one statement component at a time. Eg. something like ex:a foaf:knows ?b.

My biggest hurdle was the SPARQL engine which would need a lot of work to take advantage of ordered data structures.

A lot of the configurable aspect of this task is now a lot easier to implement as new custom stores on top of the ExtensibleStore. Eg. if we wanted an in-memory store based on a TreeSet or one that was optimised for read heavy workloads.

I'm closing this specific issue though since I'm not likely to keep looking into this anymore.

hmottestad self-assigned this Jul 15, 2017

hmottestad added the 📶 enhancement issue is a new feature or improvement label Jul 15, 2017

barthanssens added the ⏩ performance label Oct 10, 2018

abrokenjester added this to To do in Project Progress May 1, 2020

hmottestad closed this as completed Apr 8, 2021

Project Progress automation moved this from 📋 Backlog to 🥳 Done Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster and more configurable memory store #862

Faster and more configurable memory store #862

hmottestad commented Jul 15, 2017 •

edited

hmottestad commented Jul 15, 2017

Tpt commented Jul 15, 2017

abrokenjester commented Apr 8, 2021

hmottestad commented Apr 8, 2021

Faster and more configurable memory store #862

Faster and more configurable memory store #862

Comments

hmottestad commented Jul 15, 2017 • edited

hmottestad commented Jul 15, 2017

Tpt commented Jul 15, 2017

abrokenjester commented Apr 8, 2021

hmottestad commented Apr 8, 2021

hmottestad commented Jul 15, 2017 •

edited