Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster and more configurable memory store #862

Closed
hmottestad opened this issue Jul 15, 2017 · 4 comments
Closed

Faster and more configurable memory store #862

hmottestad opened this issue Jul 15, 2017 · 4 comments
Assignees
Labels
📶 enhancement issue is a new feature or improvement ⏩ performance

Comments

@hmottestad
Copy link
Contributor

hmottestad commented Jul 15, 2017

I’m look at creating a new Memory Store that is configurable so that it can be optimised for any scenario.

Here is what I’m thinking of doing:

  • B+ tree indexes (because they support range queries)
  • Thread based indexing (async)
  • Deletes in diff index (merged when index.size > some max)
  • Hashmap index for exists(triple1) queries

I don’t want to have transactional support. I use transactions with disk based databases, but the memory store I usually only use for embedded purposes that are single threaded.

Scenarios I want to support:

  • read heavy
  • write heavy
  • transforms (deserialise, query, update, serialise)

Timeline for this task is 3-6 months.

@hmottestad
Copy link
Contributor Author

In general, are there any recommendations or requirements that others might have before I get too committed?

I also know that a lot of triple stores convert IRIs and literals to a hash/integer so that the hash/integer is stored in the indexes and the IRIs/literals are stored in a lookup table. I’m considering doing this too, but it might not be of much benefit unless I migrate to manual memory management using the unsafe library. Any thoughts on using sun.misc.Unsafe?

@hmottestad hmottestad self-assigned this Jul 15, 2017
@hmottestad hmottestad added the 📶 enhancement issue is a new feature or improvement label Jul 15, 2017
@Tpt
Copy link
Contributor

Tpt commented Jul 15, 2017

A maybe not relevant suggestion : you should have a look at mapDB (and similar libraries) that could be helpful to implement such things (a nice feature is that mapDB could be used with both memory only and disk-based storages).

@abrokenjester
Copy link
Contributor

@hmottestad is this still something that is on your radar?

@hmottestad
Copy link
Contributor Author

My work from this resulted in the ExtensibleStore.

What I managed to implement:

  • A B+tree in-memory store - the issue here was that comparator code we had was slow (it's now fast, but my B+tree code is stale as hell)
  • An Adaptive Radix Tree (ART), which was also very performant but I couldn't get is quite as performant as I wanted

In general I was able to get better performance on loading data for the ART based store, and for both stores I got better performance when selecting for more than one statement component at a time. Eg. something like ex:a foaf:knows ?b.

My biggest hurdle was the SPARQL engine which would need a lot of work to take advantage of ordered data structures.

A lot of the configurable aspect of this task is now a lot easier to implement as new custom stores on top of the ExtensibleStore. Eg. if we wanted an in-memory store based on a TreeSet or one that was optimised for read heavy workloads.

I'm closing this specific issue though since I'm not likely to keep looking into this anymore.

Project Progress automation moved this from 📋 Backlog to 🥳 Done Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📶 enhancement issue is a new feature or improvement ⏩ performance
Projects
No open projects
Development

No branches or pull requests

4 participants