PostgresKeyValue

Key-Value storage for Posgresql

Performant and simple key-value storage in Posgresql. With a Hash-like interface. Only dependency is pg gem.

PostgresKeyValue tries to get out of your way, by being unopiniated small, and simple.

It works similar, but not compatible to, Hash. Some features from Hash are implemented, others deliberately omitted when they don't make sense or would make leaky abstractions.

PostgresKeyValue depends on the pg gem, but doesn't add this as requirement, so that you can provide your own, your version, fork or compatible gem instead.

Configuration and usage is done through dependency injection, which makes it easy for you to test, and to replace with mocks. The design aims to decouple as much as possible, allowing to integrate in the right place (and only there).

A few tools are included to prepare and optimize the database. Usable in e.g. your migrations or a deploy script.

PostgresKeyValue is not finished!

Work in Progress. Here are some evident TODOs (will be moved into github issues later)

Installation

Add this line to your application's Gemfile:

gem 'postgres_key_value'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install postgres_key_value

We don't install pg gem for you as dependency, so ensure you add it yourself. For example:

gem 'pg'

Usage

Steps are as follows:

Make a connection to a postgresql database.
Instantiate a PostgresKeyValue object by passing in this connection.
Write to-, and read from this database.

require 'pg'
require 'postgres_key_value'

connection = PG::Connection.open(:dbname => 'test')
greetings = PostgresKeyValue::Store.new(connection) 

greetings[:en] = "Hello World"
greetings[:nl] = "Hallo Wereld"

greetings[:en]                          #=> Hello World
greetings['DE-de']                      #=> nil
greetings.fetch('DE-de', 'No greeting') #=> No greeting
greetings.key?(:nl)                     #=> true

# Can be another process on another machine entirely.
Thread.new do
  other_greetings = PostgresKeyValue::Store.new(connection) 
  other_greetings[:en] = "Hello Mars!"
end.join

greetings[:en]                          #=> Hello Mars!

Utils

Utils to create and prepare the table are provided. For example in your migrations:

class CreateKVTableForCursors < ButtonShop::Migration
  include PostgresKeyValue::Utils

  def migrate_up
    create_table('cursors', 'buttonshop_kv_store')
  end

  def migrate_down
    drop_table('cursors', 'buttonshop_kv_store')
  end

  private

  def connection
    ButtonShop.config.primary_db_connection
  end
end

And in a hypthetical deployment or provisioning tool

class CursorsKvPreparator
  include PostgresKeyValue::Utils
  DB_NAME = 'buttonshop_kv_store'
  TABLE_NAME = 'cursors'

  def initialize(connection)
    @table_name = table_name
    @connection = connection
  end

  def prepare
    MyInfra::Databases::CreateDatabaseCommand.new(DB_NAME)
    create_table(TABLE_NAME, DB_NAME)
  end

  private

  attr_reader :connection
end

on :staging_server do
  CursorsKvPreparator.new(@pg_connection).prepare
end

Technical details

Keys can only be strings or symbols. So be sure to convert your object to a string explicitely before using.

greetings[nil]  #=> PostgresKeyValue::InvalidKey
greetings[42]   #=> PostgresKeyValue::InvalidKey
greetings['']   #=> nil
greetings['42'] #=> nil

TODO: write about

transactions
connection pools
read/write copies

Database is configured to store key/value in two columns: key is primary key, value of type json. Primary is of type string, so PG limitation on keys and string storage apply.

PostgresKeyValue deliberately tries not to be fully compatible with Hash. But it does offer a similar interface. Mainly because such an opaque abstraction is leaky:

A table with KV storages can, by design, grow very large, whereas a hash is memory bound. so features like iterators store.each {|k,v| ... } or store.to_a require the underlying limits to leak through. We'd then need logic, config, etc to handle when the database becomes too big for memory to hold.

We allow keys only to be strings, and not "anything" as hash does. The database stores keys as strings, so if we'd allow "anything" as key, the marshalling or serializing would not only become complex, it puts a performance hit on all usage: so the ones using it with strings as keys would become slower too.

The values are serialized using JSON. This is lossy. This is by-design, but for security reasons. Marshalling code object.marshall retains the entire state, including methods, or callbacks and allows the provider of data to even monkeypatch your ruby codebase. We chose for JSON, as that is simplest, and therefore secured from these attacks (unless JSON.parse is vulnarable, which is not unthinkable).

Many methods on Hash don't make a lot of sense either. E.g. most methods that operate on the entire hash, like transform_keys! or compact have little use in a pure KV lookup system. When in need of such operations, you probably need an actual database-table (which, not by coincidence, the connection already offers!)

Another reason for not wanting to have feature-parity with Hash, is that it would grow this gem far beyond "simple", without there being a clear need for all the added features. Hash is really large! Rather, if there are features you need, raise an issue (or write a patch) so we can determine if it fits the scope and is worth the extra code.

The keys are indexed, this is default for Postgresql for primary keys, using btree. This causes some slowdown on-insert but fast reads. In cases where you have a small dataset. and many writes but fewer reads, this causes notable performance penalties. Removing the index may help. But only for small datasets, as the upsert-behaviour causes every write to perform a lookup anyway: on large datasets, writing will become slow without this index.

Alternatives

Ruby core has Hash, and DBM. Redis is another alternative that may be better suited too.

PostgresKeyValue is most helpful when:

You are using Postgres for other storage already and want to avoid extra services (=complexity, cost, sysadmin)
You have your service spread out over multiple servers OR
You cannmot write files on disk
Your KV database is too big to fit in memory

An in-memory database is -by far- the fastest and easiest. Just a Hash.new and be done with it. Nothing beats this! Except when other processes, threads or servers need to access it. A singelton-pattern with Hash might help, but that comes with added complexity and potential race-conditions.

Storing the KV on disk is very easy too. Especially since Ruby comes with DBM in its stdlib. This is a dedicated key-value database stored in a file (berkely database). Downside is that the speed of the disk matters (much, slower than memory!) and that all services wanting to read and write need access to the disk. Race-conditions and locking issues must be solved too in a setup where multiple processes use one such database-file.

Redis is another obvious solution. Depending on the network, this is often faster than the RDM on disk. But Redis comes with its own swath of issues. Most of them are logical tradeoffs to keep things fast and simple. But the biggest Downside is that it requires you to manage (and/or pay for) an extra service, when you quite often have a postgres at hand already. Other downsides are that redis isn't fully acid-complient. It can be configured to write backups to disk, but it really isn't a good use-case to store data that cannot be re-created. It's perfect for things like cache, temporary tokens, or projections (which can be re-build from primary storage or other services) but not very solid for data that doesn't live elsewhere, like a queue with to-be-sent emails, events or commands: if the database crashes, there's no way to re-send those emails, re-emit the events or re-schedule the commands: the data is gone forever. Postgres has this solved with its WAL.

We have a benchmark, that compares Hash, RDM, Redis and PostgresKeyValue. You can run it with RUN_ALTERNATIVES_BENCHMARKS=true b rake benchmark.

It is clear that PostgresKeyValue is slowest. Especially when writing.

                     user     system      total        real
Write Hash       1.010123   0.000000   1.010123 (  1.010157)
Write DBM        9.648655   0.452455  10.101110 ( 10.104472)
Write Redis      4.492164   0.996256   5.488420 (  8.091726)
Write Postgres  17.056292   9.601811  26.658103 (293.735055)
Read Hash        0.227127   0.000000   0.227127 (  0.227968)
Read DBM        10.124688   0.118673  10.243361 ( 10.277904)
Read Redis       4.066872   1.013194   5.080066 (  7.109471)
Read Postgres    1.453755   0.966952   2.420707 (  9.943543)

This was run with postgresql and redis in a throttled (1 CPU - standard settings) docker on localhost. So network overhead can be neglegted, but CPU is a limiting factor. And writing to PostgresKeyValue database requires indexing, which is CPU-heavy. This is also the reason why real time is so much higher for postgres: the ruby side is waiting a lot for postgresql server to finish writing.

(Please note that for this benchmark you need a redis server running and must have compiled ruby with DBM support.)

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

In order to run the comparison benchmark, which compares this gem to Ruby Hash, YAML/DBM (both stdlib) and Redis, you need a ruby compiled with dbm, which is the berkely DB installed. For Ubuntu, this means installing libdb-dev (sudo apt install libdb-dev ) and then recompiling ruby (rbenv install --force if you use rbenv). And you need a redis server running.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/berkes/postgres_key_value. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the PostgresKeyValue project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
benchmark		benchmark
bin		bin
lib		lib
test		test
.env.template		.env.template
.gitignore		.gitignore
.rubocop.yml		.rubocop.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
postgres_key_value.gemspec		postgres_key_value.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PostgresKeyValue

PostgresKeyValue is not finished!

Installation

Usage

Utils

Technical details

Alternatives

Development

Contributing

License

Code of Conduct

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

berkes/postgres_key_value

Folders and files

Latest commit

History

Repository files navigation

PostgresKeyValue

PostgresKeyValue is not finished!

Installation

Usage

Utils

Technical details

Alternatives

Development

Contributing

License

Code of Conduct

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages