Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapter for FoundationDB #51

Closed
fire opened this issue Jun 8, 2019 · 43 comments
Closed

Adapter for FoundationDB #51

fire opened this issue Jun 8, 2019 · 43 comments
Labels

Comments

@fire
Copy link

fire commented Jun 8, 2019

Hi Cabol,

I was wondering what approach should I go to implementing an adapter for FoundationDB?

Thanks.

@cabol
Copy link
Owner

cabol commented Jun 8, 2019

Hey @fire, well I haven't worked with FoundationDB yet, but overall, I'd suggest:

  • Review the Nebulex.Adapter behaviour and try to map the basic cache callbacks to the provided FoundationDB API. I would start with init (here is where you have to initialize your client to connect to FoundationDB), get, set, delete and so on. I don't know if there is an existing FoundationDB client for Elixir already, otherwise, I think you should start with that, at least a basic one.

  • Then, you should review some adapter implementations, like NebulexRedisAdapter and NebulexMemcachedAdapter, in order to have a better idea of how to implement your adapter. And again, try to implement the init callback, establish connection(s) with FoundationDB, etc. Then move ahead with get, set, delete, etc.

  • Once you have the Nebulex.Adapter behaviour implemented, continue with Nebulex.Adapter.Transaction behaviour (if it is possible of course).

Anything I can help you with, just let me know. On the other hand, it is great to know you are interested to contribute with an adapter like this 👍

@cabol
Copy link
Owner

cabol commented Jun 13, 2019

Any progress with the adapter? Do you have a GH repo to take a look at it? or is it a private project?

@fire
Copy link
Author

fire commented Jun 13, 2019

I only have limited bandwidth to work on this, so the plan is to get a few hours in the weekend.

@cabol
Copy link
Owner

cabol commented Jun 13, 2019

Cool, so in the meantime, I will close this issue!

@cabol cabol closed this as completed Jun 13, 2019
@fire
Copy link
Author

fire commented Jun 23, 2019

I have today to play with this.

https://github.com/ananthakumaran/fdb Is the elixir client.

Question: Should I use the direct foundation database layers or through a mongodb translation?

  • init
  • get
  • set
  • delete

Note: Everything in fdb is in a transaction. Batching?

@fire
Copy link
Author

fire commented Jun 23, 2019

@fire
Copy link
Author

fire commented Jun 23, 2019

init

# https://github.com/ananthakumaran/fdb#getting-started
:ok = FDB.start(610)

db = FDB.Database.create(cluster_file_path)

FDB.Database.transact(db, fn transaction ->
  value = FDB.Transaction.get(transaction, key)
  :ok = FDB.Transaction.set(transaction, key, value <> "hello")
end)

@fire
Copy link
Author

fire commented Jun 23, 2019

Get and set reference code:

# https://github.com/ananthakumaran/fdb#coder
alias FDB.{Transaction, Database, KeySelectorRange}
alias FDB.Coder.{Integer, Tuple, NestedTuple, ByteString, Subspace}

coder =
  Transaction.Coder.new(
    Subspace.new(
      {"ts", ByteString.new()},
      Tuple.new({
        # date
        NestedTuple.new({
          # year, month, date
          NestedTuple.new({Integer.new(), Integer.new(), Integer.new()}),
          # hour, minute, second
          NestedTuple.new({Integer.new(), Integer.new(), Integer.new()})
        }),
        # website
        ByteString.new(),
        # page
        ByteString.new(),
        # browser
        ByteString.new()
      })
    ),
    Integer.new()
  )
db = Database.create(%{coder: coder})

Database.transact(db, fn t ->
  m = Transaction.get(t, {{{2018, 03, 01}, {1, 0, 0}}, "www.github.com", "/fdb", "mozilla"})
  c = Transaction.get(t, {{{2018, 03, 01}, {1, 0, 0}}, "www.github.com", "/fdb", "chrome"})
end)

range = KeySelectorRange.starts_with({{{2018, 03, 01}}})
result =
  Database.get_range_stream(db, range)
  |> Enum.to_list()

TODO: Convert to directory in FDB instead of subspaces.

@fire
Copy link
Author

fire commented Jun 23, 2019

Created a git repo.

Install FDB on Windows.

https://www.foundationdb.org/download/

Install elixir from chocolatey.

Ensure visual studio 2017 is installed.

git clone https://github.com/fire/nebulex_fdb_adapter

Open native command prompt for Visual Studio.

mix deps.get
iex --werl -S mix

@fire
Copy link
Author

fire commented Jun 23, 2019

A complete example using the regular foundation db api without the directory layer.

:ok = FDB.start(600)
cluster_file_path = 'test'
db = FDB.Cluster.create(cluster_file_path) |> FDB.Database.create()
alias FDB.Coder.{Integer, Tuple, NestedTuple, ByteString, Subspace}
# set
FDB.Database.transact(db, 
fn transaction ->
  :ok = FDB.Transaction.set(transaction, "key", "hello")
end
)
# get
FDB.Database.transact(db, 
fn transaction ->
  FDB.Transaction.get(transaction, "key")
end
)
# delete
FDB.Database.transact(db, 
fn transaction ->
  FDB.Transaction.clear(transaction, "key")
end
)
# get
FDB.Database.transact(db, 
fn transaction ->
  FDB.Transaction.get(transaction, "key")
end
)

@fire
Copy link
Author

fire commented Jun 23, 2019

Complete example with directories in FoundationDB.

alias FDB.{Directory, Transaction, Database, KeySelectorRange}
alias FDB.Coder.{Integer, Tuple, NestedTuple, ByteString, Subspace}
:ok = FDB.start(600)
cluster_file_path = 'test'
db = FDB.Cluster.create(cluster_file_path) |> FDB.Database.create()
root = Directory.new()
dir = Database.transact(db, fn tr ->
  Directory.create_or_open(root, tr, ["nebulex", "test"])
end)
test_dir = Subspace.new(dir)
coder = Transaction.Coder.new(test_dir)
test_db = FDB.Database.set_defaults(db, %{coder: coder})
# set
FDB.Database.transact(test_db , 
fn transaction ->
  :ok = FDB.Transaction.set(transaction, "key", "hello")
end
)
# get
FDB.Database.transact(test_db , 
fn transaction ->
  FDB.Transaction.get(transaction, "key")
end
)
# delete
FDB.Database.transact(test_db , 
fn transaction ->
  FDB.Transaction.clear(transaction, "key")
end
)
# get
FDB.Database.transact(test_db , 
fn transaction ->
  FDB.Transaction.get(transaction, "key")
end
)
# List directories
Database.transact(db, fn tr ->
  Directory.list(root, tr, ["nebulex"])
end)

@cabol
Copy link
Owner

cabol commented Jun 23, 2019

Hey!

Question: Should I use the direct foundation database layers or through a mongodb translation?

I don't understand the question, I haven't checked/used FDB before, so I didn't know there was a kind of "mongodb translation". But overall, I'd try to use it directly with fdb client without a translation.

Note: Everything in fdb is in a transaction. Batching?

Not necessary, you can just wrap up each callback impl with the FDB transaction (this is internal on each function for the adapter impl). For example:

@impl Nebulex.Adapter
def get(cache, key, opts) do
  # get the db instace (the init creates the cluster, perhaps you can create a module to hold the db)
  FDB.Database.transact(db, fn transaction ->
    value = FDB.Transaction.get(transaction, key)
    build_object(key, value, opts) # maybe a function to create Nebulex.Object.t()
  end)
end

The most important part at the beginning is the init, you have to create the adapter's children, for example. For this case, you can create the DB and store it somewhere (and ETS for metadata or in a separate GenServer, etc.) so it can be re-used to perform the commands.

Also, check out how the client works, if it runs in a separate app, or you can start the supervision tree from the init callback to run everything in the same app.

Let me know once you have something in order to review it and help you more! Stay tuned!

@fire
Copy link
Author

fire commented Jun 23, 2019

Since the database is loaded from config files I would probably use a gen server with initial settings loaded from a configuration file or somewhere else like a cluster key set dynamically. Although I think the database object isn't shareable between different computers. Advice?

I have to look up gen servers api, I don't remember using it before.

Can you clarify?

The most important part at the beginning is the init, you have to create the adapter's children, for example. For this case, you can create the DB and store it somewhere (and ETS for metadata or in a separate GenServer, etc.) so it can be re-used to perform the commands.

Also, check out how the client works, if it runs in a separate app, or you can start the supervision tree from the init callback to run everything in the same app.

@fire
Copy link
Author

fire commented Jun 23, 2019

How would I add the concept of a database storage path to the nebulex api?

@cabol
Copy link
Owner

cabol commented Jun 23, 2019

Although I think the database object isn't shareable between different computers. Advice?

Each cache should have its own DB object. So you can use either a GenServer or an ETS table. In other to avoid unnecessary message passing, I'd use an ETS table (called metadata or something) to store everything related to DB (maybe other info aside from DDB instance/struct).

Can you clarify?

For example, within the __before_compile__ I'd create some helper function to retrieve info like the DB instance, like so:

@impl true
defmacro __before_compile__(env) do
  ## ... maybe other things
  def __db__ do
    :ets.lookup_element(:meta, :db, 2)
  end
end

For the init:

@impl true
def init(_opts) do
  :meta = :ets.new(:meta, [:named_table, :public, {:readd_concurrency, true}])
  :ok = FDB.start(610)
  db = FDB.Database.create(cluster_file_path)
  true = :ets.insert(:meta, {:db, db})
  {:ok, []}
end

And from the adapter's functions:

@impl Nebulex.Adapter
def get(cache, key, opts) do
  FDB.Database.transact(cache.__db__, fn transaction ->
    value = FDB.Transaction.get(transaction, key)
    build_object(key, value, opts) # maybe a function to create Nebulex.Object.t()
  end)
end

Is it clear?

How would I add the concept of a database storage path to the nebulex api?

Please elaborate more on this, what do you mean with add the concept of a database storage path to the nebulex api (I think you don't need to add any concept to the Nebulex API)

@cabol cabol reopened this Jun 23, 2019
@cabol cabol added the feature label Jun 23, 2019
@fire
Copy link
Author

fire commented Jun 23, 2019

Should the ets table be namespaced with nebulex_fdb_meta?

So in Foundation Database I have this concept of a directory path. So for a particular key-value database it would be in alpha/db. My best guess is to put it in the init options to give a path, but I wasn't sure if there's a place to put stuff like that there.

@cabol
Copy link
Owner

cabol commented Jun 23, 2019

Should the ets table be namespaced with nebulex_fdb_meta?

It doesn't matter, just use the name of the adapter module (__MODULE__).

So in Foundation Database I have this concept of a directory path. So for a particular key-value database it would be in alpha/db. My best guess is to put it in the init options to give a path, but I wasn't sure if there's a place to put stuff like that there.

Ok, yes, that is something you can put and then load from the config file. The important things you know you will need in the adapter, you can use __before_compile__ , for example:

@impl true
defmacro __before_compile__(env) do
  cache = env.module
  config = Module.get_attribute(cache, :config) # your config
  path = Keyword.fetch!(config, :db_path)

  quote do
    def __db_path__, do: unquote(path)

    def __db__ do
      :ets.lookup_element(:meta, :db, 2)
     end  
  end
end

@fire
Copy link
Author

fire commented Jun 23, 2019

How do you write tests for adapters?

@fire
Copy link
Author

fire commented Jun 23, 2019

@fire
Copy link
Author

fire commented Jun 23, 2019

Currently stuck on initializing the module in am iex console. Can't seem to call the init function.

defmodule NebulexFdbAdapter.TestCache do
  use Nebulex.Cache,
    otp_app: :nebulex_fdb_adapter,
    adapter: NebulexFdbAdapter,
    config: "test",
    cluster_file_path: "test",
    db_path: ["nebulex", "test"]
end
NebulexFdbAdapter.TestCache.get("hi")

@cabol
Copy link
Owner

cabol commented Jun 23, 2019

You are calling :ok = FDB.start(600) within __before_compile__ , it has to be within the init callback.

@fire
Copy link
Author

fire commented Jun 23, 2019

I believe the FDB.start(600) is for initializing the shared library so before_compile is correct.

After turning the fdb database to be save to ssd, the current code can now set, get and delete.

@fire
Copy link
Author

fire commented Jun 24, 2019

What do I return for https://hexdocs.pm/nebulex/Nebulex.Adapter.html#c:get_many/3 inside of the adapter?

@cabol
Copy link
Owner

cabol commented Jun 24, 2019

The __before_compile__ is to inject code in compile-time, not to initialize things. The best place to do what you are describing is init callback, but it's your call!

What do I return for https://hexdocs.pm/nebulex/Nebulex.Adapter.html#c:get_many/3 inside of the adapter?

Returns a map: %{key => Nebulex.Object.t()}

@fire
Copy link
Author

fire commented Jun 24, 2019

The problem with FDB.start(600) in the init call is it exceptions on the second call :|. According to the docs it can only be used once per elixir startup.

@cabol
Copy link
Owner

cabol commented Jun 24, 2019

I understand. Perhaps you should consider initialize it from the app using the FDB adapter. I think there should be a better way to initialize this library (C NIF), I'd suggest reviewing other options. Anyways, for the moment if within the __before_compile__ works for you, I suppose it's ok, until you find a better option.

@fire
Copy link
Author

fire commented Jun 24, 2019

A basic set many and get many is in.

I'll ask if it's possible from the library to return the standard :ok, {:error, exception} on the FDB.start(600) code.

@fire
Copy link
Author

fire commented Jun 24, 2019

I'm having a hard time storing arbitrary data in the bytestring.

I was thinking of storing the keys and the values in Erlang Term format or flatbuffers. Thoughts?

@fire
Copy link
Author

fire commented Jun 24, 2019

I solved the problem with key "1" and value 1 not being accepted by converting all stored values to erlang binary terms.

How do you recommend benchmarking?

@cabol
Copy link
Owner

cabol commented Jun 24, 2019

I solved the problem with key "1" and value 1 not being accepted by converting all stored values to erlang binary terms.

Yes, when you have issues due to the data types. sometimes is better just use term_to_binary and binary_to_term.

How do you recommend benchmarking?

You can use benchee, that's the tool I've used to include bench tests in the project. Check out:

@fire
Copy link
Author

fire commented Jun 24, 2019

I created a benchee benchmark, but I don't understand how to make the system use more than one core.

The nebulex system doesn't seem to use more than a core.

fdb-nebulex-elixir

@cabol
Copy link
Owner

cabol commented Jun 24, 2019

The nebulex system doesn't seem to use a threadpool.

First of all, threadpool? If you meant be able to use the multicore arch, it is not about Nebulex, you have to configure the Erlang VM (e.g.: -smp auto)

@fire
Copy link
Author

fire commented Jun 24, 2019

I mis-spoke, I mean when I run the benchmark it only causes 100% one core in the system monitor.

Not sure where in the system it is being limited.

@fire
Copy link
Author

fire commented Jun 24, 2019

I was mistaken, I only started one FDB server process, I should start a number related to the number of cores on my computer.

@cabol
Copy link
Owner

cabol commented Jun 24, 2019

Oh, I see, but IMHO that's something the library should handle. anyways, I'm glad that it is working for you now 😄 !!

@fire
Copy link
Author

fire commented Jun 26, 2019

Something is weird on benchee:

  • get does 2157 iterations per second
  • get_many does 247.64 iterations per second

Any ideas?

Guesses:

  • get_many does each fetch sequentially before returning

@cabol
Copy link
Owner

cabol commented Jun 26, 2019

get_many does each fetch sequentially before returning

Indeed, that depends on the adapter implementation, for the local adapter since it is based on ETS tables, there is no way to perform a get_many in constant time, the complexity is O(N). Same happens in Redis, but the thing is in Redis you can just execute one command but internally the complexity is O(N).

@fire
Copy link
Author

fire commented Jun 27, 2019

Why does your benchee get_many always from 1 to 10 keys? Doesn't this cause problems?

@fire
Copy link
Author

fire commented Jun 29, 2019

I integrated poolboy.

Benchee seems inaccurate. The database server queries stats is much higher than the operations per second in benchee.

@fire
Copy link
Author

fire commented Jun 30, 2019

I will look for another benchmarking tool, benchee requires custom formatters to get accurate numbers and is hard to use. The fdb elixir library maintainer was able to explain why the fdbcli status numbers didn't match the benchee numbers.

EDITED:

Hard to use means that the library doesn't abstract the concept of an operation and there doesn't seem to be inputs for zipfian distribution.

@cabol
Copy link
Owner

cabol commented Jun 30, 2019

I'd suggest basho_bench, for load tests and benchmarks the tool is great, I've used it several times. This is an example (to run load tests for Nebulex) nebulex_bench.

@fire
Copy link
Author

fire commented Jul 29, 2019

The adapter works, but it's not ready for production. The documentation isn't great and the project using the FDB adapter is on standby.

However, the fdb adapter works!

@fire fire closed this as completed Jul 29, 2019
@cabol
Copy link
Owner

cabol commented Jul 29, 2019

Awesome 😄 !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants