Skip to content

Commit

Permalink
Improve docs and fix dialyxir warn when running tests
Browse files Browse the repository at this point in the history
  • Loading branch information
cabol committed Jan 6, 2021
1 parent 7055b80 commit 1db0c9f
Show file tree
Hide file tree
Showing 5 changed files with 147 additions and 45 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,20 @@ Furthermore, it enables the implementation of different
[distributed cache topologies][cache_topologies],
and more.

[ecto]: https://github.com/elixir-ecto/ecto
[cache_patterns]: https://github.com/ehcache/ehcache3/blob/master/docs/src/docs/asciidoc/user/caching-patterns.adoc
[cache_topologies]: https://docs.oracle.com/middleware/1221/coherence/develop-applications/cache_intro.htm

Nebulex is commonly used to interact with different cache implementations and/or
stores (such as Redis, Memcached, or other implementations of cache in Elixir),
being completely agnostic from them, avoiding the vendor lock-in.
stores (such as Redis, Memcached, or even other Elixir cache implementations
like [Cachex][cachex]), being completely agnostic from them, avoiding the vendor
lock-in.

See the [getting started guide](http://hexdocs.pm/nebulex/getting-started.html)
and the [online documentation](http://hexdocs.pm/nebulex/Nebulex.html)
for more information.

[ecto]: https://github.com/elixir-ecto/ecto
[cachex]: https://github.com/whitfin/cachex
[cache_patterns]: https://github.com/ehcache/ehcache3/blob/master/docs/src/docs/asciidoc/user/caching-patterns.adoc
[cache_topologies]: https://docs.oracle.com/middleware/1221/coherence/develop-applications/cache_intro.htm

## Usage

You need to add `nebulex` as a dependency to your `mix.exs` file. However, in
Expand Down
9 changes: 6 additions & 3 deletions lib/nebulex/adapters/local.ex
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,19 @@ defmodule Nebulex.Adapters.Local do
(which is more than enough) also referred like the `newer` and
the `older`.
## Features
## Overall features
* Configurable backend (`ets` or `:shards`).
* Expiration – A status based on TTL (Time To Live) option. To maintain
cache performance, expired entries may not be immediately flushed or
evicted, they are expired or evicted on-demand, when the key is read.
* Eviction – [Generational Garbage Collection](http://hexdocs.pm/nebulex/Nebulex.Adapters.Local.Generation.html).
* Eviction – [Generational Garbage Collection][gc].
* Sharding – For intensive workloads, the Cache may also be partitioned
(by using `:shards` backend and specifying the `:partitions` option).
* Support for transactions via Erlang global name registration facility.
* Support for stats.
[gc]: http://hexdocs.pm/nebulex/Nebulex.Adapters.Local.Generation.html
## Options
Expand Down Expand Up @@ -80,7 +83,7 @@ defmodule Nebulex.Adapters.Local do
starts and there are few entries or the consumed memory is near to `0`.
Defaults to `600_000` (10 minutes).
## Example
## Usage
`Nebulex.Cache` is the wrapper around the cache. We can define a
local cache as follows:
Expand Down
90 changes: 66 additions & 24 deletions lib/nebulex/adapters/partitioned.ex
Original file line number Diff line number Diff line change
Expand Up @@ -2,41 +2,83 @@ defmodule Nebulex.Adapters.Partitioned do
@moduledoc ~S"""
Built-in adapter for partitioned cache topology.
A partitioned cache is a clustered, fault-tolerant cache that has linear
scalability. Data is partitioned among all the machines of the cluster.
For fault-tolerance, partitioned caches can be configured to keep each piece
of data on one or more unique machines within a cluster. This adapter
in particular hasn't fault-tolerance built-in, each piece of data is kept
in a single node/machine (sharding), therefore, if a node fails, the data
kept by this node won't be available for the rest of the cluster.
## Overall features
* Partitioned cache topology (Sharding Distribution Model).
* Configurable primary storage adapter.
* Configurable Keyslot to distributed the keys across the cluster members.
* Support for transactions via Erlang global name registration facility.
* Stats support rely on the primary storage adapter.
## Partitioned Cache Topology
There are several key points to consider about a partitioned cache:
* _**Partitioned**_: The data in a distributed cache is spread out over
all the servers in such a way that no two servers are responsible for
the same piece of cached data. This means that the size of the cache
and the processing power associated with the management of the cache
can grow linearly with the size of the cluster. Also, it means that
operations against data in the cache can be accomplished with a
"single hop," in other words, involving at most one other server.
* _**Load-Balanced**_: Since the data is spread out evenly over the
servers, the responsibility for managing the data is automatically
load-balanced across the cluster.
* _**Ownership**_: Exactly one node in the cluster is responsible for each
piece of data in the cache.
* _**Point-To-Point**_: The communication for the partitioned cache is all
point-to-point, enabling linear scalability.
* _**Location Transparency**_: Although the data is spread out across
cluster nodes, the exact same API is used to access the data, and the
same behavior is provided by each of the API methods. This is called
location transparency, which means that the developer does not have to
code based on the topology of the cache, since the API and its behavior
will be the same with a local cache, a replicated cache, or a distributed
cache.
* _**Failover**_: Failover of a distributed cache involves promoting backup
data to be primary storage. When a cluster node fails, all remaining
cluster nodes determine what data each holds in backup that the failed
cluster node had primary responsible for when it died. Those data becomes
the responsibility of whatever cluster node was the backup for the data.
However, this adapter does not provide fault-tolerance implementation,
each piece of data is kept in a single node/machine (via sharding), then,
if a node fails, the data kept by this node won't be available for the
rest of the cluster memebers.
> Based on **"Distributed Caching Essential Lessons"** by **Cameron Purdy**
and [Coherence Partitioned Cache Service][oracle-pcs].
[oracle-pcs]: https://docs.oracle.com/cd/E13924_01/coh.340/e13819/partitionedcacheservice.htm
## Additional implementation notes
`:pg2` or `:pg` (>= OTP 23) is used under-the-hood by the adapter to manage
the cluster nodes. When the partitioned cache is started in a node, it creates
a group and joins it (the cache supervisor PID is joined to the group). Then,
when a function is invoked, the adapter picks a node from the node list
(using the group members), and then the function is executed on that node.
In the same way, when the supervisor process of the partitioned cache
dies, the PID of that process is automatically removed from the PG group;
this is why it's recommended to use a consistent hashing algorithm for the
node selector.
when a function is invoked, the adapter picks a node from the group members,
and then the function is executed on that specific node. In the same way,
when a partitioned cache supervisor dies (the cache is stopped or killed for
some reason), the PID of that process is automatically removed from the PG
group; this is why it's recommended to use consistent hashing for distributing
the keys across the cluster nodes.
> **NOTE:** `pg2` will be replaced by `pg` in future, since the `pg2` module
is deprecated as of OTP 23 and scheduled for removal in OTP 24.
This adapter depends on a local cache adapter (primary storage), it adds
a thin layer on top of it in order to distribute requests across a group
of nodes, where is supposed the local cache is running already. However,
you don't need to define or declare an additional cache module for the
local store, instead, the adapter initializes it automatically (adds the
local cache store as part of the supervision tree) based on the given
options within the `primary:` argument.
## Features
you don't need to define any additional cache module for the primary
storage, instead, the adapter initializes it automatically (it adds the
primary storage as part of the supervision tree) based on the given
options within the `primary_storage_adapter:` argument.
* Support for partitioned topology (Sharding Distribution Model).
* Support for transactions via Erlang global name registration facility.
* Configurable primary storage adapter (local cache adapter).
* Configurable keyslot module to compute the node.
## Usage
When used, the Cache expects the `:otp_app` and `:adapter` as options.
The `:otp_app` should point to an OTP application that has the cache
Expand Down Expand Up @@ -112,7 +154,7 @@ defmodule Nebulex.Adapters.Partitioned do
* `:keyslot` - Defines the module implementing `Nebulex.Adapter.Keyslot`
behaviour.
* `task_supervisor_opts` - Start-time options passed to
* `:task_supervisor_opts` - Start-time options passed to
`Task.Supervisor.start_link/1` when the adapter is initialized.
## Shared options
Expand Down
77 changes: 66 additions & 11 deletions lib/nebulex/adapters/replicated.ex
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,31 @@ defmodule Nebulex.Adapters.Replicated do
@moduledoc ~S"""
Built-in adapter for replicated cache topology.
The replicated cache excels in its ability to handle data replication,
concurrency control and failover in a cluster, all while delivering
in-memory data access speeds. A clustered replicated cache is exactly
what it says it is: a cache that replicates its data to all cluster nodes.
## Overall features
* Replicated cache topology.
* Configurable primary storage adapter.
* Cache-level locking when flushing cache or adding new nodes.
* Key-level (or entry-level) locking for key-based write-like operations.
* Support for transactions via Erlang global name registration facility.
* Stats support rely on the primary storage adapter.
## Replicated Cache Topology
A replicated cache is a clustered, fault tolerant cache where data is fully
replicated to every member in the cluster. This cache offers the fastest read
performance with linear performance scalability for reads but poor scalability
for writes (as writes must be processed by every member in the cluster).
Because data is replicated to all servers, adding servers does not increase
aggregate cache capacity.
There are several challenges to building a reliably replicated cache. The
first is how to get it to scale and perform well. Updates to the cache have
to be sent to all cluster nodes, and all cluster nodes have to end up with
the same data, even if multiple updates to the same piece of data occur at
the same time. Also, if a cluster node requests a lock, ideally it should
not have to get all cluster nodes to agree on the lock or at least do it in
a very efficient way (`:global` is used for this), otherwise it will scale
a very efficient way (`:global` is used here), otherwise it will scale
extremely poorly; yet in the case of a cluster node failure, all of the data
and lock information must be kept safely.
Expand All @@ -25,16 +38,18 @@ defmodule Nebulex.Adapters.Replicated do
However, there are some limitations:
* <ins>Cost Per Update</ins> - Updating a replicated cache requires pushing
the new version of the data to all other cluster members, which will limit
scalability if there is a high frequency of updates per member.
* _**Cost Per Update**_ - Updating a replicated cache requires pushing
the new version of the data to all other cluster members, which will
limit scalability if there is a high frequency of updates per member.
* <ins>Cost Per Entry</ins> - The data is replicated to every cluster
member, so Memory Heap space is used on each member, which will impact
* _**Cost Per Entry**_ - The data is replicated to every cluster member,
so Memory Heap space is used on each member, which will impact
performance for large caches.
> Based on **"Distributed Caching Essential Lessons"** by **Cameron Purdy**.
## Usage
When used, the Cache expects the `:otp_app` and `:adapter` as options.
The `:otp_app` should point to an OTP application that has the cache
configuration. For example:
Expand Down Expand Up @@ -87,7 +102,7 @@ defmodule Nebulex.Adapters.Replicated do
with the local primary storage. These options will depend on the local
adapter to use.
* `task_supervisor_opts` - Start-time options passed to
* `:task_supervisor_opts` - Start-time options passed to
`Task.Supervisor.start_link/1` when the adapter is initialized.
## Shared options
Expand Down Expand Up @@ -117,6 +132,46 @@ defmodule Nebulex.Adapters.Replicated do
MyCache.nodes()
MyCache.nodes(:cache_name)
## Caveats of replicated adapter
As it is explained in the beginning, a replicated topology not only brings
with advantages (mostly for reads) but also with some limitations and
challenges.
This adapter uses global locks (via `:global`) for all operation that modify
or alter the cache somehow to ensure as much consistency as possible across
all members of the cluster. These locks may be per key or for the entire cache
depending on the operation taking place. For that reason, it is very important
to be aware about those operation that can potentally lead to performance and
scalability issues, so that you can do a better usage of the replicated
adapter. The following is with the operations and aspects you should pay
attention to:
* Starting and joining a new replicated node to the cluster is the most
expensive action, because all write-like operations across all members of
the cluster are blocked until the new node completes the synchronization
process, which involves copying cached data from any of the existing
cluster nodes into the new node, and this could be very expensive
depending on the number of caches entries. For that reason, adding new
nodes is something exceptional and expected to happen once in a while.
* Flushing cache. When flush action is executed, like in the previous case,
all write-like operations across all members of the cluster are blocked
until the flush is completed (this implies flushing the cached data from
all cluster nodes). Therefore, flushing the cache is also considered an
exceptional case that happens only once in while.
* Write-like operations based on a key only block operations related to
that key across all members of the cluster. This is not as critical as
the previous two cases but it is something to keep in mind anyway because
if there is a highly demanded key in terms of writes, that could be also
a potential bottleneck.
Summing up, the replicated cache topology along with this adapter should
be used mainly when the the reads clearly dominate over the writes (e.g.:
Reads 80% and Writes 20% or less) Also, flushing cache and adding new nodes
must be exceptional cases happening only once in a while to avoid performance
issues.
"""

# Provide Cache Implementation
Expand Down
2 changes: 1 addition & 1 deletion test/support/cluster.ex
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ defmodule Nebulex.Cluster do
rpc(node, Application, :ensure_all_started, [:mix])
rpc(node, Mix, :env, [Mix.env()])

for {app_name, _, _} <- Application.loaded_applications() do
for {app_name, _, _} <- Application.loaded_applications(), app_name not in [:dialyxir] do
rpc(node, Application, :ensure_all_started, [app_name])
end
end
Expand Down

0 comments on commit 1db0c9f

Please sign in to comment.