Improve docs and fix dialyxir warn when running tests

cabol · Jan 6, 2021 · 1db0c9f · 1db0c9f
1 parent 7055b80
commit 1db0c9f
Show file tree

Hide file tree

Showing 5 changed files with 147 additions and 45 deletions.
diff --git a/README.md b/README.md
@@ -17,18 +17,20 @@ Furthermore, it enables the implementation of different
 [distributed cache topologies][cache_topologies],
 and more.
 
-[ecto]: https://github.com/elixir-ecto/ecto
-[cache_patterns]: https://github.com/ehcache/ehcache3/blob/master/docs/src/docs/asciidoc/user/caching-patterns.adoc
-[cache_topologies]: https://docs.oracle.com/middleware/1221/coherence/develop-applications/cache_intro.htm
-
 Nebulex is commonly used to interact with different cache implementations and/or
-stores (such as Redis, Memcached, or other implementations of cache in Elixir),
-being completely agnostic from them, avoiding the vendor lock-in.
+stores (such as Redis, Memcached, or even other Elixir cache implementations
+like [Cachex][cachex]), being completely agnostic from them, avoiding the vendor
+lock-in.
 
 See the [getting started guide](http://hexdocs.pm/nebulex/getting-started.html)
 and the [online documentation](http://hexdocs.pm/nebulex/Nebulex.html)
 for more information.
 
+[ecto]: https://github.com/elixir-ecto/ecto
+[cachex]: https://github.com/whitfin/cachex
+[cache_patterns]: https://github.com/ehcache/ehcache3/blob/master/docs/src/docs/asciidoc/user/caching-patterns.adoc
+[cache_topologies]: https://docs.oracle.com/middleware/1221/coherence/develop-applications/cache_intro.htm
+
 ## Usage
 
 You need to add `nebulex` as a dependency to your `mix.exs` file. However, in

diff --git a/lib/nebulex/adapters/local.ex b/lib/nebulex/adapters/local.ex
@@ -15,16 +15,19 @@ defmodule Nebulex.Adapters.Local do
   (which is more than enough) also referred like the `newer` and
   the `older`.
 
-  ## Features
+  ## Overall features
 
     * Configurable backend (`ets` or `:shards`).
     * Expiration – A status based on TTL (Time To Live) option. To maintain
       cache performance, expired entries may not be immediately flushed or
       evicted, they are expired or evicted on-demand, when the key is read.
-    * Eviction – [Generational Garbage Collection](http://hexdocs.pm/nebulex/Nebulex.Adapters.Local.Generation.html).
+    * Eviction – [Generational Garbage Collection][gc].
     * Sharding – For intensive workloads, the Cache may also be partitioned
       (by using `:shards` backend and specifying the `:partitions` option).
     * Support for transactions via Erlang global name registration facility.
+    * Support for stats.
+
+  [gc]: http://hexdocs.pm/nebulex/Nebulex.Adapters.Local.Generation.html
 
   ## Options
 
@@ -80,7 +83,7 @@ defmodule Nebulex.Adapters.Local do
       starts and there are few entries or the consumed memory is near to `0`.
       Defaults to `600_000` (10 minutes).
 
-  ## Example
+  ## Usage
 
   `Nebulex.Cache` is the wrapper around the cache. We can define a
   local cache as follows:

diff --git a/lib/nebulex/adapters/partitioned.ex b/lib/nebulex/adapters/partitioned.ex
@@ -2,41 +2,83 @@ defmodule Nebulex.Adapters.Partitioned do
   @moduledoc ~S"""
   Built-in adapter for partitioned cache topology.
 
-  A partitioned cache is a clustered, fault-tolerant cache that has linear
-  scalability. Data is partitioned among all the machines of the cluster.
-  For fault-tolerance, partitioned caches can be configured to keep each piece
-  of data on one or more unique machines within a cluster. This adapter
-  in particular hasn't fault-tolerance built-in, each piece of data is kept
-  in a single node/machine (sharding), therefore, if a node fails, the data
-  kept by this node won't be available for the rest of the cluster.
+  ## Overall features
+
+    * Partitioned cache topology (Sharding Distribution Model).
+    * Configurable primary storage adapter.
+    * Configurable Keyslot to distributed the keys across the cluster members.
+    * Support for transactions via Erlang global name registration facility.
+    * Stats support rely on the primary storage adapter.
+
+  ## Partitioned Cache Topology
+
+  There are several key points to consider about a partitioned cache:
+
+    * _**Partitioned**_: The data in a distributed cache is spread out over
+      all the servers in such a way that no two servers are responsible for
+      the same piece of cached data. This means that the size of the cache
+      and the processing power associated with the management of the cache
+      can grow linearly with the size of the cluster. Also, it means that
+      operations against data in the cache can be accomplished with a
+      "single hop," in other words, involving at most one other server.
+
+    * _**Load-Balanced**_:  Since the data is spread out evenly over the
+      servers, the responsibility for managing the data is automatically
+      load-balanced across the cluster.
+
+    * _**Ownership**_: Exactly one node in the cluster is responsible for each
+      piece of data in the cache.
+
+    * _**Point-To-Point**_: The communication for the partitioned cache is all
+      point-to-point, enabling linear scalability.
+
+    * _**Location Transparency**_: Although the data is spread out across
+      cluster nodes, the exact same API is used to access the data, and the
+      same behavior is provided by each of the API methods. This is called
+      location transparency, which means that the developer does not have to
+      code based on the topology of the cache, since the API and its behavior
+      will be the same with a local cache, a replicated cache, or a distributed
+      cache.
+
+    * _**Failover**_: Failover of a distributed cache involves promoting backup
+      data to be primary storage. When a cluster node fails, all remaining
+      cluster nodes determine what data each holds in backup that the failed
+      cluster node had primary responsible for when it died. Those data becomes
+      the responsibility of whatever cluster node was the backup for the data.
+      However, this adapter does not provide fault-tolerance implementation,
+      each piece of data is kept in a single node/machine (via sharding), then,
+      if a node fails, the data kept by this node won't be available for the
+      rest of the cluster memebers.
+
+  > Based on **"Distributed Caching Essential Lessons"** by **Cameron Purdy**
+    and [Coherence Partitioned Cache Service][oracle-pcs].
+
+  [oracle-pcs]: https://docs.oracle.com/cd/E13924_01/coh.340/e13819/partitionedcacheservice.htm
+
+  ## Additional implementation notes
 
   `:pg2` or `:pg` (>= OTP 23) is used under-the-hood by the adapter to manage
   the cluster nodes. When the partitioned cache is started in a node, it creates
   a group and joins it (the cache supervisor PID is joined to the group). Then,
-  when a function is invoked, the adapter picks a node from the node list
-  (using the group members), and then the function is executed on that node.
-  In the same way, when the supervisor process of the partitioned cache
-  dies, the PID of that process is automatically removed from the PG group;
-  this is why it's recommended to use a consistent hashing algorithm for the
-  node selector.
+  when a function is invoked, the adapter picks a node from the group members,
+  and then the function is executed on that specific node. In the same way,
+  when a partitioned cache supervisor dies (the cache is stopped or killed for
+  some reason), the PID of that process is automatically removed from the PG
+  group; this is why it's recommended to use consistent hashing for distributing
+  the keys across the cluster nodes.
 
   > **NOTE:** `pg2` will be replaced by `pg` in future, since the `pg2` module
     is deprecated as of OTP 23 and scheduled for removal in OTP 24.
 
   This adapter depends on a local cache adapter (primary storage), it adds
   a thin layer on top of it in order to distribute requests across a group
   of nodes, where is supposed the local cache is running already. However,
-  you don't need to define or declare an additional cache module for the
-  local store, instead, the adapter initializes it automatically (adds the
-  local cache store as part of the supervision tree) based on the given
-  options within the `primary:` argument.
-
-  ## Features
+  you don't need to define any additional cache module for the primary
+  storage, instead, the adapter initializes it automatically (it adds the
+  primary storage as part of the supervision tree) based on the given
+  options within the `primary_storage_adapter:` argument.
 
-    * Support for partitioned topology (Sharding Distribution Model).
-    * Support for transactions via Erlang global name registration facility.
-    * Configurable primary storage adapter (local cache adapter).
-    * Configurable keyslot module to compute the node.
+  ## Usage
 
   When used, the Cache expects the `:otp_app` and `:adapter` as options.
   The `:otp_app` should point to an OTP application that has the cache
@@ -112,7 +154,7 @@ defmodule Nebulex.Adapters.Partitioned do
     * `:keyslot` - Defines the module implementing `Nebulex.Adapter.Keyslot`
       behaviour.
 
-    * `task_supervisor_opts` - Start-time options passed to
+    * `:task_supervisor_opts` - Start-time options passed to
       `Task.Supervisor.start_link/1` when the adapter is initialized.
 
   ## Shared options

diff --git a/lib/nebulex/adapters/replicated.ex b/lib/nebulex/adapters/replicated.ex
@@ -2,18 +2,31 @@ defmodule Nebulex.Adapters.Replicated do
   @moduledoc ~S"""
   Built-in adapter for replicated cache topology.
 
-  The replicated cache excels in its ability to handle data replication,
-  concurrency control and failover in a cluster, all while delivering
-  in-memory data access speeds. A clustered replicated cache is exactly
-  what it says it is: a cache that replicates its data to all cluster nodes.
+  ## Overall features
+
+    * Replicated cache topology.
+    * Configurable primary storage adapter.
+    * Cache-level locking when flushing cache or adding new nodes.
+    * Key-level (or entry-level) locking for key-based write-like operations.
+    * Support for transactions via Erlang global name registration facility.
+    * Stats support rely on the primary storage adapter.
+
+  ## Replicated Cache Topology
+
+  A replicated cache is a clustered, fault tolerant cache where data is fully
+  replicated to every member in the cluster. This cache offers the fastest read
+  performance with linear performance scalability for reads but poor scalability
+  for writes (as writes must be processed by every member in the cluster).
+  Because data is replicated to all servers, adding servers does not increase
+  aggregate cache capacity.
 
   There are several challenges to building a reliably replicated cache. The
   first is how to get it to scale and perform well. Updates to the cache have
   to be sent to all cluster nodes, and all cluster nodes have to end up with
   the same data, even if multiple updates to the same piece of data occur at
   the same time. Also, if a cluster node requests a lock, ideally it should
   not have to get all cluster nodes to agree on the lock or at least do it in
-  a very efficient way (`:global` is used for this), otherwise it will scale
+  a very efficient way (`:global` is used here), otherwise it will scale
   extremely poorly; yet in the case of a cluster node failure, all of the data
   and lock information must be kept safely.
 
@@ -25,16 +38,18 @@ defmodule Nebulex.Adapters.Replicated do
 
   However, there are some limitations:
 
-    * <ins>Cost Per Update</ins> - Updating a replicated cache requires pushing
-      the new version of the data to all other cluster members, which will limit
-      scalability if there is a high frequency of updates per member.
+    * _**Cost Per Update**_ - Updating a replicated cache requires pushing
+      the new version of the data to all other cluster members, which will
+      limit scalability if there is a high frequency of updates per member.
 
-    * <ins>Cost Per Entry</ins> - The data is replicated to every cluster
-      member, so Memory Heap space is used on each member, which will impact
+    * _**Cost Per Entry**_ - The data is replicated to every cluster member,
+      so Memory Heap space is used on each member, which will impact
       performance for large caches.
 
   > Based on **"Distributed Caching Essential Lessons"** by **Cameron Purdy**.
 
+  ## Usage
+
   When used, the Cache expects the `:otp_app` and `:adapter` as options.
   The `:otp_app` should point to an OTP application that has the cache
   configuration. For example:
@@ -87,7 +102,7 @@ defmodule Nebulex.Adapters.Replicated do
       with the local primary storage. These options will depend on the local
       adapter to use.
 
-    * `task_supervisor_opts` - Start-time options passed to
+    * `:task_supervisor_opts` - Start-time options passed to
       `Task.Supervisor.start_link/1` when the adapter is initialized.
 
   ## Shared options
@@ -117,6 +132,46 @@ defmodule Nebulex.Adapters.Replicated do
       MyCache.nodes()
       MyCache.nodes(:cache_name)
 
+  ## Caveats of replicated adapter
+
+  As it is explained in the beginning, a replicated topology not only brings
+  with advantages (mostly for reads) but also with some limitations and
+  challenges.
+
+  This adapter uses global locks (via `:global`) for all operation that modify
+  or alter the cache somehow to ensure as much consistency as possible across
+  all members of the cluster. These locks may be per key or for the entire cache
+  depending on the operation taking place. For that reason, it is very important
+  to be aware about those operation that can potentally lead to performance and
+  scalability issues, so that you can do a better usage of the replicated
+  adapter. The following is with the operations and aspects you should pay
+  attention to:
+
+    * Starting and joining a new replicated node to the cluster is the most
+      expensive action, because all write-like operations across all members of
+      the cluster are blocked until the new node completes the synchronization
+      process, which involves copying cached data from any of the existing
+      cluster nodes into the new node, and this could be very expensive
+      depending on the number of caches entries. For that reason, adding new
+      nodes is something exceptional and expected to happen once in a while.
+
+    * Flushing cache. When flush action is executed, like in the previous case,
+      all write-like operations across all members of the cluster are blocked
+      until the flush is completed (this implies flushing the cached data from
+      all cluster nodes). Therefore, flushing the cache is also considered an
+      exceptional case that happens only once in while.
+
+    * Write-like operations based on a key only block operations related to
+      that key across all members of the cluster. This is not as critical as
+      the previous two cases but it is something to keep in mind anyway because
+      if there is a highly demanded key in terms of writes, that could be also
+      a potential bottleneck.
+
+  Summing up, the replicated cache topology along with this adapter should
+  be used mainly when the the reads clearly dominate over the writes (e.g.:
+  Reads 80% and Writes 20% or less) Also, flushing cache and adding new nodes
+  must be exceptional cases happening only once in a while to avoid performance
+  issues.
   """
 
   # Provide Cache Implementation

diff --git a/test/support/cluster.ex b/test/support/cluster.ex
@@ -54,7 +54,7 @@ defmodule Nebulex.Cluster do
     rpc(node, Application, :ensure_all_started, [:mix])
     rpc(node, Mix, :env, [Mix.env()])
 
-    for {app_name, _, _} <- Application.loaded_applications() do
+    for {app_name, _, _} <- Application.loaded_applications(), app_name not in [:dialyxir] do
       rpc(node, Application, :ensure_all_started, [app_name])
     end
   end