readthedocs: finish documentation rework

- finish background section - add developer section - add section on scaled and weighted losses
JuliaML · Feb 9, 2017 · c1f204b · c1f204b
1 parent 49ffad9
commit c1f204b
Show file tree

Hide file tree

Showing 6 changed files with 604 additions and 268 deletions.
diff --git a/docs/developer/design.rst b/docs/developer/design.rst
@@ -1,173 +1,192 @@
 Developer Documentation
 =========================
 
+In this part of the documentation we will discuss some of the
+internal design aspects of this library. Consequently, the target
+audience of this section and its sub-sections is primarily people
+interested in contributing to this package. As such, the
+information provided here should be of little to no relevance for
+users interested in simply applying the package.
+
 Abstract Superclasses
 --------------------------
 
-Most of the implemented losses fall under the category of
-supervised losses. In other words they represent functions with
-two parameters (the true targets and the predicted outcomes) to
-compute their value.
-
-.. class:: SupervisedLoss
-
-   Abstract subtype of ``Loss``.
-   A loss is considered **supervised**, if all the information needed
-   to compute ``value(loss, features, targets, outputs)`` are contained
-   in ``targets`` and ``outputs``, and thus allows for the
-   simplification ``value(loss, targets, outputs)``.
+We have seen in previous sections, that many families of loss
+functions are implemented as immutable types with free
+parameters. An example for such a family is the
+:class:`L1EpsilonInsLoss`, which represents all the
+:math:`\epsilon`-insensitive loss-functions for each possible
+value of :math:`\epsilon`.
 
-.. class:: DistanceLoss
+Aside from these special families, there a handful of more
+generic families that between them contain almost all of the loss
+functions this package implements. These families are defined as
+abstract types in the type tree. Their main purpose is two-fold:
 
-   Abstract subtype of :class:`SupervisedLoss`.
-   A supervised loss that can be simplified to
-   ``L(targets, outputs) = L(targets - outputs)`` is considered
-   distance-based.
+- From an end-user's perspective, they are most useful for
+  dispatching on the particular kind of prediction problem that
+  they are intended for (regression vs classification).
 
-.. class:: MarginLoss
+- Form an implementation perspective, these abstract types allow
+  us to implement shared functionality and fall-back methods,
+  or even allow for a simpler implementation.
 
-   Abstract subtype of :class:`SupervisedLoss`.
-   A supervised loss, where the targets are in {-1, 1}, and which
-   can be simplified to ``L(targets, outputs) = L(targets * outputs)``
-   is considered margin-based.
+Most of the implemented loss functions fall under the umbrella of
+supervised losses. As such, we barely mention other types of
+losses anywhere in this documentation.
 
-Shared Interface
--------------------
+.. class:: SupervisedLoss
 
-.. function:: value(loss, agreement)
+   Abstract subtype of :class:`Loss`.
 
-   Computes the value of the loss function for each
-   observation in ``agreement`` individually and returns the result
-   as an array of the same size as the parameter.
+   As mentioned in the background section, a supervised loss is a
+   family of functions of two parameters, namely the true targets
+   and the predicted outcomes. A loss is considered
+   **supervised**, if all the information needed to compute
+   ``value(loss, features, target, output)`` are contained in
+   ``target`` and ``output``, and thus allows for the
+   simplification ``value(loss, target, output)``.
 
-   :param loss: An instance of the loss we are interested in.
-   :type loss: :class:`MarginLoss`
-   :param agreement: The result of multiplying the true targets with
-                     the predicted outputs.
-   :type agreement: ``AbstractArray``
-   :return: The value of the loss function for the elements in
-            ``agreement``.
-   :rtype: ``AbstractArray``
+There are two interesting sub-families of supervised loss
+functions.  One of these families is called distance-based. All
+losses that belong to this family are implemented as subtype of
+the abstract type :class:`DistanceLoss`, which itself is subtype
+of :class:`SupervisedLoss`.
 
-.. function:: deriv(loss, agreement)
+.. class:: DistanceLoss
 
-   Computes the derivative of the loss function for each
-   observation in ``agreement`` individually and returns the result
-   as an array of the same size as the parameter.
+   Abstract subtype of :class:`SupervisedLoss`. A supervised loss
+   that can be simplified to ``value(loss, target, output)`` =
+   ``value(loss, output - target)`` is considered distance-based.
 
-   :param loss: An instance of the loss we are interested in.
-   :type loss: :class:`MarginLoss`
-   :param agreement: The result of multiplying the true targets with
-                     the predicted outputs.
-   :type agreement: ``AbstractArray``
-   :return: The derivatives of the loss function for the elements in
-            ``agreement``.
-   :rtype: ``AbstractArray``
+The second core sub-family of supervised losses is called
+margin-based. All loss functions that belong to this family are
+implemented as subtype of the abstract type :class:`MarginLoss`,
+which itself is subtype of :class:`SupervisedLoss`.
 
-.. function:: value_deriv(loss, agreement)
+.. class:: MarginLoss
 
-   Returns the results of :func:`value` and :func:`deriv` as a tuple.
-   In some cases this function can yield better performance, because
-   the losses can make use of shared variable when computing
-   the values.
+   Abstract subtype of :class:`SupervisedLoss`. A supervised
+   loss, where the targets are in {-1, 1}, and which can be
+   simplified to ``value(loss, target, output)`` = ``value(loss,
+   target * output)`` is considered margin-based.
 
 Shared Interface
--------------------
+----------------------
+
+Each of the three abstract types listed above serves a purpose
+other than dispatch. All losses that belong to the same family
+share functionality to some degree. For example all subtypes of
+:class:`SupervisedLoss` share the same implementations for the
+vectorized versions of :func:`value` and :func:`deriv`.
+
+More interestingly, the abstract types :class:`DistanceLoss` and
+:class:`MarginLoss`, serve an additional purpose aside from
+shared functionality. We have seen in the background section what
+it is that makes a loss margin-based or distance-based. Without
+repeating the definition let us state that it boils down to the
+existence of a *representing function* :math:`\psi`, which allows
+to compute a loss using a unary function instead of a binary one.
+Indeed, all the subtypes of :class:`DistanceLoss` and
+:class:`MarginLoss` are implemented in the unary form of their
+representing function.
+
+Distance-based Losses
+~~~~~~~~~~~~~~~~~~~~~~
+
+Supervised losses that can be expressed as a univariate function
+of ``output - target`` are referred to as distance-based losses.
+Distance-based losses are typically utilized for regression
+problems. That said, there are also other losses that are useful
+for regression problems that don't fall into this category, such
+as the :class:`PeriodicLoss`.
 
 .. function:: value(loss, difference)
 
-   Computes the value of the loss function for each
-   observation in ``difference`` individually and returns the result
-   as an array of the same size as the parameter.
+   Computes the value of the representing function :math:`\psi`
+   of the given `loss` at `difference`.
 
    :param loss: An instance of the loss we are interested in.
    :type loss: :class:`DistanceLoss`
-   :param difference: The result of subtracting the true targets from
-                      the predicted outputs.
-   :type difference: ``AbstractArray``
-   :return: The value of the loss function for the elements in
-            ``difference``.
-   :rtype: ``AbstractArray``
+   :param difference: The result of subtracting the true target
+                      :math:`y` from the predicted output
+                      :math:`\hat{y}`.
+   :type difference: `Number`
+   :return: The value of the losses representing function at
+            the point `difference`.
+   :rtype: `Number`
 
 .. function:: deriv(loss, difference)
 
-   Computes the derivative of the loss function for each
-   observation in ``difference`` individually and returns the result
-   as an array of the same size as the parameter.
+   Computes the derivative of the representing function
+   :math:`\psi` of the given `loss` at `difference`.
 
    :param loss: An instance of the loss we are interested in.
    :type loss: :class:`DistanceLoss`
-   :param difference: The result of subtracting the true targets from
-                      the predicted outputs.
-   :type difference: ``AbstractArray``
-   :return: The derivatives of the loss function for the elements in
-            ``difference``.
-   :rtype: ``AbstractArray``
+   :param difference: The result of subtracting the true target
+                      :math:`y` from the predicted output
+                      :math:`\hat{y}`.
+   :type difference: `Number`
+   :return: The derivativ of the losses representing function at
+            the point `difference`.
+   :rtype: `Number`
 
 .. function:: value_deriv(loss, difference)
 
-   Returns the results of :func:`value` and :func:`deriv` as a tuple.
-   In some cases this function can yield better performance, because
-   the losses can make use of shared variable when computing
-   the values.
-
-Regression vs Classification
------------------------------
+   Returns the results of :func:`value` and :func:`deriv` as a
+   tuple. In some cases this function can yield better
+   performance, because the losses can make use of shared
+   variable when computing the values.
 
-We can further divide the supervised losses into two useful
-sub-categories: :class:`DistanceLoss` for regression and
-:class:`MarginLoss` for classification.
-
-Losses for Regression
-~~~~~~~~~~~~~~~~~~~~~~
-
-Supervised losses that can be expressed as a univariate function
-of ``output - target`` are referred to as distance-based losses.
-
-.. code-block:: julia
-
-    value(L2DistLoss(), difference)
-
-Distance-based losses are typically utilized for regression problems.
-That said, there are also other losses that are useful for
-regression problems that don't fall into this category, such as
-the :class:`PeriodicLoss`.
-
-.. note::
-
-    In the literature that this package is partially based on,
-    the convention for the distance-based losses is ``target - output``
-    (see [STEINWART2008]_ p. 38).
-    We chose to diverge from this definition because it would force
-    a difference between the results for the unary and the binary
-    version of the derivative.
-
-Losses for Classification
+Margin-based Losses
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Margin-based losses are supervised losses where the values of the
 targets are restricted to be in :math:`\{1,-1\}`, and which can
 be expressed as a univariate function ``output * target``.
 
-.. code-block:: julia
+.. function:: value(loss, agreement)
 
-    value(L1HingeLoss(), agreement)
+   Computes the value of the representing function :math:`\psi`
+   of the given `loss` at `agreement`.
 
-.. note::
+   :param loss: An instance of the loss we are interested in.
+   :type loss: :class:`MarginLoss`
+   :param agreement: The result of multiplying the true target
+                     :math:`y` with the predicted output
+                     :math:`\hat{y}`.
+   :type agreement: `Number`
+   :return: The value of the losses representing function
+            at the given point `agreement`.
+   :rtype: `Number`
+
+.. function:: deriv(loss, agreement)
 
-    Throughout the codebase we refer to the result of
-    ``output * target`` as ``agreement``.
-    The discussion that lead to this convention can be found
-    `issue #9 <https://github.com/JuliaML/LossFunctions.jl/issues/9#issuecomment-190321549>`_
+   Computes the derivative of the representing function
+   :math:`\psi` of the given `loss` at `agreement`.
 
-Margin-based losses are usually used for binary classification.
-In contrast to other formalism, they do not natively provide
-probabilities as output.
+   :param loss: An instance of the loss we are interested in.
+   :type loss: :class:`MarginLoss`
+   :param agreement: The result of multiplying the true target
+                     :math:`y` with the predicted output
+                     :math:`\hat{y}`.
+   :type agreement: `Number`
+   :return: The derivative of the losses representing function
+            at the given point `agreement`.
+   :rtype: `Number`
 
+.. function:: value_deriv(loss, agreement)
 
-Deviations from Literature
-----------------------------
+   Returns the results of :func:`value` and :func:`deriv` as a
+   tuple. In some cases this function can yield better
+   performance, because the losses can make use of shared
+   variable when computing the values.
 
 Writing Tests
 ----------------
 
+.. warning::
+
+   This section is still under development and thus in an
+   unfinished state.
+
diff --git a/docs/index.rst b/docs/index.rst
@@ -116,6 +116,21 @@ they agree with the sign of the target.
 |    losses/margin                             |                                                                                        |
 +----------------------------------------------+----------------------------------------------------------------------------------------+
 
+Common Meta Losses
+----------------------
+
+In some situations it can be useful to slightly alter an existing
+loss function. We provide two general ways to accomplish that.
+The first way is to scale a loss by a constant factor. This can
+for example be useful to transform the :class:`L2DistLoss` into
+the least squares loss one knows from statistics. The second way
+is to reweight the two classes of a binary classification loss.
+This is useful for handling inbalanced class distributions.
+
+.. toctree::
+   :maxdepth: 2
+
+   losses/scaledandweighted
 
 Internals
 --------------------------------