Skip to content

Commit

Permalink
readthedocs: finish documentation rework
Browse files Browse the repository at this point in the history
- finish background section
- add developer section
- add section on scaled and weighted losses
  • Loading branch information
Evizero committed Feb 9, 2017
1 parent 49ffad9 commit c1f204b
Show file tree
Hide file tree
Showing 6 changed files with 604 additions and 268 deletions.
259 changes: 139 additions & 120 deletions docs/developer/design.rst
Original file line number Diff line number Diff line change
@@ -1,173 +1,192 @@
Developer Documentation
=========================

In this part of the documentation we will discuss some of the
internal design aspects of this library. Consequently, the target
audience of this section and its sub-sections is primarily people
interested in contributing to this package. As such, the
information provided here should be of little to no relevance for
users interested in simply applying the package.

Abstract Superclasses
--------------------------

Most of the implemented losses fall under the category of
supervised losses. In other words they represent functions with
two parameters (the true targets and the predicted outcomes) to
compute their value.

.. class:: SupervisedLoss

Abstract subtype of ``Loss``.
A loss is considered **supervised**, if all the information needed
to compute ``value(loss, features, targets, outputs)`` are contained
in ``targets`` and ``outputs``, and thus allows for the
simplification ``value(loss, targets, outputs)``.
We have seen in previous sections, that many families of loss
functions are implemented as immutable types with free
parameters. An example for such a family is the
:class:`L1EpsilonInsLoss`, which represents all the
:math:`\epsilon`-insensitive loss-functions for each possible
value of :math:`\epsilon`.

.. class:: DistanceLoss
Aside from these special families, there a handful of more
generic families that between them contain almost all of the loss
functions this package implements. These families are defined as
abstract types in the type tree. Their main purpose is two-fold:

Abstract subtype of :class:`SupervisedLoss`.
A supervised loss that can be simplified to
``L(targets, outputs) = L(targets - outputs)`` is considered
distance-based.
- From an end-user's perspective, they are most useful for
dispatching on the particular kind of prediction problem that
they are intended for (regression vs classification).

.. class:: MarginLoss
- Form an implementation perspective, these abstract types allow
us to implement shared functionality and fall-back methods,
or even allow for a simpler implementation.

Abstract subtype of :class:`SupervisedLoss`.
A supervised loss, where the targets are in {-1, 1}, and which
can be simplified to ``L(targets, outputs) = L(targets * outputs)``
is considered margin-based.
Most of the implemented loss functions fall under the umbrella of
supervised losses. As such, we barely mention other types of
losses anywhere in this documentation.

Shared Interface
-------------------
.. class:: SupervisedLoss

.. function:: value(loss, agreement)
Abstract subtype of :class:`Loss`.

Computes the value of the loss function for each
observation in ``agreement`` individually and returns the result
as an array of the same size as the parameter.
As mentioned in the background section, a supervised loss is a
family of functions of two parameters, namely the true targets
and the predicted outcomes. A loss is considered
**supervised**, if all the information needed to compute
``value(loss, features, target, output)`` are contained in
``target`` and ``output``, and thus allows for the
simplification ``value(loss, target, output)``.

:param loss: An instance of the loss we are interested in.
:type loss: :class:`MarginLoss`
:param agreement: The result of multiplying the true targets with
the predicted outputs.
:type agreement: ``AbstractArray``
:return: The value of the loss function for the elements in
``agreement``.
:rtype: ``AbstractArray``
There are two interesting sub-families of supervised loss
functions. One of these families is called distance-based. All
losses that belong to this family are implemented as subtype of
the abstract type :class:`DistanceLoss`, which itself is subtype
of :class:`SupervisedLoss`.

.. function:: deriv(loss, agreement)
.. class:: DistanceLoss

Computes the derivative of the loss function for each
observation in ``agreement`` individually and returns the result
as an array of the same size as the parameter.
Abstract subtype of :class:`SupervisedLoss`. A supervised loss
that can be simplified to ``value(loss, target, output)`` =
``value(loss, output - target)`` is considered distance-based.

:param loss: An instance of the loss we are interested in.
:type loss: :class:`MarginLoss`
:param agreement: The result of multiplying the true targets with
the predicted outputs.
:type agreement: ``AbstractArray``
:return: The derivatives of the loss function for the elements in
``agreement``.
:rtype: ``AbstractArray``
The second core sub-family of supervised losses is called
margin-based. All loss functions that belong to this family are
implemented as subtype of the abstract type :class:`MarginLoss`,
which itself is subtype of :class:`SupervisedLoss`.

.. function:: value_deriv(loss, agreement)
.. class:: MarginLoss

Returns the results of :func:`value` and :func:`deriv` as a tuple.
In some cases this function can yield better performance, because
the losses can make use of shared variable when computing
the values.
Abstract subtype of :class:`SupervisedLoss`. A supervised
loss, where the targets are in {-1, 1}, and which can be
simplified to ``value(loss, target, output)`` = ``value(loss,
target * output)`` is considered margin-based.

Shared Interface
-------------------
----------------------

Each of the three abstract types listed above serves a purpose
other than dispatch. All losses that belong to the same family
share functionality to some degree. For example all subtypes of
:class:`SupervisedLoss` share the same implementations for the
vectorized versions of :func:`value` and :func:`deriv`.

More interestingly, the abstract types :class:`DistanceLoss` and
:class:`MarginLoss`, serve an additional purpose aside from
shared functionality. We have seen in the background section what
it is that makes a loss margin-based or distance-based. Without
repeating the definition let us state that it boils down to the
existence of a *representing function* :math:`\psi`, which allows
to compute a loss using a unary function instead of a binary one.
Indeed, all the subtypes of :class:`DistanceLoss` and
:class:`MarginLoss` are implemented in the unary form of their
representing function.

Distance-based Losses
~~~~~~~~~~~~~~~~~~~~~~

Supervised losses that can be expressed as a univariate function
of ``output - target`` are referred to as distance-based losses.
Distance-based losses are typically utilized for regression
problems. That said, there are also other losses that are useful
for regression problems that don't fall into this category, such
as the :class:`PeriodicLoss`.

.. function:: value(loss, difference)

Computes the value of the loss function for each
observation in ``difference`` individually and returns the result
as an array of the same size as the parameter.
Computes the value of the representing function :math:`\psi`
of the given `loss` at `difference`.

:param loss: An instance of the loss we are interested in.
:type loss: :class:`DistanceLoss`
:param difference: The result of subtracting the true targets from
the predicted outputs.
:type difference: ``AbstractArray``
:return: The value of the loss function for the elements in
``difference``.
:rtype: ``AbstractArray``
:param difference: The result of subtracting the true target
:math:`y` from the predicted output
:math:`\hat{y}`.
:type difference: `Number`
:return: The value of the losses representing function at
the point `difference`.
:rtype: `Number`

.. function:: deriv(loss, difference)

Computes the derivative of the loss function for each
observation in ``difference`` individually and returns the result
as an array of the same size as the parameter.
Computes the derivative of the representing function
:math:`\psi` of the given `loss` at `difference`.

:param loss: An instance of the loss we are interested in.
:type loss: :class:`DistanceLoss`
:param difference: The result of subtracting the true targets from
the predicted outputs.
:type difference: ``AbstractArray``
:return: The derivatives of the loss function for the elements in
``difference``.
:rtype: ``AbstractArray``
:param difference: The result of subtracting the true target
:math:`y` from the predicted output
:math:`\hat{y}`.
:type difference: `Number`
:return: The derivativ of the losses representing function at
the point `difference`.
:rtype: `Number`

.. function:: value_deriv(loss, difference)

Returns the results of :func:`value` and :func:`deriv` as a tuple.
In some cases this function can yield better performance, because
the losses can make use of shared variable when computing
the values.

Regression vs Classification
-----------------------------
Returns the results of :func:`value` and :func:`deriv` as a
tuple. In some cases this function can yield better
performance, because the losses can make use of shared
variable when computing the values.

We can further divide the supervised losses into two useful
sub-categories: :class:`DistanceLoss` for regression and
:class:`MarginLoss` for classification.

Losses for Regression
~~~~~~~~~~~~~~~~~~~~~~

Supervised losses that can be expressed as a univariate function
of ``output - target`` are referred to as distance-based losses.

.. code-block:: julia
value(L2DistLoss(), difference)
Distance-based losses are typically utilized for regression problems.
That said, there are also other losses that are useful for
regression problems that don't fall into this category, such as
the :class:`PeriodicLoss`.

.. note::

In the literature that this package is partially based on,
the convention for the distance-based losses is ``target - output``
(see [STEINWART2008]_ p. 38).
We chose to diverge from this definition because it would force
a difference between the results for the unary and the binary
version of the derivative.

Losses for Classification
Margin-based Losses
~~~~~~~~~~~~~~~~~~~~~~~~~~

Margin-based losses are supervised losses where the values of the
targets are restricted to be in :math:`\{1,-1\}`, and which can
be expressed as a univariate function ``output * target``.

.. code-block:: julia
.. function:: value(loss, agreement)

value(L1HingeLoss(), agreement)
Computes the value of the representing function :math:`\psi`
of the given `loss` at `agreement`.

.. note::
:param loss: An instance of the loss we are interested in.
:type loss: :class:`MarginLoss`
:param agreement: The result of multiplying the true target
:math:`y` with the predicted output
:math:`\hat{y}`.
:type agreement: `Number`
:return: The value of the losses representing function
at the given point `agreement`.
:rtype: `Number`

.. function:: deriv(loss, agreement)

Throughout the codebase we refer to the result of
``output * target`` as ``agreement``.
The discussion that lead to this convention can be found
`issue #9 <https://github.com/JuliaML/LossFunctions.jl/issues/9#issuecomment-190321549>`_
Computes the derivative of the representing function
:math:`\psi` of the given `loss` at `agreement`.

Margin-based losses are usually used for binary classification.
In contrast to other formalism, they do not natively provide
probabilities as output.
:param loss: An instance of the loss we are interested in.
:type loss: :class:`MarginLoss`
:param agreement: The result of multiplying the true target
:math:`y` with the predicted output
:math:`\hat{y}`.
:type agreement: `Number`
:return: The derivative of the losses representing function
at the given point `agreement`.
:rtype: `Number`

.. function:: value_deriv(loss, agreement)

Deviations from Literature
----------------------------
Returns the results of :func:`value` and :func:`deriv` as a
tuple. In some cases this function can yield better
performance, because the losses can make use of shared
variable when computing the values.

Writing Tests
----------------

.. warning::

This section is still under development and thus in an
unfinished state.

15 changes: 15 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,21 @@ they agree with the sign of the target.
| losses/margin | |
+----------------------------------------------+----------------------------------------------------------------------------------------+

Common Meta Losses
----------------------

In some situations it can be useful to slightly alter an existing
loss function. We provide two general ways to accomplish that.
The first way is to scale a loss by a constant factor. This can
for example be useful to transform the :class:`L2DistLoss` into
the least squares loss one knows from statistics. The second way
is to reweight the two classes of a binary classification loss.
This is useful for handling inbalanced class distributions.

.. toctree::
:maxdepth: 2

losses/scaledandweighted

Internals
--------------------------------
Expand Down
Loading

0 comments on commit c1f204b

Please sign in to comment.