Skip to content

Commit

Permalink
Merge pull request #79 from JuliaML/readme
Browse files Browse the repository at this point in the history
update readme
  • Loading branch information
Evizero committed Feb 9, 2017
2 parents 8a5baef + d94d083 commit 10e082a
Showing 1 changed file with 192 additions and 30 deletions.
222 changes: 192 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,63 +14,197 @@ that are commonly used in Machine Learning._
:-------------------------------:|:----------------------------------:
![distance_losses](https://rawgithub.com/JuliaML/FileStorage/master/LossFunctions/distance.svg) | ![margin_losses](https://rawgithub.com/JuliaML/FileStorage/master/LossFunctions/margin.svg)

Others: `PeriodicLoss`, `PoissonLoss`, `ScaledLoss`
Others: `PeriodicLoss`, `PoissonLoss`, `ScaledLoss`,
`WeightedBinaryLoss`

## Introduction

Typically, the loss functions we work with in Machine Learning
fall into the category of supervised losses. These are
multivariate functions of two variables, the **true target** `y`,
which represents the "ground truth" (i.e. correct answer), and
the **predicted output** `ŷ`, which is what our model thinks the
truth is. A supervised loss function takes these two variables as
input and returns a value that quantifies how "bad" our
prediction is in comparison to the truth. In other words: *the
lower the loss, the better the prediction.*

This package provides a considerable amount of carefully
implemented loss functions, as well as an API to query their
properties (e.g. convexity). Furthermore, we expose methods to
compute their values, derivatives, and second derivatives for
single observations as well as arbitrarily sized arrays of
observations. In the case of arrays a user additionally has the
ability to define if and how element-wise results are averaged or
summed over.

## Example

The following code snippets show a simple "hello world" scenario
of how a `Loss` can be used to compute the element-wise values.
of how this package can be used to work with loss functions in
various ways.

```julia
using LossFunctions
```

true_targets = [ 1, 0, -2]
pred_outputs = [0.5, 1, -1]
All the concrete loss "functions" that this package provides are
actually defined as immutable types, instead of native Julia
functions. We can compute the value of some type of loss using
the function `value()`. Let us start with an example of how to
compute the loss for a group of three of observations. By default
the loss will be computed element-wise.

value(L2DistLoss(), true_targets, pred_outputs)
```
```
3-element Array{Float64,1}:
0.25
1.0
1.0
```julia
julia> true_targets = [ 1, 0, -2];

julia> pred_outputs = [0.5, 2, -1];

julia> value(L2DistLoss(), true_targets, pred_outputs)
# 3-element Array{Float64,1}:
# 0.25
# 4.0
# 1.0
```

Alternatively, one can also use the loss like a function
Alternatively, one can also use an instance of a loss just like
one would use any other Julia function. This can make the code
significantly more readable while not impacting performance, as
it is a zero-cost abstraction (i.e. it compiles down to the same
code).

```julia
myloss = L2DistLoss()
myloss(true_targets, pred_outputs) # same result as above
julia> loss = L2DistLoss()
# LossFunctions.LPDistLoss{2}()

julia> loss(true_targets, pred_outputs)
# 3-element Array{Float64,1}:
# 0.25
# 4.0
# 1.0

julia> loss(1, 0.5f0) # single observation
# 0.25f0
```

The function signatures of `value` also apply to the derivatives.
If you are not actually interested in the element-wise results
individually, but some accumulation of those (such as mean or
sum), you can additionally specify an average mode. This will
avoid allocating a temporary array and directly compute the
result.

```julia
deriv(L2DistLoss(), true_targets, pred_outputs)
```
julia> value(L2DistLoss(), true_targets, pred_outputs, AvgMode.Sum())
# 5.25

julia> value(L2DistLoss(), true_targets, pred_outputs, AvgMode.Mean())
# 1.75
```
3-element Array{Float64,1}:
-1.0
2.0
2.0

Aside from these standard unweighted average modes, we also
provide weighted alternatives. These expect a weight-factor for
each observation in the predicted outputs and so allow to give
certain observations a stronger influence over the result.

```julia
julia> value(L2DistLoss(), true_targets, pred_outputs, AvgMode.WeightedSum([2,1,1]))
# 5.5

julia> value(L2DistLoss(), true_targets, pred_outputs, AvgMode.WeightedMean([2,1,1]))
# 1.375
```

Additionally, we provide mutating versions of most functions.
We do not restrict the targets and outputs to be vectors, but
instead allow them to be arrays of any arbitrary shape. The shape
of an array may or may not have an interpretation that is
relevant for computing the loss. It is possible to explicitly
specify which dimension denotes the observations. This is
particularly useful for multivariate regression where one could
want to accumulate the loss per individual observation.

```julia
buffer = zeros(3)
deriv!(buffer, L2DistLoss(), true_targets, pred_outputs)
julia> A = rand(2,3)
# 2×3 Array{Float64,2}:
# 0.0939946 0.97639 0.568107
# 0.183244 0.854832 0.962534

julia> B = rand(2,3)
# 2×3 Array{Float64,2}:
# 0.0538206 0.77055 0.996922
# 0.598317 0.72043 0.912274

julia> value(L2DistLoss(), A, B, AvgMode.Sum())
# 0.420741920634

julia> value(L2DistLoss(), A, B, AvgMode.Sum(), ObsDim.First())
# 2-element Array{Float64,1}:
# 0.227866
# 0.192876

julia> value(L2DistLoss(), A, B, AvgMode.Sum(), ObsDim.Last())
# 3-element Array{Float64,1}:
# 0.1739
# 0.060434
# 0.186408
```

If need be, one can also compute the mean- or sum-value efficiently,
without allocating a temporary array.
All these function signatures of `value` also apply for computing
the derivatives using `deriv` and the second derivatives using
`deriv2`.

```julia
# or meanvalue
sumvalue(L2DistLoss(), true_targets, pred_outputs)
julia> deriv(L2DistLoss(), true_targets, pred_outputs)
# 3-element Array{Float64,1}:
# -1.0
# 4.0
# 2.0

julia> deriv2(L2DistLoss(), true_targets, pred_outputs)
# 3-element Array{Float64,1}:
# 2.0
# 2.0
# 2.0
```

For computing the first and second derivatives we additionally
expose a convenience syntax which allows for a more math-like
look of the code.

```julia
julia> loss = L2DistLoss()
# LossFunctions.LPDistLoss{2}()

julia> loss'(true_targets, pred_outputs)
# 3-element Array{Float64,1}:
# -1.0
# 4.0
# 2.0

julia> loss''(true_targets, pred_outputs)
# 3-element Array{Float64,1}:
# 2.0
# 2.0
# 2.0
```
0.75

Additionally, we provide mutating versions for the subset of
methods that return an array. These have the same function
signatures with the only difference of requiring an additional
parameter as the first argument. This variable should always be
the preallocated array that is to be used as storage.

```julia
julia> buffer = zeros(3)
# 3-element Array{Float64,1}:
# 0.0
# 0.0
# 0.0

julia> deriv!(buffer, L2DistLoss(), true_targets, pred_outputs)
# 3-element Array{Float64,1}:
# -1.0
# 4.0
# 2.0
```

Note that this only shows a small part of the functionality this
Expand All @@ -88,10 +222,38 @@ on `HingeLoss` within Julia's REPL:
```julia
?HingeLoss
```
```
search: HingeLoss L2HingeLoss L1HingeLoss SmoothedL1HingeLoss
L1HingeLoss <: MarginLoss
The hinge loss linearly penalizes every predicition where the
resulting agreement a = y⋅ŷ < 1 . It is Lipschitz continuous
and convex, but not strictly convex.
L(a) = \max \{ 0, 1 - a \}
--------------------------------------------------------------------
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
3 │'\. │ 0 │ ┌------│
│ ''_ │ │ | │
│ \. │ │ | │
│ '. │ │ | │
L │ ''_ │ L' │ | │
│ \. │ │ | │
│ '. │ │ | │
0 │ ''_______│ -1 │------------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
```

## Installation

This package is registered in `METADATA.jl` and can be installed as usual
This package is registered in `METADATA.jl` and can be installed
as usual

```julia
Pkg.add("LossFunctions")
Expand Down

0 comments on commit 10e082a

Please sign in to comment.