Skip to content

Julia package of regularization functions for machine learning


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation


PenaltyFunctions.jl is a Julia package that provides generic implementations for a diverse set of penalty functions that are commonly used for regularization purposes in Machine Learning.

License Build Status Coverage Status


Many popular models in Machine Learning are parameterized by a set of real-valued coefficients θ (theta), which is usually stored in the form of an array. If our data set has k features, then θ would typically be a vector of k or k+1 numeric elements. Each individual feature x_i of our data set is assigned a corresponding coefficient θ_i, which is used to quantify the feature's influence on the prediction. The concrete values for the coefficient vector are learned by an optimization algorithm, which tries to select the "best" set of coefficients for the given data and model. Without any restriction on their values the optimization algorithm is free to choose the coefficients freely, which may result in overly complex prediction functions. This freedom is known to cause the optimization algorithm to overfit to the noise in the training data. This is where penalties come in!

A penalty is a function of the coefficients and only the coefficients. It associates the given set of coefficients with a cost without any regard for their influence on the predictive power of the prediction function. This cost is then is added to the overall cost of the prediction function. This way the optimization algorithm is encouraged to choose "simpler" coefficients. What exactly "simpler" means depends on the chosen penalty. In general terms: penalties help to reduce the possibility of overfitting.

Available penalties

This package implements a number of carefully crafted penalty functions, as well as an API to query their properties (e.g. convexity). Furthermore, we expose methods to compute their values and derivatives for a single value, coefficient vectors, and even arrays of arbitrary dimensionality. The provided penalty functions fall into one of two main families, namely Element Penalties and Array Penalties.

Element penalties

The first family of penalty functions contains all those that apply to to the individual elements of θ element-wise. The resulting cost of a coefficient array is then the sum of the element-wise results.

Univariate Parameter Bivariate Parameter
univariate_elem bivariate_elem
The cost-values of various penalties as a function of a single coefficient Cross sections of the cost-surfaces. This time for two coefficients

Every penalty that is of this family is subtype of ElementPenalty. From an implementation perspective these penalties are defined using the element-wise functions. The following table lists the implemented types and their definitions.

Penalty value on element
NoPenalty() g(θ) = 0
L1Penalty() g(θ) = abs(θ)
L2Penalty() g(θ) = 0.5 * θ ^ 2
ElasticNetPenalty(α = 0.5) g(θ) = (1 - α) * abs(θ) + α * .5 * θ ^ 2
SCADPenalty(a = 3.7, γ = 1.0) L1Penalty that blends to constant
MCPPenalty(γ = 2.0) g(θ) = abs(θ) < γ ? abs(θ) - θ ^ 2 / 2γ : γ / 2
LogPenalty(η = 1.0) g(θ) = log(1 + η * abs(θ))

The total cost for an array of coefficients is then defined as sum(g, θ).

using PenaltyFunctions
p = L1Penalty()
x = randn(5)
s = randn(5)
buffer = zeros(5)

# value
value(p, x[1])        # evaluate on element
value(p, x)           # evaluate on array
value.(p, x)          # broadcast is supported as well
value(p, x[1], s[1])  # evaluate on element, scaled by scalar
value(p, x, s[1])     # evaluate on array, scaled by scalar
value(p, x, s)        # evaluate on array, element-wise scaling

# value via calling the Penalty object
p = L1Penalty()

# derivatives and gradients
deriv(p, x[1])        # derivative
deriv(p, x[1], s[1])  # scaled derivative
grad(p, x)            # gradient
grad(p, x, s[1])      # scaled gradient
grad(p, x, s)         # element-wise scaled gradient
grad!(buffer, p, x)       # overwrite buffer with gradient
grad!(buffer, p, x, s[1]) # overwrite buffer with scaled gradient
grad!(buffer, p, x, s)    # overwrite buffer with element-wise scaled gradient

# prox operator
prox(p, x[1], s[1]) # prox on element
prox(p, x, s[1])    # prox on array, scaled by scalar
prox(p, x, s)       # prox on array, element-wise scaling
prox!(p, x, s[1])   # overwrite x, scaled by scalar
prox!(p, x, s)      # overwrite x, element-wise scaling

Array penalties

The second family of penalty functions contains all those that that need to be evaluated on the entire coefficient array θ at once. Every penalty that belongs to this family is subtype of ArrayPenalty. The following table outlines the implemented types and their definitions.

Penalty value on array
NuclearNormPenalty() sum of singular values of x
MahalanobisPenalty(C) g(x) = x' * C' * C * x
GroupLassoPenalty() g(x) = vecnorm(x)


The package is registered in METADATA.jl.



This code is free to use under the terms of the MIT "Expat" license.