Metric Learning to Rank
Matlab M C
Switch branches/tags
Nothing to show
Clone or download
Permalink
Failed to load latest commit information.
cuttingPlane simplified dual-mkl, fixed bug in semi-sup CP Feb 14, 2012
distance installed global switch for admm step limit Feb 4, 2012
dual simplified dual-mkl, fixed bug in semi-sup CP Feb 14, 2012
feasible updated feasible projections to use sparsity Aug 2, 2013
initialize added leave-one-out testing, safety check in init Feb 17, 2012
loss Cutting out DOD-MKL, and RCP implementations. They never worked anyway. Sep 1, 2011
metricPsi Cutting out DOD-MKL, and RCP implementations. They never worked anyway. Sep 1, 2011
regularize fixed bug in regularizeNone Jan 17, 2013
separationOracle initial checkin Sep 1, 2011
thresh Added qp_admm and proj_simplex (utility for qp_admm) May 15, 2013
util normalized row-sum by dimension Aug 2, 2013
.gitignore modified gitignore Jan 18, 2013
LICENSE Adding demo code, data, and license Sep 1, 2011
MANIFEST Updated manifest and readme for v1.1 Sep 1, 2011
README Update README Jan 18, 2013
Wine.mat Adding demo code, data, and license Sep 1, 2011
mlr_demo.m fixed a bug when batchSize < valid samples Dec 3, 2012
mlr_plot.m simplified dual-mkl, fixed bug in semi-sup CP Feb 14, 2012
mlr_test.m fixed a bug when batchSize < valid samples Dec 3, 2012
mlr_train.m Added qp_admm and proj_simplex (utility for qp_admm) May 15, 2013
mlr_train_primal.m fixed a bug when batchSize < valid samples Dec 3, 2012
rmlr_demo.m updated the rmlr demo script May 15, 2013
rmlr_train.m Merge https://github.com/khdlim/mlr May 15, 2013

README

Metric Learning to Rank (mlr-1.2)
http://www-cse.ucsd.edu/~bmcfee/code/mlr/

AUTHORS:
Brian McFee <bmcfee@cs.ucsd.edu>
Daryl Lim <dklim@ucsd.edu>

This code is distributed under the GNU GPL license.  See LICENSE for details
or http://www.gnu.org/licenses/gpl-3.0.txt



INTRODUCTION
------------
This package contains the MATLAB code for Metric Learning to Rank (MLR).  
The latest version of this software can be found at the URL above.

The software included here implements the algorithm described in

[1] McFee, Brian and Lanckriet, G.R.G.  Metric learning to rank.  
    Proceedings of the 27th annual International Conference 
    on Machine Learning (ICML), 2010.

Please cite this paper if you use this code.

If you use the Robust MLR code (rmlr_train), please cite:

[2] Lim, D.K.H., McFee, B. and Lanckriet, G.R.G. Robust structural metric learning.
    Proceedings of the 30th annual International Conference on Machine Learning (ICML),
    2013.


INSTALLATION
------------

1. Requirements

This software requires MATLAB R2007a or later.  Because it makes extensive use
of the "bsxfun" function, earlier versions of Matlab will not work.

If you have an older Matlab installation, you can install an alternative bsxfun
implementation by going to

http://www.mathworks.com/matlabcentral/fileexchange/23005 ,

however, this may not work, and it will certainly be slower than the native
bsxfun implementation.


2. Compiling MEX functions

MLR includes two auxiliary functions "cummax" and "binarysearch" to accelerate
certain steps of the optimization procedure.  The source code for these
functions is located in the "util" subdirectory.

A makefile is provided, so that (on Unix systems), you can simply type

cd util
make
cd ..


3. Running the demo

To test the installation, first add the path to MLR to your Matlab environment:

    >> addpath(genpath('/path/to/mlr/'));

Then, run the demo script:

    >> mlr_demo

from within Matlab.  The demo generates a random training/test split of the Wine data set
(http://archive.ics.uci.edu/ml/datasets/Wine)
and learns a metric with MLR.  Both native and learned metrics are displayed in a figure
with a scatter plot.


TRAINING
--------

There are several modes of operation for training metrics with MLR.  In the
simplest mode, training data is contained in a matrix X (each column is a
training vector), and Y contains the labels/relevance for the training data
(see below).

   [W, Xi, D] = mlr_train(X, Y, C,...)

       X = d*n data matrix
       Y = either n-by-1 label of vectors
           OR
           n-by-2 cell array where 
               Y{q,1} contains relevant indices for q, and
               Y{q,2} contains irrelevant indices for q
       
       C >= 0 slack trade-off parameter (default=1)

       W   = the learned metric, i.e., the inner product matrix of the learned 
                space can be computed by X' * W * X
       Xi  = slack value on the learned metric (see [1])
       D   = diagnostics


By default, MLR optimizes for Area Under the ROC Curve (AUC).  This can be
changed by setting the "LOSS" parameter to one of several ranking loss
measures:

   [W, Xi, D] = mlr_train(X, Y, C, LOSS)
       where LOSS is one of:
           'AUC':      Area under ROC curve (default)
           'KNN':      KNN accuracy*
           'Prec@k':   Precision-at-k
           'MAP':      Mean Average Precision
           'MRR':      Mean Reciprocal Rank
           'NDCG':     Normalized Discounted Cumulative Gain

    *Note:  KNN is correct only for binary classification problems; in 
            practice, Prec@k is usually a better alternative.


For KNN/Prec@k/NDCG, a threshold k may be set to determine the truncation of
the ranked list.  Thiscan be done by setting the k parameter:

   [W, Xi, D] = mlr_train(X, Y, C, LOSS, k)
       where k is the number of neighbors for Prec@k or NDCG
       (default=3)


By default, MLR regularizes the metric W by the trace, i.e., the 1-norm of the
eigenvalues.  This can be changed to one of several alternatives:

   [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG)
       where REG defines the regularization on W, and is one of:
           0:          no regularization
           1:          1-norm: trace(W)            (default)
           2:          2-norm: trace(W' * W)
           3:          Kernel: trace(W * X), assumes X is square 
                                and positive-definite

    The last setting, "kernel", is appropriate for regularizing metrics learned
    from kernel matrices.


W corresponds to a linear projection matrix (rotation and scaling).  To learn a
restricted model which may only scale, but not rotate the data, W can be
constrained to diagonal matrices by setting the "Diagonal" parameter to 1:

   [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal)
       Diagonal = 0:   learn a full d-by-d W       (default)
       Diagonal = 1:   learn diagonally-constrained W (d-by-1)
       
    Note: the W returned in this case will be the d-by-1 vector corresponding
          to the main diagonal of a full metric, not the full d-by-d matrix.



Finally, we provide a stochastic gradient descent implementation to handle
large-scale problems.  Rather than estimating gradients from the entire
training set, this variant uses a random subset of size B (see below) at each
call to the cutting plane subroutine.  This results in faster, but less
accurate, optimization:

   [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal, B)
       where B > 0 enables stochastic optimization with batch size B



TESTING
-------

Once a metric has been trained by "mlr_train", you can evaluate performance
across all measures by using the "mlr_test" function:

   Perf = mlr_test(W, test_k, Xtrain, Ytrain, Xtest, Ytest)

       W       = d-by-d positive semi-definite matrix
       test_k  = vector of k-values to use for KNN/Prec@k/NDCG
       Xtrain  = d-by-n matrix of training data
       Ytrain  = n-by-1 vector of training labels
                   OR
                 n-by-2 cell array where
                   Y{q,1} contains relevant indices (in 1..n) for point q
                   Y{q,2} contains irrelevant indices (in 1..n) for point q
       Xtest   = d-by-m matrix of testing data
       Ytest   = m-by-1 vector of training labels, or m-by-2 cell array 
                   If using the cell version, indices correspond to
                   the training set, and must lie in the range (1..n)
               

   The output structure Perf contains the mean score for:
       AUC, KNN, Prec@k, MAP, MRR, NDCG,
   as well as the effective dimensionality of W, and
   the best-performing k-value for KNN, Prec@k, and NDCG.

   For information retrieval settings, consider Xtrain/Ytrain as the corpus
   and Xtest/Ytest as the queries.

By testing with 
    Wnative = []
you can quickly evaluate the performance of the native (Euclidean) metric.


MULTIPLE KERNEL LEARNING
------------------------

As of version 1.1, MLR supports learning (multiple) kernel metrics, using the method described in

[3] Galleguillos, Carolina, McFee, Brian, Belongie, Serge, and Lanckriet, G.R.G.  
    From region similarity to category discovery.
    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.

For m kernels, input data must be provided in the form of an n-by-n-by-m matrix K, where K(:,:,i) 
contains the i'th kernel matrix of the training data.  Training the metric is similar to the 
single-kernel case, except that the regularization parameter should be set to 3 (kernel regularization), 
eg:

    >> [W, Xi, D] = mlr_train(K, Y, C, 'auc', 1, 3);

returns an n-by-n-by-m matrix W which can be used directly with mlr_test.  Multiple kernel learning also supports
diagonally constrained learning, eg:
    
    >> [W, Xi, D] = mlr_train(K, Y, C, 'auc', 1, 3, 1);

returns an n-by-m matrix W, where W(:,i) is the main diagonal of the i'th kernels metric.  Again, this W can be fed
directly into mlr_test.

For testing with multiple kernel data, Ktest should be an n-by-nTest-by-m matrix, where K(:,:,i) is the
training-by-test slice of the i'th kernel matrix.

FEEDBACK
--------

Please send any bug reports, source code contributions, etc. to 

Brian McFee <bmcfee@cs.ucsd.edu>