Skip to content
Switch branches/tags


Failed to load latest commit information.
Latest commit message
Commit time

The Pytorch Implementation of L-Softmax

this repository contains a new, clean and enhanced pytorch implementation of L-Softmax proposed in the following paper:

Large-Margin Softmax Loss for Convolutional Neural Networks By Weiyang Liu, Yandong Wen, Zhiding Yu, Meng Yang [pdf in arxiv] [original CAFFE code by authors]

L-Softmax proposes a modified softmax classification method to increase the inter-class separability and intra-class compactness.

this re-implementation is based on the earlier pytorch implementation here by jihunchoi and borrowing some ideas from its TensorFlow implementation here by auroua. Generally the improvements are as follows:

  • Now features visualization as depicted in the original paper using the vis argument in the code.
  • Cleaner and more readable code
  • More comments in file for future readers
  • Variable names are now in better correspondence with the original paper
  • Using the updated PyTorch 0.4.1 syntax and API
  • Two models to produce visualization in paper's fig 2 and the original MNIST model is provided
  • The lambda (beta variable in code) optimization missing in the earlier PyTorch code has been added (refer to section 5.1 in the original paper)
  • The numerical error of torch.acos has been addressed
  • Provided training logs in the Logs folder
  • Some other minor performance improvements

Version compatibility

This code has been tested in Ubuntu 18.04 LTS using PyCharm IDE and a NVIDIA 1080Ti GPU. Here is a list of libraries and their corresponding versions:

python = 3.6
pytorch = 0.4.1
torchvision = 0.2.1
matplotlib = 2.2.2
numpy = 1.14.3
scipy = 1.1.0

Network parameters

  • batch_size = 256
  • max epochs = 100
  • learning rate = 0.1 (0.01 at epoch 50 and 0.001 at epoch 65)
  • SGD with momentum = 0.9
  • weight_decay = 0.0005


Here are the test set visualization results of training the MNIST for different margins: alt text

  • this plot has been generated using the smaller network proposed in the paper for visualization purposes only with batch size = 64, constant learning rate = 0.01 for 10 epochs, and no weight decay regularization.

And here is the tabulated results of training MNIST with the proposed network in the paper:

margin test accuracy paper
m = 1 99.37% 99.60%
m = 2 99.60% 99.68%
m = 3 99.56% 99.69%
m = 4 99.61% 99.69%
  • the test accuracy values are the max test accuracy of running the code only once with the network parameters above!