# **Optimizing Multi-label AUROC loss on Chest X-Ray Dataset (CheXpert)**

**Author**: Zhuoning Yuan

**Introduction**

In this tutorial, you will learn how to quickly train a DenseNet121 model by optimizing AUROC using our novel AUCMLoss and PESG optimizer on Chest X-Ray dataset, e.g.,[CheXpert](https://stanfordmlgroup.github.io/competitions/chexpert/). After completion of this tutorial, you should be able to use LibAUC to train your own models on your own datasets.



**Useful Resources**:
* Website: https://libauc.org
* Github: https://github.com/Optimization-AI/LibAUC

**Reference**:  

If you find this tutorial helpful in your work,  please acknowledge our library and cite the following paper:

<pre>
@inproceedings{yuan2021large,
  title={Large-scale robust deep auc maximization: A new surrogate loss and empirical studies on medical image classification},
  author={Yuan, Zhuoning and Yan, Yan and Sonka, Milan and Yang, Tianbao},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={3040--3049},
  year={2021}
}
</pre>


# **Installing LibAUC**

In [None]:
!pip install libauc==1.2.0

# **Downloading CheXpert**
 
*   To request dataset access, you need to apply from CheXpert website: https://stanfordmlgroup.github.io/competitions/chexpert/
*   In this tutorial, we use the smaller version of dataset with lower image resolution, i.e., *CheXpert-v1.0-small.zip*



In [None]:
!cp /content/drive/MyDrive/chexpert-dataset/CheXpert-v1.0-small.zip /content/
!mkdir CheXpert
!unzip CheXpert-v1.0-small.zip -d /content/CheXpert/


# **Importing LibAUC**

In [None]:
from libauc.losses import AUCM_MultiLabel, CrossEntropyLoss
from libauc.optimizers import PESG, Adam
from libauc.models import densenet121 as DenseNet121
from libauc.datasets import CheXpert
from libauc.metrics import auc_roc_score # for multi-task

from PIL import Image
import numpy as np
import torch 
import torchvision.transforms as transforms
from torch.utils.data import Dataset
import torch.nn.functional as F   

# **Reproducibility**

In [None]:
def set_all_seeds(SEED):
    # REPRODUCIBILITY
    torch.manual_seed(SEED)
    np.random.seed(SEED)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

# **Datasets, Loss and Optimizer**

In [None]:
root = './CheXpert/CheXpert-v1.0-small/'
# Index=-1 denotes multi-label with 5 diseases
traindSet = CheXpert(csv_path=root+'train.csv', image_root_path=root, use_upsampling=False, use_frontal=True, image_size=224, mode='train', class_index=-1, verbose=False)
testSet =  CheXpert(csv_path=root+'valid.csv',  image_root_path=root, use_upsampling=False, use_frontal=True, image_size=224, mode='valid', class_index=-1, verbose=False)
trainloader =  torch.utils.data.DataLoader(traindSet, batch_size=32, num_workers=2, shuffle=True)
testloader =  torch.utils.data.DataLoader(testSet, batch_size=32, num_workers=2, shuffle=False)

# check imbalance ratio for each task
print (traindSet.imratio_list )

# paramaters
SEED = 123
BATCH_SIZE = 32
lr = 0.1 
epoch_decay = 2e-3
weight_decay = 1e-5
margin = 1.0
total_epochs = 2

# model
set_all_seeds(SEED)
model = DenseNet121(pretrained=True, last_activation=None, activations='relu', num_classes=5)
model = model.cuda()

# define loss & optimizer
loss_fn = AUCM_MultiLabel(num_classes=5)
optimizer = PESG(model, 
                 loss_fn=loss_fn,
                 lr=lr, 
                 margin=margin, 
                 epoch_decay=epoch_decay, 
                 weight_decay=weight_decay)

[0.12241724991755092, 0.32190737435022276, 0.06796421448276946, 0.31190878776298636, 0.402555659671146]


# **Multi-label Training**
Optimizing Multi-label AUROC loss (e.g., 5 tasks)   




In [None]:
# training
print ('Start Training')
print ('-'*30)

best_val_auc = 0 
for epoch in range(total_epochs):
    if epoch > 0:
        optimizer.update_regularizer(decay_factor=10)    

    for idx, data in enumerate(trainloader):
      train_data, train_labels = data
      train_data, train_labels  = train_data.cuda(), train_labels.cuda()
      y_pred = model(train_data)
      y_pred = torch.sigmoid(y_pred)
      loss = loss_fn(y_pred, train_labels)
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()
        
      # validation  
      if idx % 400 == 0:
         model.eval()
         with torch.no_grad():    
              test_pred = []
              test_true = [] 
              for jdx, data in enumerate(testloader):
                  test_data, test_labels = data
                  test_data = test_data.cuda()
                  y_pred = model(test_data)
                  y_pred = torch.sigmoid(y_pred)
                  test_pred.append(y_pred.cpu().detach().numpy())
                  test_true.append(test_labels.numpy())
            
              test_true = np.concatenate(test_true)
              test_pred = np.concatenate(test_pred)
              val_auc_mean = np.mean(auc_roc_score(test_true, test_pred)) 
              model.train()

              if best_val_auc < val_auc_mean:
                 best_val_auc = val_auc_mean
                 torch.save(model.state_dict(), 'aucm_pretrained_model.pth')

              print ('Epoch=%s, BatchID=%s, Val_AUC=%.4f, Best_Val_AUC=%.4f'%(epoch, idx, val_auc_mean, best_val_auc))

Start Training
------------------------------
Epoch=0, BatchID=0, Val_AUC=0.5558, Best_Val_AUC=0.5558
Epoch=0, BatchID=400, Val_AUC=0.8283, Best_Val_AUC=0.8283
Epoch=0, BatchID=800, Val_AUC=0.8074, Best_Val_AUC=0.8283
Epoch=0, BatchID=1200, Val_AUC=0.8528, Best_Val_AUC=0.8528
Epoch=0, BatchID=1600, Val_AUC=0.8337, Best_Val_AUC=0.8528
Epoch=0, BatchID=2000, Val_AUC=0.8420, Best_Val_AUC=0.8528
Epoch=0, BatchID=2400, Val_AUC=0.8589, Best_Val_AUC=0.8589
Epoch=0, BatchID=2800, Val_AUC=0.8475, Best_Val_AUC=0.8589
Epoch=0, BatchID=3200, Val_AUC=0.8702, Best_Val_AUC=0.8702
Epoch=0, BatchID=3600, Val_AUC=0.8453, Best_Val_AUC=0.8702
Epoch=0, BatchID=4000, Val_AUC=0.8552, Best_Val_AUC=0.8702
Epoch=0, BatchID=4400, Val_AUC=0.8366, Best_Val_AUC=0.8702
Epoch=0, BatchID=4800, Val_AUC=0.8603, Best_Val_AUC=0.8702
Epoch=0, BatchID=5200, Val_AUC=0.8700, Best_Val_AUC=0.8702
Epoch=0, BatchID=5600, Val_AUC=0.8842, Best_Val_AUC=0.8842
Reducing learning rate to 0.01000 @ T=5970!
Updating regularizer @ T=5970!

# **Evaluation**

In [None]:
# show auc roc scores for each task 
auc_roc_score(test_true, test_pred)