*   Author: Zhuoning Yuan
*   Project: https://github.com/yzhuoning/LibAUC


# **01.Installing LibAUC**

In [None]:
!pip install libauc

Processing ./libauc-1.0.7-py3-none-any.whl
Installing collected packages: libauc
Successfully installed libauc-1.0.7


# **02. Loading Datasets**

This requires a [tensorflow](https://www.tensorflow.org/install) version>2.0.0.




### CIFAR10
* **Description**: The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
* **Homepage:** https://www.cs.toronto.edu/~kriz/cifar.html



In [None]:
from libauc.datasets import CIFAR10
(train_data, train_label), (test_data, test_label) = CIFAR10()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


### CIFAR100
* **Description**: This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses.
* **Homepage:** https://www.cs.toronto.edu/~kriz/cifar.html


In [None]:
from libauc.datasets import CIFAR100
(train_data, train_label), (test_data, test_label) = CIFAR100()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz


### CAT_vs_DOG
* **Description**: The training archive contains 25,000 images of dogs and cats. Train your algorithm on these files and predict the labels for 1 = dog, 0 = cat.
* **Homepage:** https://www.kaggle.com/c/dogs-vs-cats/data



In [None]:
from libauc.datasets import CAT_VS_DOG
(train_data, train_label), (test_data, test_label) = CAT_VS_DOG()


### STL10
* **Description**: The STL-10 dataset consists of 5000 96x96 colour images in 10 classes, with 500 images per class. There are 8000 test images, with 800 images per class. 
* **Homepage:**: https://ai.stanford.edu/~acoates/stl10/



In [None]:
from libauc.datasets import STL10
(train_data, train_label), (test_data, test_label) = STL10()

Downloading data from http://ai.stanford.edu/~acoates/stl10/stl10_binary.tar.gz


# **03. Constructing Imbalanced Datasets**



Import *imbalance_generator* function

In [None]:
from libauc.datasets import imbalance_generator 

Set *random_seed=123* and *imbalance_ratio=0.1*

In [None]:
SEED = 123
imratio = 0.1 # postive_samples/(total_samples)

We have the new imbalanced datasets, consisting of 2777 positive images and 25000 negative images for training set. For testing set, we keep them unchanged.

In [None]:
from libauc.datasets import CIFAR10
(train_data, train_label), (test_data, test_label) = CIFAR10()
(train_images, train_labels) = imbalance_generator(train_data, train_label, imratio=imratio, shuffle=True, random_seed=SEED)
(test_images, test_labels) = imbalance_generator(test_data, test_label, is_balanced=True, random_seed=SEED)

NUM_SAMPLES: [27777], POS:NEG: [2777 : 25000], POS_RATIO: 0.1000
NUM_SAMPLES: [10000], POS:NEG: [5000 : 5000], POS_RATIO: 0.5000


# **04. Preparing datasets for training with DataLoaders**

In [None]:
import torch
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms
import numpy as np

class ImageDataset(Dataset):
    def __init__(self, images, targets, image_size=32, crop_size=30, mode='train'):
       self.images = images.astype(np.uint8)
       self.targets = targets
       self.mode = mode
       self.transform_train = transforms.Compose([                                                
                              transforms.ToTensor(),
                              transforms.RandomCrop((crop_size, crop_size), padding=None),
                              transforms.RandomHorizontalFlip(),
                              transforms.Resize((image_size, image_size)),
                              ])
       self.transform_test = transforms.Compose([
                             transforms.ToTensor(),
                             transforms.Resize((image_size, image_size)),
                              ])
    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        image = self.images[idx]
        target = self.targets[idx]
        image = Image.fromarray(image.astype('uint8'))
        if self.mode == 'train':
            image = self.transform_train(image)
        else:
            image = self.transform_test(image)
        return image, target
  

trainloader = DataLoader(ImageDataset(train_images, train_labels, mode='train'), batch_size=128, shuffle=True, num_workers=2, pin_memory=True)
testloader = DataLoader(ImageDataset(test_images, test_labels, mode='test'), batch_size=128, shuffle=False, num_workers=2,  pin_memory=True)

Now, we are ready to train models using the new imbalanced dataset. 