## Importing of Libraries

From the *Ordinal Deep Learning* package, we import the methods that will allow us to work with ordinal datasets.

We also import methods from libraries such as *pytorch* and *torchvision* that will allow us to process and work with the datasets.


In [1]:
from dlordinal.datasets import FGNet, Adience
from torchvision.transforms import ToTensor, Compose
from torchvision.datasets import ImageFolder
from torch.utils.data import Subset
from sklearn.model_selection import StratifiedShuffleSplit
import numpy as np

## FGNet

To make use of the [FGNet dataset](https://yanweifu.github.io/FG_NET_data/), an instance of it will be created where the following fields will be specified:

* __root__: an attribute that defines the path where the dataset will be downloaded and extracted.
* __download__: an attribute that indicates the desire to perform the dataset download.
* __process_data__: an attribute that allows indicating to the method whether the data should be preprocessed for working with it, in case the user does not want to perform their own preprocessing.

In [2]:
fgnet = FGNet(root='./datasets/fgnet', download=True, process_data=True)

Files already downloaded and verified
Files already processed and verified
Files already split and verified


Once the data has been downloaded, extracted, and preprocessed, we can load it to subsequently make use of it for training and validating a model.

After decompressing the dataset and processing it, we will see that a folder named *FGNET* is created, and inside it, we will find the *train* and *test* folders.

In [3]:
train_data = ImageFolder(
    root="./datasets/fgnet/FGNET/train", transform=Compose([ToTensor()])
)
test_data = ImageFolder(
    root="./datasets/fgnet/FGNET/test", transform=Compose([ToTensor()])
)

As an additional data processing step, we are going to show how we can obtain the number of classes in the dataset and how we can create a partition for validation.

In [4]:
# Obtain the number of classes
num_classes = len(train_data.classes)

# Create a validation split
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.15, random_state=0)
sss_splits = list(sss.split(X=np.zeros(len(train_data)), y=train_data.targets))
train_idx, val_idx = sss_splits[0]

# Create subsets for training and validation
train_data = Subset(train_data, train_idx)
val_data = Subset(train_data, val_idx)

### Adience

The [Adience dataset](https://talhassner.github.io/home/projects/Adience/Adience-data.html) does not allow direct download like FGNet, so a series of instructions must be followed to be able to download it.

* Download files fold_0_data.txt-fold_4_data.txt and place in a common folder
* Download aligned.tar.gz

Once the instrucctions are followed, an instance of it will be created where the following fields will be specified:
* __extract_file_path__: define the path where the file *aligned.tar.gz* is located.
* __extract__: indicate to the methos if we want to extract the file *aligned.tar.gz*.
* __folds_path__: indicate the path where text files with indices to the five-fold cross validation tests using all faces.
* __images_path__: indicate the path where the extraction will be done.
* __transformed_images_path__: indicate the path where all the images will be resized, maintaining the original aspect ratio, setting the height to 128 pixels, and allowing the width to adjust automatically.
* __partition_path__: indicates the path where the images will be stored separated by age ranges.



In [5]:
adience = Adience(
    extract_file_path="./datasets/adience/aligned.tar.gz",
    extract=True,
    folds_path="./datasets/adience/folds",
    images_path="./datasets/adience/aligned",
    transformed_images_path="./datasets/adience/transformed_images",
    partition_path="./datasets/adience/partitions",
)

File already extracted.
Fold 0: discarding 104 entries (2.3%)
Fold 1: discarding 456 entries (12.2%)
Fold 2: discarding 594 entries (15.3%)
Fold 3: discarding 377 entries (10.9%)
Fold 4: discarding 137 entries (3.6%)
Resizing images...


100%|██████████| 17702/17702 [06:47<00:00, 43.44it/s]
20it [03:16,  9.84s/it]


After the dataset has been extracted and the images have been processed and partitioned, the data is loaded.

In [7]:
data = ImageFolder(
    root="./datasets/adience/partitions", transform=Compose([ToTensor()])
)

As you can see, what has been loaded is the complete dataset, so a small code has been prepared to partition this data in a stratified way, making a *holout* in which 80% of the dataset images are for training and 20% for testing.

In [9]:
sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
sss_splits = list(sss.split(X=np.zeros(len(data)), y=data.targets))
train_idx, val_idx = sss_splits[0]

# Create subsets for training and test
train_data = Subset(train_data, train_idx)
test_data = Subset(train_data, val_idx)