# 2. Load from Hub and transform

## Setup

In [None]:
from detection_datasets import DetectionDataset, available_in_hub

## Load from Hub

In [None]:
available_in_hub()

In [None]:
dd = DetectionDataset().from_hub(dataset_name='fashionpedia')

## Analyze the dataset

In [None]:
dd.data

In [None]:
dd.show(image_id=25)

In [None]:
dd.categories

## Transform the dataset

### Change the categories

In this dataset we have very fine_grained labels, so to create an easier task we may want to focus on higher level labels, like "clothing" instead of "pants".  

For example, let us assume that we are only interested in detection these 4 classes:  
- Clothing  
- Shoes  
- Bags  
- Accessories  

We can start by printing a list of the existing category names, and use it to create our mapping dictionnary:

In [None]:
dd.category_names

In [None]:
mapping = {
    'shirt, blouse': 'clothing',
    'top, t-shirt, sweatshirt': 'clothing',
    'sweater': 'clothing',
    'cardigan': 'clothing',
    'jacket': 'clothing',
    'vest': 'clothing',
    'pants': 'clothing',
    'shorts': 'clothing',
    'skirt': 'clothing',
    'coat': 'clothing',
    'dress': 'clothing',
    'jumpsuit': 'clothing',
    'cape': 'clothing',
    'glasses': 'accessories',
    'hat': 'accessories',
    'headband, head covering, hair accessory': 'accessories',
    'tie': 'accessories',
    'glove': 'accessories',
    'belt': 'accessories',
    'tights, stockings': 'accessories',
    'sock': 'accessories',
    'shoe': 'shoes',
    'bag, wallet': 'bags',
    'scarf': 'accessories',
}

We will then map the existing classes to these new classes with a simple Python dictionnary.  

3 thinks will happen when applying this new mapping:  
- The existing categories will be replaced by their mapped category names
- The existing categories that have no equivalent in this mapping will be dropped  
- The category_id will be recreated using the alphabetical order of the new categories (here 'Accessiroes' will have id et to 0, Bags to 1 and so on)

In [None]:
dd.map_categories(mapping=mapping)

In [None]:
dd.categories

In [None]:
dd.data

In [None]:
dd.show(image_id=25)

### Change the splits

The fashionpedia comes with the following splits:

In [None]:
dd.splits

In [None]:
dd.split_proportions

In [None]:
dd.split(splits=[0.9, 0.05, 0.05])

In [None]:
dd.splits

In [None]:
dd.split_proportions

## Push to new dataset to the Hub

In [None]:
dd.to_hub(dataset_name='fashionpedia_4_categories', repo_name='detection-datasets')