# What is pants?

We've started this project to understand how segmentation models work. Identifying and segmenting pants in an image is a fairly easy task for us humans — we can do it with almost 100% accuracy. But how well can machines do it?
To answer this existential question, we decided to train a segmentation model and run some predictions. Our first choice was the Ultralytics YOLOv8 segmentation model, as it's well-documented, open-source, and frankly, looks quite promising.


## Setting up Google Colab env

In [None]:
# clone the repo
!git clone https://github.com/LorenaDerezanin/WhatIsPants.git
!cd WhatIsPants

## Install requirements with pip

In [2]:
!pip install -r requirements.txt --no-cache-dir



## Prepare dataset

As our initial dataset we used a Deep Fashion MultiModal dataset: https://github.com/yumingj/DeepFashion-MultiModal    
    * from 44,096 jpg images, 12,701 are annotated (classes, segmentation masks and bounding boxes)

In [8]:
# download image files
!wget --header 'Host: drive.usercontent.google.com' \
  --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8' \
  --header 'Accept-Language: en-US,en;q=0.5' \
  --header 'Upgrade-Insecure-Requests: 1' \
  --header 'Sec-Fetch-Dest: document' \
  --header 'Sec-Fetch-Mode: navigate' \
  --header 'Sec-Fetch-Site: cross-site' \
  --header 'Sec-Fetch-User: ?1' 'https://drive.usercontent.google.com/download?id=1U2PljA7NE57jcSSzPs21ZurdIPXdYZtN&export=download&authuser=0&confirm=t&uuid=115a0cd6-8ddb-427b-9343-62b76c4d939c&at=APZUnTWiXg4LlG3A7QPA5DmjASX8%3A1715537567680' \
  --output-document 'images.zip'

--2024-05-12 20:19:36--  https://drive.usercontent.google.com/download?id=1U2PljA7NE57jcSSzPs21ZurdIPXdYZtN&export=download&authuser=0&confirm=t&uuid=115a0cd6-8ddb-427b-9343-62b76c4d939c&at=APZUnTWiXg4LlG3A7QPA5DmjASX8%3A1715537567680
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 142.250.181.193
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|142.250.181.193|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6822188210 (6,4G) [application/octet-stream]
Saving to: ‘images.zip’

images.zip            1%[                    ] 110,44M  12,2MB/s    eta 7m 16s ^C


In [9]:
# download annotation labels
!wget --header 'Host: drive.usercontent.google.com' \
  --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8' \
  --header 'Accept-Language: en-US,en;q=0.5' \
  --header 'Upgrade-Insecure-Requests: 1' \
  --header 'Sec-Fetch-Dest: document' \
  --header 'Sec-Fetch-Mode: navigate' \
  --header 'Sec-Fetch-Site: cross-site' \
  --header 'Sec-Fetch-User: ?1' 'https://drive.usercontent.google.com/download?id=1r-5t-VgDaAQidZLVgWtguaG7DvMoyUv9&export=download&authuser=0&confirm=t&uuid=b445e6d2-634c-4b59-96c8-4455c6f117a5&at=APZUnTV7OltdPbT0OB1lUK1FhJO8%3A1715537716467' \
  --output-document 'segm.zip'

--2024-05-12 20:22:48--  https://drive.usercontent.google.com/download?id=1r-5t-VgDaAQidZLVgWtguaG7DvMoyUv9&export=download&authuser=0&confirm=t&uuid=b445e6d2-634c-4b59-96c8-4455c6f117a5&at=APZUnTV7OltdPbT0OB1lUK1FhJO8%3A1715537716467
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 142.250.181.193
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|142.250.181.193|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94904514 (91M) [application/octet-stream]
Saving to: ‘segm.zip’

segm.zip             29%[====>               ]  26,57M  11,8MB/s               ^C


### Convert masks to contours format that YOLO can process

In [10]:
# import modules
import os
import cv2
import numpy as np
from concurrent.futures import ThreadPoolExecutor

# define directories
masks_dir = "datasets/deepfashion/segm"
labels_dir = "datasets/deepfashion/labels"

# define the mask color for pants
# pants are marked with a light gray color in mask files
pants_mask_color = np.array([211, 211, 211])

# import the mask2contour function 
from mask2contour import mask2contour

# load labelled mask pngs
# parallelize processing
with ThreadPoolExecutor(max_workers=10) as executor:
    for mask_filename in os.listdir(masks_dir):
        executor.submit(mask2contour, mask_filename, masks_dir, labels_dir, pants_mask_color)



No pants labelled in WOMEN-Skirts-id_00007878-02_4_full_segm.png
No pants labelled in WOMEN-Rompers_Jumpsuits-id_00000344-05_4_full_segm.png
No pants labelled in WOMEN-Dresses-id_00002656-02_2_side_segm.png
No pants labelled in WOMEN-Dresses-id_00006526-05_4_full_segm.png
No pants labelled in WOMEN-Dresses-id_00006244-04_3_back_segm.png
No pants labelled in WOMEN-Sweatshirts_Hoodies-id_00001326-04_7_additional_segm.png
No pants labelled in WOMEN-Rompers_Jumpsuits-id_00006748-01_1_front_segm.png
No pants labelled in WOMEN-Dresses-id_00007417-01_2_side_segm.png
No pants labelled in WOMEN-Dresses-id_00003397-02_2_side_segm.png
No pants labelled in WOMEN-Skirts-id_00004644-02_4_full_segm.png
No pants labelled in WOMEN-Blouses_Shirts-id_00004425-01_4_full_segm.png
No pants labelled in WOMEN-Dresses-id_00007940-02_4_full_segm.png
No pants labelled in WOMEN-Cardigans-id_00000559-01_1_front_segm.png
No pants labelled in WOMEN-Dresses-id_00005748-01_1_front_segm.png
No pants labelled in WOMEN-B

## Subset data into training, validation and test sets

Dataset containg all 12,701 labelled images was split into:   
    * train 80%   
    * val 10%  
    * test 10%   

In [None]:
import os
from subset_training_data import setup_dirs, copy_files_in_parallel

# define working directory and subset size
WORKDIR = '.'
subset_size = 12702

# setup directories
dirs = setup_dirs(WORKDIR, subset_size)
labels_source_dir, images_source_dir, train_dir, val_dir, test_dir, num_train_labels, num_val_labels, num_test_labels = dirs

# copy files in parallel
copy_files_in_parallel(labels_source_dir, images_source_dir, train_dir, val_dir, test_dir, num_train_labels, num_val_labels, num_test_labels, subset_size)


### Inspect annotations by printing 4x4 set of labelled images

In [None]:
import supervision as sv
from inspect_annotations import load_and_annotate_images, plot_image_grid

# define images, labels and yaml paths, and sample size
IMAGES_DIRECTORY_PATH = "datasets/lvis_pants/images/train2017"
ANNOTATIONS_DIRECTORY_PATH = "datasets/lvis_pants/labels/train2017"
DATA_YAML_PATH = "lvis.yaml"
SAMPLE_SIZE = 16

# load and annotate images
images, image_names = load_and_annotate_images(
    images_directory_path=IMAGES_DIRECTORY_PATH,
    annotations_directory_path=ANNOTATIONS_DIRECTORY_PATH,
    data_yaml_path=DATA_YAML_PATH,
    sample_size=SAMPLE_SIZE
)

# plot images grid
plot_image_grid(
    images=images,
    titles=image_names,
    grid_size=(4, 4),
    size=(16, 16)
)

## Find and prepare a more diverse dataset

To enrich a very uniform initial data set, it was supplemented with LVIS (Large Vocabulary Instance Segmentation) dataset: https://www.lvisdataset.org/dataset   
to create a more diverse set and prevent overfitting.

### Copy LVIS images into dir to be subsetted

In [None]:
cp -r ~/datasets/lvis/images datasets/lvis_pants/

### Subset only pants labels

In [None]:
# Training set 
python subset_lvis_pants_labels.py \
  --source_directory "$HOME/datasets/lvis/labels/train2017/" \
  --target_directory "datasets/lvis_pants/labels/train2017/"

# Validation set
python subset_lvis_pants_labels.py \
  --source_directory "$HOME/datasets/lvis/labels/val2017/" \
  --target_directory "datasets/lvis_pants/labels/val2017/"

# Check number of resulting non-empty labels
# Should be 4462 train and 184 val
find datasets/lvis_pants/labels/train2017 -type f -size +0c | wc -l 
find datasets/lvis_pants/labels/val2017 -type f -size +0c | wc -l

### Keep only as many pantsless images as there are pantsful images

In [None]:
# Training set
python remove_superfluous_empty_labels.py \
  --labels_directory datasets/lvis_pants/labels/train2017 \
  --images_directory datasets/lvis_pants/images/train2017
  
# Validation set
python remove_superfluous_empty_labels.py \
  --labels_directory datasets/lvis_pants/labels/val2017 \
  --images_directory datasets/lvis_pants/images/val2017

### Remove images which have no corresponding label file

We observed that the LVIS dataset contains images with pants where pants are not annotated. For example: 000000096670.jpg shows a baseball player, and the labels include a baseball, a home base, a bat, and a belt, but no pants.

In [None]:
# Training set
python delete_labelless_images.py \
  --images_directory datasets/lvis_pants/images/train2017 \
  --labels_directory datasets/lvis_pants/labels/train2017

# Validation set
python delete_labelless_images.py \
  --images_directory datasets/lvis_pants/images/val2017 \
  --labels_directory datasets/lvis_pants/labels/val2017

### Prepare `train` configuration yaml file 

In [None]:
tensorboard: True

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: /Users/lorenaderezanin/PycharmProjects/WhatIsPants/datasets/lvis_pants # dataset root dir
train: images/train2017 # train images (relative to 'path') 100170 images
val: images/val2017 # val images (relative to 'path') 19809 images

names:
  0: This is pants

## Run the training


In [None]:
!python lvis_yolo_train.py 50

In [None]:
### 