# Human parsing

- Mount the google drive

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
!pip install ninja
!git clone https://github.com/PeikeLi/Self-Correction-Human-Parsing
%cd Self-Correction-Human-Parsing
!mkdir checkpoints
!mkdir inputs
!mkdir outputs

Collecting ninja
  Downloading ninja-1.10.2.3-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)
[?25l[K     |███                             | 10 kB 24.5 MB/s eta 0:00:01[K     |██████                          | 20 kB 11.4 MB/s eta 0:00:01[K     |█████████                       | 30 kB 9.4 MB/s eta 0:00:01[K     |████████████▏                   | 40 kB 8.6 MB/s eta 0:00:01[K     |███████████████▏                | 51 kB 5.6 MB/s eta 0:00:01[K     |██████████████████▏             | 61 kB 5.7 MB/s eta 0:00:01[K     |█████████████████████▏          | 71 kB 5.6 MB/s eta 0:00:01[K     |████████████████████████▎       | 81 kB 6.3 MB/s eta 0:00:01[K     |███████████████████████████▎    | 92 kB 6.3 MB/s eta 0:00:01[K     |██████████████████████████████▎ | 102 kB 5.5 MB/s eta 0:00:01[K     |████████████████████████████████| 108 kB 5.5 MB/s 
[?25hInstalling collected packages: ninja
Successfully installed ninja-1.10.2.3
Cloning into 'Self-Correction-Human-Pa

In [None]:
base_path = "/content/gdrive/MyDrive/RA"

- Select and download human dataset below

In [None]:
dataset = 'atr'         #select from ['lip', 'atr', 'pascal']

In [None]:
import gdown

if dataset == 'lip':
    url = 'https://drive.google.com/uc?id=1k4dllHpu0bdx38J7H28rVVLpU-kOHmnH'
elif dataset == 'atr':
    url = 'https://drive.google.com/uc?id=1ruJg4lqR_jgQPj-9K0PP-L2vJERYOxLP'
elif dataset == 'pascal':
    url = 'https://drive.google.com/uc?id=1E5YwNKW2VOEayK9mWCS3Kpsxf-3z04ZE'

output = 'checkpoints/final.pth'
gdown.download(url, output, quiet=False)

Downloading...
From: https://drive.google.com/uc?id=1ruJg4lqR_jgQPj-9K0PP-L2vJERYOxLP
To: /content/Self-Correction-Human-Parsing/checkpoints/final.pth
100%|██████████| 267M/267M [00:01<00:00, 166MB/s]


'checkpoints/final.pth'

- Upload input image: I use the uploaded images on my own google drive

In [None]:
#Please select images which you want to upload
%cd inputs
from google.colab import files
uploaded = files.upload()
%cd ..

/content/Self-Correction-Human-Parsing/inputs


Saving 2.jpg to 2.jpg
/content/Self-Correction-Human-Parsing


- Run and save processed images and pixel-wise logits on google drive

In [None]:
!python3 simple_extractor.py --dataset 'atr' --model-restore 'checkpoints/final.pth' --input-dir "/content/gdrive/MyDrive/RA/HKTV_data"  --output-dir "/content/gdrive/MyDrive/RA/HKTV_data_process" --logits

Evaluating total class number 18 with ['Background', 'Hat', 'Hair', 'Sunglasses', 'Upper-clothes', 'Skirt', 'Pants', 'Dress', 'Belt', 'Left-shoe', 'Right-shoe', 'Face', 'Left-leg', 'Right-leg', 'Left-arm', 'Right-arm', 'Bag', 'Scarf']
100% 100/100 [02:58<00:00,  1.78s/it]


## Detection criteria
1. More than 5% of pixels are classified as human labels
2. There are more than one detected human labels

## Performance on 100 images
- Accuracy: 85%. Total time used: 3 min 
- Fail reasons:
    - Most of the human body is covered by cloths: Image 73, 85, 141, 144, 151, 185, 203, 251, 258, 283, 757
    - Human figure is too small: Image 23, 24
    - Other weird scenarios: Image 96 (cannot recognize part of the arm); Image 102 (sport bra classified as human face)
- **Pros**
  - Easy to compute and understand, the pixel-wise logit is a by-product when going through the human-parsing phase in the BMI estimation model
- **Cons**
  1. Impractical, need to save the logits (~100MB for large image) of each image to disk, which consumes huge space when the number of images is huge
  2. The pixel is related to actual size. So if the area of human figure is of small size compared to the image size, the threshold fails in such case
  3. When most of the human body is covered by cloths, only small percentage of pixels are classified as human body. This method fails in such case. As the class labels of "atr" human dataset also contain some clothing labels - 'Hat', 'Upper-clothes', 'Skirt', 'Pants', 'Dress', 'Belt','Left-shoe', 'Right-shoe', 'Bag', 'Scarf', it is a trade-off to include these labels and improve recall, or to exclude these labels to improve precision.
  4. Only a rule-based method, might not be generalized to all cases
  5. The pixel threshold cannot guarantee that the part detected as human is clustered at a place, because pixel can be anywhere in the image, which is not meaningful in some cases

In [None]:
import pandas as pd
pd.set_option('display.max_rows', None)

import numpy as np
import glob

In [None]:
def count_pct_label_pixel(load_path, class_dict, label_name, print_shape=False):
    """
    count the percentage of pixels that are labeled as part of a person: [Hair, Face, Left-arm, Right-arm, Left-leg, Right-leg]
    """
    pixel_label = np.argmax(np.load(load_path), axis=2)

    if print_shape:
    print("Input shape is", pixel_label.shape)

    pct_pixel = 0
    if label_name == '':
    print("No classified human label.")
    else:
        n_pixel = len(np.where(pixel_label == class_dict[label_name])[0])
        total_pixel = pixel_label.shape[0] * pixel_label.shape[1]
        print("{:.2%} of pixels is classified as {}.".format(n_pixel /total_pixel, label_name))
        pct_pixel = n_pixel /total_pixel
  
    return pct_pixel

In [None]:
# sort the logits files by image id
output_paths = glob.glob('%s/HKTV_data_process/*.npy' % base_path)
output_paths.sort(key=lambda x: int(x.split("/")[-1].split(".")[0]))

# atr class labels
class_labels = np.array(['Background', 'Hat', 'Hair', 'Sunglasses', 'Upper-clothes', 'Skirt', 'Pants', 'Dress', 'Belt',
                       'Left-shoe', 'Right-shoe', 'Face', 'Left-leg', 'Right-leg', 'Left-arm', 'Right-arm', 'Bag', 'Scarf'])
class_dict = {name: label for label, name in enumerate(class_labels)}

# hair, face, left-leg, right-leg, left-arm, right-arm
human_labels = np.array([2, 11, 12, 13, 14, 15])

result_dict = {'image_id': [], 'contain_label': [], 'pct_pixel': [], 'par_pred': []}

for i in range(len(output_paths)):
    image_id = int(output_paths[i].split("/")[-1][:-4])
    print("Image %d" % image_id)
    pixel_label = np.argmax(np.load(output_paths[i]), axis=2)
    human_label = np.intersect1d(np.unique(pixel_label), human_labels)
    contain_label_list = class_labels[human_label]

    pct_total = 0
    for l in range(len(contain_label_list)):
        pct_total += count_pct_label_pixel(output_paths[i], class_dict, contain_label_list[l], print_shape=False)

    if pct_total > 0.05 and len(class_labels[human_label]) > 1:
        print("The image contains human. \n")
        result_dict["par_pred"].append(1)
    else:
        print("The image does not contains human. \n")
        result_dict["par_pred"].append(0)

    result_dict["image_id"].append(image_id)
    result_dict["contain_label"].append(", ".join(contain_label_list))
    result_dict["pct_pixel"].append(round(pct_total, 4))

result_df = pd.DataFrame.from_dict(result_dict, orient="columns")
result_df = result_df.sort_values("image_id").reset_index(drop=True)
result_df.to_csv("%s/human_parsing_result.csv" % base_path)


Image 0
0.01% of pixels is classified as Face.
The image does not contains human. 

Image 1
0.04% of pixels is classified as Face.
The image does not contains human. 

Image 2
0.89% of pixels is classified as Hair.
13.94% of pixels is classified as Face.
3.70% of pixels is classified as Left-leg.
3.50% of pixels is classified as Right-leg.
3.69% of pixels is classified as Left-arm.
4.36% of pixels is classified as Right-arm.
The image contains human. 

Image 3
0.67% of pixels is classified as Hair.
18.69% of pixels is classified as Face.
2.34% of pixels is classified as Left-leg.
2.17% of pixels is classified as Right-leg.
4.22% of pixels is classified as Left-arm.
5.31% of pixels is classified as Right-arm.
The image contains human. 

Image 5
0.93% of pixels is classified as Hair.
17.92% of pixels is classified as Face.
3.25% of pixels is classified as Left-leg.
3.27% of pixels is classified as Right-leg.
5.30% of pixels is classified as Left-arm.
5.03% of pixels is classified as Righ

In [None]:
result_df

Unnamed: 0,image_id,contain_label,pct_pixel,has_human
0,0,Face,0.0001,0
1,1,Face,0.0004,0
2,2,"Hair, Face, Left-leg, Right-leg, Left-arm, Rig...",0.3009,1
3,3,"Hair, Face, Left-leg, Right-leg, Left-arm, Rig...",0.334,1
4,5,"Hair, Face, Left-leg, Right-leg, Left-arm, Rig...",0.3569,1
5,6,"Hair, Face, Left-leg, Right-leg, Left-arm, Rig...",0.5755,1
6,7,"Hair, Face, Left-arm, Right-arm",0.5545,1
7,10,Left-arm,0.0,0
8,16,"Hair, Face, Right-arm",0.0056,0
9,21,"Hair, Face",0.165,1
