This kernel implements [the evaluation score](https://www.kaggle.com/c/imaterialist-fashion-2019-FGVC6/overview/evaluation).  
The evaluation scores are calculated with some pseudo data which is almost the same format of iMaterialist competition.  
I hope this kernel helps kagglers;)

In [91]:
from itertools import chain
from pathlib import Path
import random

import numpy as np
import pandas as pd


# Data path
DATA_DIR = Path("../input/imaterialist-fashion-2020-fgvc7")

## Load Data

`ClassId`, which may have any attributes, is converted into `CategoryId` which has no attributes for easy understood.

In [92]:
train_df = pd.read_csv(r'../input/imaterialist-fashion-2020-fgvc7/train.csv')
train_df['CategoryId'] = train_df['ClassId'].astype(str).str.split('_').str[0] # 10(classId)_1_2_3(attributeIds) => 10

In [93]:
train_df["AttributesIds"]

0         115,136,143,154,230,295,316,317
1         115,136,142,146,225,295,316,317
2                                     163
3                                 160,204
4                                     219
                       ...               
333396                                163
333397                                NaN
333398                                157
333399                                157
333400        102,128,142,150,295,308,317
Name: AttributesIds, Length: 333401, dtype: object

In [94]:
def extract(x):
    i=0
    num=""
    extract_list=[]

    while(i<=len(x)):
        if(i==len(x)):
            extract_list.append(num)
            break
        if(x[i]!=','):
            num=num+x[i]
            i=i+1
        else:
            extract_list.append(num)
            i=i+1
            num=""
    return extract_list

In [95]:
attributes_description_list=[]

for i in range(0,train_df.shape[0]):
    attributes_description_list.append(extract (str(train_df.loc[i,'AttributesIds'])))


In [96]:
train_df['AttributesIds']=attributes_description_list

In [97]:
train_df

Unnamed: 0,ImageId,EncodedPixels,Height,Width,ClassId,AttributesIds,CategoryId
0,00000663ed1ff0c4e0132b9b9ac53f6e,6068157 7 6073371 20 6078584 34 6083797 48 608...,5214,3676,6,"[115, 136, 143, 154, 230, 295, 316, 317]",6
1,00000663ed1ff0c4e0132b9b9ac53f6e,6323163 11 6328356 32 6333549 53 6338742 75 63...,5214,3676,0,"[115, 136, 142, 146, 225, 295, 316, 317]",0
2,00000663ed1ff0c4e0132b9b9ac53f6e,8521389 10 8526585 30 8531789 42 8537002 46 85...,5214,3676,28,[163],28
3,00000663ed1ff0c4e0132b9b9ac53f6e,12903854 2 12909064 7 12914275 10 12919485 15 ...,5214,3676,31,"[160, 204]",31
4,00000663ed1ff0c4e0132b9b9ac53f6e,10837337 5 10842542 14 10847746 24 10852951 33...,5214,3676,32,[219],32
...,...,...,...,...,...,...,...
333396,fffe20b555b98c3c1f26c8dfff275cbc,2712731 8 2715725 23 2718719 39 2721713 55 272...,3000,2001,28,[163],28
333397,ffffbf7014a9e408bfbb81a75bc70638,71179 1 71678 3 72178 4 72678 4 73178 5 73679 ...,500,375,33,[nan],33
333398,ffffbf7014a9e408bfbb81a75bc70638,116648 5 117148 16 117648 22 118148 26 118647 ...,500,375,31,[157],31
333399,ffffbf7014a9e408bfbb81a75bc70638,67711 1 68210 1 68709 2 69204 2 69208 3 69705 ...,500,375,31,[157],31


## Make ground truth data (training or validation) and its predictions

In [98]:
# Ground truth
gt_columns = ['ImageId', 'EncodedPixels', 'Height', 'Width','AttributesIds','CategoryId']
gt_df = train_df.sample(5000, random_state=777)[gt_columns]
gt_df.head()

Unnamed: 0,ImageId,EncodedPixels,Height,Width,AttributesIds,CategoryId
294600,e25a540df60806f649259ec7c6829ae6,147881 2 148530 4 149179 4 149828 4 150477 4 1...,650,438,[186],33
238585,b78423688872e43e2dc2c3d9cbaa4a65,8009601 2 8014412 6 8019224 9 8024035 12 80288...,4813,3209,[nan],35
62000,2f7ed67861442e509774b86455cc4615,107288 1 108486 4 109684 7 110882 10 112080 12...,1200,1200,"[157, 204]",31
294091,e1fd38a75d7ac773e2b8249f05bd7ed5,76415 6 77614 18 78813 28 80012 35 81210 43 82...,1200,1200,"[160, 204]",31
179361,8a33d8caefeab5795c99f015600f93cd,2177976 4 2180422 13 2182868 21 2185315 29 218...,2448,2448,[nan],19


In [99]:
# Prediction
pred_columns = ['ImageId', 'EncodedPixels', 'CategoryId',"AttributesIds"]
pred_df_ = train_df.sample(3000, random_state=111)[pred_columns]

# Prediction uses ground truth data partialy for simulation.
# The prediction has the perfect masks and classIDs of the ground truth data at least 2000.
pred_df = pd.concat([gt_df.iloc[:2000][pred_columns], pred_df_], axis=0, sort=False)
pred_df.head()

Unnamed: 0,ImageId,EncodedPixels,CategoryId,AttributesIds
294600,e25a540df60806f649259ec7c6829ae6,147881 2 148530 4 149179 4 149828 4 150477 4 1...,33,[186]
238585,b78423688872e43e2dc2c3d9cbaa4a65,8009601 2 8014412 6 8019224 9 8024035 12 80288...,35,[nan]
62000,2f7ed67861442e509774b86455cc4615,107288 1 108486 4 109684 7 110882 10 112080 12...,31,"[157, 204]"
294091,e1fd38a75d7ac773e2b8249f05bd7ed5,76415 6 77614 18 78813 28 80012 35 81210 43 82...,31,"[160, 204]"
179361,8a33d8caefeab5795c99f015600f93cd,2177976 4 2180422 13 2182868 21 2185315 29 218...,19,[nan]


In [100]:
print(train_df['AttributesIds'][333397][0]=='nan')

True


In [101]:
print(f"Ground truth: {len(gt_df)}")
print(f"Predictions: {len(pred_df)}")

Ground truth: 5000
Predictions: 5000


## Make pseudo predictions.

EncodedPixcels are randomly dropped and fluctuated to make various true positive samples for real simulation.

In [142]:
def drop_randomly(pixels):
    pixels_ = pixels.split()
    split_pixels = np.split(np.array(pixels_), len(pixels_)/2)

    # Drop pixels
    random.seed(7)
    remains = int(random.choice(np.arange(0.5, 1.1, 0.1)) * len(split_pixels))
    drop_pixels = random.sample(split_pixels, remains)

    # Fluctuate pixel length
    def fluc_pixel(arr, f):
        return np.array([arr[0], max(1, int(arr[1]) + f)])

    random.seed(7)
    fluc = np.random.randint(-10, 10, len(drop_pixels))
    dp_ = [fluc_pixel(arr, f) for arr, f in zip(drop_pixels, fluc)]

    dp = list(chain.from_iterable([dp.tolist() for dp in dp_]))

    return ' '.join(dp)

pred_df_pseudo = pred_df.copy()
pred_df_pseudo['EncodedPixels'] = pred_df['EncodedPixels'].apply(drop_randomly)
pred_df_pseudo.head()

Unnamed: 0,ImageId,EncodedPixels,CategoryId,AttributesIds
294600,e25a540df60806f649259ec7c6829ae6,150477 12 151773 13 154364 2 147881 1 148530 1...,33,[186]
238585,b78423688872e43e2dc2c3d9cbaa4a65,8380226 29 8962708 18 8125099 26 8187674 31 93...,35,[nan]
62000,2f7ed67861442e509774b86455cc4615,388034 6 152820 82 228307 234 306205 346 12166...,31,"[157, 204]"
294091,e1fd38a75d7ac773e2b8249f05bd7ed5,150605 383 282339 448 95734 48 106391 206 3686...,31,"[160, 204]"
179361,8a33d8caefeab5795c99f015600f93cd,2554898 162 3166890 158 3808228 131 2297868 14...,19,[nan]


In [103]:
pred_df

Unnamed: 0,ImageId,EncodedPixels,CategoryId,AttributesIds
294600,e25a540df60806f649259ec7c6829ae6,147881 2 148530 4 149179 4 149828 4 150477 4 1...,33,[186]
238585,b78423688872e43e2dc2c3d9cbaa4a65,8009601 2 8014412 6 8019224 9 8024035 12 80288...,35,[nan]
62000,2f7ed67861442e509774b86455cc4615,107288 1 108486 4 109684 7 110882 10 112080 12...,31,"[157, 204]"
294091,e1fd38a75d7ac773e2b8249f05bd7ed5,76415 6 77614 18 78813 28 80012 35 81210 43 82...,31,"[160, 204]"
179361,8a33d8caefeab5795c99f015600f93cd,2177976 4 2180422 13 2182868 21 2185315 29 218...,19,[nan]
...,...,...,...,...
145876,7014b9490ff55a1d7d67e184a9eb06ed,3871414 25 3873834 77 3876255 127 3878675 178 ...,31,"[159, 204]"
237975,b71ac155c67e3d2871fefb19009174b3,2607841 13 2610634 26 2613427 26 2616219 27 26...,42,[nan]
37528,1cc65314f6d7ea3bc678abde5b9f1063,588499 5 589772 16 591048 23 592329 25 593610 ...,23,[nan]
252017,c20449a9781000ed897aa4ca7da14038,3288097 2 3290545 5 3292993 9 3295440 14 32978...,32,[218]


## Evaluation

In [139]:
def calc_IoU(A,B):
    AorB = np.logical_or(A,B).astype('int')
    AandB = np.logical_and(A,B).astype('int')
    IoU = AandB.sum() / AorB.sum()
    return IoU



def rle_to_mask(rle_list, SHAPE):
    tmp_flat = np.zeros(SHAPE[0]*SHAPE[1])
    if len(rle_list) == 1:
        mask = np.reshape(tmp_flat, SHAPE).T
    else:
        strt = rle_list[::2]
        length = rle_list[1::2]
        for i,v in zip(strt,length):
            tmp_flat[(int(i)-1):(int(i)-1)+int(v)] = 255
        mask = np.reshape(tmp_flat, SHAPE).T
    return mask

def calc_IoU_threshold(data):
    # Note: This rle_to_mask should be called before loop below for speed-up! We currently implement here to reduse memory usage.
    mask_gt = rle_to_mask(data['EncodedPixels_gt'].split(), (int(data['Height']), int(data['Width'])))
    mask_pred = rle_to_mask(data['EncodedPixels_pred'].split(), (int(data['Height']), int(data['Width'])))
    return calc_IoU(mask_gt, mask_pred)

def calc_F1_threshold(gt_df,pred_df):

    fp=0
    fn=0
    tp=0
    # Note: This rle_to_mask should be called before loop below for speed-up! We currently implement here to reduse memory usage.
    f1_list=[]
    for i in range(0,gt_df.shape[0]):
        if len(gt_df.iloc[i])>len(pred_df.iloc[i]):
            fn=len(gt_df.iloc[i])-len(pred_df.iloc[i])
        elif len(gt_df.iloc[i])<len(pred_df.iloc[i]):
            fp=-len(gt_df.iloc[i])+len(pred_df.iloc[i])
        for j in range(min(len(gt_df.iloc[i]),len(pred_df.iloc[i]))):
            if(gt_df.iloc[i][j]==pred_df.iloc[i][j]):
                tp=tp+1
            else:
                fp=fp+1
                fn=fn+1
        precision=tp/(tp+fp)
        recall=tp/(tp+fn)
        if(precision==0 and recall==0):
                f1_list.append(0)
        else:
            f1_score=2*precision*recall/(precision+recall)
            f1_list.append(f1_score)
        fp=0
        fn=0
        tp=0
    return f1_list
            
        


def evaluation(gt_df, pred_df):
    eval_df = pd.merge(gt_df, pred_df, how='outer', on=['ImageId', 'CategoryId'], suffixes=['_gt', '_pred'])

    #IoU for True Positive
    idx_ = eval_df['EncodedPixels_gt'].notnull() & eval_df['EncodedPixels_pred'].notnull()
    check=[]
    print(len(check))

    IoU = eval_df[idx_].apply(calc_IoU_threshold, axis=1)
#     print(eval_df[idx_]['AttributesIds_pred'])
   
    f1_list=calc_F1_threshold(eval_df[idx_]['AttributesIds_gt'],eval_df[idx_]['AttributesIds_pred'])
    
    
    frame = { 'IOU': IoU, 'F1': f1_list } 
  
    result = pd.DataFrame(frame) 

#     print(result)
    
    
#     print(IoU)

    # False Positive
    fp = (eval_df['EncodedPixels_gt'].isnull() & eval_df['EncodedPixels_pred'].notnull()).sum()

#     # False Negative
#     fn = (eval_df['EncodedPixels_gt'].notnull() & eval_df['EncodedPixels_pred'].isnull()).sum()
    threshold_IoU = [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
    f1_th=[0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
    scores = []
    for th in threshold_IoU:
        for f1 in f1_th:
            tp1 = result['IOU']>th 
            tp2=result['F1']>f1
            tp=tp1&tp2
            tp=tp.sum()
        # True Positive
            fp1 = result['IOU']<=th 
            fp2=result['F1']<=f1
            fp_=fp1&fp2
            maybe_fp=fp_.sum()
            
        # False Positive (not Ground Truth) + False Positive (under IoU threshold)
            fp_IoU = fp + maybe_fp

        # Calculate evaluation score
            score = tp / (tp + fp_IoU)
            scores.append(score)
            print(f"Threshold: {th},F1: {f1}, Precision: {score}, TP: {tp}, FP: {fp_IoU}")

    mean_score = sum(scores) / len(scores)
    print(f"Mean precision score: {mean_score}")

**Using prediction of original EncodedPixels (IoU == 1)**  
This takes a bit time...

In [None]:
evaluation(gt_df, pred_df_pseudo)

0


In [None]:
f1_list

Precision (TP, FP, FN) is the same at each threshold because the prediction masks match the ground truth ones perfectly.  
It is found that when true positive is 2046 in 5000 ground truth data of mask and classID, we maybe get the precision score of ~0.2566.

**Using prediction of modified EncodedPixels (IoU != 1)**  
We calculate the evaluation score with modified prediction for reality simulation

In [None]:
evaluation(gt_df, pred_df_pseudo)

We get ~0.0775 score.  
Larger IoU threshold reduces the number of TP and increases the number of FP. After 0.8 IoU threshold, TP is 0. This means no masks hit the ground truth masks.

## Try it on your training!!
Make pred_df which has the rows that are the mask of **ONE** classID like above `pred_df`.   
`evaluation(gt_df, your_pred_df)`

This implementaion takes a bit long time because the code is not optimized.   
Welcome any comments and contributions!!  
Thanks.