# OVERVIEW

One of the unique rules of this competition is a special requirement regarding the label hierarchy consistency. We predict nine exam-level and one image-level label, where some of the labels are conflicting and must adhere to a specific hirearchy displayed on the image below.

![hierarchy](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F115173%2Fa2a5ee66b5799274141dd547cc3ea466%2FPE%20figure.jpg?generation=1599575183749576&alt=media)

According to the data description page: 

> Winning submissions will be inspected to ensure label predictions adhere to the expected label hierarchy defined by the diagram on the Data page. The metric intends to heavily penalize submissions which mis-predict in this manner, however due to the complexity of predictions at both image and study levels and as an extra precaution, the host will verify that prospective winners have not made conflicting label predictions. The requirements which submissions will be held to are [specified by the host in this post](https://www.kaggle.com/c/rsna-str-pulmonary-embolism-detection/discussion/183473).

The goal of this notebook is to develop a code that checks for label consistency in a submission file to make sure it adheres to the competition rules. Please note that the code has not been fully verified with the organizers yet and can be interpreting some of the rules incorrectly. I will be happy to correct any inconsistencies you might find if you point me to them in the comments section :)

# VERSION HISTORY

- v1: first version of the notebook
- v2: corrected handling of `prediction == 0.5` for some labels as noted by [@anthracene](https://www.kaggle.com/anthracene) in the comments
- v3: wrapped consistency checks into `check_consitency()` function
- v4: text adjustments (no code changes)
- v5: added rule 1d (see the discussion)

# CHECKING CONSISTENCY

## PREPARATION

In [1]:
# LIBRARIES
import pandas as pd
import numpy as np

In [2]:
# IMPORT DATA
train = pd.read_csv('../input/rsna-str-pulmonary-embolism-detection/train.csv')
test  = pd.read_csv('../input/rsna-str-pulmonary-embolism-detection/test.csv')

Let's import one of the public kernel submission files to check it for the label consistency requirements. As an example, I am using the `submission.csv` file produced by [@seraphwedd18](https://www.kaggle.com/seraphwedd18) in [this kernel](https://www.kaggle.com/seraphwedd18/pe-detection-with-keras-model-creation/output?scriptVersionId=42782514&select=submission.csv). 

In [3]:

# IMPORT EXAMPLE SUBMISSION
sub = pd.read_csv('../input/pe-detection-with-keras-model-creation/submission.csv')
sub.shape


(152703, 2)

## FORMALIZING CONSISTENCY RULES

The rules below represent my understanding of the label consistency requirements outlined by [@anthracene](https://www.kaggle.com/anthracene) in [this discussion topic](https://www.kaggle.com/c/rsna-str-pulmonary-embolism-detection/discussion/183473). I encourage you to read the topic before inspecting the rules below.

We implement the following two groups of rules on the exam level. The first group specifies conflicting characteristics of PE if it is detected on any of the images in the exam. The second group makes sure that if there are no images with detected PE in the exam, we do not predict any of the PE charactersitcs to be present.

1. If there is at least one image per `StudyInstanceUID` with `pe_present_on_image` > 0.5, then:
    - either `rv_lv_ratio_lt_1` or `rv_lv_ratio_gte_1` must have p > 0.5; both cannot have p > 0.5.
    - at least one of `central_pe`, `rightsided_pe` and `leftsided_pe` must have p > 0.5; multiple having p > 0.5 is allowed.
    - `acute_and_chronic_pe` and `chronic_pe`: only one of them can have p > 0.5; neither having p > 0.5 is allowed.
2. If there are no images per `StudyInstanceUID` with `pe_present_on_image` > 0.5, then:
    - either `indeterminate` or `negative_exam_for_pe` must have p > 0.5; both cannot have p > 0.5.
    - all positive-related labels: `rv_lv_ratio_lt_1`, `rv_lv_ratio_gte_1`, `central_pe`, `rightsided_pe`, `leftsided_pe`, `acute_and_chronic_pe` and `chronic_pe` must have p < 0.5.

## CHECKING CONSISTENCY RULES

Let's start by checking if there is at least one image predicted as positive (`pe_present_on_image > 0.5`) in an exam and splitting our submission data into positive and negative exams.

Now we can check label consistency rules separately for positive and negative exams. We will identify rows that do not satisfy any of the requirements and merge them into a data frame representing the inconsistent predictions.

As you can see, the submission file has 26081 rows that do not satisfy the rule labeled as 2b. This means that, although the exam does not have any images with predicted PE (i.e., p(`pe_present_on_image`) > 0.5), some of the positive-related labels describing the characteristics of PE have p > 0.5. In my understanding, these inconsistencies should be fixed in order not to be disqualified from the leaderboard.

# WRAPPING IN A FUNCTION

The function below wraps the previous code blocks into a function that can be applied to a submission file.

In [4]:
def check_consistency(sub, test):
    
    '''
    Checks label consistency and returns the errors
    
    Args:
    sub   = submission dataframe (pandas)
    test  = test.csv dataframe (pandas)
    '''
    
    # EXAM LEVEL
    for i in test['StudyInstanceUID'].unique():
        df_tmp = sub.loc[sub.id.str.contains(i, regex = False)].reset_index(drop = True)
        df_tmp['StudyInstanceUID'] = df_tmp['id'].str.split('_').str[0]
        df_tmp['label_type']       = df_tmp['id'].str.split('_').str[1:].apply(lambda x: '_'.join(x))
        del df_tmp['id']
        if i == test['StudyInstanceUID'].unique()[0]:
            df = df_tmp.copy()
        else:
            df = pd.concat([df, df_tmp], axis = 0)
    df_exam = df.pivot(index = 'StudyInstanceUID', columns = 'label_type', values = 'label')
    
    # IMAGE LEVEL
    df_image = sub.loc[sub.id.isin(test.SOPInstanceUID)].reset_index(drop = True)
    df_image = df_image.merge(test, how = 'left', left_on = 'id', right_on = 'SOPInstanceUID')
    df_image.rename(columns = {"label": "pe_present_on_image"}, inplace = True)
    del df_image['id']
    
    # MERGER
    df = df_exam.merge(df_image, how = 'left', on = 'StudyInstanceUID')
    ids    = ['StudyInstanceUID', 'SeriesInstanceUID', 'SOPInstanceUID']
    labels = [c for c in df.columns if c not in ids]
    df = df[ids + labels]
    
    # SPLIT NEGATIVE AND POSITIVE EXAMS
    df['positive_images_in_exam'] = df['StudyInstanceUID'].map(df.groupby(['StudyInstanceUID']).pe_present_on_image.max())
    df_pos = df.loc[df.positive_images_in_exam >  0.5]
    df_neg = df.loc[df.positive_images_in_exam <= 0.5]
    
    # CHECKING CONSISTENCY OF POSITIVE EXAM LABELS
    rule1a = df_pos.loc[((df_pos.rv_lv_ratio_lt_1  >  0.5)  & 
                         (df_pos.rv_lv_ratio_gte_1 >  0.5)) | 
                        ((df_pos.rv_lv_ratio_lt_1  <= 0.5)  & 
                         (df_pos.rv_lv_ratio_gte_1 <= 0.5))].reset_index(drop = True)
    rule1a['broken_rule'] = '1a'
    rule1b = df_pos.loc[(df_pos.central_pe    <= 0.5) & 
                        (df_pos.rightsided_pe <= 0.5) & 
                        (df_pos.leftsided_pe  <= 0.5)].reset_index(drop = True)
    rule1b['broken_rule'] = '1b'
    rule1c = df_pos.loc[(df_pos.acute_and_chronic_pe > 0.5) & 
                        (df_pos.chronic_pe           > 0.5)].reset_index(drop = True)
    rule1c['broken_rule'] = '1c'
    rule1d = df_pos.loc[(df_pos.indeterminate        > 0.5) | 
                        (df_pos.negative_exam_for_pe > 0.5)].reset_index(drop = True)
    rule1d['broken_rule'] = '1d'

    
    # CHECKING CONSISTENCY OF NEGATIVE EXAM LABELS
    rule2a = df_neg.loc[((df_neg.indeterminate        >  0.5)  & 
                         (df_neg.negative_exam_for_pe >  0.5)) | 
                        ((df_neg.indeterminate        <= 0.5)  & 
                         (df_neg.negative_exam_for_pe <= 0.5))].reset_index(drop = True)
    rule2a['broken_rule'] = '2a'
    rule2b = df_neg.loc[(df_neg.rv_lv_ratio_lt_1     > 0.5) | 
                        (df_neg.rv_lv_ratio_gte_1    > 0.5) |
                        (df_neg.central_pe           > 0.5) | 
                        (df_neg.rightsided_pe        > 0.5) | 
                        (df_neg.leftsided_pe         > 0.5) |
                        (df_neg.acute_and_chronic_pe > 0.5) | 
                        (df_neg.chronic_pe           > 0.5)].reset_index(drop = True)
    rule2b['broken_rule'] = '2b'
    
    # MERGING INCONSISTENT PREDICTIONS
    errors = pd.concat([rule1a, rule1b, rule1c, rule1d, rule2a, rule2b], axis = 0)
    
    # OUTPUT
    print('Found', len(errors), 'inconsistent predictions')
    return errors


In [5]:
# CHECK
errors = check_consistency(sub, test)
errors.head()

Found 0 inconsistent predictions


Unnamed: 0,StudyInstanceUID,SeriesInstanceUID,SOPInstanceUID,acute_and_chronic_pe,central_pe,chronic_pe,indeterminate,leftsided_pe,negative_exam_for_pe,rightsided_pe,rv_lv_ratio_gte_1,rv_lv_ratio_lt_1,pe_present_on_image,positive_images_in_exam,broken_rule
