This kernel can be used to visualize the bounding boxes interactively using W&B and will walk you through the process of logging bounding boxes. 

This kernel can be considered a simple EDA kernel with the focus on deriving insights by interactively playing with the bounding boxes for each images.

![img](https://i.imgur.com/3vk2h3j.gif)

### Credits

* I am using [xhlulu's](https://www.kaggle.com/xhlulu) resized dataset. Check out his amazing [kernel](https://www.kaggle.com/xhlulu/siim-covid-19-convert-to-jpg-256px) that can be used to generate resized images with different resolution. The uploaded 256x256 Kaggle dataset is [here](https://www.kaggle.com/xhlulu/siim-covid19-resized-to-256px-jpg).

# Imports and Setups

In [None]:
!pip install -q --upgrade wandb

import wandb
wandb.login()

In [None]:
import os
import gc
import cv2
import ast
import numpy as np
import pandas as pd
from tqdm import tqdm
import matplotlib.pyplot as plt
%matplotlib inline

# Hyperparams

In [None]:
TRAIN_PATH = '../input/siim-covid19-resized-to-256px-jpg/train/'
IMG_SIZE = 256
NUM_SAMPLES_TO_VIZ = 32

# Load Dataset

This competition is unique because of how the data is presented and thus the problem statement. The DICOM formatted radiograms of chest scans are available in this directory structure `study/series/images`. What's `study` and `image`?
 
There are 6334 unique chest scans or `images` while 6054 unique `study` directories. This means that in some study directory there are more than one images. 

In this competition, the task is to provide image level predictions - `none` vs `opacity` as well as study level predictions - `negative`, `typical`, `atypical`, `indeterminate`. Thus we will build two separate classifiers - one for study level prediction and another for image level prediction. Then when bounding boxes or localization?

For images with `opacity` label i.e, image-level label, we need to train an object detector for localization. 

* `train_study_level.csv`: Study level labels.

* `train_image_level.csv`: Image level labels.

In this kernel we will try to understand the relationship better.

In [None]:
# Load image level csv file
df = pd.read_csv('../input/siim-covid19-detection/train_image_level.csv')
# Load study level csv file
label_df = pd.read_csv('../input/siim-covid19-detection/train_study_level.csv')

# Modify values in the id column
df['id'] = df.apply(lambda row: row.id.split('_')[0], axis=1)
# Add absolute path
df['path'] = df.apply(lambda row: TRAIN_PATH+row.id+'.jpg', axis=1)
# Get image level labels
df['image_level'] = df.apply(lambda row: row.label.split(' ')[0], axis=1)

# Modify values in the id column
label_df['id'] = label_df.apply(lambda row: row.id.split('_')[0], axis=1)
# Rename the column id with StudyInstanceUID
label_df.columns = ['StudyInstanceUID', 'Negative for Pneumonia', 'Typical Appearance', 'Indeterminate Appearance', 'Atypical Appearance']

# Merge both dataframes
df = df.merge(label_df, on='StudyInstanceUID',how="left")
df.head(2)

In [None]:
print(f'Number of unique image in training dataset: {len(df)}')

bbox_nan_num = df['boxes'].isna().sum()
print(f'Number of images without any bbox annotation: {bbox_nan_num}')

In [None]:
# Label encode study-level labels
labels = df[['Negative for Pneumonia','Typical Appearance','Indeterminate Appearance','Atypical Appearance']].values
labels = np.argmax(labels, axis=1)

df['study_level'] = labels
df.head(2)

In [None]:
class_label_to_id = {
    'Negative for Pneumonia': 0,
    'Typical Appearance': 1,
    'Indeterminate Appearance': 2,
    'Atypical Appearance': 3
}

class_id_to_label = {val: key for key, val in class_label_to_id.items()}

In [None]:
# Load meta.csv file
meta_df = pd.read_csv('../input/siim-covid19-resized-to-256px-jpg/meta.csv')
train_meta_df = meta_df.loc[meta_df.split == 'train']
train_meta_df.columns = ['id', 'dim0', 'dim1', 'split']
train_meta_df.head(2)

In [None]:
df = df.merge(train_meta_df, on='id',how="left")
df.head(5)

## Visualize Bounding Boxes

This section will walk you through the steps required to use W&B to log bounding boxes. 

Notes: 
* Since some of the images don't have bounding box coordinates we will drop those rows. 
* The steps below can be used as it is with a training pipeline with littleeee modification. 

Note: Even though the `true_label` is `opacity` or `none` but I have logged the study-level labels for more insight. Every image with bounding box coordinates belong to `opacity` label.

In [None]:
# Since there are over 2000 rows without bounding box coordinates.
opacity_df = df.dropna(subset = ["boxes"], inplace=False)
opacity_df = opacity_df.reset_index(drop=True)

In [None]:
# Get the raw bounding box 
# Ref: https://www.kaggle.com/yujiariyasu/plot-3positive-classes
def get_bbox(row):
    bboxes = []
    bbox = []
    for i, l in enumerate(row.label.split(' ')):
        if (i % 6 == 0) | (i % 6 == 1):
            continue
        bbox.append(float(l))
        if i % 6 == 5:
            bboxes.append(bbox)
            bbox = []  
            
    return bboxes

# Scale the bounding boxes.
def scale_bbox(row, bboxes):
    # Get scaling factor
    scale_x = IMG_SIZE/row.dim1
    scale_y = IMG_SIZE/row.dim0
    
    scaled_bboxes = []
    for bbox in bboxes:
        x = int(np.round(bbox[0]*scale_x, 4))
        y = int(np.round(bbox[1]*scale_y, 4))
        x1 = int(np.round(bbox[2]*(scale_x), 4))
        y1= int(np.round(bbox[3]*scale_y, 4))

        scaled_bboxes.append([x, y, x1, y1]) # xmin, ymin, xmax, ymax
        
    return scaled_bboxes

# To log a bounding box, you'll need to provide a dictionary with 
# the following keys and values to the boxes keyword argument of wandb.Image.
def wandb_bbox(image, bboxes, true_label, class_id_to_label):
    all_boxes = []
    for bbox in bboxes:
        box_data = {"position": {
                        "minX": bbox[0],
                        "minY": bbox[1],
                        "maxX": bbox[2],
                        "maxY": bbox[3]
                    },
                     "class_id" : int(true_label),
                     "box_caption": class_id_to_label[true_label],
                     "domain" : "pixel"}
        all_boxes.append(box_data)
    

    return wandb.Image(image, boxes={
        "ground_truth": {
            "box_data": all_boxes,
          "class_labels": class_id_to_label
        }
    })

In [None]:
sampled_df = opacity_df.sample(NUM_SAMPLES_TO_VIZ).reset_index(drop=True)

run = wandb.init(project='kaggle-covid', 
                 config={'competition': 'siim-fisabio-rsna', '_wandb_kernel': 'ayut'},
                 job_type='visualize_sample_bbox')

wandb_bbox_list = []
for i in tqdm(range(NUM_SAMPLES_TO_VIZ)):
    row = sampled_df.loc[i]
    # Load image
    image = cv2.imread(row.path)
    # Get bboxes
    bboxes = get_bbox(row)
    # Scale bounding boxes
    scale_bboxes = scale_bbox(row, bboxes)
    # Get ground truth label
    true_label = row.study_level
    
    wandb_bbox_list.append(wandb_bbox(image, 
                                      scale_bboxes, 
                                      true_label, 
                                      class_id_to_label))
    
wandb.log({"radiograph": wandb_bbox_list})

run.finish()

run

> 📌 Click on the [run page](https://wandb.ai/ayush-thakur/kaggle-covid/runs/3vxougc9?workspace=user-ayush-thakur) to interactively play with the bounding box coordinates. 

> 📌 Click on the ⚙️ icon to interact with the UI.

# Better Data Understanding using W&B Tables

In [None]:
# W&B image
def wandb_bbox(image, bboxes, true_label, class_id_to_label, class_set):
    all_boxes = []
    for bbox in bboxes:
        box_data = {"position": {
                        "minX": bbox[0],
                        "minY": bbox[1],
                        "maxX": bbox[2],
                        "maxY": bbox[3]
                    },
                     "class_id" : int(true_label),
                     "box_caption": class_id_to_label[true_label],
                     "domain" : "pixel"}
        all_boxes.append(box_data)
    

    return wandb.Image(image, boxes={
        "ground_truth": {
            "box_data": all_boxes,
          "class_labels": class_id_to_label
        }
    }, classes=class_set)

In [None]:
run = wandb.init(project='kaggle-covid', 
                 config={'competition': 'siim-fisabio-rsna', '_wandb_kernel': 'ayut'},
                 job_type='visualize-everything')

class_set = wandb.Classes([{'id': id, 'name': name} for id, name in class_id_to_label.items()])


table = wandb.Table(columns=['ImageID', 'StudyInstanceUID', 'Radiogram', 'image-label', 'study-label',
                             'Negative', 'Typical', 'Indeterminate', 'Atypical',
                             'ori_dim0', 'ori_dim1', 'split'])

# create an artifact for all the raw data
viz_at = wandb.Artifact('eda', type="basic-eda")

for i in tqdm(range(len(df))):
    row = df.loc[i]
    # Load image
    image = cv2.imread(row.path)
    # Get bboxes
    bboxes = get_bbox(row)
    # Scale bounding boxes
    scale_bboxes = scale_bbox(row, bboxes)
    # Get ground truth label
    true_label = row.study_level
    # Get image with bounding boxes
    wandb_img = wandb_bbox(image, 
                           scale_bboxes, 
                           true_label, 
                           class_id_to_label,
                           class_set)
    
    # Add info in the table as new row
    table.add_data(row.id, row.StudyInstanceUID, wandb_img, row.image_level, row.study_level,
                   row['Negative for Pneumonia'], row['Typical Appearance'], row['Indeterminate Appearance'], row['Atypical Appearance'],
                   row.dim0, row.dim1, row.split)
    
    del row, wandb_img
    _ = gc.collect()
    
# wandb.log({'radiogram_eda': table})
viz_at.add(table, "Radiogram EDA")
run.log_artifact(viz_at)
run.finish()



## [Check out the Table $\rightarrow$](https://wandb.ai/ayush-thakur/kaggle-covid/artifacts/basic-eda/eda/c17893f765f142a4acbf/files/Radiogram%20EDA.table.json)

![img](https://i.imgur.com/zdRSgtn.gif)

# Work In Progress

Upcoming:
* ~Visualize the entire training dataset using W&B Tables.~
* ~Visualize Metadata~.
* Get insight from the tables.