In this kernel, we will use Weights and Biases's Semantic Segmentation logger to interactively visualize the dataset. We will also visualize the ENTIRE dataset easily using W&B Tables.

If you like the work, consider upvoting. :D

## Imports and Setup

In [None]:
import os
import numpy as np
import pandas as pd
from tqdm import tqdm

import wandb
wandb.login()

## Load Dataset

In [None]:
df = pd.read_csv('../input/sartorius-cell-instance-segmentation/train.csv')
df.head()

## Utilities

In [None]:
# ref: https://www.kaggle.com/inversion/run-length-decoding-quick-start
def rle_decode(mask_rle, shape, color=1):
    '''
    mask_rle: run-length as string formated (start length)
    shape: (height, width, channels) of array to return 
    color: color for the mask
    Returns numpy array (mask)

    '''
    s = mask_rle.split()
    
    starts = list(map(lambda x: int(x) - 1, s[0::2]))
    lengths = list(map(int, s[1::2]))
    ends = [x + y for x, y in zip(starts, lengths)]
    
    img = np.zeros((shape[0] * shape[1], shape[2]), dtype=np.float32)
            
    for start, end in zip(starts, ends):
        img[start : end] = color
    
    return img.reshape(shape)

In [None]:
class_id2label = {
    1: 'shsy5y',
    2: 'cort', 
    3: 'astro'
}

class_label2id = {v:k for k, v in class_id2label.items()}

# Note the use of wandb.Image
def wandb_mask(bg_img, gt_mask):
  return wandb.Image(bg_img, masks={
      "ground_truth" : {
          "mask_data" : gt_mask,
          "class_labels": class_id2label
      }
    }
  )

# Log the Masks

In [None]:
VISUALIZE_SAMPLES = 10

ids = df.id.unique()
sample_idx = np.random.choice(len(ids), VISUALIZE_SAMPLES)
sample_ids = ids[sample_idx]

# Initialize W&B
run = wandb.init(project='sartorius-viz', 
                 config={'competition': 'sartorius', '_wandb_kernel':'ayut'}) # The config variable is to show that you can pass in any dict (hyperparameters)

for i in range(VISUALIZE_SAMPLES):
    image_id = sample_ids[i]
    sample_df = df[df["id"] == image_id].reset_index(drop=True)
    # Empty mask
    mask = np.zeros((520, 704, 1))
    # Fill mask
    for j in range(len(sample_df)):
        row = sample_df.loc[j]
        mask += rle_decode(row.annotation, 
                           shape=(520, 704, 1))
        
    mask[np.where(mask>0)] = class_label2id[row.cell_type]
    mask = np.squeeze(mask, axis=-1)
            
    # Log to W&B
    image_path = f"../input/sartorius-cell-instance-segmentation/train/{image_id}.png"
    wandb.log({f"Segmentation Viz" : [wandb_mask(image_path, mask)]})
    
# Close W&B run
wandb.finish()

## [Check out the run page here $\rightarrow$](https://wandb.ai/ayut/sartorius-viz/runs/30jbuljv?workspace=user-ayut)

![img](https://i.imgur.com/5FDCLQK.gif)

## 🎉🎉 Visualizing dataset interactively with W&B Tables 🎆🎆

W&B Tables let you to log, query, and analyze data interactively. This can help you understand your dataset, visualize model predictions, and share insights in a central dashboard.

The code cell below logs the entire dataset of this competition along with the metadata from `train.csv` file. You can use this table to get useful insights. 

### Why should you use W&B Tables?

* It is suited for quick EDA.
* It helps understand the data better with few lines of code. Here's a [quick colab notebook](http://wandb.me/tables-quickstart).
* It lets you see the "actual" data in it's entirety. With matplotlib based visualization you will have to plot everything in batches and it not very scalable.
* You can filter, sort and group data which can help answer some fundamental questions.
* It is well suited to visualize model predictions and compare models on example level. You can check out [this Kaggle kernel](https://www.kaggle.com/ayuraj/better-data-understanding-with-w-b-tables) to learn more about model prediction visualization.

Read more about Tables [here](https://wandb.ai/wandb/posts/reports/Announcing-W-B-Tables-Iterate-on-Your-Data--Vmlldzo4NTMxNDU).

### What these metadata columns are?

![img](https://i.imgur.com/UZtbwox.png) <br>
([Source](https://www.microscopyu.com/techniques/phase-contrast/introduction-to-phase-contrast-microscopy))

Check out the source link, it's a great introduction to phase contrast microscopy. 

* `id` - unique identifier for object
* `cell_type` - the cell line 
* `plate_time` - time plate was created (The plate as shown in the figure is where the specimen is kept.)
* `sample_date` - date sample was created  
* `sample_id` - sample identifier
* `elapsed_timedelta` - time since first image taken of sample

❗ Note: I have excluded `width` and `height` since they are same for every sample. 

In [None]:
# Initialize a W&B run to log images
run = wandb.init(project='sartorius-viz', 
                 config={'competition': 'sartorius', '_wandb_kernel':'ayut'}) # W&B Code 1

# Inialize an empty W&B tables
data_at = wandb.Table(columns=['id', 'image', 'cell_type', 
                               'plate_time', 'sample_date', 
                               'sample_id', 'elapsed_timedelta']) # W&B Code 2

# Setup a WandB Classes object. This will give additional metadata for visuals
# Note that we need to pass class_set to wandb.Image. In future, we might not to do this extra step. 
class_set = wandb.Classes([{'name': name, 'id': id} 
                           for name, id in zip(class_label2id.keys(), class_label2id.values())]) # W&B Code 3

for image_id, tmp_df in tqdm(df.groupby('id')):
    tmp_df = tmp_df.reset_index(drop=True)
    image_path = f"../input/sartorius-cell-instance-segmentation/train/{image_id}.png"
    
    # Create mask
    mask = np.zeros((520, 704, 1))
    for j in range(len(tmp_df)):
        row = tmp_df.loc[j]
        mask += rle_decode(row.annotation,
                           shape=(520, 704, 1))
        
    mask[np.where(mask>0)] = class_label2id[row.cell_type]
    mask = np.squeeze(mask, axis=-1)
    
    # Get W&B image
    wandb_mask = wandb.Image(image_path, classes=class_set, masks={
                      "ground_truth" : {
                          "mask_data" : mask
                      }})

    # Append data 
    data_at.add_data(image_id,                                            
                     wandb_mask,
                     *tuple(row)[4:]) # W&B Code 4
    
wandb.log({'Sartorius Dataset': data_at}) # W&B Code 5
wandb.finish() # W&B Code 6

## [Check out the run page here $\rightarrow$](https://wandb.ai/ayut/sartorius-viz/runs/289xy46z?workspace=user-ayut)

![img](https://i.imgur.com/LQDRI25.gif)