# RANZCR CLiP - Catheter and Line Position Challenge - Exploratory Data Analysis

Quick Exploratory Data Analysis for [RANZCR CLiP - Catheter and Line Position Challenge](https://www.kaggle.com/c/ranzcr-clip-catheter-line-classification) challenge    

In this competition, you’ll detect the presence and position of catheters and lines on chest x-rays. Use machine learning to train and test your model on 40,000 images to categorize a tube that is poorly placed.

![](https://storage.googleapis.com/kaggle-competitions/kaggle/23870/logos/header.png?t=2020-12-01-04-28-05)

<a id="top"></a>

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list" style='color:white; background:#6E848D; border:0' role="tab" aria-controls="home"><center>Quick Navigation</center></h3>

* [Overview](#1)
    
* [Annotations](#2)
    
* [ETT - Abnormal](#4)
* [ETT - Borderline](#5)
* [ETT - Normal](#6)
* [NGT - Abnormal](#7)
* [NGT - Borderline](#8)
* [NGT - Incompletely Imaged](#9)
* [NGT - Normal](#10)
* [CVC - Abnormal](#11)
* [CVC - Borderline](#12)
* [CVC - Normal](#13)
* [Swan Ganz Catheter Present](#14)
    
    
* [Submission](#100)

<a id="1"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>Overview<center><h2>

In [None]:
import os
import ast
import random

import numpy as np
import pandas as pd
import cv2
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
BASE_DIR = "../input/ranzcr-clip-catheter-line-classification/"
os.listdir(BASE_DIR)

In [None]:
df_train = pd.read_csv(os.path.join(BASE_DIR, "train.csv"), index_col=0)
df_train

In [None]:
df_train.iloc[:, :-1].sum()

In [None]:
def visualize_batch(image_ids):
    plt.figure(figsize=(16, 12))
    
    for ind, image_id in enumerate(image_ids):
        plt.subplot(3, 4, ind + 1)
        image = cv2.imread(os.path.join(BASE_DIR, "train", f"{image_id}.jpg"))
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        plt.imshow(image)
        plt.axis("off")
    
    plt.show()

In [None]:
def print_statistics(df, col):
    print("Distribution:")
    print(df[col].value_counts())
    print()
    print(f"Percent of 1: {df[col].mean():.5f}")

<a id="2"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>Annotations<center><h2>

**train_annotations.csv** these are segmentation annotations for training samples that have them. They are included solely as additional information for competitors.

In [None]:
df_annot = pd.read_csv(os.path.join(BASE_DIR, "train_annotations.csv"))
df_annot.head()

In [None]:
def plot_image_with_annotations(row_ind):
    row = df_annot.iloc[row_ind]
    image_path = os.path.join(BASE_DIR, "train", row["StudyInstanceUID"] + ".jpg")
    label = row["label"]
    data = np.array(ast.literal_eval(row["data"]))
    
    plt.figure(figsize=(10, 5))
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    plt.subplot(1, 2, 1)
    plt.imshow(image)
    plt.subplot(1, 2, 2)
    plt.imshow(image)
    plt.scatter(data[:, 0], data[:, 1])
    
    plt.suptitle(label, fontsize=15)

In [None]:
plot_image_with_annotations(8)

In [None]:
for i in range(5):
    plot_image_with_annotations(random.randint(0, 15000))

<a id="4"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>ETT - Abnormal<center><h2>

endotracheal tube placement abnormal

In [None]:
col_name = "ETT - Abnormal"
print_statistics(df_train, col_name)
tmp_df = df_train[df_train[col_name] == 1]
visualize_batch(random.sample(tmp_df.index.tolist(), 12))

<a id="5"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>ETT - Borderline<center><h2>

endotracheal tube placement borderline abnormal

In [None]:
col_name = "ETT - Borderline"
print_statistics(df_train, col_name)
tmp_df = df_train[df_train[col_name] == 1]
visualize_batch(random.sample(tmp_df.index.tolist(), 12))

<a id="6"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>ETT - Normal<center><h2>

endotracheal tube placement normal

In [None]:
col_name = "ETT - Normal"
print_statistics(df_train, col_name)
tmp_df = df_train[df_train[col_name] == 1]
visualize_batch(random.sample(tmp_df.index.tolist(), 12))

<a id="7"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>NGT - Abnormal<center><h2>

nasogastric tube placement abnormal

In [None]:
col_name = "NGT - Abnormal"
print_statistics(df_train, col_name)
tmp_df = df_train[df_train[col_name] == 1]
visualize_batch(random.sample(tmp_df.index.tolist(), 12))

<a id="8"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>NGT - Borderline<center><h2>

nasogastric tube placement borderline abnormal

In [None]:
col_name = "NGT - Borderline"
print_statistics(df_train, col_name)
tmp_df = df_train[df_train[col_name] == 1]
visualize_batch(random.sample(tmp_df.index.tolist(), 12))

<a id="9"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>NGT - Incompletely Imaged<center><h2>

nasogastric tube placement inconclusive due to imaging

In [None]:
col_name = "NGT - Incompletely Imaged"
print_statistics(df_train, col_name)
tmp_df = df_train[df_train[col_name] == 1]
visualize_batch(random.sample(tmp_df.index.tolist(), 12))

<a id="10"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>NGT - Normal<center><h2>

nasogastric tube placement borderline normal

In [None]:
col_name = "NGT - Normal"
print_statistics(df_train, col_name)
tmp_df = df_train[df_train[col_name] == 1]
visualize_batch(random.sample(tmp_df.index.tolist(), 12))

<a id="11"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>CVC - Abnormal<center><h2>

central venous catheter placement abnormal

In [None]:
col_name = "CVC - Abnormal"
print_statistics(df_train, col_name)
tmp_df = df_train[df_train[col_name] == 1]
visualize_batch(random.sample(tmp_df.index.tolist(), 12))

<a id="12"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>CVC - Borderline<center><h2>

central venous catheter placement borderline abnormal

In [None]:
col_name = "CVC - Borderline"
print_statistics(df_train, col_name)
tmp_df = df_train[df_train[col_name] == 1]
visualize_batch(random.sample(tmp_df.index.tolist(), 12))

<a id="13"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>CVC - Normal<center><h2>

central venous catheter placement normal

In [None]:
col_name = "CVC - Normal"
print_statistics(df_train, col_name)
tmp_df = df_train[df_train[col_name] == 1]
visualize_batch(random.sample(tmp_df.index.tolist(), 12))

<a id="14"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>Swan Ganz Catheter Present<center><h2>

In [None]:
col_name = "Swan Ganz Catheter Present"
print_statistics(df_train, col_name)
tmp_df = df_train[df_train[col_name] == 1]
visualize_batch(random.sample(tmp_df.index.tolist(), 12))

<a id="100"></a>
<h2 style='background:#6E848D; border:0; color:white'><center>Submission<center><h2>

In [None]:
df_submission = pd.read_csv(os.path.join(BASE_DIR, "sample_submission.csv"), index_col=0)
df_submission

In [None]:
df_submission.to_csv("submission.csv")