## What are we predicting?

In this competition, you’ll detect the presence and position of catheters and lines on chest x-rays. Use machine learning to train and test your model on 40,000 images to categorize a tube that is poorly placed.

## Evaluation criteria?

Submissions are evaluated on area under the ROC curve between the predicted probability and the observed target.
To calculate the final score, AUC is calculated for each of the 11 labels, then averaged. The score is then the average of the individual AUCs of each predicted column.

## Train vs Test?

A code-only competition so there is a hidden test set (approximately 4x larger, with ~14k images) as well.

train.csv contains image IDs, binary labels, and patient IDs.

TFRecords are available for both train and test. (They are also available for the hidden test set.)

train_annotations.csv includes segmentation annotations for training samples that have them as solely additional information.

## Similar Dataset & Competitions?

[RSNA Pneumonia Detection Challenge]("https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/notebooks")

[SIIM-ACR Pneumothorax Segmentation]("https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation")

[XRay Lung Segmentation]("https://www.kaggle.com/c/xray-lung-segmentation/data")

[RSNA Pneumonia Detection Challenge]("https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/notebooks")

Have more?

## Read

In [None]:
import pandas as pd
train = pd.read_csv("../input/ranzcr-clip-catheter-line-classification/train.csv")

In [None]:
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import cv2

In [None]:
cols = [
    'ETT - Abnormal', 'ETT - Borderline', 'ETT - Normal', 
    'NGT - Abnormal', 'NGT - Borderline', 'NGT - Incompletely Imaged', 
    'NGT - Normal', 'CVC - Abnormal', 'CVC - Borderline', 
    'CVC - Normal', 'Swan Ganz Catheter Present'
]

fig = make_subplots(rows=4, cols=3)

traces = [
    go.Bar(
        x=[0, 1], 
        y=[
            len(train[train[col]==0]),
            len(train[train[col]==1])
        ], 
        name=col,
        text = [
            str(round(100 * len(train[train[col]==0]) / len(train), 2)) + '%',
            str(round(100 * len(train[train[col]==1]) / len(train), 2)) + '%'
        ],
        textposition='auto'
    ) for col in cols
]

for i in range(len(traces)):
    fig.append_trace(traces[i], (i // 3) + 1, (i % 3)  +1)

fig.update_layout(
    title_text='Train columns',
    height=1200,
    width=1000
)

fig.show()

[Creds]("https://www.kaggle.com/isaienkov/ranzcr-clip-data-understanding")

In [None]:
import os
f, plots = plt.subplots(1, 5, sharex='col', sharey='row', figsize=(17, 17))
samples = train.sample(n=5, random_state=666)['StudyInstanceUID'].values

for i in range(5):
    image = cv2.imread(os.path.join("/kaggle/input/ranzcr-clip-catheter-line-classification/train/", f"{samples[i]}.jpg"))
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    plots[i].imshow(image)

In [None]:
train.iloc[:, 1:-1].sum()

In [None]:
plt.hist(train.iloc[:, 1:-1].sum())

In [None]:
import random, os
import numpy as np
import torch
from fastai.vision.all import *
from fastai.callback import mixup
path = Path('../input/ranzcr-clip-catheter-line-classification/')
path.ls()

In [None]:
train['path'] = train.StudyInstanceUID.map(lambda x:str(path / 'train' / x)+'.jpg')

In [None]:
labels = list(train.columns[1:12].values)
labels

In [None]:
def get_y(fname):
    return fname[1:12].values.astype(np.float32)

def get_x(fname):
    return fname[-1]

In [None]:
item_tfms = RandomResizedCrop(256, min_scale=0.75, ratio=(1.,1.))

dls = DataBlock(blocks=(ImageBlock, MultiCategoryBlock(encoded=True, vocab=labels)),
                        get_x = get_x,get_y = get_y,  
                        item_tfms = item_tfms)

In [None]:
dls = dls.dataloaders(train)
dls.show_batch()

In [None]:
learn = cnn_learner(dls, resnet152, metrics = [accuracy_multi], cbs=[mixup.MixUp()], model_dir="/tmp/model/").to_native_fp16()

In [None]:
learn = cnn_learner(dls, resnet50, metrics = [accuracy_multi], cbs=[mixup.MixUp()], model_dir="/tmp/model/").to_native_fp16()

In [None]:
learn.lr_find()

In [None]:
learn.fine_tune(3, 2e-2)

In [None]:
learn = learn.to_native_fp32()

In [None]:
path = Path('../input/ranzcr-clip-catheter-line-classification')

In [None]:
submission_df = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/sample_submission.csv')
submission_df.iloc[:,1:] = submission_df.iloc[:,1:].astype(float)

In [None]:
test_data_path = submission_df['StudyInstanceUID'].apply(lambda x: path/'test'/(x+'.jpg'))
tst_dl = learn.dls.test_dl(test_data_path)
preds,targs = learn.tta(dl = tst_dl)

In [None]:
columns = list(train.columns[1:12])

In [None]:
submission_df[columns] = pd.DataFrame(preds,columns=columns)
# submission_df

In [None]:
submission_df.to_csv('submission.csv',index=False)  