### **Background Information:**

Traumatic injury is a significant global health concern, especially affecting individuals in the first four decades of life. It is responsible for millions of annual deaths worldwide and poses a substantial public health challenge. Prompt and accurate diagnosis of traumatic injuries is crucial for improving patient outcomes and increasing survival rates. Among various diagnostic tools, computed tomography (CT) has emerged as a vital technology for evaluating individuals suspected of having abdominal injuries. CT scans provide detailed cross-sectional images of the abdomen, aiding in the detection and assessment of traumatic injuries.


Interpreting CT scans for abdominal trauma can be a complex and time-consuming task, particularly when dealing with multiple injuries or subtle areas of active bleeding. This complexity often requires the expertise of medical professionals, and even for them, it can be challenging to make rapid and precise diagnoses. The need for timely intervention and appropriate treatment underscores the importance of improving the diagnostic process.

Artificial intelligence (AI) and machine learning have demonstrated significant potential in assisting medical professionals in diagnosing and grading the severity of traumatic injuries. Advanced AI algorithms can enhance the speed and accuracy of detecting injuries, leading to improved trauma care and better patient outcomes on a global scale.

### **Problem Statement:**

Traumatic injury is a leading cause of death globally, particularly affecting individuals in their prime years. Timely and accurate diagnosis of traumatic injuries is essential for improving patient outcomes and reducing mortality rates. Computed tomography (CT) scans have emerged as a critical tool for evaluating individuals with suspected abdominal injuries due to their ability to provide detailed cross-sectional images. However, interpreting these CT scans, especially in cases of multiple injuries or subtle internal bleeding, poses a significant challenge for healthcare professionals, often leading to delays in diagnosis and treatment.

To address this pressing healthcare issue, the RSNA Abdominal Trauma Detection AI Challenge aims to harness the potential of artificial intelligence and machine learning. The challenge calls upon researchers to develop advanced AI algorithms capable of rapidly and precisely detecting severe injuries to internal abdominal organs, including the liver, kidneys, spleen, and bowel, as well as identifying active internal bleeding.


### **Objectives:**

*  To develop advanced AI algorithms capable of accurately detecting traumatic injuries in CT scans of the abdomen.
* Classify the severity of detected injuries, providing valuable information for medical professionals.
* To expedite the diagnosis process, enabling rapid identification of injuries and active bleeding.
*  To improve trauma care and patient outcomes by providing AI tools that can assist medical professionals in making more accurate and timely diagnoses.

### **Research Questions:**

* To effectively  detect traumatic injuries to internal abdominal organs in CT scans?
*  To identify features and patterns in CT images  that are  indicative of different types and severities of abdominal injuries?
* To determine  the speed and accuracy of trauma diagnosis impact patient outcomes and survival rates?




In [None]:
import numpy as np 
import pandas as pd
import pydicom
import matplotlib.pyplot as plt
import cv2
import seaborn as sns
import tensorflow as tf
import os

In [None]:
train = pd.read_csv('/kaggle/input/rsna-2023-abdominal-trauma-detection/image_level_labels.csv')
labels = pd.read_csv('/kaggle/input/rsna-2023-abdominal-trauma-detection/train.csv')
train_meta = pd.read_csv('/kaggle/input/rsna-2023-abdominal-trauma-detection/train_series_meta.csv')
test_meta = pd.read_csv('/kaggle/input/rsna-2023-abdominal-trauma-detection/test_series_meta.csv')



In [None]:
#Displaying the first few rows of each dataset
train.head(), labels.head(), train_meta.head()

**train (image_level_labels.csv):**

* patient_id: The unique identifier for each patient.
* series_id: Identifier for the series of images for the patient.
* instance_number: The specific image instance number within the series.
* injury_name: The name or type of injury detected in the image.

**labels (train.csv):**

* This dataset provides the labels for different types of injuries for each patient.
* Columns like bowel_healthy, bowel_injury, extravasation_healthy, etc., indicate the health status or injury severity of various organs for each patient.

**train_meta (train_series_meta.csv):**

* patient_id: The unique identifier for each patient.
* series_id: Identifier for the series of images for the patient.
* aortic_hu: A quantitative measure related to the images.
* incomplete_organ: A binary indicator specifying whether the organ is incomplete in the images.







Basic Understanding

In [None]:
# Basic information for the 'train' dataset
train_info = {
    "Number of Rows": train.shape[0],
    "Number of Columns": train.shape[1],
    "Columns": train.columns.tolist(),
    "Data Types": train.dtypes.tolist(),
    "Unique Values per Column": train.nunique().tolist()
}

# Basic information for the 'labels' dataset
labels_info = {
    "Number of Rows": labels.shape[0],
    "Number of Columns": labels.shape[1],
    "Columns": labels.columns.tolist(),
    "Data Types": labels.dtypes.tolist(),
    "Unique Values per Column": labels.nunique().tolist()
}

# Basic information for the 'train_meta' dataset
train_meta_info = {
    "Number of Rows": train_meta.shape[0],
    "Number of Columns": train_meta.shape[1],
    "Columns": train_meta.columns.tolist(),
    "Data Types": train_meta.dtypes.tolist(),
    "Unique Values per Column": train_meta.nunique().tolist()
}

train_info, labels_info, train_meta_info


**1. train (image_level_labels.csv) Dataset:**
* Number of Rows: 12,029
* Number of Columns: 4
* Columns:
* patient_id: Identifier for the patient.
* series_id: Identifier for the series of images for the patient.
* instance_number: Specific image instance number within the series.
* injury_name: Type of injury detected in the image.
* Data Types: The data types are appropriate with integer types for identifiers and object (string) type for the injury name.
* Unique Values: There are 246 unique patients, 330 unique series, and 925 unique instance numbers. The injury_name column has 2 unique values, indicating two types of injuries.

**2. labels (train.csv) Dataset:**

* Number of Rows: 3,147
* Number of Columns: 15
* Columns:
* Various columns representing the health status or injury severity of various organs for each patient.
* Data Types: All columns are of integer type.
* Unique Values: There are 3,147 unique patients. The injury-related columns have binary values (0 or 1), indicating the absence or presence of a specific injury type.

**3. train_meta (train_series_meta.csv) Dataset:**

* Number of Rows: 4,711
* Number of Columns: 4
* Columns:
* patient_id: Identifier for the patient.
* series_id: Identifier for the series of images for the patient.
* aortic_hu: A quantitative measure related to the images.
* incomplete_organ: Binary indicator specifying whether the organ is incomplete in the images.
* Data Types: The data types are appropriate with integer and float types.
* Unique Values: There are 3,147 unique patients and 4,711 unique series. The incomplete_organ column has 2 unique values.






In [None]:
# Displaying missing values for each dataset separately

missing_values_train = train.isnull().sum().to_frame(name='Missing Values (train)')
missing_values_labels = labels.isnull().sum().to_frame(name='Missing Values (labels)')
missing_values_train_meta = train_meta.isnull().sum().to_frame(name='Missing Values (train_meta)')

missing_values_train, missing_values_labels, missing_values_train_meta


**1. train (image_level_labels.csv) Dataset:**

No missing values are present in any columns.

**2. labels (train.csv) Dataset:**

No missing values are present in any columns.

**3. train_meta (train_series_meta.csv) Dataset:**

No missing values are present in any columns.

In [None]:
# Checking for duplicates in each dataset
duplicates_train = train.duplicated().sum()
duplicates_labels = labels.duplicated().sum()
duplicates_train_meta = train_meta.duplicated().sum()

duplicates_summary = {
    "Dataset": ["train", "labels", "train_meta"],
    "Number of Duplicates": [duplicates_train, duplicates_labels, duplicates_train_meta]
}

pd.DataFrame(duplicates_summary)


There are no duplicate rows in any of the datasets:

* train (image_level_labels.csv) Dataset: 0 duplicates.
* labels (train.csv) Dataset: 0 duplicates.
* train_meta (train_series_meta.csv) Dataset: 0 duplicates.

## **Exploratory data Analysis**

In [None]:
# Visualizing the distribution of injury types in the 'train' dataset
plt.figure(figsize=(10, 6))
sns.countplot(data=train, x='injury_name')
plt.title('Distribution of Injury Types in train Dataset')
plt.ylabel('Count')
plt.xlabel('Injury Type')
plt.show()


The data suggests that extravasation (active bleeding) is more frequently identified in the provided images than bowel injuries.

In [None]:
# Visualizing the distribution of injury-related columns in the 'labels' dataset
injury_columns = [col for col in labels.columns if col != "patient_id"]
injury_counts = labels[injury_columns].sum()

plt.figure(figsize=(14, 8))
injury_counts.sort_values().plot(kind='barh')
plt.title('Distribution of Injury-Related Columns in labels Dataset')
plt.xlabel('Count')
plt.ylabel('Injury Type / Health Status')
plt.show()


In [None]:
# Visualizing the distribution of the 'aortic_hu' column in the 'train_meta' dataset
plt.figure(figsize=(10, 6))
sns.histplot(train_meta['aortic_hu'], bins=50, kde=True)
plt.title('Distribution of Aortic HU in train_meta Dataset')
plt.xlabel('Aortic HU')
plt.ylabel('Count')
plt.show()


Hounsfield Units (HU) are a measure used in CT scans to describe radiodensity, and the distribution gives us an idea of the variation in these values across different images.

**Relationship Analysis:**

In [None]:
# Visualizing the relationship between 'aortic_hu' and 'incomplete_organ' in the 'train_meta' dataset
plt.figure(figsize=(10, 6))
sns.boxplot(data=train_meta, x='incomplete_organ', y='aortic_hu')
plt.title('Relationship between Aortic HU and Incomplete Organ in train_meta Dataset')
plt.xlabel('Incomplete Organ (0 = Complete, 1 = Incomplete)')
plt.ylabel('Aortic HU')
plt.show()


This suggests that there might be some relationship between the completeness of the organ in the image and the aortic_hu values.

**Outliers Analysis:**

In [None]:
# Outlier analysis for the 'aortic_hu' column using the IQR method

# Calculate Q1, Q3, and IQR
Q1 = train_meta['aortic_hu'].quantile(0.25)
Q3 = train_meta['aortic_hu'].quantile(0.75)
IQR = Q3 - Q1

# Define bounds for outliers
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Identify outliers
outliers = train_meta[(train_meta['aortic_hu'] < lower_bound) | (train_meta['aortic_hu'] > upper_bound)]

# Percentage of data points that are outliers
outlier_percentage = (len(outliers) / len(train_meta)) * 100

outlier_summary = {
    "Lower Bound": lower_bound,
    "Upper Bound": upper_bound,
    "Number of Outliers": len(outliers),
    "Percentage of Outliers": outlier_percentage
}

outlier_summary


In [None]:
# Visualizing outliers for the 'aortic_hu' column
plt.figure(figsize=(12, 8))
sns.boxplot(train_meta['aortic_hu'])
plt.axhline(lower_bound, color='r', linestyle='--', label=f"Lower Bound: {lower_bound}")
plt.axhline(upper_bound, color='g', linestyle='--', label=f"Upper Bound: {upper_bound}")
plt.title('Boxplot of Aortic HU with Outliers Highlighted')
plt.xlabel('Aortic HU')
plt.legend()
plt.show()


From the plot, we can observe a cluster of data points above the upper bound, indicating potential outliers with higher aortic_hu values.

**Relationship Analysis:**

**1. Injury Type vs. Aortic HU:**

In [None]:
# Merging the 'train' and 'train_meta' datasets on 'patient_id' and 'series_id'
merged_data = pd.merge(train, train_meta, on=['patient_id', 'series_id'])

# Visualizing the distribution of 'aortic_hu' based on 'injury_name'
plt.figure(figsize=(12, 8))
sns.boxplot(data=merged_data, x='injury_name', y='aortic_hu')
plt.title('Distribution of Aortic HU based on Injury Type')
plt.xlabel('Injury Type')
plt.ylabel('Aortic HU')
plt.show()


For bowel_injury, the distribution appears to have a slightly higher median and is more compact in terms of the interquartile range (IQR) compared to extravasation.
The extravasation injury (which represents active bleeding) has a broader IQR, indicating more variability in the aortic_hu values for this injury type. There are also a few potential outliers present for this injury type.

**2. Injury Type vs. Completeness of Organ:**

In [None]:
# Visualizing the relationship between 'injury_name' and 'incomplete_organ'
plt.figure(figsize=(10, 6))
sns.countplot(data=merged_data, x='injury_name', hue='incomplete_organ')
plt.title('Injury Type vs. Completeness of Organ')
plt.xlabel('Injury Type')
plt.ylabel('Count')
plt.legend(title='Incomplete Organ (0 = Complete, 1 = Incomplete)')
plt.show()


For both bowel_injury and extravasation injury types, the majority of the organs in the images are complete (incomplete_organ = 0).
The number of images with incomplete organs (incomplete_organ = 1) is relatively lower for both injury types, with extravasation having a slightly higher count of incomplete organs compared to bowel_injury.

## Importing Images 

In [None]:
# Adjusting the path generation function to exclude instance_number
def test_img_path(row):
    return f"/kaggle/input/rsna-2023-abdominal-trauma-detection/test_images/{row['patient_id']}/{row['series_id']}/"

test_meta['test_img_path'] = test_meta.apply(test_img_path, axis=1)

# Display the first few rows of the test_meta dataframe with the new 'adjusted_img_path' column
test_meta.head()



In [None]:
def img_path(row):
    return f"/kaggle/input/rsna-2023-abdominal-trauma-detection/train_images/{row['patient_id']}/{row['series_id']}/{row['instance_number']}.dcm"

train['img_path'] = train.apply(img_path, axis=1)

**DICOM Image Visualization:**

In [None]:
# Generating Kaggle reference paths for the 'train' dataset again
train['img_path'] = train.apply(img_path, axis=1)

# Displaying the first few rows of the 'train' dataset with the updated 'img_path' column
train.head()


In [None]:
import pydicom
import matplotlib.pyplot as plt

def read_dicom_image(path):
    """
    Reads a DICOM image and returns its pixel array.
    """
    dicom_img = pydicom.dcmread(path)
    return dicom_img.pixel_array

# Sample 20 rows from the train dataset
sample_data = train.sample(20)

# Extract the img_paths and corresponding injury names for labeling
sample_img_paths = sample_data['img_path'].tolist()
sample_labels = sample_data['injury_name'].tolist()

# Set up the figure for visualization
plt.figure(figsize=(15, 30))

# Loop through the sampled image paths and display them in rows of 3 with labels
for idx, (img_path, label) in enumerate(zip(sample_img_paths, sample_labels), start=1):
    plt.subplot(7, 3, idx)  # 7 rows, 3 columns
    plt.imshow(read_dicom_image(img_path), cmap='gray')
    plt.title(label)
    plt.axis('off')

plt.tight_layout()
plt.show()



**comparison of images for each injury type**

In [None]:
import pydicom
import matplotlib.pyplot as plt

def read_dicom_image(path):
    """
    Reads a DICOM image and returns its pixel array.
    """
    dicom_img = pydicom.dcmread(path)
    return dicom_img.pixel_array

# Sample one image path for each injury type
sample_img_paths = train.groupby('injury_name').apply(lambda x: x.sample(1)['img_path'].values[0])
sample_labels = sample_img_paths.index.tolist()

# Set up the figure for visualization
plt.figure(figsize=(15, 5))

# Loop through the sampled image paths and display them side by side with labels
for idx, (img_path, label) in enumerate(zip(sample_img_paths, sample_labels), start=1):
    plt.subplot(1, len(sample_img_paths), idx)
    plt.imshow(read_dicom_image(img_path), cmap='gray')
    plt.title(label)
    plt.axis('off')

plt.tight_layout()
plt.show()


**preprocessing :**


* Rescaling: Adjusting the intensity values to a standard scale, e.g., between 0 and 1.
* Resizing: Making sure all images have the same size, especially if they are being fed into a neural network.
* Histogram Equalization: Enhancing the contrast of images.
* Normalization: Removing the mean and scaling to unit variance.
* Data Augmentation: Techniques such as rotation, zooming, and flipping to artificially increase the size of the dataset (useful for training deep learning models).

In [None]:
import pydicom
import cv2
import numpy as np
import matplotlib.pyplot as plt

# Load a sample DICOM image
sample_path = train['img_path'].iloc[0]
dicom_img = pydicom.dcmread(sample_path).pixel_array

# Rescale the image to the range [0, 1]
rescaled_img = cv2.normalize(dicom_img, None, 0, 1, cv2.NORM_MINMAX, dtype=cv2.CV_32F)

# Apply histogram equalization
equalized_img = cv2.equalizeHist((rescaled_img * 255).astype(np.uint8))

# Plot original and preprocessed images side by side
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(dicom_img, cmap='gray')
plt.title('Original Image')
plt.subplot(1, 2, 2)
plt.imshow(equalized_img, cmap='gray')
plt.title('Preprocessed Image')
plt.show()


**Modelling**

**Steps for Model Building:**
* Data Preparation: Split the data into training and validation sets.
* Data Augmentation: Use data augmentation techniques to artificially increase the size of the training dataset.
* Model Architecture: Define the CNN architecture.
* Model Compilation: Specify the loss function, optimizer, and metrics.
* Model Training: Train the model using the training data.
* Model Evaluation: Evaluate the model's performance on the validation data.

**1. Data Preparation**

In [None]:
from sklearn.model_selection import train_test_split

# Splitting the data into training (80%) and validation (20%) sets again
train_data, val_data = train_test_split(train, test_size=0.2, stratify=train['injury_name'], random_state=42)

# Extracting image paths and labels for training and validation sets
X_train = train_data['img_path'].tolist()
y_train = train_data['injury_name'].tolist()

X_val = val_data['img_path'].tolist()
y_val = val_data['injury_name'].tolist()




2. Data Augmentation

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Data Augmentation for the training set
train_datagen = ImageDataGenerator(
    rescale=1./255,           # Rescale pixel values to [0,1]
    rotation_range=20,        # Randomly rotate the image (degrees, 0 to 180)
    width_shift_range=0.2,    # Randomly shift images horizontally (fraction of total width)
    height_shift_range=0.2,   # Randomly shift images vertically (fraction of total height)
    horizontal_flip=True      # Randomly flip images horizontally
)

# Only rescaling for the validation set
val_datagen = ImageDataGenerator(rescale=1./255)


In [None]:
# Setting up generators to read images from the dataframe
train_generator = train_datagen.flow_from_dataframe(
    dataframe=train_data,
    x_col="img_path",
    y_col="injury_name",
    target_size=(150, 150),
    batch_size=32,
    class_mode='categorical'
)

val_generator = val_datagen.flow_from_dataframe(
    dataframe=val_data,
    x_col="img_path",
    y_col="injury_name",
    target_size=(150, 150),
    batch_size=32,
    class_mode='categorical'
)

# To check class indices
print(train_generator.class_indices)



**3. Model Architecture**

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D(2, 2))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dense(len(train_generator.class_indices), activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


**4. Model Compilation**