# Introduction to Medical Imaging Data Analysis

Welcome to this introductory notebook on medical imaging data analysis. This resource is designed to provide a comprehensive starting point for newcomers to the field, offering insights into various aspects of medical imaging and practical examples to help you get started. This notebook covers:

### Table of Contents:
1. **Types of Medical Imaging Data**
2. **Medical Imaging Data Formats**
3. **Typical Dataset Sizes**
4. **Visualizing Medical Imaging Data**
5. **Reading Medical Imaging Data**
6. **Preprocessing Medical Imaging Data**
7. **Feature Extraction**
8. **Simple Analysis Pipeline**
9. **Resources for Further Learning**
10. **References**
11. **Acknowledgements**
12. **Homework**

Each section builds on the previous ones, introducing progressively complex concepts and techniques. By the end, you will have a solid foundation in medical imaging data analysis and be well-prepared for more advanced topics.


## 1. Types of Medical Imaging Data

Medical imaging is essential for diagnosing and treating various medical conditions. Different imaging modalities provide unique insights into the body's structure and function. Here are the most common types:

### X-ray
- **Principle**: Electromagnetic radiation to create images, especially of bones.
- **Uses**: Detecting fractures, dental imaging, chest exams.
- **Limitations**: Limited soft tissue detail, radiation exposure.

### Computed Tomography (CT)
- **Principle**: Combines multiple X-ray images taken from different angles to create cross-sectional views.
- **Uses**: Identifying tumors, guiding biopsies, assessing injuries.
- **Limitations**: Radiation exposure, high cost, less soft tissue detail than MRI.

### Magnetic Resonance Imaging (MRI)
- **Principle**: Uses strong magnets and radio waves to generate detailed images of soft tissues.
- **Uses**: Brain, spinal cord, muscle, and joint imaging.
- **Limitations**: High cost, longer scan times, not suitable for patients with metal implants.

### Ultrasound
- **Principle**: Uses high-frequency sound waves to produce real-time images.
- **Uses**: Monitoring pregnancy, examining organs, guiding procedures.
- **Limitations**: Operator-dependent, limited penetration in dense tissues, cannot image air-filled or bony structures well.

### Nuclear Medicine (including PET)
- **Principle**: Uses radioactive tracers that emit gamma rays or positrons to visualize physiological processes.
- **Uses**: Thyroid scans, bone scans, PET for cancer detection, heart disease, brain disorders.
- **Limitations**: Radiation exposure, high cost, lower spatial resolution.

### Mammography
- **Principle**: Low-dose X-rays specifically for breast imaging.
- **Uses**: Early detection of breast cancer.
- **Limitations**: Discomfort, radiation exposure, potential for false positives/negatives.


## 2. Medical Imaging Data Formats

Medical imaging data comes in various formats, each designed to store different types of images and associated metadata. Understanding these formats is crucial for handling and processing medical imaging data effectively.

### DICOM (Digital Imaging and Communications in Medicine)
- **Description**: The standard format for storing and transmitting medical images and related information.
- **Features**:
  - Includes metadata such as patient information, imaging parameters, and study details.
  - Supports a wide range of imaging modalities (e.g., X-ray, CT, MRI, Ultrasound).
  - Facilitates interoperability between different medical imaging devices and systems.
- **File Extension**: `.dcm`

### NIfTI (Neuroimaging Informatics Technology Initiative)
- **Description**: Commonly used format for storing brain imaging data, particularly in research.
- **Features**:
  - Stores multi-dimensional arrays of data (e.g., 3D MRI or CT scans).
  - Includes metadata for image orientation and scaling.
  - Widely used in neuroimaging studies and analysis tools.
- **File Extensions**: `.nii`, `.nii.gz`

### Analyze
- **Description**: An older format for storing biomedical images, primarily used in brain imaging.
- **Features**:
  - Consists of two files: a header file (`.hdr`) and an image file (`.img`).
  - Stores 3D image data and metadata.
  - Largely replaced by NIfTI but still encountered in legacy data.
- **File Extensions**: `.hdr`, `.img`

### NRRD (Nearly Raw Raster Data)
- **Description**: Flexible format for storing n-dimensional raster data, often used in medical imaging research.
- **Features**:
  - Simple, human-readable header file (`.nhdr`) and a binary data file.
  - Supports various types of medical imaging data, including MRI and CT scans.
  - Facilitates data exchange and processing in research environments.
- **File Extensions**: `.nrrd`, `.nhdr`

### MHA/MHD (MetaImage)
- **Description**: Format for storing medical images and associated metadata, commonly used in research.
- **Features**:
  - Consists of a text header file (`.mhd`) and a binary image data file (`.raw`).
  - Supports large datasets and various imaging modalities.
  - Allows easy manipulation and processing of medical images.
- **File Extensions**: `.mhd`, `.mha`, `.raw`

### JPEG/PNG/TIFF
- **Description**: Common image formats used for medical images, particularly for reports and documentation.
- **Features**:
  - Suitable for 2D images like X-rays or ultrasound snapshots.
  - Limited metadata support compared to specialized formats.
  - Widely compatible with general-purpose image viewers and editing tools.
- **File Extensions**: `.jpg`, `.jpeg`, `.png`, `.tiff`



## 3. Typical Dataset Sizes

Understanding the typical sizes of medical imaging datasets is crucial for planning storage, processing, and analysis. Medical imaging datasets can vary greatly depending on the modality, resolution, and the number of images acquired.

### X-ray
- **Typical Size**: 10 MB - 50 MB per image.
- **Details**:
  - Single 2D images.
  - Smaller file sizes due to lower resolution compared to 3D imaging modalities.

### Computed Tomography (CT)
- **Typical Size**: 200 MB - 2 GB per scan.
- **Details**:
  - Comprises multiple cross-sectional images (slices).
  - High resolution and 3D volumetric data contribute to larger file sizes.

### Magnetic Resonance Imaging (MRI)
- **Typical Size**: 100 MB - 500 MB per sequence.
- **Details**:
  - Consists of multiple sequences (e.g., T1, T2, FLAIR) with various parameters.
  - Each sequence includes multiple slices or volumes.

### Ultrasound
- **Typical Size**: 5 MB - 30 MB per image or video clip.
- **Details**:
  - 2D images are relatively small.
  - 3D/4D ultrasound and video clips increase the data size.

### Nuclear Medicine (including PET)
- **Typical Size**: 50 MB - 500 MB per scan.
- **Details**:
  - Includes multiple frames over time for dynamic studies.
  - PET/CT or PET/MRI combinations further increase data size.

### Mammography
- **Typical Size**: 20 MB - 50 MB per image.
- **Details**:
  - High-resolution 2D images.
  - Digital Breast Tomosynthesis (3D mammography) can generate larger datasets.



## 4. Visualizing Medical Imaging Data

Visualizing medical imaging data is a crucial step in analysis and diagnosis. Effective visualization helps in understanding complex anatomical structures and identifying abnormalities.

### Visualization Tools

#### 3D Slicer
- **Description**: 3D Slicer is an open-source software platform for medical image informatics, image processing, and three-dimensional visualization. It supports a variety of imaging modalities and provides a wide range of tools for analysis and visualization.
- **Features**:
  - 3D volume rendering
  - Slice views (axial, sagittal, coronal)
  - Interactive segmentation
  - Registration and fusion of different imaging modalities
- **Download**: You can download 3D Slicer from [slicer.org](https://www.slicer.org/).

#### ITK-SNAP
- **Description**: ITK-SNAP is a software application used to segment structures in 3D medical images. It is widely used for manual and semi-automatic segmentation and provides easy-to-use tools for navigating and annotating medical images.
- **Features**:
  - Manual and automatic segmentation tools
  - Interactive 3D navigation
  - Intuitive user interface
  - Supports a variety of medical image formats
- **Download**: You can download ITK-SNAP from [itksnap.org](http://www.itksnap.org/pmwiki/pmwiki.php).

These tools are essential for visualizing and analyzing medical imaging data, providing powerful capabilities for both clinical and research applications.


## 5. How to Read Medical Imaging Data

Reading medical imaging data involves accessing both the image and its associated metadata. Below are runnable examples for different formats using Python libraries. We will focus on DICOM and NIfTI formats, which are widely used in medical imaging.



### DICOM (Digital Imaging and Communications in Medicine)

In [None]:
# Install the necessary libraries for handling and visualizing DICOM files
!pip install pydicom matplotlib

import pydicom # Library for working with DICOM files
import matplotlib.pyplot as plt # Library for plotting graphs

# Load a DICOM file
file_path = 'path_to_dicom_file.dcm'
dicom_data = pydicom.dcmread(file_path)

# Accessing metadata
print(f"Patient's Name: {dicom_data.PatientName}")
print(f"Study Date: {dicom_data.StudyDate}")

# Accessing pixel data
image_array = dicom_data.pixel_array

# Display the image
plt.imshow(image_array, cmap='gray')
plt.title('DICOM Image')
plt.show()


### NIfTI (Neuroimaging Informatics Technology Initiative)

In [None]:
# Install the libraries for handling and visualizing NIfTI files
!pip install nibabel matplotlib

import nibabel as nib # Library for working with NIfTI files
import matplotlib.pyplot as plt # Library for plotting the image

# Load a NIfTI file
file_path = 'path_to_nifti_file.nii'
nifti_data = nib.load(file_path)

# Accessing metadata information
print(f"Image Shape: {nifti_data.shape}") # dimensions of the image
print(f"Affine Matrix: {nifti_data.affine}") # affine transformation matrix for image orientation with respect to the world coordinate system.

# Accessing pixel data
image_array = nifti_data.get_fdata()

# Display a slice of the image
plt.imshow(image_array[:, :, image_array.shape[2]//2], cmap='gray')
plt.title('NIfTI Image')
plt.show()
