# Extract nifti's to create dataset for segmentation


In [1]:
import pandas as pd
import csv
import re
import shutil
import os
import numpy as np
import nibabel as nib
import skimage.transform as skTrans
import SimpleITK as sitk

## Create dataframe from Distant and Deeprisk log

In [7]:
distant_log_path = "L:\\basic\Personal Archive\E\emquist\parsing_MRI\LOG\\final_header_df_DISTANT.csv"
deeprisk_log_path = "L:\\basic\Personal Archive\E\emquist\parsing_MRI\LOG\\final_header_df_DEEP-RISK.csv"

In [8]:
# Function to extract the number string from the given string
def extract_number_string(pattern, input_string):
    match = re.search(pattern, input_string)
    return match.group() if match else None

### Distant dataframe

In [9]:
dist_log = pd.read_csv(distant_log_path,header=None, sep=";")
column_name = 0

# Define a regular expression pattern to match the desired pattern.
pattern = r"DIST_\d{4}"

# Applying the function to the DataFrame's column
dist_log['StudyID'] = dist_log[column_name].apply(lambda x: extract_number_string(pattern, x))


### Deeprisk dataframe

In [10]:

dr_log = pd.read_csv(deeprisk_log_path,header=None, sep=";")
column_name = 0
#DRAUMC0001_DERI200909030911
# Define a regular expression pattern to match the desired pattern.
pattern = r"DRAUMC\d{4}"

# Applying the function to the DataFrame's column
dr_log['StudyID'] = dr_log[column_name].apply(lambda x: extract_number_string(pattern, x))

In [11]:
# Display the DataFrame with the extracted number string
dist_log['StudyID']
dr_log['StudyID']

0        DRAUMC0001
1        DRAUMC0001
2        DRAUMC0001
3        DRAUMC0001
4        DRAUMC0001
            ...    
30446    DRAUMC1261
30447    DRAUMC1261
30448    DRAUMC1261
30449    DRAUMC1261
30450    DRAUMC1261
Name: StudyID, Length: 30451, dtype: object

## Get LGE-SAX from the logs

In [12]:
# LGE_SAX
mri_type = 'LGE_SAX'


# Cine SAX
# mri_type = 'SAX'

In [13]:
# Make a copy of the DataFrame using the copy() method
dist_df = dist_log.loc[dist_log[2] == mri_type].copy()
dr_df = dr_log.loc[dr_log[2] == mri_type].copy()

# Make unique series
dist_df['Series'] = dist_df[0]
dist_df.drop_duplicates(subset=['StudyID', 'Series'], keep='first', inplace=True)
dist_df['StudyDateMRI'] = pd.to_datetime(dist_df[5], format='%Y%m%d').dt.strftime('%Y-%m-%d')

# Make unique series
dr_df['Series'] = dr_df[0]
dr_df.drop_duplicates(subset=['StudyID', 'Series'], keep='first', inplace=True)
dr_df['StudyDateMRI'] = pd.to_datetime(dr_df[5], format='%Y%m%d').dt.strftime('%Y-%m-%d')

print(dist_df.shape)
print(dr_df.shape)

(1201, 19)
(939, 19)


### Merge logs and add clinical data

In [14]:
clinical_data_path = r"\\amc.intra\opslag\basic\diva1\Onderzoekers\DEEP-RISK\All clinical information.csv"

In [15]:
clinical_df = pd.read_csv(clinical_data_path)
implantdetails = clinical_df[['StudyID', 'ImplantationDate']]
dr__ = dr_df[['StudyID', 'StudyDateMRI', 0]]
dist__ = dist_df[['StudyID','StudyDateMRI', 0]]

print(dr__.shape)
print(dist__.shape)
mri = pd.concat([dist__, dr__], axis=0)

mri['Filename'] = mri[0]

clin_mri = pd.merge(mri, implantdetails, on='StudyID', how='left')

print(clin_mri.shape)
clin_mri = clin_mri[['StudyID', 'StudyDateMRI','ImplantationDate', 'Filename']]

#Calculate days between implant and MRI
clin_mri['StudyDateMRI'] = pd.to_datetime(clin_mri['StudyDateMRI'])
clin_mri['ImplantationDate'] = pd.to_datetime(clin_mri['ImplantationDate'])

# Calculate the time difference in days between 'StudyDateMRI' and 'ImplantationDate'.
clin_mri['Days_MRI_to_Implant'] = (clin_mri['StudyDateMRI'] - clin_mri['ImplantationDate']).dt.days

#Baseline MRI
#indices = clin_mri[(clin_mri['Days_MRI_to_Implant'] <0) & (clin_mri['Days_MRI_to_Implant'] >-365)].index.tolist()
#clin_mri = clin_mri[clin_mri.index.isin(indices)]

clin_mri_onemriperpatient = clin_mri

#Select so one row per patient, keep MRI closest to implant
clin_mri_onemriperpatient = clin_mri_onemriperpatient.sort_values(by='Days_MRI_to_Implant')
clin_mri_onemriperpatient = clin_mri_onemriperpatient.drop_duplicates(subset='StudyID', keep='last')

print("number of MRI total", clin_mri.shape)
print("number of patients with baseline MRI", (clin_mri_onemriperpatient['StudyID'].nunique()))

subject_counts = clin_mri.groupby('StudyID').size()

# Count the occurrences of each unique value in the subject_counts Series
subject_counts_summary = subject_counts.value_counts()

print("Number of subjects with 1 MRI:", subject_counts_summary.get(1, 0))
print("Number of subjects with 2 MRI:", subject_counts_summary.get(2, 0))
print("Number of subjects with 3 MRI:", subject_counts_summary.get(3, 0))
print("Number of subjects with 4 MRI:", subject_counts_summary.get(4, 0))
print("Number of subjects with 5 MRI:", subject_counts_summary.get(5, 0))
print("Number of subjects with 5 MRI:", subject_counts_summary.get(6, 0))

(939, 3)
(1201, 3)
(2140, 5)
number of MRI total (2140, 5)
number of patients with baseline MRI 1722
Number of subjects with 1 MRI: 1432
Number of subjects with 2 MRI: 210
Number of subjects with 3 MRI: 55
Number of subjects with 4 MRI: 16
Number of subjects with 5 MRI: 5
Number of subjects with 5 MRI: 0


# Extract the niftis from the Emma pipeline

In [16]:
source_paths_niftis = [
    r'\\amc.intra\opslag\basic\diva1\Onderzoekers\DEEP-RISK\DEEP-RISK\emma_cmr_toolkit\final_DR_output_dir\niftis',
    r'\\amc.intra\opslag\basic\diva1\Onderzoekers\DISTANT\emma_cmr_toolkit\final_DIST_output_dir\niftis'
]

In [17]:
def find_matching_folder(folder_name, folder_names):
    # Look for a matching folder in the list of folder names
    for name in folder_names:
        if name in folder_name:
            return name
    return None

def extract_niftis_with_keyword(keyword, source_paths_niftis, dst_path, folder_names):
    for source_path in source_paths_niftis:
        for root, _, files in os.walk(source_path):
            # Check if the current folder contains a subfolder named 'LGE_SAX'
            if keyword in os.listdir(root):
                matching_name = find_matching_folder(root, folder_names)
                if matching_name:
                    source_subfolder = os.path.join(root, keyword)
                    destination_filename = os.path.join(dst_path, matching_name)
                    shutil.copytree(source_subfolder, destination_filename)


In [19]:
# List of folder names corresponding to each source path
folder_names = list(clin_mri_onemriperpatient['Filename'])

# # LGE_SAX dataset
# keyword = 'LGE_SAX'
# lge_dst_path = r'\\amc.intra\opslag\basic\\Personal Archive\E\emquist\parsing_MRI\final_test_seg\lge_sax_raw_mri'
# extract_niftis_with_keyword(keyword, source_paths_niftis, lge_dst_path, folder_names)

# # Cine Sax dataset
keyword = 'SAX'
lge_dst_path = r'\\amc.intra\opslag\basic\\Personal Archive\E\emquist\parsing_MRI\final_test_seg\cine_sax_raw_mri'
extract_niftis_with_keyword(keyword, source_paths_niftis, lge_dst_path, folder_names)

# Extract processed dataset with more than 7 slices in one folder, keep track of files in a log

In [35]:
def write_log(log_file, filepath, slices, img_shape, n):
    with open(log_file, mode='a', newline='') as csvfile:
        csv_writer = csv.writer(csvfile)
        csv_writer.writerow([filepath, slices, img_shape, n])

In [67]:
def orthogonal_check_and_save(file, sitk_img, n_good, n_few, dst_path, too_few_slices_path):

    img_shape = sitk_img.GetSize()
    img = sitk_img
    #transpose if single slice
    if img_shape[0] < 40:
        img = sitk.PermuteAxes(sitk_img, [1,2,0])
        img_shape = img.GetSize()
    elif img_shape[1] < 40:
        img = sitk.PermuteAxes(sitk_img, [0,2,1])
        img_shape = img.GetSize()
    if img_shape[0] < 40:
        img = sitk.PermuteAxes(img, [1,2,0])

    img_shape = img.GetSize()

    minimum_slices = 7

    if img_shape[2] >= minimum_slices: 
        
        non_orthogonal_image = img #sitk.ReadImage(filepath)
        # Create an identity transformation
        identity_transform = sitk.Euler3DTransform()

        # Set up the registration
        registration_method = sitk.ImageRegistrationMethod()

        # Set the similarity metric to mean squares
        registration_method.SetMetricAsMeanSquares()

        # Set the optimizer to gradient descent
        registration_method.SetOptimizerAsGradientDescent(learningRate=1.0, numberOfIterations=100,
                                                        convergenceMinimumValue=1e-6,
                                                        convergenceWindowSize=10)

        # Set the interpolator to linear
        registration_method.SetInterpolator(sitk.sitkLinear)

        # Set the initial transform as the identity transform
        registration_method.SetInitialTransform(identity_transform)

        # Perform the registration
        final_transform = registration_method.Execute(sitk.Cast(non_orthogonal_image, sitk.sitkFloat32),
                                                    sitk.Cast(non_orthogonal_image, sitk.sitkFloat32))

        # Resample the non-orthogonal image to the orthogonal image grid using the final transform
        orthogonal_image = sitk.Resample(non_orthogonal_image, sitk.Cast(non_orthogonal_image, sitk.sitkFloat32),
                                        final_transform, sitk.sitkLinear, 0.0)

        # Save the orthogonal image as a NIfTI file
        out_path = os.path.join(dst_path, f'{n_good}_{file}')
        sitk.WriteImage(orthogonal_image, out_path)
        return True, img_shape
    else:
        # Save the orthogonal image as a NIfTI file
        out_path = os.path.join(too_few_slices_path, f'{n_few}_{file}')
        sitk.WriteImage(sitk_img, out_path)

        return False, img_shape

In [97]:
def cine_orthogonal_check_and_save(file, sitk_img, n_good, n_few, dst_path, too_few_slices_path):

    img_shape = sitk_img.GetSize()
    img = sitk_img

    #transpose if single slice
    if len(img_shape) == 3:
        if img_shape[0] < 40:
            img = sitk.PermuteAxes(sitk_img, [1,2,0])
            img_shape = img.GetSize()
        elif img_shape[1] < 40:
            img = sitk.PermuteAxes(sitk_img, [0,2,1])
            img_shape = img.GetSize()
        if img_shape[0] < 40:
            img = sitk.PermuteAxes(img, [1,2,0])
        img_shape = img.GetSize()
        # Save the orthogonal image as a NIfTI file
        out_path = os.path.join(too_few_slices_path, f'{n_few}_{file}')
        sitk.WriteImage(sitk_img, out_path)

        return False, img_shape

    minimum_slices = 7

    if img_shape[2] >= minimum_slices: 

        # Read the 4D non-orthogonal image (CINE CMR) as a series of 3D volumes
        non_orthogonal_image = img

        # Create an identity transformation for 3D registration
        identity_transform = sitk.Euler3DTransform()

        # Set up the registration for 3D volumes
        registration_method = sitk.ImageRegistrationMethod()
        registration_method.SetMetricAsMeanSquares()
        registration_method.SetOptimizerAsGradientDescent(learningRate=1.0, numberOfIterations=100,
                                                        convergenceMinimumValue=1e-6,
                                                        convergenceWindowSize=10)
        registration_method.SetInterpolator(sitk.sitkLinear)
        registration_method.SetInitialTransform(identity_transform)

        # Create an empty 4D image to store the registered volumes
        registered_4d_image = sitk.Image(non_orthogonal_image.GetSize(), sitk.sitkFloat32)
        registered_4d_image.CopyInformation(non_orthogonal_image)

        # Iterate over each 3D volume in the 4D image
        for t in range(non_orthogonal_image.GetSize()[3]):
            # Perform 3D registration for each volume
            fixed_image = non_orthogonal_image[:, :, :, t]  # Extract the t-th volume
            moving_image = non_orthogonal_image[:, :, :, t]  # You can modify this if needed

            # Perform the registration
            final_transform = registration_method.Execute(sitk.Cast(fixed_image, sitk.sitkFloat32),
                                                        sitk.Cast(moving_image, sitk.sitkFloat32))

            # Resample the moving image to the fixed image grid using the final transform
            registered_image = sitk.Resample(moving_image, fixed_image, final_transform,
                                            sitk.sitkLinear, 0.0)

            # Insert the registered volume into the 4D image
            registered_4d_image[:, :, :, t] = registered_image

        # Save the registered 4D image as a NIfTI file
        out_path = os.path.join(dst_path, f'{n_good}_{file}')
        sitk.WriteImage(registered_4d_image, out_path)

        return True, img_shape
    else:
        # Save the orthogonal image as a NIfTI file
        out_path = os.path.join(too_few_slices_path, f'{n_few}_{file}')
        sitk.WriteImage(sitk_img, out_path)

        return False, img_shape

In [98]:
def get_processed_datasets_and_log(root, keywords, not_keywords, mri_type):

    main_path = f"{root}\{mri_type}_raw_mri"
    good_path = f"{root}\{mri_type}_mri_good"
    non_otrthonormal_path = f"{root}\{mri_type}_mri_non_orthonormal"
    too_few_slices_path = f"{root}\{mri_type}_mri_too_few"

    os.makedirs(good_path, exist_ok=True)
    os.makedirs(non_otrthonormal_path, exist_ok=True)
    os.makedirs(too_few_slices_path, exist_ok=True)

    log_good = f"{root}\LOG\{mri_type}_log_good.csv"
    log_non_orthonormal = f"{root}\LOG\{mri_type}_log_non_orthonormal.csv"
    log_too_few_slices = f"{root}\LOG\{mri_type}_log_too_few.csv"

    filepath = 'Processed_Paths'
    slices = 'Slices'
    img_shape = 'Shape'
    n = 'number'

    write_log(log_good, filepath, slices, img_shape, n)
    write_log(log_non_orthonormal, filepath, slices, img_shape, n)
    write_log(log_too_few_slices, filepath, slices, img_shape, n)
    
    n_good = 0
    n_wrong = 0
    n_few = 0

    for root, _, files in sorted(os.walk(main_path)):
        for file in files:
            if mri_type[:3] == 'lge':
                if 'psir' in file and not 'mag' in file:
                    filepath = os.path.join(root, file)
                    try: 
                        good = False
                        og_img = sitk.ReadImage(filepath)
                        good, img_shape = orthogonal_check_and_save(file, og_img, n_good, n_few, good_path, too_few_slices_path)
                        if good:
                            write_log(log_good, filepath, slices, img_shape, n_good)
                            n_good += 1
                            
                        else:
                            write_log(log_too_few_slices, filepath, slices, img_shape, n_few)
                            n_few += 1
                    except Exception as e: 
                        print(e)
                        non_nii = nib.load(filepath)
                        nifti_data = nib.load(filepath).get_fdata()
                        write_log(log_non_orthonormal, filepath, slices, nifti_data.shape, n_wrong)
                        out_path = os.path.join(non_otrthonormal_path, f'{n_wrong}_{file}')
                        non_nii.to_filename(out_path)
                        n_wrong += 1

            elif mri_type[:4] == 'cine':
                filepath = os.path.join(root, file)
                try: 
                    good = False
                    og_img = sitk.ReadImage(filepath)
                    good, img_shape = cine_orthogonal_check_and_save(file, og_img, n_good, n_few, good_path, too_few_slices_path)
                    if good:
                        write_log(log_good, filepath, slices, img_shape, n_good)
                        n_good += 1
                        
                    else:
                        write_log(log_too_few_slices, filepath, slices, img_shape, n_few)
                        n_few += 1
                except Exception as e: 
                    print(e)
                    non_nii = nib.load(filepath)
                    nifti_data = nib.load(filepath).get_fdata()
                    write_log(log_non_orthonormal, filepath, slices, nifti_data.shape, n_wrong)
                    out_path = os.path.join(non_otrthonormal_path, f'{n_wrong}_{file}')
                    non_nii.to_filename(out_path)
                    n_wrong += 1
                    

                            
                            

                

In [99]:
# cine or lge and view (sax/ 2ch/ 4ch etc )
mri_type = 'cine_sax'

# for lge
keywords = ["psir"]
not_keywords = ["mag"]
root_path = 'L:\\basic\Personal Archive\E\emquist\parsing_MRI\\final_test_seg'


get_processed_datasets_and_log(root_path, keywords, not_keywords, mri_type)

Exception thrown in SimpleITK ImageFileReader_Execute: D:\a\1\sitk-build\ITK\Modules\IO\NIFTI\src\itkNiftiImageIO.cxx:2016:
ITK ERROR: ITK only supports orthonormal direction cosines.  No orthonormal definition found!
Exception thrown in SimpleITK ImageFileReader_Execute: D:\a\1\sitk-build\ITK\Modules\IO\NIFTI\src\itkNiftiImageIO.cxx:2016:
ITK ERROR: ITK only supports orthonormal direction cosines.  No orthonormal definition found!
Exception thrown in SimpleITK ImageFileReader_Execute: D:\a\1\sitk-build\ITK\Modules\IO\NIFTI\src\itkNiftiImageIO.cxx:2016:
ITK ERROR: ITK only supports orthonormal direction cosines.  No orthonormal definition found!
Exception thrown in SimpleITK ImageFileReader_Execute: D:\a\1\sitk-build\ITK\Modules\IO\NIFTI\src\itkNiftiImageIO.cxx:2016:
ITK ERROR: ITK only supports orthonormal direction cosines.  No orthonormal definition found!
Exception thrown in SimpleITK ImageFileReader_Execute: D:\a\1\sitk-build\ITK\Modules\IO\NIFTI\src\itkNiftiImageIO.cxx:2016:
ITK 