# RSNA2024 LSDC Making Dataset
In this competition, handling the dataset images seems a little difficult.

In my method, the instance numbers specified for each condition and level are used as part of the input image, and a total of 25 channels of input images are collected for each study ID.

### My other Notebooks
- [RSNA2024 LSDC Making Dataset](https://www.kaggle.com/code/itsuki9180/rsna2024-lsdc-making-dataset) <- you're reading now
- [RSNA2024 LSDC Training Baseline](https://www.kaggle.com/code/itsuki9180/rsna2024-lsdc-training-baseline)
- [RSNA2024 LSDC Submission Baseline](https://www.kaggle.com/code/itsuki9180/rsna2024-lsdc-submission-baseline)



# Import Libralies

In [1]:
import pydicom
import glob, os
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import cv2
from tqdm import tqdm
import re

  from pandas.core import (


In [2]:
rd = 'rsna-2024-lumbar-spine-degenerative-classification'

In [3]:
def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    return [ atoi(c) for c in re.split(r'(\d+)', text) ]

# Reading and Taking a look csv

In [4]:
dfc = pd.read_csv(f'{rd}/train_label_coordinates.csv')

In [5]:
df = pd.read_csv(f'{rd}/train_series_descriptions.csv')
df.head()

Unnamed: 0,study_id,series_id,series_description
0,4003253,702807833,Sagittal T2/STIR
1,4003253,1054713880,Sagittal T1
2,4003253,2448190387,Axial T2
3,4646740,3201256954,Axial T2
4,4646740,3486248476,Sagittal T1


In [6]:
from collections import defaultdict
coords = pd.read_csv(os.path.join(rd, 'train_label_coordinates.csv'))
dick = {}
for i in glob.glob(f'{rd}/train_images/*/*/*.dcm'):
    splitted = i.split('/')
    si = splitted[2]
    series_id = splitted[3]
    fn = splitted[4][:-4]
    dick[str(si) + '_' + str(series_id) + '_' + str(fn)] = []
    
for i in range(len(coords)):
    si = coords.iloc[i, 0]
    series_id = coords.iloc[i, 1]
    fn = coords.iloc[i, 2]
    level = coords.iloc[i, 4]
    x = coords.iloc[i, 5]
    y = coords.iloc[i, 6]
    dick[str(si) + '_' + str(series_id) + '_' + str(fn)].append([x, y, level])

For each study_id, we can observe 3 to 6 series_ids.

In [7]:
df['series_description'].value_counts()

series_description
Axial T2            2340
Sagittal T1         1980
Sagittal T2/STIR    1974
Name: count, dtype: int64

We note that most study_ids with four or more series_ids have two or more Axial T2s.

In [8]:
df[df['study_id']==4096820034]

Unnamed: 0,study_id,series_id,series_description
5997,4096820034,300517765,Axial T2
5998,4096820034,2097107888,Axial T2
5999,4096820034,2602265508,Sagittal T2/STIR
6000,4096820034,2679683906,Axial T2
6001,4096820034,3114813181,Axial T2
6002,4096820034,3236751045,Sagittal T1


In [9]:
dfc[dfc['study_id']==4096820034]

Unnamed: 0,study_id,series_id,instance_number,condition,level,x,y
46392,4096820034,300517765,5,Right Subarticular Stenosis,L4/L5,305.5,372.444444
46393,4096820034,300517765,11,Right Subarticular Stenosis,L5/S1,298.388889,358.222222
46394,4096820034,2097107888,10,Left Subarticular Stenosis,L1/L2,348.379062,378.070442
46395,4096820034,2097107888,16,Left Subarticular Stenosis,L2/L3,344.193491,369.699299
46396,4096820034,2097107888,22,Left Subarticular Stenosis,L3/L4,342.100705,359.235371
46397,4096820034,2602265508,7,Spinal Canal Stenosis,L1/L2,268.706316,88.525822
46398,4096820034,2602265508,7,Spinal Canal Stenosis,L2/L3,257.288476,135.399061
46399,4096820034,2602265508,7,Spinal Canal Stenosis,L3/L4,247.673452,180.469484
46400,4096820034,2602265508,7,Spinal Canal Stenosis,L4/L5,244.668757,220.732394
46401,4096820034,2602265508,7,Spinal Canal Stenosis,L5/S1,253.081903,260.394366


We can see that it corresponds as follows:

- Axial T2 => (Left|Right) Subarticular Stenosis (10 classes)
- Sagittal T2/STIR => Spinal Canal Stenosis (5 Classes)
- Sagittal T1 => (Left|Right) Neural Foraminal Narrowing (10 classes)

I found it difficult to order Axial T2 well, so we decided to select them randomly during training. For the other two, we will save images at equal intervals.

# Export png from dcm
.dcm format files have various pixel values and image shapes. To use them in a deep learning framework, we will make the values ​​fall within a certain range and resize the shapes to 512px.

In [10]:
def imread_and_imwirte(src_path, dst_path):
    dicom_data = pydicom.dcmread(src_path)
    image = dicom_data.pixel_array
    shape_original = image.shape
    image = (image - image.min()) / (image.max() - image.min() +1e-6) * 255
    img = cv2.resize(image, (256, 256))
    assert img.shape==(256,256)
    cv2.imwrite(dst_path, img)
    return [256/shape_original[0], 256/shape_original[1]]

In [11]:
st_ids = df['study_id'].unique()
st_ids[:3], len(st_ids)

(array([4003253, 4646740, 7143189]), 1975)

In [12]:
desc = list(df['series_description'].unique())
desc

['Sagittal T2/STIR', 'Sagittal T1', 'Axial T2']

In [13]:
num_images_in_si = {si: {'Axial T2':0, 'Sagittal T1':0, 'Sagittal T2/STIR':0}
                    for si in st_ids}
or_names_to_new = {}
for idx, si in enumerate(tqdm(st_ids, total=len(st_ids))):
    pdf = df[df['study_id']==si]
    for ds in desc:
        ds_ = ds.replace('/', '_')
        pdf_ = pdf[pdf['series_description']==ds]
        os.makedirs(f'cvt_png/{si}/{ds_}', exist_ok=True)
        allimgs = []
        for i, row in pdf_.iterrows():
            pimgs = glob.glob(f'{rd}/train_images/{row["study_id"]}/{row["series_id"]}/*.dcm')
            pimgs = sorted(pimgs, key=natural_keys)
            allimgs.extend(pimgs)
            
        if len(allimgs)==0:
            print(si, ds, 'has no images')
            continue

        if ds == 'Axial T2':
            for j, impath in enumerate(allimgs):
                splitted = impath.split('/')
                sii = splitted[2]
                sri = splitted[3]
                fn = splitted[4][:-4]
                full = str(sii) + '_' + str(sri) + '_' + str(fn)
                dst = f'cvt_png/{si}/{ds}/{j:03d}.png'
                ratios = imread_and_imwirte(impath, dst)
                or_names_to_new[full] = str(si) + '_' + str(ds) + '_' + '0' * (3 - len(str(j))) + str(j)
                for i in range(len(dick[full])):
                    dick[full][i][0] *= ratios[0]
                    dick[full][i][1] *= ratios[1]
                
            num_images_in_si[si][ds] += len(allimgs)
                
        elif ds == 'Sagittal T2/STIR':
            for j, i in enumerate(np.arange(0, len(allimgs), 1)):
                dst = f'cvt_png/{si}/{ds_}/{j:03d}.png'
                ind2 = i
                splitted = allimgs[ind2].split('/')
                sii = splitted[2]
                sri = splitted[3]
                fn = splitted[4][:-4]
                full = str(sii) + '_' + str(sri) + '_' + str(fn)
                or_names_to_new[full] = str(si) + '_' + str(ds_) + '_' + '0' * (3 - len(str(j))) + str(j)
                ratios = imread_and_imwirte(allimgs[ind2], dst)
                
                for i in range(len(dick[full])):
                    dick[full][i][0] *= ratios[0]
                    dick[full][i][1] *= ratios[1]
                    
            num_images_in_si[si][ds] += len(allimgs)
            
#             assert len(glob.glob(f'cvt_png/{si}/{ds_}/*.png'))==15
                
        elif ds == 'Sagittal T1':

            for j, i in enumerate(np.arange(0, len(allimgs), 1)):
                dst = f'cvt_png/{si}/{ds}/{j:03d}.png'
                ind2 = i
                splitted = allimgs[ind2].split('/')
                sii = splitted[2]
                sri = splitted[3]
                fn = splitted[4][:-4]
                full = str(sii) + '_' + str(sri) + '_' + str(fn)
                or_names_to_new[full] = str(si) + '_' + str(ds) + '_' + '0' * (3 - len(str(j))) + str(j)
                ratios = imread_and_imwirte(allimgs[ind2], dst)
                
                for i in range(len(dick[full])):
                    dick[full][i][0] *= ratios[0]
                    dick[full][i][1] *= ratios[1]
                
            num_images_in_si[si][ds] += len(allimgs)


#             assert len(glob.glob(f'cvt_png/{si}/{ds}/*.png'))==15

 57%|██████████████████████▏                | 1126/1975 [04:29<03:04,  4.61it/s]

2492114990 Sagittal T1 has no images


 64%|█████████████████████████▏             | 1273/1975 [05:02<02:32,  4.60it/s]

2780132468 Sagittal T1 has no images


 70%|███████████████████████████▏           | 1378/1975 [05:27<01:49,  5.46it/s]

3008676218 Sagittal T2/STIR has no images


100%|███████████████████████████████████████| 1975/1975 [07:45<00:00,  4.25it/s]


# Continuing with the [Training Baseline...](https://www.kaggle.com/code/itsuki9180/rsna2024-lsdc-training-baseline)

In [14]:
num_images_in_si

{4003253: {'Axial T2': 43, 'Sagittal T1': 15, 'Sagittal T2/STIR': 15},
 4646740: {'Axial T2': 54, 'Sagittal T1': 17, 'Sagittal T2/STIR': 17},
 7143189: {'Axial T2': 23, 'Sagittal T1': 17, 'Sagittal T2/STIR': 17},
 8785691: {'Axial T2': 21, 'Sagittal T1': 15, 'Sagittal T2/STIR': 15},
 10728036: {'Axial T2': 82, 'Sagittal T1': 19, 'Sagittal T2/STIR': 19},
 11340341: {'Axial T2': 45, 'Sagittal T1': 18, 'Sagittal T2/STIR': 18},
 11943292: {'Axial T2': 55, 'Sagittal T1': 15, 'Sagittal T2/STIR': 15},
 13317052: {'Axial T2': 27, 'Sagittal T1': 17, 'Sagittal T2/STIR': 17},
 22191399: {'Axial T2': 58, 'Sagittal T1': 18, 'Sagittal T2/STIR': 18},
 26342422: {'Axial T2': 70, 'Sagittal T1': 17, 'Sagittal T2/STIR': 17},
 29931867: {'Axial T2': 70, 'Sagittal T1': 23, 'Sagittal T2/STIR': 23},
 33736057: {'Axial T2': 22, 'Sagittal T1': 15, 'Sagittal T2/STIR': 15},
 38281420: {'Axial T2': 31, 'Sagittal T1': 17, 'Sagittal T2/STIR': 17},
 40745534: {'Axial T2': 57, 'Sagittal T1': 18, 'Sagittal T2/STIR': 1

In [15]:
dick

{'435973854_579171714_36': [[102.74893617021276, 171.02978723404257, 'L1/L2']],
 '435973854_579171714_45': [],
 '435973854_579171714_35': [],
 '435973854_579171714_40': [],
 '435973854_579171714_20': [],
 '435973854_579171714_39': [],
 '435973854_579171714_44': [],
 '435973854_579171714_27': [],
 '435973854_579171714_18': [],
 '435973854_579171714_25': [],
 '435973854_579171714_4': [],
 '435973854_579171714_11': [],
 '435973854_579171714_16': [],
 '435973854_579171714_6': [],
 '435973854_579171714_9': [[106.74326241134753, 162.6780141843972, 'L5/S1']],
 '435973854_579171714_22': [[111.82695035460993, 144.88510638297876, 'L3/L4']],
 '435973854_579171714_28': [],
 '435973854_579171714_42': [],
 '435973854_579171714_17': [],
 '435973854_579171714_41': [],
 '435973854_579171714_3': [],
 '435973854_579171714_5': [],
 '435973854_579171714_34': [],
 '435973854_579171714_15': [[116.18439716312058, 148.8794326241135, 'L4/L5']],
 '435973854_579171714_37': [],
 '435973854_579171714_32': [],
 '435

In [16]:
dick_for_train = {or_names_to_new[name]: dick[name]
                  for name in list(dick.keys()) if len(dick[name]) > 0}

In [17]:
dick_for_train

{'435973854_Axial T2_035': [[102.74893617021276, 171.02978723404257, 'L1/L2']],
 '435973854_Axial T2_008': [[106.74326241134753, 162.6780141843972, 'L5/S1']],
 '435973854_Axial T2_021': [[111.82695035460993, 144.88510638297876, 'L3/L4']],
 '435973854_Axial T2_014': [[116.18439716312058, 148.8794326241135, 'L4/L5']],
 '435973854_Axial T2_028': [[111.10070921985817, 154.32624113475177, 'L2/L3']],
 '435973854_Sagittal T2_STIR_008': [[153.77433004231312,
   97.48942172073343,
   'L1/L2'],
  [143.30324400564174, 127.81946403385048, 'L2/L3'],
  [137.52609308885755, 160.67700987306065, 'L3/L4']],
 '435973854_Sagittal T2_STIR_007': [[143.30324400564174,
   188.84062059238366,
   'L4/L5'],
  [152.69111424541606, 211.94922425952046, 'L5/S1']],
 '435973854_Axial T2_082': [[133.34658548233048, 170.91117478510026, 'L1/L2']],
 '435973854_Axial T2_068': [[134.8136341929322, 145.48233046800382, 'L3/L4']],
 '435973854_Axial T2_061': [[138.97027220630375, 149.39446036294177, 'L4/L5']],
 '435973854_Axial

In [18]:
df_for_train = pd.DataFrame()
width, height = 90, 90
filenames, levels, xmins, ymins, xmaxs, ymaxs = [], [], [], [], [], []
for key in list(dick_for_train.keys()):
    if 'T2' in key:
        continue
    for i in dick_for_train[key]:
        xmin, ymin = max(0.0, min(i[0]-width/2, 255.0)), max(0.0, min(i[1]-height/2, 255.0))
        xmax, ymax = min(255.0, i[0]+width/2)+1, min(255.0, i[1]+height/2)+1
        levels.append(i[2])
        splitted = key.split('_')
        fn = splitted[0] + '/' + '_'.join(splitted[1:-1]) + '/' + splitted[-1] + '.png'
        filenames.append(fn)
        xmins.append(xmin)
        ymins.append(ymin)
        xmaxs.append(xmax)
        ymaxs.append(ymax)
        
df_for_train['filename'] = filenames
df_for_train['x1'] = xmins
df_for_train['y1'] = ymins
df_for_train['x2'] = xmaxs
df_for_train['y2'] = ymaxs
df_for_train['labels'] = levels
df_for_train

Unnamed: 0,filename,x1,y1,x2,y2,labels
0,435973854/Sagittal T1/003.png,95.172695,168.432905,186.172695,256.000000,L5/S1
1,435973854/Sagittal T1/010.png,108.492734,47.322325,199.492734,138.322325,L1/L2
2,435973854/Sagittal T1/010.png,95.980185,78.772787,186.980185,169.772787,L2/L3
3,435973854/Sagittal T1/010.png,90.231176,108.194188,181.231176,199.194188,L3/L4
4,435973854/Sagittal T1/010.png,86.849406,138.968296,177.849406,229.968296,L4/L5
...,...,...,...,...,...,...
19719,3207960359/Sagittal T1/011.png,90.937571,12.825199,181.937571,103.825199,L1/L2
19720,3207960359/Sagittal T1/011.png,85.997730,46.677639,176.997730,137.677639,L2/L3
19721,3207960359/Sagittal T1/011.png,81.639047,79.948922,172.639047,170.948922,L3/L4
19722,3207960359/Sagittal T1/013.png,77.135074,146.055619,168.135074,237.055619,L5/S1


In [19]:
df_for_train.to_csv('detect.csv', index=False)