Format dicom #52

tcpan · 2024-05-04T04:30:55Z

Updated read and write. Better standard compliance for write. Reading now allows random data access.

…, and chunked

…for the gain. ENH: combined loop to reduce file opens.

tompollard · 2024-05-04T22:33:06Z

Thanks @tcpan. I think we should go ahead and merge, but we will have to do some restructuring afterwards because our current framework expects each module in the formats folder to represent a separate format.

@bemoody @briangow my suggestion would be either:

allowing formats to include utilities in a subfolder within formats. e.g.

waveform_benchmark/formats/dicom.py -> leave in current location

waveform_benchmark/formats/dcm_rand_access.py -> waveform_benchmark/formats/dicom/rand_access.py

waveform_benchmark/formats/dcm_waveform_reader.py -> waveform_benchmark/formats/dicom/waveform_reader.py

waveform_benchmark/formats/dcm_waveform_writer.py ->
waveform_benchmark/formats/dicom/waveform_writer.py
Store rand_access.py, waveform_reader.py, waveform_writer.py in a separate repository, which is then added as a dependency here. This might make sense if we consider the modules to be implementing the format, rather than just applying it.

briangow

@tcpan , thanks so much for this - preparing the DICOM format for benchmarking clearly took a lot of work!

I've left a couple of comments in the code. One other thought is that it might be cleaner to move the dcm_* files under a separate folder (perhaps formats/dcm_utils/dcm_*) such that the only files at the formats/ level are the ones that get called for benchmarking. This is completely up to you though, no worries if you prefer to leave them where they are. @tompollard beat me to this suggestion above.

briangow · 2024-05-04T22:28:41Z

waveform_benchmark/formats/dicom.py

+        # import json
+
+        # ======== organize the waveform chunks  (tested)
+        print("INPUT pleth", waveforms['Pleth'])


Perhaps this line should get commented out.

briangow · 2024-05-04T22:31:22Z

waveform_benchmark/formats/dicom.py

+# MultichannelRespiratoryWaveform: RESP, >1, , unconstrained, DCID 3005 “Respiration Waveform” , SS/SL
+# 
+
+CHANNEL_TO_DICOM_IOD = {


Do we need to add the other channel types from the proposed suite of waveforms for benchmarking? See Waveform suite characterization here: #11 (comment) .

yes, this needs to be updated.

@tcpan , I've started an issue to track this: #53

tcpan · 2024-05-05T12:40:23Z

Hi, Brian, sorry for the delay. Was working on house stuff all day. Will do re subfolder. Is formats the right place to put that? Get Outlook for iOS<https://aka.ms/o0ukef>

________________________________ From: Brian Gow ***@***.***> Sent: Saturday, May 4, 2024 6:40:15 PM To: chorus-ai/chorus_waveform ***@***.***> Cc: Tony Pan ***@***.***>; Mention ***@***.***> Subject: Re: [chorus-ai/chorus_waveform] Format dicom (PR #52) @briangow commented on this pull request. @tcpan<https://github.com/tcpan> , thanks so much for this - preparing the DICOM format for benchmarking clearly took a lot of work! I've left a couple of comments in the code. One other thought is that it might be cleaner to move the dcm_* files under a separate folder (perhaps formats/dcm_utils/dcm_*) such that the only files at the formats/ level are the ones that get called for benchmarking. This is completely up to you though, no worries if you prefer to leave them where they are.

________________________________ In waveform_benchmark/formats/dicom.py<#52 (comment)>:

- # dicom.save_as("/mnt/c/Users/tcp19/Downloads/Compressed/test_waveform.dcm", write_like_original=False)

+ def write_waveforms(self, path, waveforms): + fs = FileSet() + + # one series per modality + # as many multiplexed groups as allowed by modality + # one instance per chunk + + # one dicomdir per study? or series? + studyInstanceUID = uid.generate_uid() + seriesInstanceUID = uid.generate_uid() + prefix, ext = os.path.splitext(path) + # import json + + # ======== organize the waveform chunks (tested) + print("INPUT pleth", waveforms['Pleth']) Perhaps this line should get commented out.

________________________________ In waveform_benchmark/formats/dicom.py<#52 (comment)>:

+# name: modality name, maximum sequences, grouping, SOPUID, max samples, sampling frequency, source, datatype,

+# TwelveLeadECGWaveform: ECG, 1-5, {1: I,II,III; 2: aVR, aVL, aVF; 3: V1, V2, V3; 4: V4, V5, V6; 5: II}, 16384, 200-1000, DCID 3001 “ECG Lead”, SS +# GeneralECGWaveform: ECG, 1-4, 1-24 per serquence, ?, 200-1000, DCID 3001 “ECG Lead”, SS +# General32BitECGWaveform: ECG, 1-4, 1-24 per serquence, ?, by confirmance statement, DCID 3001 “ECG Lead”, SL +# AmbulatoryECGWaveform: ECG, 1, 1-12, maxsize of wvaeform data attribute, 50-1000, DCID 3001 “ECG Lead”, SB/SS +# HemodynamicWaveform: HD, 1-4, 1-8, maxsize of wvaeform data attribute, <400, , SS +# CardiacElectrophysiologyWaveform: EPS, 1-4, , <=20000, DCID 3011 “Electrophysiology Anatomic Location” , SS +# ArterialPulseWaveform: HD, 1, 1, ? , <600, DCID 3004 “Arterial Pulse Waveform” , SB/SS +# RespiratoryWaveform: RESP, 1, 1, ? , <100, DCID 3005 “Respiration Waveform”, SB/SS +# ScalpEEGWaveform: EEG, 1 (interruption as separate instances), 1-64, , unconstrained, DCID 3030 “EEG Lead”, SS/SL +# ElectromyogramWaveform: EMG, unconstrained, 1-64, , unconstrained, DCID 3031 “Lead Location Near or in Muscle” or DCID 3032 “Lead Location Near Peripheral Nerve”, SS/SL +# SleepEEGWaveform: EEG, unconstrained, 1-64, , unconstrained, DCID 3030 “EEG Lead” , SS/SL +# MultichannelRespiratoryWaveform: RESP, >1, , unconstrained, DCID 3005 “Respiration Waveform” , SS/SL +# + +CHANNEL_TO_DICOM_IOD = { Do we need to add the other channel types from the proposed suite of waveforms for benchmarking? See Waveform suite characterization here: #11 (comment)<#11 (comment)> . — Reply to this email directly, view it on GitHub<#52 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAHFAKKH54S6CYQJY6LHO4LZAVPU7AVCNFSM6AAAAABHGQEHJCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMZZGU2TMMZTGA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

briangow · 2024-05-05T22:45:08Z

Hi, Brian, sorry for the delay. Was working on house stuff all day. Will do re subfolder. Is formats the right place to put that?

@tcpan , it's up to you. If you feel that these files are useful beyond this repository perhaps they deserve a dedicated place as @tompollard suggested in this 2nd thought on the topic. It would probably make sense to create a general utils module in this repository at some point. If you prefer that option feel free to set it up or let me know and I'd be happy to do so. As far as I'm concerned, having them in a subfolder under formats/ is also fine.

…sing of specified tags only. FIX: input chunks of the same channels were recorded as separate channels in the channel definition. ENH: optimization with deferred dataset reading, and avoid using pydicom's reading functions where possible. ENH: moved support files to format/dcm_utils

tcpan · 2024-05-06T17:23:27Z

I have checked in some updates, with some fixes and some new tests for different chunking strategy (likely not to make a huge difference). Note that to get the channel metadata, each file is opened and scanned. A more efficient way to do this would be to add the channel metadata as a private tag on the DICOMDIR file. This would still be standards compliant, but I have not implemented this.

briangow · 2024-05-06T20:20:46Z

Thanks @tcpan !

tompollard · 2024-05-07T13:20:18Z

Hi, Brian, sorry for the delay. Was working on house stuff all day. Will do re subfolder. Is formats the right place to put that?

@tcpan , it's up to you. If you feel that these files are useful beyond this repository perhaps they deserve a dedicated place as @tompollard suggested in this 2nd thought on the topic. It would probably make sense to create a general utils module in this repository at some point. If you prefer that option feel free to set it up or let me know and I'd be happy to do so. As far as I'm concerned, having them in a subfolder under formats/ is also fine.

I think the cleanest option is to store rand_access.py, waveform_reader.py, waveform_writer.py in a separate "DICOM-waveform" repository, because (as far as I understand it) the code implements the functionality for writing waveforms to DICOM. If we choose DICOM as the format, we will need to have a package that enables these functions (and actually, I assume there will be demand by the community for this functionality regardless). I'd be happy to help set up a pip installable version of this repo if you like this option.

If we want to go for the quick and easy option, my preference would be to move them to a submodule of the formats folder. I don't particularly like the idea of a generic utils submodule, because it has potential to become a dumping ground for random code.

tcpan added 5 commits May 2, 2024 17:00

ENH: more proper dicom compliance: channels are grouped appropriately…

09d97d1

…, and chunked

Merge branch 'main' into format-dicom

bacf261

FIX: value mismatch. ENH: reduce running time for metadata access.

619f7d1

ENH: clean up commented out code.

e59b6b9

FIX: lower bit rate version was using 8 bit which was not sufficient …

0b2ee6b

…for the gain. ENH: combined loop to reduce file opens.

briangow reviewed May 4, 2024

View reviewed changes

tcpan added 3 commits May 6, 2024 10:39

Merge branch 'main' into format-dicom

b262064

ENH: removed unused method

86131d9

tcpan mentioned this pull request May 6, 2024

Tidying DICOM format for the purpose of testing #42 #50

Closed

briangow merged commit 51f49b1 into main May 6, 2024
1 of 2 checks passed

tcpan mentioned this pull request May 7, 2024

updated DICOM to use private metadata tags to speed up file load. #54

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Format dicom #52

Format dicom #52

tcpan commented May 4, 2024

tompollard commented May 4, 2024

briangow left a comment •

edited

Loading

briangow May 4, 2024

tcpan May 6, 2024

briangow May 4, 2024

tcpan May 6, 2024

briangow May 6, 2024

tcpan commented May 5, 2024 via email

briangow commented May 5, 2024

tcpan commented May 6, 2024

briangow commented May 6, 2024

tompollard commented May 7, 2024

Format dicom #52

Format dicom #52

Conversation

tcpan commented May 4, 2024

tompollard commented May 4, 2024

briangow left a comment • edited Loading

Choose a reason for hiding this comment

briangow May 4, 2024

Choose a reason for hiding this comment

tcpan May 6, 2024

Choose a reason for hiding this comment

briangow May 4, 2024

Choose a reason for hiding this comment

tcpan May 6, 2024

Choose a reason for hiding this comment

briangow May 6, 2024

Choose a reason for hiding this comment

tcpan commented May 5, 2024 via email

briangow commented May 5, 2024

tcpan commented May 6, 2024

briangow commented May 6, 2024

tompollard commented May 7, 2024

briangow left a comment •

edited

Loading