Skip to content
David Gobbi edited this page Oct 4, 2017 · 12 revisions

Scan a directory tree for DICOM files, and print the metadata in a format usable for a spreadsheet or database.

Usage

dicomtocsv [options] <directory>

Options

-k tag=value      Provide a key to be queried and matched.
-q <query.txt>    Provide a file to describe the find query.
-u <uids.txt>     Provide a file that contains a list of UIDs.
-o <data.csv>     Provide a file for the query results.
--first-nonzero   Search series for first nonzero value of each key.
--all-unique      Report all unique values within each series.
--min-value       Report the minimum value within each series.
--max-value       Report the maximum value within each series.
--directory-only  Use directory scan only, do not re-scan files.
--ignore-dicomdir Ignore the DICOMDIR file even if it is present.
--charset <cs>    Charset to use if SpecificCharacterSet is missing.
--images-only     Only list files that have PixelData or equivalent.
--noheader        Do not print the csv header.
--study           Print one record for each study.
--series          Print one record for each series (default).
--image           Print one record for each image.
--silent          Do not report any progress information.
--help            Print a brief help message.
--version         Print the software version.

Description

This program will create a .csv file in accordance with the supplied query information. If no query is specified, then a default query will be used. The .csv file will list the attributes of each DICOM series that is found within a directory tree.

Details

For each attribute to be extracted, the tag can be given with "-k", for example "-k PatientName". The attributes can also be specified with "-q query.txt" where each line of "query.txt" gives one attribute. For detailed information on how to specify a query, see Command Line Tools.

The output file is formatted as follows, with one header line followed by the comma-separated, quote-enclosed values.

PatientName,PatientBirthDate,PatientSex,StudyDate,StudyTime,SeriesNumber,SeriesDescription
"TEST^PATIENT","19360703","M","20140603","105200","2","Cerebral  4.0  H31s"

The command to produce this output is as follows, which searches the current directory for DICOM files:

dicomtocsv . -k PatientName -k PatientBirthDate -k PatientSex -k StudyDate -k StudyTime \
    -k SeriesNumber -k SeriesDescription

The order of the fields in the .csv file is the same as given on the command line. If the command line repeats an attribute, then that field will still only be listed once, with its first appearance on the command line dictating the order in the .csv file.

All output is given in utf-8, with conversion from the original character set to utf-8 if necessary. If any value contains a quotation mark, the quotation mark will be doubled as per RFC 4180. The file will use <CR><LF> to end each line.

By default, the .csv file will provide one record per series, but with the "--image" option it will provide a record for every file. Similarly, the "--study" option will only provide one record per study. The records are sorted first by patient, then by date, and finally by instance number.

The use of "-o" to give the name of the output file is optional. If no output file is given, then the output will be written to stdout. However, one advantage of using "-o" to specify the output file is that this allows dicomtocsv to print progress information to the terminal, which is useful during a long scan. This progress information can be turned off with the "--silent" option.

The default series-level query is as follows:

# patient-level information
PatientName
PatientID
PatientBirthDate
PatientSex
# study-level information
StudyDate
StudyTime
StudyID
AccessionNumber
StudyDescription
StudyInstanceUID
# series-level information
Modality
SeriesNumber
SeriesDescription
SeriesInstanceUID
Rows
Columns
NumberOfReferences

The "NumberOfReferences" attribute will add a field for the number of files in the series.

When using the "--image" option, the above query will be expanded to include the following fields:

# image-level information
InstanceNumber
SOPClassUID
SOPInstanceUID
ReferencedFileID

The "ReferencedFileID" attribute will add a field for the file name.

If "--directory-only" is given, then the results will be limited to what is present in the DICOMDIR file for the directory (or, in the absence of a DICOMDIR file, to the information that a DICOMDIR typically contains). This option is useful when scanning a CD, since it allows a summary to be provided by scanning just a single file on the CD. The information in the default queries listed above will be provided by most DICOMDIR files.

The "--first-nonzero" option can be used when writing one record per series. For each attribute, it causes dicomtocsv to scan the entire series for that attribute and print the first value that has a nonzero value. It has no effect on non-numeric values. If dicomtocsv is writing one record per series and this option is not used, then the record will show the values from the first file in the series.

See also

  • dicomdump dump all the attributes from a DICOM file
  • dicomfind find DICOM files that match a query