## Adding metadata when converting to DICOM


When converting data to DICOM (as described in the conversion tools notebook), DICOM tags can be inserted during conversion, so that experimental metadata is included in the output DICOM dataset. This eliminates the need for a post-processing step to add metadata, which speeds up the total time to create a complete DICOM dataset.

See also relevant [Bio-Formats documentation](https://bio-formats.readthedocs.io/en/stable/users/comlinetools/conversion.html#cmdoption-bfconvert-extra-metadata)

### Recap of required packages

In [None]:
# Required for downloading data from IDC
!pip install idc-index

# Install bfconvert via bftools
# Install bfconvert via bftools
!wget https://downloads.openmicroscopy.org/bio-formats/7.3.1/artifacts/bftools.zip
!unzip bftools.zip

### Download SVS input data

In [None]:
# Download sample data from OpenSlide
!wget https://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/CMU-1-Small-Region.svs

### Write supplemental metadata file

DICOM tags to be written are provided as a JSON file.

The structure of the JSON file is based on that used by [dcmqi](https://github.com/QIICR/dcmqi/tree/master/doc/examples), but with several additions.

Additional technical discussion of how to represent DICOM tags in JSON is available [here](https://github.com/ome/bioformats/pull/4016).

#### Basic tag structure

Each DICOM tag is a single JSON object, e.g.:

```
{
   "BodyPartExamined": {
     "Value": "BRAIN",
     "VR": "CS",
     "Tag": "(0018,0015)"
   }
}
```

The object's name (`BodyPartExamined`) should be the name of the tag in the DICOM dictionary, with spaces removed.
There is only 1 required key/value pair:

- `Value` (here, `BRAIN`), which is the tag's value

There are also 3 optional key/value pairs:

- `Tag` (here, `(0018,0015)`, which is the tag corresponding to the object name in the DICOM dictionary. If not defined, this will be looked up automatically.
- `VR` (here `CS`), which is the value representation to use when writing the tag. If not defined, the default VR will be looked up in the DICOM dictionary.
- `ResolutionStrategy`, which defines what to do with this tag it was defined multiple times. Valid values are `IGNORE`, `APPEND`, and `REPLACE`. `APPEND` is the default if the `VR` is `SQ` (a sequence), or `REPLACE` for all other VRs.


#### Writing values for different VRs

The `Value` is interpreted according to the VR that was either defined or looked up in the dictionary.

For VRs representing a string of characters (e.g. `SH`), the `Value` is used directly. It is not necessary to ensure that `Value` contains an even number of characters. If needed, Bio-Formats' DICOM writer will pad the string to the correct width.

For VRs representing a numeric type (e.g. `US`), the `Value` is parsed and then saved to DICOM as the correct type (e.g. uint16 for `US`). When a value multiplicity greater than 1 (i.e. an array of values) is needed, the values should be separated by a comma:


```
{
    "ReferencedFrameNumber": {
        "Value": "1,3,5,9",
        "VR": "IS",
        "Tag": "(0008,1160)"
    }
}
```

#### Handling duplicate or conflicting tags

In the first example above, tag `(0018,0015)` (`BodyPartExamined`) would always be set to `BRAIN`. In this example though:

```
{
   "BodyPartExamined": {
     "Value": "BRAIN",
     "VR": "CS",
     "Tag": "(0018,0015)",
     "ResolutionStrategy": "IGNORE"
   }
}
```

tag `(0018,0015)` (`BodyPartExamined`) would only be set to `BRAIN` if the tag wasn't previously defined.

`ResolutionStrategy` is particularly useful when trying to alter metadata that Bio-Formats' DICOM writer already writes. For example, Bio-Formats will automatically write an `OpticalPathSequence` with the appropriate number of channels, but may have missing wavelengths or other metadata. To fully replace the default `OpticalPathSequence`, the entire sequence can be defined with a `ResolutionStrategy` of `REPLACE`:

```
  "OpticalPathSequence": {
    "VR": "SQ",
    "Tag": "(0048,0105)",
    "Sequence": {
      "IlluminationTypeCodeSequence": {
        "VR": "SQ",
        "Tag": "(0022,0016)",
        "Sequence": {
          "CodeValue": {
            "VR": "SH",
            "Tag": "(0008,0100)",
            "Value": "111743"
          },
          "CodingSchemeDesignator": {
            "VR": "SH",
            "Tag": "(0008,0102)",
            "Value": "DCM"
          },
          "CodeMeaning": {
            "VR": "LO",
            "Tag": "(0008,0104)",
            "Value": "Epifluorescence illumination"
          }
        }
      },
      "IlluminationWaveLength": {
        "VR": "FL",
        "Tag": "(0022,0055)",
        "Value": "488.0"
      },
      "OpticalPathIdentifier": {
        "VR": "SH",
        "Tag": "(0048,0106)",
        "Value": "1"
      },
      "OpticalPathDescription": {
        "VR": "ST",
        "Tag": "(0048,0107)",
        "Value": "replacement channel"
      }
    },
    "ResolutionStrategy": "REPLACE"
   }
   ```

### Convert SVS to DICOM with supplemental metadata

In [None]:
# save one of the JSON examples to a file
# edit this as needed, or paste a different example from above
json = '''{
   "BodyPartExamined": {
     "Value": "BRAIN",
     "VR": "CS",
     "Tag": "(0018,0015)"
   }
}'''
with open('supplemental-metadata.json', 'w') as f:
    f.write(json)

In [None]:
!cat supplemental-metadata.json
!./bftools/bfconvert -noflat -precompressed CMU-1-Small-Region.svs CMU-1.dcm -extra-metadata supplemental-metadata.json