# Read Varian DVH Text Files

## Imports

In [14]:
from typing import Callable, List

from pathlib import Path
import re
from functools import partial

import pandas as pd
import text_reader as tp
from sections import Rule, RuleSet, SectionBreak, Section, ProcessingMethods

## Demo File Path

In [8]:
demo_dvh_folder = Path.cwd() / r'./References/Text Files/DVH files'
demo_dvh_1 = demo_dvh_folder / 'Breast CHWR Relative Dose Relative Volume 1 cGy Step Size.dvh'
#demo_dvh_1.exists()
#demo_dvh_folder.exists()


## DVH File Sections
<style type="text/css">
.tg  {text-align:center;font-family:Arial, sans-serif;font-size:14px;font-weight:bold;}
.th{text-align:center;background-color:#38fff8;font-weight:bold;}
</style>
<table>
<thead><tr><th class="th">Section</th><th class="th">Description</th></tr></thead>
<tbody>
    <tr>
        <td>Patient Information</td>
        <td>
            First Section in file.<br>
            Occurs only Once.<br>
        </tr>
    <tr>
        <td>DVH Information</td>
        <td>
            Information about the DVH source and format.<br>
            Occurs only Once.
            </td>
        </tr>
    <tr>
        <td>Plan Information</td>
        <td>
            Information about the plan generating the DVH.<br>
            Repeated for each plan included in the DVH file.
            </td>
        </tr>
    <tr>
        <td>Structure Information</td>
        <td>
            Information about each structure in the plan DVH.<br>
            Repeated for each Structure included in the DVH file.
            </td>
        </tr>
    <tr>
        <td>Structure Dose Summary</td>
        <td>
            A Summary of the DVH Dose information for the structure.<br>
            Repeated for each Structure included in the DVH file.
            </td>
        </tr>
    <tr>
        <td>DVH Data</td>
        <td>
            Columns containing DVH data for the structure.<br>
            Occurs immediately after the related Structure Information 
            for each Structure included in the DVH file.      
            </td>
        </tr>
</tbody>
</table>

## Initial Header
The initial header contains two parts:
- Patient Information
- DVH Information
  
The information is formatted as two fixed-width columns
- The first column is left-justified with trailing spaces and a final _':'_. 
- The second column is also left-justified but without trailing spaces.
- The second column begins with a space.
- The _'Description'_ value extends over multiple lines with spaces filling the entire first column.
- A blank line occurs at the end of the initial header

<style type="text/css">
    .tg {text-align:left;background-color:#e6ffff;font-family:Arial, sans-serif;font-size:14px;font-weight:bold;border-style: solid dotted solid dotted}
    .th {text-align:center;background-color:#38fff8;font-weight:bold;}
    .tc {text-align:left;background-color:#e0ebeb;font-family: Terminal, monospace;font-size:11px;font-weight:normal;}
</style>
<table>
    <thead><tr>
        <th class="th">Export File Entry</th>
        <th class="th">Value</th>
        <th class="th">Example</th>
    </tr></thead>
<tbody>
    <tr><td colspan="3" class="tg">Patient Information</td></tr>
    <tr>
        <td>Patient Name:</td>
        <td>Name of the patient</td>
        <td class="tc">AXIR, CHWR</td>
        </tr>
    <tr>
        <td>Patient ID:</td>
        <td>Identification of the patient</td>
        <td class="tc">TEST CHWR</td>
        </tr>
    <tr><td colspan="3" class="tg">DVH Information</td></tr>
        <tr>
            <td>Comment:</td>
            <td>User defined comment</td>
            <td class="tc">DVHs for a plan sum</td>
            </tr>
        <tr>
            <td>Date:</td>
            <td>Date and time as defined in the Windows operating system</td>
            <td class="tc">March 17, 2023 11:40:38 AM</td>
            </tr>
        <tr>
            <td>Exported by:</td>
            <td>User name of the user who exported the DVH</td>
            <td class="tc">GS MP</td>
            </tr>
        <tr>
            <td>Type:</td>
            <td>Type of the DVH One of:<br>
                Cumulative<br>
                Differential<br>
                Natural<br>
                </td>
            <td class="tc">Cumulative Dose Volume Histogram</td>
            </tr>
        <tr>
            <td>Description:</td>
            <td>Description of the DVH type exported</td>
            <td class="tc">
                The cumulative DVH displays the percentage (relative)<br>
                or volume (absolute) of structures that receive a dose<br>
                equal to or greater than a given dose.<br>
                </td>
            </tr>
</tbody>
</table>

__Example Initial Header__ (Numbers at the top are for reference only)

> ```
>00000000001111111111222222222233333333334444444444555555555566666666667777777777
>01234567890123456789012345678901234567890123456789012345678901234567890123456789
>
>Patient Name         : AXIR, CHWR
>Patient ID           : TEST CHWR
>Comment              : DVHs for a plan sum
>Date                 : March 17, 2023 11:40:38 AM
>Exported by          : GS MP
>Type                 : Cumulative Dose Volume Histogram
>Description          : The cumulative DVH displays the percentage (relative)
>                       or volume (absolute) of structures that receive a dose
>                       equal to or greater than a given dose.
>
> ```

- Combine *Patient Information* and *DVH Information* into one section.
- Drop the *Description* since it doesn't contain useful information and is 
  more complicated to deal with because it spans multiple lines. 

**Section Parameters**
<table>
<tr><td>name</td><td>Information</td><td>Section name for reference</td></tr>
<tr><td>start_section</td><td>None</td><td>Beginning of file</td></tr>
<tr><td>start_search</td><td>False</td><td>Start section immediately</td></tr>
<tr><td>end_section</td><td>('Description', 'START', 'Before')</td>
  <td>End the section just before the line that begins with 'Description'</td>
  </tr>
<tr><td>processor</td>
  <td>[partial(str.split, sep=':', maxsplit=1),<br>
       tp.trim_items,<br>
       tp.drop_blanks]</td>
  <td>Split each line at the first ':',<br> 
      remove leading and trailing text from each part of the split,<br>
      then drop any empty rows.</td>
  </tr>
  <tr><td>assemble</td><td>tp.to_dict</td>
  <td>Convert the split rows into dictionary items, with the first item as 
      the key and the second as the value.</td>
  </tr>
  </table>

In [15]:
info_split = partial(str.split, sep=':', maxsplit=1)
dvh_info_section = Section(
    name='Information',
    start_section=None,
    end_section=('Description', 'START', 'Before'),
    processor=[info_split, 
               tp.trim_items, 
               tp.drop_blanks],
    assemble=tp.to_dict
    )

In [16]:
demo_dvh_text = demo_dvh_1.read_text(encoding='utf_8_sig').splitlines()
dvh_info_section.read(demo_dvh_text)

{'Patient Name': 'AXIR, CHWR',
 'Patient ID': 'TEST CHWR',
 'Comment': 'DVHs for one plan',
 'Date': 'March 17, 2023 11:40:38 AM',
 'Exported by': 'GS MP',
 'Type': 'Cumulative Dose Volume Histogram'}

### Plan Information
- _Plan Information_ occurs immediately after the _Initial Header_.
- _Plan Information_ is repeated for each plan in the DVH file.
   Often there will be only one plan, but DVH files generated from plan 
   comparison DVHs will contain multiple plans.
- The `Plan` and `Course` information in the _Plan Information_ is also 
  contained in each _Structure Information_ section, and can be used to link the 
  `Total dose` to the appropriate _DVH Dose Data_.
- `Total dose` may be important if conversions between absolute and relative 
  dose are required.
- The Plan Header information is formatted as delimited text, with ': ' as 
  the delimiter.
- For a _Plan sum_ DVH `Total dose [cGy]` and `% for dose (%)` will have 
  values of: `not defined`
- A blank line occurs at the end of the plan header

<style type="text/css">
    .tg {text-align:left;background-color:#e6ffff;
        font-family:Arial, sans-serif;font-size:14px;font-weight:bold;
        border-style: solid dotted solid dotted}
    .th {text-align:center;background-color:#38fff8;font-weight:bold;}
    .tc {text-align:left;background-color:#e0ebeb;
        font-family: Terminal, monospace;font-size:11px;font-weight:normal;}
</style>
<table>
<thead><tr>
    <th class="th">Export File Entry</th>
    <th class="th">Value</th>
    <th class="th">Example</th>
    </tr></thead>
<tbody>
    <tr>
        <td>Plan:</td>
        <td>Plan ID</td>
        <td class="tc">Plan sum: Plan Sum</td>
        </tr>
    <tr>
        <td>Uncertainty plan:</td>
        <td>Plan ID (variation of plan: Plan ID)<br>
            Not present if Plan Uncertainty is not calculated.
            </td>
        <td class="tc">N/A</td>
        </tr>
    <tr>
        <td>Course:</td>
        <td>Course ID</td>
        <td class="tc">Course: C1</td>
        </tr>
    <tr>
        <td>Plan Status:</td>
        <td>One of:
            <ul>
                <li>Approved</li>
                <li>Reviewed</li>
                <li>Unapproved</li>
                <li>Rejected</li>
                </ul>
            Not present if the DVH is for a <i>Plan sum</i>.
            </td>
        <td class="tc">
            Plan Status: Treatment Approved 
            Thursday, January 02, 2020 12:55:56 
            by gsal
            </td>
        </tr>
    <tr>
        <td>Total dose [Gy]:</td>
        <td>Total dose in Gray</td>
        <td class="tc">Total dose [cGy]: 6400.0</td>
        </tr>
    <tr>
        <td>% for dose (%):</td>
        <td>Treatment prescription percentage</td>
        <td class="tc">% for dose (%): 100.0</td>
        </tr>
</tbody>
</table>

__Example Plan Header__

> ```
> Plan: EARR
> Course: C2
> Plan Status: Completed
> Total dose [cGy]: 6400.0
> % for dose (%): 100.0
> 
> ```

## DVH Dose Data
DVH Dose Data is repeated for each structure in the plan that has a calculated DVH.

Each section of DVH data consists of three parts:
- Structure Information
- Structure Dose Summary
- DVH Curve Data


### _Structure Information_ and _Structure Dose Summary_
- _Structure Information_ and _Structure Dose Summary_ are both formatted as 
  delimited text, with '_: _' as the delimiter.
- The `Plan` and `Course` information are also contained in the _Plan Header_, 
  except here it is always labeled `Plan` never `Plan sum` even if the DVH is 
  for a Plan sum.
- Units for values are found at the end of the label in square brackets. e.g.
  > `Mean Dose [cGy]: 505.1`
- Unis will depend on the DVH style Chosen:
  
    <Table>
    <tr><th>Measurement Type</th><th>Absolute</th><th>Relative</th></tr>
    <tr><th>Dose</th><td>[cGy]</td><td>[%]</td></tr>
    <tr><th>Volume</th><td>[cm³]</td><td>[%]</td></tr>
    </Table>

- Some labels will not contain corresponding values. e.g.:
  > `Paddick CI: `
- Some labels will not contain an `N/A` value e.g.:
  > `D98.0% [%]: N/A`
- Sometimes a line will contain only a delimiter with neither a label or a value:
  > `: `
- A blank line occurs between at the end of the _Structure Dose Summary_ and 
  the start of the _DVH Curve Data_



<style type="text/css">
    .tg {text-align:left;background-color:#e6ffff;
        font-family:Arial, sans-serif;font-size:14px;font-weight:bold;
        border-style: solid dotted solid dotted}
    .th {text-align:center;background-color:#38fff8;font-weight:bold;}
    .tc {text-align:left;background-color:#e0ebeb;
        font-family: Terminal, monospace;font-size:11px;font-weight:normal;}
</style>
<table>
    <thead><tr>
        <th class="th">Export File Entry</th>
        <th class="th">Value</th>
        <th class="th">Example</th>
        </tr></thead>
    <tbody>
        <tr><td colspan="3" class="tg">Structure Information</td></tr>
        <tr>
            <td>Structure:</td>
            <td>ID of the structure</td>
             <td class="tc">Structure: Cricoid</td>
            </tr>
    <tr>
        <td>Approval Status:</td>
        <td>One of:
            <ul>
                <li>Approved</li>
                <li>Reviewed</li>
                <li>Unapproved</li>
                <li>Rejected</li>
                </ul>
            </td>
        <td class="tc">Approval Status: Approved</td>
        </tr>
    <tr>
        <td>Plan:</td>
        <td>ID of the plan</td>
        <td class="tc">Plan: EARR</td>
        </tr>
    <tr>
        <td>Course:</td>
        <td>ID of the course</td>
        <td class="tc">Course: C2</td>
        </tr>
    <tr>
        <td>Volume [cm³]:</td>
        <td>Volume. Value shown when dose is absolute.</td>
        <td class="tc">Volume [cm³]: 30.3</td>
        </tr>
    <tr>
    <tr><td colspan="3" class="tg">Structure Dose Summary</td></tr>
    <tr>
        <td>Dose Cover. [%]:</td>
        <td>Percentage of the dose coverage</td>
        <td class="tc">Dose Cover.[%]: 100.0</td>
        </tr>
    <tr>
        <td>Sampling Cover. [%]:</td>
        <td>Percentage of the structure volume used in DVH calculation</td>
        <td class="tc">Sampling Cover.[%]: 100.0</td>
        </tr>
    <tr>
        <td>Min Dose [%]:/Min Dose [Gy]:</td>
        <td>Dose minimum in percentage (relative dose) or Gray (absolute dose)</td>
        <td class="tc">Min Dose [cGy]: 272.3</td>
        </tr>
    <tr>
        <td>Max Dose [%]:/Max Dose [Gy]:</td>
        <td>Dose maximum in percentage (relative dose) or Gray (absolute dose)</td>
        <td class="tc">Max Dose [%]: 106.9</td>
        </tr>
    <tr>
        <td>Mean Dose [%]:/Mean Dose [Gy]:</td>
        <td>Dose mean in percentage (relative dose) or Gray (absolute dose)</td>
        <td class="tc">Mean Dose [%]: 101.6</td>
        </tr>
    <tr>
        <td>Modal Dose [%]:/Modal Dose [Gy]:</td>
        <td>Dose modal in percentage (relative dose) or Gray (absolute dose)</td>
        <td class="tc">Modal Dose [%]: 101.5</td>
        </tr>
    <tr>
        <td>Median Dose [%]:/Median Dose [Gy]:</td>
        <td>Dose median in percentage (relative dose) or Gray (absolute dose)</td>
        <td class="tc">Median Dose [%]: 101.6</td>
        </tr>
    <tr>
        <td>STD [%]:</td>
        <td>Standard deviation</td>
        <td class="tc">STD [%]: 1.1</td>
        </tr>
    <tr>
        <td>NDR:</td>
        <td>Natural dose ratio</td>
        <td class="tc"></td>
        </tr>
    <tr>
        <td>Equiv. Sphere Diam. [cm]:</td>
        <td>Equivalent sphere diameter value</td>
        <td class="tc">Equiv. Sphere Diam. [cm]: 4.2</td>
        </tr>
    <tr>
        <td>Conformity Index:</td>
        <td>Conformity index value</td>
        <td class="tc">Conformity Index: 1.00</td>
        </tr>
    <tr>
        <td>Gradient Measure [cm]:</td>
        <td>Gradient measure value</td>
        <td class="tc">Gradient Measure [cm]: 0.76</td>
        </tr>
    <tr>
        <td>Dose Level [cGy]:</td>
        <td></td>
        <td class="tc">Dose Level [cGy]: </td>
        </tr>
    <tr>
        <td>RTOG CI:</td>
        <td>Radiation Therapy Oncology Group Conformity Index value</td>
        <td class="tc">RTOG CI: </td>
        </tr>
    <tr>
        <td>Paddick CI:</td>
        <td>Paddick Conformity Index value</td>
        <td class="tc">Paddick CI: </td>
        </tr>
    <tr>
        <td>GI:</td>
        <td>Gastrointestinal index value</td>
        <td class="tc">GI: </td>
        </tr>
    <tr>
        <td>ICRU83 HI:</td>
        <td>ICRU 83 Homogeneity Index value</td>
        <td class="tc">ICRU83 HI: </td>
        </tr>
    <tr>
        <td>d<sub>Volume</sub> <b>OR</b> d<sub>Dose</sub> [cm³ <b>OR</b> Gy]:</td>
        <td>Differential DVH curve values</td>
        <td class="tc">
            D95.0% [cGy]: 6400.0<br>
            V95.0% [cm³]: 38.3086
            </td>
        </tr>
    <tr>
        <td>d<sub>Volume</sub>  <b>OR</b> d<sub>U</sub> 
            [cm³ * Gy1.5] <b>OR</b> U(Dose) = Dose-1.5:</td>
        <td>Natural DVH curve values</td>
        <td class="tc"></td>
        </tr>
</tbody>
</table>

**Example _Structure Information_ and _Structure Dose Summary_**

> ```
> Structure: opt PTV
> Approval Status: Approved
> Plan: EARR
> Course: C2
> Volume [cm³]: 38.3
> Dose Cover.[%]: 100.0
> Sampling Cover.[%]: 100.0
> Min Dose [%]: 91.9
> Max Dose [%]: 106.9
> Mean Dose [%]: 101.6
> Modal Dose [%]: 101.5
> Median Dose [%]: 101.6
> STD [%]: 1.1
> Equiv. Sphere Diam. [cm]: 4.2
> Conformity Index: 1.00
> Gradient Measure [cm]: 0.76
> Dose Level [cGy]: 
> RTOG CI: 
> Paddick CI: 
> GI: 
> ICRU83 HI: 
> D95.0% [cGy]: 6400.0
> D98.0% [%]: 99.0
> D99.0% [cGy]: 6279.5
> V95.0% [cm³]: 38.3086
> V100.0% [cm³]: 36.4045
> : 
> 
> ```

### DVH Curve Data
- Columns containing DVH data for the structure occurs immediately after the 
  related Structure Information.
- The number of lines of data will depend on:
    - The selected resolution of the DVH (0.1 cGy step size is the default)
    - The maximum dose to any structure included in the DVH.
    - The minimum and maximum dose range selected for the DVH.
- Columns are formatted in right-justified, fixed-width style 
- The data is contained in two or three of the following columns:
    - Dose [cGy]
    - Relative dose [%]
    - Structure Volume [cm³]
    - Ratio of Total Structure Volume [%]
    - dVolume / dDose [cm³ / cGy]
- The right most header sometimes extends past the fixed-width column end.
- Plan sum DVH tables (where relative dose is not defined only have two columns, 
  and have shorter column widths.
    
<style type="text/css">
    th {text-align:center;background-color:#38fff8;font-weight:bold;}
    .tg {text-align:left;background-color:#e6ffff;
        font-family:Arial, sans-serif;font-size:14px;font-weight:bold;
        border-style: solid dotted solid dotted}
    .tc {text-align:left;background-color:#e0ebeb;
        font-family: Terminal, monospace;font-size:11px;font-weight:normal;}
    .tb {border-width: medium thin thin thin;}
</style>
<table>
    <thead><tr><th>Selected DVH units</th> <th>Data Columns</th></tr></thead>
    <tbody>
    <tr><td rowspan="3">
            Plan,<br>
            Relative Dose,<br>
            Relative Volume
            </td>
        <td>Relative dose [%]</td></tr>
    <tr><td>Dose [cGy]</td></tr>
    <tr><td>Ratio of Total Structure Volume [%]</td></tr>
    <tr><td rowspan="3">
            Plan,<br>
            Absolute Dose,<br>
            Relative Volume
            </td>
        <td>Dose [cGy]</td></tr>
    <tr><td>Relative dose [%]</td></tr>
    <tr><td>Ratio of Total Structure Volume [%]</td></tr>
    <tr><td rowspan="3">
            Plan,<br>
            Relative Dose,<br>
            Absolute Volume
            </td>
        <td>Relative dose [%]</td></tr>
    <tr><td>Dose [cGy]</td></tr>
    <tr><td>Structure Volume [cm³]</td></tr>
    <tr><td rowspan="3">
            Plan,<br>
            Absolute Dose,<br>
            Absolute Volume
            </td>
        <td>Dose [cGy]</td></tr>
    <tr><td>Relative dose [%]</td></tr>
    <tr><td>Structure Volume [cm³]</td></tr>
    <tr><td rowspan="3" class="tb">
            Plan, Differential DVH<br>
            Relative Dose,<br>
            Absolute Volume
            </td>
        <td class="tb">Relative dose [%]</td></tr>
    <tr><td>Dose [cGy]</td></tr>
    <tr><td>dVolume / dDose [cm³ / %]</td></tr>
    <tr><td rowspan="3">
            Plan, Differential DVH<br>
            Absolute Dose,<br>
            Absolute Volume
            </td>
        <td>Dose [cGy]</td></tr>
    <tr><td>Relative dose [%]</td></tr>
    <tr><td>dVolume / dDose [cm³ / cGy]</td></tr>
    <tr><td rowspan="2" class="tb">
            Plan sum,<br>
            Absolute Dose,<br>
            Relative Volume
            </td>
        <td class="tb">Dose [cGy]</td></tr>
    <tr><td>Ratio of Total Structure Volume [%]</td></tr>        
    <tr><td rowspan="2">
            Plan sum,<br>
            Absolute Dose,<br>
            Absolute Volume
            </td>
        <td>Dose [cGy]</td></tr>
    <tr><td>Structure Volume [cm³]</td></tr>        
    </table>
    </tbody>   
    
**Column Widths**
<table>
    <thead><tr>
        <th>DVH plan type</th><th>Column Widths</th>
        </tr></thead>
    <tbody>
    <tr>
        <td>Normal plan</td>
        <td>
            0 - 16<br>
            17 - 36<br>
            37 - end
            </td>
        </tr>
    <tr>
        <td>Plan Comparison</td>
        <td>
            0 - 16<br>
            17 - 36<br>
            37 - end
            </td>
        </tr>
    <tr>
        <td>Plan Sum</td>
        <td>
            0 - 9<br>
            10 - end
            </td>
        </tr>       
    </table>
    </tbody>   


**Example _DVH Curve Data_** (Numbers at the top are for reference only)

> ```
>00000000001111111111222222222233333333334444444444555555555566666666667777777777
>01234567890123456789012345678901234567890123456789012345678901234567890123456789
>
>       Dose [cGy]   Relative dose [%] Ratio of Total Structure Volume [%]
>                0                   0                       100
>                1            0.015625                       100
>                2             0.03125                       100
>                3            0.046875                       100
> ...
>             6841             106.891                         0
>             6842             106.906                         0
>             6843             106.922                         0
>             6844             106.938                         0
>
>```

 > ```
>00000000001111111111222222222233333333334444444444555555555566666666667777777777
>01234567890123456789012345678901234567890123456789012345678901234567890123456789
>
>Dose [cGy] Ratio of Total Structure Volume [%]
>         0                       100
>         1                   81.5558
>         2                   79.9258
>         3                   78.2589
>...
>      4693                         0
>      4694                         0
>      4695                         0
>      4696                         0
>      4697                         0
>
>```

Split the header by looking for ']'
Split the data on spaces