Skip to content

Commit

Permalink
docs(formats): e2e documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Oli4 committed Feb 6, 2023
1 parent 9094890 commit 059d67b
Show file tree
Hide file tree
Showing 10 changed files with 583 additions and 250 deletions.
108 changes: 0 additions & 108 deletions docs/formats.md

This file was deleted.

152 changes: 152 additions & 0 deletions docs/formats/he_e2e.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Heidelberg Engineering E2E Format

Missing documentation of the Heidelberg E2E format has caused frustration by many working with OCT data and several projects tried to make the data accessible. While our understanding is still by no means complete, this document aims to provide a comprehensive description of what we know and what our E2E file reader (`HeE2eReader`) builds on. If you have any questions or suggestions, please open an issue on GitHub.


## File Structure
An E2E file starts with a version structure, followed by a header structure. After that the data comes in chunks of 512 folders.


| Structure | Size (bytes) | Comment |
| :---------- | -----------: | :--------------------------------------------------------------------------------- |
| `Version` | 36 | |
| `Header` | 52 | The `prev` field in the main header refers to the start position of the last chunk |
| `ChunkList` | varies | The number and size of the chunks depends on the data |

### Chunk
Every chunk has a header similar to the file header. A chunk then holds the headers of all contained folders sequentially, followed by data containers, that are referenced by the folder headers. A chunk can contain folders with data of different patients, studies, series, slices and types. Each folder contains data for a single (patient, study, series, slice, type) combination which is given in the folder header as well as the data container header. For the last chunk to have 512 folders, empty folders of `type=0` are appended.

| Structure | Size (bytes) | Comment |
| :--------------- | -----------: | ---------------------------------------------------------------------------- |
| ` ChunkHeader` | 52 | Each chunk refers to the start position of the previous chunk (`prev` field) |
| `FolderHeaders` | 44*512=22528 | Each chunk has 512 folders with headers of size 44 |
| `DataContainers` | varies | |

### Data Container
| Structure | Size (bytes) | Comment |
| ----------------- | ------------ | ------------------------------------------------------------------------------------------------ |
| `ContainerHeader` | 60 | |
| `DataItem` | varies | There are many kinds of DataItems indicated by different type IDs in the folder/container header |


## Data Items

While the most important data, images and annotations were identified, there are still many data items that are not understood. We choose to sort the found data items by the level of information they are likely to contain. Therefore we use the IDs provided in the ContainerHeader (patient ID, study ID, series ID and slice ID). We assume that as in our test data these IDs follow some rules in them beeing hierarchical. Having a study is only meaningful if there is a patient and having a series in a study requires a study. Finally a slice requires a series to be contained in.

1. Hence, if a slice ID is given we assume that the data is slice specific.
2. If this is not the case, but a series ID is given, the data is series specific.
3. If this is not the case, but a study ID is given, the data is study specific.
4. If this is not the case, but a patient ID is given, the data is patient specific.
5. If no ID is given, the data is general or a filler.

In the following sections we describe the data items we found. If you have any further information, please open an issue on GitHub and let us know.

### Slice Data

| type ID | size | name | notes |
| ---------: | ------: | :-------- | :------------------------------- |
| 2 | 2377.62 | unknown2 | Preview? contains letters (JFIF) |
| 3 | 96 | | |
| 5 | 59 | | |
| 39 | 497 | | |
| 40 | 28 | | |
| 10004 | 428 | bscanmeta | |
| 10012 | 100 | | |
| 10019 | 2825.84 | layer | 1 folder per layer per slice |
| 10032 | 92 | | |
| 1073741824 | 738008 | image | |


### Series Data

| type ID | size | name | notes |
| ---------: | ------: | :----------------- | :------------------------------- |
| 2 | 2509.38 | unknown2 | Preview? contains letters (JFIF) |
| 3 | 96 | | |
| 11 | 27 | laterality | |
| 54 | 97 | | |
| 59 | 27 | | |
| 61 | 4 | | |
| 62 | 228 | | |
| 1000 | 51 | | |
| 1001 | 54.75 | | |
| 1003 | 17 | | |
| 1008 | 8 | | |
| 9005 | 264 | examined_structure | |
| 9006 | 520 | scanpattern | |
| 9007 | 520 | enface_modality | |
| 9008 | 520 | oct_modality | |
| 10005 | 24 | | |
| 10009 | 4 | | |
| 10010 | 4112 | | |
| 10011 | 4 | | |
| 10013 | 11284 | | |
| 10025 | 100 | slodata | |
| 1073741824 | 589844 | image | |
| 1073751824 | 51220 | | |
| 1073751825 | 51220 | | |
| 1073751826 | 24596 | | |

### Study Data

| type ID | size | name | notes |
| ------: | ---: | :----------- | ------------------------- |
| 7 | 68 | measurements | |
| 10 | 91 | | |
| 13 | 200 | unknown | Contains OCT + HRA string |
| 30 | 2 | unknown5 | |
| 53 | 51 | | |
| 58 | 91 | | |
| 1000 | 51 | | |
| 9000 | 264 | studyname | |
| 9001 | 776 | device | |

### Patient Data

| type ID | size | name | notes |
| ------: | -----: | :------- | :---- |
| 9 | 131 | patient | |
| 17 | 2 | diagnose | |
| 29 | 2 | unknown4 | |
| 31 | 217 | | |
| 52 | 97 | | |
| 9010 | 269064 | unknown1 | |

### General Data

| type ID | size | name | notes |
| ------: | ----: | :----------- | :-------------------------------------------- |
| 0 | 0 | empty_folder | Filler at the end of the last chunk |
| 9011 | 64664 | | Last element in the last chunk before fillers |


## Further observations

+ Folders in the chunks are grouped. The first chunk starts with 10 study specific folders followed by series specific folders, then the folders for all slices of all series follow. Finally, after the last slice, 6 patient specific folder follow. The last folder in the last chunk is of type 9011. After that there are only filler folders. Whether such a structure exists for every E2E file and could be used for faster access of specific data is not known.
+ Some type IDs for example the laterality type is repeated. The only difference between the folder headers is the value of the `unknown2` field which takes the values 0, 1 and 65535.

##
There are several data folder types `HeE2eReader` currently does not support.
If you believe the data you are looking for is in here, you can access the data using the hidden `_unknown_folders` attribute of the `HeE2eReader`. You will get a dictionary with keys in the format (PatientID, StudyID, SeriesID, SliceID, Type). The values are the parsed objects. You can access the unparsed binary data via the `.data_container.item` attribute.

From here it is up to you to figure out the meaning of the data. I would appreciate if you share your findings.


## Aknowledgements
While building the E2E file reader, and investigating the format we took inspiration from several exising projects, which we would like to thank:

+ [OCT-Converter](https://github.com/marksgraham/OCT-Converter)
+ [LibE2E](https://github.com/neurodial/LibE2E)
+ [uocte](https://bitbucket.org/uocte/uocte/wiki/Home)
+ [RETIMAT](https://github.com/drombas/retimat)


??? Version
```mermaid
classDiagram
class Version{
name(12) ascii
version(4) float32
unknown(20)
}
```
1 change: 1 addition & 0 deletions docs/formats/he_vol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Heidelberg Engineering VOL Format
1 change: 1 addition & 0 deletions docs/formats/he_xml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Heidelberg Engineering XML Format
5 changes: 3 additions & 2 deletions docs/gen_ref_pages.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,13 @@

nav = mkdocs_gen_files.Nav()

for path in sorted(Path("./eyepy").rglob("*.py")): #
for path in sorted(Path("src/eyepy").rglob("*.py")): #
print(path)
module_path = path.relative_to(".").with_suffix("") #
doc_path = path.relative_to(".").with_suffix(".md") #
full_doc_path = Path("reference", doc_path) #

parts = list(module_path.parts)
parts = list(module_path.parts)[1:]

if parts[-1] == "__init__": #
parts = parts[:-1]
Expand Down
34 changes: 34 additions & 0 deletions docs/tools/analyze_e2e.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import click
import pandas as pd

from eyepy.io import HeE2eReader


def main(path):
folder_headers = []

for chunk in HeE2eReader(path).parsed_file.chunks:
for folder in chunk.folders:
folder_headers.append(folder.header)

# Get all folder headers but only the second slice if the data is assigned to a slice
folder_meta = [(f.patient_id, f.study_id, f.series_id, f.slice_id,
int(f.type), f.size) for f in folder_headers
if ((0 > f.slice_id) or (f.slice_id > 1000)) or (f.slice_id == 2)]

data = pd.DataFrame.from_records(folder_meta,
columns=[
"patient_id", "study_id", "series_id",
"slice_id", "type", "size"
])

gr = data.groupby(
["patient_id", "study_id", "series_id", "slice_id", "type",
"size"]).count()


if __name__ == "__main__":
# get the path to the file to be analyzed
#path = click.prompt("Enter path to file to be analyzed")
path = "/home/morelle/Data/MACUSTAR-313-001-0001-V1-SDOCT_cSLO.e2e"
main(path)
49 changes: 48 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
site_name: eyepy
site_url: https://example.com/
copyright: Copyright © 2019 - 2023 Olivier Morelle
nav:
- Home: index.md
- Cookbook: cookbook.md
- File formats: formats.md
- File formats:
#- formats.md
- Heidelberg E2E: formats/he_e2e.md
- Heidelberg VOL: formats/he_vol.md
- Heidelberg XML: formats/he_xml.md
- Reference: reference/
- Changelog: changelog.md

Expand All @@ -21,6 +26,48 @@ plugins:
paths: [.]

markdown_extensions:
- admonition
- pymdownx.details
- pymdownx.magiclink
- pymdownx.snippets:
check_paths: true
- pymdownx.superfences:
#custom_fences:
# - name: mermaid
# class: mermaid
# format: !!python/name:pymdownx.superfences.fence_code_format

repo_url: https://github.com/MedVisBonn/eyepy

theme:
name: material
features:
- navigation.instant
- navigation.tracking
- navigation.tabs
- navigation.tabs.sticky
- navigation.sections
- navigation.expand
- navigation.footer
#- navigation.indexes # not compatible with toc.integrate
- toc.follow
- toc.integrate
- navigation.top
- content.code.copy
- header.autohide
icon:
repo: fontawesome/brands/git-alt
palette:
# Palette toggle for light mode
- media: "(prefers-color-scheme: light)"
scheme: default
toggle:
icon: material/brightness-7
name: Switch to dark mode

# Palette toggle for dark mode
- media: "(prefers-color-scheme: dark)"
scheme: slate
toggle:
icon: material/brightness-4
name: Switch to light mode

0 comments on commit 059d67b

Please sign in to comment.