# Analysis of *.mha* image headers

*Temporary licence. Licenses still to be discussed*  
Content under Creative Commons Attribution license CC-BY 4.0   
Code under Apache License  
© 2020 Serena Bonaretti for JC|MSK
---

The aim of this notebook is to extract image spacing, size, and pixel type from a group of `.mha` images  
This notebook can be attached to the *Material* paragraph of your paper

**What you do:**
- Add the path to the folder containing your `.mha` file in the code below (look for the arrow ->). Note that all the files of the folder will be read

**What the notebook does:**
- Gets the list of mha files in the directory
- For each file:
    - Reads the image header and extracts spacing, size, and pixel(voxel) type
- Creates a dataframe (=table) with all the information of all images
- Queries the table to extract how many images have a certain spacing, size, and pixel type
- Prints out dependencies for reproducibility

To read .mha headers, it uses the python package `SimpleITK`   
To create and query the dataframe, it uses the python package `pandas`  

---

In [1]:
import os
import pandas as pd
import SimpleITK as sitk

## Read image headers and create the dataframe


Read the folder content  
-> Add your folder path to the variable `folder`

In [2]:
# get files in folder
folder = "./data/images/mha"

# make sure there is "/" or "\" at the end of the folder name
if folder[-1] != os.sep:
    folder = folder + os.sep

# get the folder content
folder_content = os.listdir(folder)

Extract image information from the header

In [3]:
# variables for the loop
file_names = []
spacing    = []
size       = []
pixel_type = []

# create the reader
reader = sitk.ImageFileReader()

for i in range (0, len(folder_content)):
    
    # make sure you are loading an .mha image
    if os.path.splitext(folder_content[i])[1] == ".mha":
        
        # print out name and assign it to list
        print (folder_content[i])
        file_names.append(folder_content[i])
               
        # read the header
        reader.SetFileName(folder + folder_content[i] )
        reader.LoadPrivateTagsOn()
        reader.ReadImageInformation()
        
        # get spacing
        spac = reader.GetSpacing() # it's a tuple   
        spac = list(spac) # convert to list
        spac[0] = round(spac[0],3) # round to 3 decimals
        spac[1] = round(spac[1],3)
        spac[2] = round(spac[2],3)
        spac = tuple(spac) # reconver to tuple
        spacing.append(spac)
        
        # get size
        size.append(reader.GetSize())
        
        # get pixel type
        pixel_type.append(sitk.GetPixelIDValueAsString(reader.GetPixelID()))      

T2_0_orig.mha
T2_1_orig.mha
DESS_prep.mha
01_DESS_01_prep.mha


Create the dataframe:

In [4]:
# combine data in list of lists
data = [file_names, spacing, size, pixel_type]
# create dataframe
df = pd.DataFrame(data) 
# transpose dataframe
df = df.T
# add column names
df.columns = ["file_name", "spacing", "size", "pixel_type"]
df

Unnamed: 0,file_name,spacing,size,pixel_type
0,T2_0_orig.mha,"(4.06, 0.43, 0.43)","(33, 384, 384)",32-bit float
1,T2_1_orig.mha,"(4.06, 0.43, 0.43)","(33, 384, 384)",32-bit float
2,DESS_prep.mha,"(0.75, 0.427, 0.427)","(160, 384, 384)",8-bit signed integer
3,01_DESS_01_prep.mha,"(1.5, 0.312, 0.312)","(80, 512, 512)",8-bit signed integer


## Get number of images
The number of images coincides with the number of rows:

In [5]:
n_of_rows = df.shape[0]
print (n_of_rows)

4


## Get spacing
Show number of images with a certain spacing:

In [6]:
df.groupby('spacing')[["file_name"]].count() #[[]] is for a nice print out

Unnamed: 0_level_0,file_name
spacing,Unnamed: 1_level_1
"(0.75, 0.427, 0.427)",1
"(1.5, 0.312, 0.312)",1
"(4.06, 0.43, 0.43)",2


## Get size  
Show number of images with a certain size:

In [7]:
df.groupby('size')[["file_name"]].count() 

Unnamed: 0_level_0,file_name
size,Unnamed: 1_level_1
"(33, 384, 384)",2
"(80, 512, 512)",1
"(160, 384, 384)",1


## Get pixel type  
Show number of images with a certain pixel type:

In [8]:
df.groupby('pixel_type')[["file_name"]].count()

Unnamed: 0_level_0,file_name
pixel_type,Unnamed: 1_level_1
32-bit float,2
8-bit signed integer,2


--- 
## Dependencies
Dependencies keep track of the computational environment, so that we can make our workflows reproducible.  
Here we use the package watermark. If you haven't installed it yet, go to your terminal, and type `pip install watermark`

In [9]:
%load_ext watermark
%watermark -v -m -p pandas,SimpleITK,watermark

CPython 3.7.6
IPython 7.13.0

pandas 1.0.3
SimpleITK 1.2.4
watermark 2.0.2

compiler   : Clang 4.0.1 (tags/RELEASE_401/final)
system     : Darwin
release    : 19.4.0
machine    : x86_64
processor  : i386
CPU cores  : 4
interpreter: 64bit
