# Image metadata

In this notebook, we:
- Read the headers of .isq images in a folder and extract the information (metadata)
- Organize the metadata in a table (Pandas dataframe)
- Save the metadata in a tabular format (.xlsx or .csv)

Note: Run a separate notebook per each image folder

By Serena Bonaretti

---

Imports and variables:

In [1]:
import os
import pandas as pd
from pymsk import scanco_read_files

In [2]:
isq_folder = "./images/images_3309/"

---
## 1. Getting the names of the *.isq* files in the folder:

In [3]:
# getting the folder content
folder_content = os.listdir(isq_folder)

# creating the list for .isq file names
isq_file_names = [] 

# getting only .isq files
for file in folder_content: 
    
    # getting file extensions
    filename, file_extension = os.path.splitext(isq_folder + file)
    
    # get only the files with .isq or .ISQ file extension
    if "isq" in file_extension or "ISQ" in file_extension:
        isq_file_names.append(file)
        
print ("-> Found " + str(len(isq_file_names)) + " .isq files in folder:" )

# for filename in isq_file_names:
#     print (filename)

-> Found 3 .isq files in folder:


---
## 2. Extracting information from image headers  

- To read the .isq file headers, we use the function `read_isq_header()` from `pymsk`  
- For each image, we will get two lists: 
    - `keys`, containing all the information labels (e.g. *pixel_size_um*, etc.)
    - `values`, containing all the actual values (e.g. *82*, etc.)  
- Then we save the `keys` of the first image into the list `all_keys` - we do not need to save the keys for every image because they are the same   
  The values in `all_keys` will become the column names of the table
- Finally for each image, we add the list `values` to the list of lists `all_values`  
  The values in `all_values` will become the content of the table

In [4]:
# initializing list containing keys and values
all_keys = []
all_values = []

# for each .isq file in the folder
for i in range(0, len(isq_file_names)): 
       
    # get keys and values from the header of the current image
    current_keys, current_values = scanco_read_files.read_isq_header(isq_folder + isq_file_names[i]) 

    # save the keys of the first image in the variable all_keys
    if i == 0:
        all_keys = current_keys

    # add the values of the current image header into all_values
    all_values.append(current_values)

--- 
## 3. Creating a metadata table 

- We want to create a metadata table containing image metadata from *.isq* headers    
- To handle tables, we use the python package [Pandas](https://pandas.pydata.org/), imported at the beginning of the notebook

In [5]:
# display all pandas columns and rows 
pd.options.display.max_rows    = None
pd.options.display.max_columns = None

In [6]:
# create dataframe (=table)
isq_headers = pd.DataFrame(all_values, columns = all_keys)

# adding column with file names in position 0
isq_headers.insert(0, "file_name", isq_file_names)

# delete column "fill" because it just contains zeros
isq_headers = isq_headers.drop(columns = ["fill"])

# show dataframe
isq_headers

Unnamed: 0,file_name,check,data_type,nr_of_bytes,nr_of_blocks,pat_no,scanner_id,date,n_voxels_x,n_voxels_y,n_voxels_z,total_size_um_x,total_size_um_y,total_size_um_z,slice_thickness_um,pixel_size_um,slice_1_pos_um,min_intensity,max_intensity,mu_scaling,nr_of_samples,nr_of_projections,scan_dist_um,scanner_type,exposure_time,meas_no,site,reference_line_um,recon_algo,pat_name,energy_V,intensity_uA,data_offset
0,C0008472.ISQ;1,CTDATA-HEADER_V1,3,0,506886,2745,3309,2011_12_22,768,768,220,62976,62976,18040,82,82,107081,-1695,10801,8192,1536,750,125952,9,100000,9538,4,0,3,EUA_001���������������������������������,59400,1000,0
1,C0010013.ISQ;1,CTDATA-HEADER_V1,3,1557138432,3041286,2746,3309,2012_02_10,1536,1536,330,125952,125952,27060,82,82,88161,-2913,11085,8192,1536,750,125952,9,100000,11111,4,0,3,EUA_002,59400,1000,5
2,CJS_R_C0012934.ISQ;1,CTDATA-HEADER_V1,3,1038093312,2027526,3643,3309,2013_10_01,1536,1536,220,125952,125952,18040,82,82,128525,-12332,20959,8192,1536,750,125952,9,100000,13628,4,0,3,CJS_3043R,59400,900,5


---
## 4. Saving the table to a *.csv* or *.xlsx* file  

We can save the dataframe to several different file formats. Here we save it as:  
- *.csv* (open source)
- *.xlsx* (proprietary)  


In [7]:
# save to csv
isq_headers.to_csv("images_3309.csv", index = False)

# save to excel
# isq_headers.to_excel("images_3309.xlsx", index = False)

---
## Dependencies

In [8]:
%load_ext watermark
%watermark -v -m -p pymsk,pandas

Python implementation: CPython
Python version       : 3.8.5
IPython version      : 7.22.0

pymsk : 0.1.3
pandas: 1.2.4

Compiler    : Clang 10.0.0 
OS          : Darwin
Release     : 20.5.0
Machine     : x86_64
Processor   : i386
CPU cores   : 4
Architecture: 64bit

