## Create dataset of *.png* files from datset of *.mat* files
Magnetic resonance imaging (MRI) scans use strong magnetic field (our dataset conists from images mostly from 1.5T machines that is the most common ones) to create detailed images of the organs and tissues within the body. <br>
The whole MRI scan represents 3D array which is ususally stored at PACS systems as sequence of 2D slices in DICOM format (let us name  `x` and `y` for convention those axis that are stored in DICOM files). Each DICOM file contains metadata - patient information, hospital details, date of aquisition as well as MRI machine settings. For anonymization purposes and to reduce storage footprint set of DICOM files may be converted to `.mat` files with only 3D array data.<br>
For mathematical analysis of spatial features it is vitally to have as much as possible slices, i.e. small `dz` step between slices in the third axe `z`. There is no a lot of such datasets with small `dz`, because acquisition of such images increases acquisition time and do not enhance quality of each of `xy` slice. Radiologists as well as medical doctors works mostly with 2D images, so they generally prefer not to 'waste' their time during screening to obtain bigger number of slices.

Conversion steps of `.mat` files to the set of `.png` files
- [x] Extract one bounding box(patch) from image, using the mask. Even if we have multifocal slice we still would have one bounding box;
- [x] Compute multiplication to the mask. In multifocal case this would replace non-tumor tissue by `0`s;
- [x] Put to the center of the image of fixed size e. g. `224 x 224`, apply zero-padding to fit to required size;

In [1]:
%reload_ext autoreload
%autoreload 2

# code for being able import from module which is in the parent directory
import os,sys,inspect
currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(currentdir)
sys.path.insert(0, parentdir)

from pathlib import Path
import time
from dataset_creators import create_imagenet_dataset

Specify path to data.

In [4]:
path_from = Path("/storage_1/003_raw_gbm_met_classifier/")
path_split_csv = Path("/storage_1/003_raw_gbm_met_classifier/split.csv")

Following two cells create dataset of `.png` files from dataset of `.mat` files: with multiplication on masks and cenetring the image and without a such preprocessing. From practical point of view only relatively big tumours are interesting in case of glioblastoma vs brain metastases caracterization - multiple small tumors almost always are metastasis. Therefore we consider only those images that contains at least 50 non-healthy pixels (tumor or necrosis).

In [5]:
path_to = Path("/storage_1/dataset_classification_threshold_50/")
create_imagenet_dataset(path_from=path_from, path_to=path_to, path_split_csv=path_split_csv, tol_threshold=50);

100%|██████████| 848/848 [16:31<00:00,  1.17s/it]


In [7]:
path_to = Path("/storage_1/dataset_classification_threshold_50_whole/")
create_imagenet_dataset(path_from=path_from, path_to=path_to, path_split_csv=path_split_csv, slice_func="mat2png", tol_threshold=50)

100%|██████████| 848/848 [21:47<00:00,  1.54s/it]
