# Module: processing

The purpose of this notebook is to test the processing modules.
This will provide an example of how to use these modules in a pipeline.

In [37]:
from src import processing

import os
from glob import glob
from re import sub
import numpy as np

## Extracting compressed images

The first part of this pipeline is to extract the images from compressed file formats. First we specify the directory containing the images we want to extract, then we generate a list of files to decompress. 

In [7]:
#Directory containing compressed images
input_dir = 'data/human/registration/jacobians/absolute/smooth/'
input_dir = os.path.join(input_dir, '')

#List of paths to compressed images
input_files = glob(input_dir+'*.gz')
input_files[:5]

['data/human/registration/jacobians/absolute/smooth/d8_0148_01.extracted_fwhm_4vox.nii.gz',
 'data/human/registration/jacobians/absolute/smooth/41014_T1.extracted_fwhm_4vox.nii.gz',
 'data/human/registration/jacobians/absolute/smooth/d8_0393_01.extracted_fwhm_4vox.nii.gz',
 'data/human/registration/jacobians/absolute/smooth/21054_T1.extracted_fwhm_4vox.nii.gz',
 'data/human/registration/jacobians/absolute/smooth/d8_0204_01.extracted_fwhm_4vox.nii.gz']

We can decompress a list of input files using the `gunzip_files()` function.

In [8]:
# This seems to work as intended
# unzipped_files = processing.gunzip_files(infiles = input_files, 
#                                          keep = True,
#                                          parallel = True,
#                                          nproc = 8)

unzipped_files = glob(input_dir+'*.nii')
unzipped_files[:5]

['data/human/registration/jacobians/absolute/smooth/d8_0096_01.extracted_fwhm_4vox.nii',
 'data/human/registration/jacobians/absolute/smooth/12013_T1.extracted_fwhm_4vox.nii',
 'data/human/registration/jacobians/absolute/smooth/21045_T1.extracted_fwhm_4vox.nii',
 'data/human/registration/jacobians/absolute/smooth/d8_0673_01.extracted_fwhm_4vox.nii',
 'data/human/registration/jacobians/absolute/smooth/sub-1050792_ses-01_T1w.extracted_fwhm_4vox.nii']

The function returns a list containing the paths to the unzipped images. By default, the output directory is the same as the input directory.

***
## Converting image formats

The compressed files were in NIFTY format. We can keep these as is, or convert them to MINC format if we prefer. The `processing` module provides a tool for converting between these image formats: `convert_images()`. Like the `gunzip_files()` function, this function takes in a list of images to convert. We can also specify an output directory for the converted images.

In [9]:
# Directory in which to save the converted images
outdir = 'data/human/registration/jacobians/absolute/smooth_minc/'

# This seems to work as intended
# imgfiles = processing.convert_images(infiles = unzipped_files,
#                                      input_format = 'nifty',
#                                      output_format = 'minc',
#                                      outdir = imgdir,
#                                      keep = True,
#                                      parallel = True,
#                                      nproc = 8)

imgfiles = glob(outdir+'*.mnc')
imgfiles[:5]

['data/human/registration/jacobians/absolute/smooth_minc/sub-1050811_ses-01_T1w.extracted_fwhm_4vox.mnc',
 'data/human/registration/jacobians/absolute/smooth_minc/d8_0216_01.extracted_fwhm_4vox.mnc',
 'data/human/registration/jacobians/absolute/smooth_minc/sub-1050869_ses-01_run-01_T1w.extracted_fwhm_4vox.mnc',
 'data/human/registration/jacobians/absolute/smooth_minc/31030_T1.extracted_fwhm_4vox.mnc',
 'data/human/registration/jacobians/absolute/smooth_minc/d8_0713_02.extracted_fwhm_4vox.mnc']

Once again, the output of the function is a list containing the paths to the converted images. 

*** 

## Computing effect sizes

For this particular project, once we've extracted the images, we want to calculate effect size images for the human participants in our study. We do so by identifying propensity-matched controls based on a number of features, and compute a voxel-wise z-score with respect to these controls. 

The `processing` module provides a function to do just this: `calculate_human_effect_sizes()`. This function takes a number of input parameters, including a CSV file containing the demographic information for the participants, a mask image, the number of controls to use, and more.

In [10]:
# The directory containing the images to use is the output directory from the previous step
imgdir = outdir

# Path to the demographics file
demographics = 'data/human/registration/DBM_input_demo_passedqc.csv'

# Path to the mask image to use
maskfile = 'data/human/registration/reference_files/mask.mnc'

# Option to specify which data to use
dataset = 1

# Number of controls to use for propensity matching
ncontrols = 10

# Lower bound on the number of matched controls before the matching criteria are relaxed
threshold = 5

Let's create an output directory in which to store these effect sizes. We'll write some of the parameters into the name of the directory, to keep track of what we've done.

In [12]:
# Output directory
es_dir = ('data/human/effect_sizes/absolute/'
          'resolution_{}_dataset_{}_ncontrols_{}_threshold_{}'
          .format(0.5, dataset, ncontrols, threshold))
es_dir = os.path.join(es_dir, '')
es_dir

'data/human/effect_sizes/absolute/resolution_0.5_dataset_1_ncontrols_10_threshold_5/'

We then use the `calculate_human_effect_sizes` function to generate these effect size images. 

In [14]:
# This seems to work as intended
# es_files = processing.calculate_human_effect_sizes(demographics = demographics,
#                                                    imgdir = imgdir,
#                                                    maskfile = maskfile,
#                                                    outdir = es_dir, 
#                                                    ncontrols = ncontrols,
#                                                    threshold = threshold,
#                                                    parallel = True,
#                                                    nproc = 4)

es_files = glob(es_dir+'*.mnc')
es_files = es_files[:10]
es_files[:5]

['data/human/effect_sizes/absolute/resolution_0.5_dataset_1_ncontrols_10_threshold_5/sub-1050158_ses-01_T1w.extracted_ES_res_0.5_data_1_nc_10_thresh_5.mnc',
 'data/human/effect_sizes/absolute/resolution_0.5_dataset_1_ncontrols_10_threshold_5/sub-1050100_ses-01_run-02_T1w.extracted_ES_res_0.5_data_1_nc_10_thresh_5.mnc',
 'data/human/effect_sizes/absolute/resolution_0.5_dataset_1_ncontrols_10_threshold_5/sub-1050135_ses-01_T1w.extracted_ES_res_0.5_data_1_nc_10_thresh_5.mnc',
 'data/human/effect_sizes/absolute/resolution_0.5_dataset_1_ncontrols_10_threshold_5/sub-1050172_ses-01_T1w.extracted_ES_res_0.5_data_1_nc_10_thresh_5.mnc',
 'data/human/effect_sizes/absolute/resolution_0.5_dataset_1_ncontrols_10_threshold_5/sub-1050084_ses-01_T1w.extracted_ES_res_0.5_data_1_nc_10_thresh_5.mnc']

The function returns a list of paths to the effect size images.

---

## Resampling images

When we computed the effect sizes above, we created images in the native resolution of 0.5mm. These images are quite large, so it will be useful to have a tool to resample them as desired. In the `processing` module, this can be accomplished using the `resample_images()` function. All we need are a set of input files, an output directory, and the new resolution in millimeters. 

In [15]:
# Resolution to which we want to resample
isostep = 3.0

#Output directory
es_dir_downsampled = sub(r'resolution_0.5', 
                         'resolution_{}'.format(isostep),
                         es_dir)

# Downsample images
es_files_downsampled = processing.resample_images(infiles = es_files,
                                                  isostep = isostep,
                                                  outdir = es_dir_downsampled,
                                                  parallel = True,
                                                  nproc = 2)
es_files_downsampled

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:14<00:00,  1.45s/it]


['data/human/effect_sizes/absolute/resolution_3.0_dataset_1_ncontrols_10_threshold_5/sub-1050158_ses-01_T1w.extracted_ES_res_0.5_data_1_nc_10_thresh_5_resampled_3.0.mnc',
 'data/human/effect_sizes/absolute/resolution_3.0_dataset_1_ncontrols_10_threshold_5/sub-1050100_ses-01_run-02_T1w.extracted_ES_res_0.5_data_1_nc_10_thresh_5_resampled_3.0.mnc',
 'data/human/effect_sizes/absolute/resolution_3.0_dataset_1_ncontrols_10_threshold_5/sub-1050135_ses-01_T1w.extracted_ES_res_0.5_data_1_nc_10_thresh_5_resampled_3.0.mnc',
 'data/human/effect_sizes/absolute/resolution_3.0_dataset_1_ncontrols_10_threshold_5/sub-1050172_ses-01_T1w.extracted_ES_res_0.5_data_1_nc_10_thresh_5_resampled_3.0.mnc',
 'data/human/effect_sizes/absolute/resolution_3.0_dataset_1_ncontrols_10_threshold_5/sub-1050084_ses-01_T1w.extracted_ES_res_0.5_data_1_nc_10_thresh_5_resampled_3.0.mnc',
 'data/human/effect_sizes/absolute/resolution_3.0_dataset_1_ncontrols_10_threshold_5/sub-1050027_ses-01_run-03_T1w.extracted_ES_res_0.5_da

The file paths are returned as before.

We can also resampled individual images using the `resample_image()` function.

In [16]:
# File to resample: The human template file
infile = 'data/human/registration/reference_files/model.mnc'

# New resolution 
isostep = 3.0

# Resample the image
model_downsampled = processing.resample_image(infile = infile,
                                              isostep = isostep)
model_downsampled

'data/human/registration/reference_files/model_resampled_3.0.mnc'

In [17]:
# Resample the human mask file as well
infile = 'data/human/registration/reference_files/mask.mnc'
mask_downsampled = processing.resample_image(infile = infile,
                          isostep = isostep)
mask_downsampled

'data/human/registration/reference_files/mask_resampled_3.0.mnc'

In [18]:
# infiles = ['data/human/registration/reference_files/model.mnc',
#            'data/human/registration/reference_files/mask.mnc']
# isostep = 1.0
# processing.resample_images(infiles = infiles,
#                            isostep = isostep)

---

## Importing images

Having lower resolution images allows us to work with voxel-wise data more effectively. In particular, we import a ton of images into formats like a list or a matrix. We can import an image into Python using `processing.import_image()`. 

In [38]:
img = processing.import_image(img = es_files_downsampled[0])
print(img)
print(type(img))
print(img.shape)

[0.00021018 0.00021018 0.00021018 ... 0.00021018 0.00021018 0.00021018]
<class 'numpy.ndarray'>
(298220,)


By default, the image is imported as a 1-dimensional NumPy array. We can prevent the function from flattening the image by setting `flatten = False`:

In [39]:
img = processing.import_image(img = es_files_downsampled[0],
                              flatten = False)
print(type(img))
print(img.shape)

<class 'numpy.ndarray'>
(65, 74, 62)


In this case, the image is imported as a 3-dimensional NumPy array. 

We can also provide a mask to filter the image upon import:

In [40]:
img = processing.import_image(img = es_files_downsampled[0],
                              mask = mask_downsampled,
                              flatten = True)
print(type(img))
print(img.shape)

<class 'numpy.ndarray'>
(53408,)


When `flatten = True`, only those voxels within the mask are returned. However, if we set `flatten = False` with a mask, the function returns the full images, but sets the voxels outside the mask to 0:

In [42]:
img = processing.import_image(img = es_files_downsampled[0],
                              mask = mask_downsampled,
                              flatten = False)
print(type(img))
print(img.shape)
print(np.sum(img != 0))

<class 'numpy.ndarray'>
(65, 74, 62)
53408


To import a number of images, we can use the `import_images()` function, which wraps around `import_image()`. This function takes in a list of images to import. Images are flattened by default, but can also be imported without flattening. 

In [61]:
imgs = processing.import_images(infiles = es_files_downsampled,
                                mask = mask_downsampled,
                                parallel = True,
                                nproc = 2)

print(imgs[:5])
print(type(imgs))
print(len(imgs))

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 129.62it/s]

[array([ 0.03929304, -0.59906298, -0.3775884 , ...,  0.28202843,
        0.68806983, -0.14135567]), array([-0.55086186, -0.35133998, -0.63647806, ...,  0.27981543,
       -0.64250263, -1.14555966]), array([1.30499114, 1.30078428, 1.50229257, ..., 0.37850737, 2.30401992,
       0.72099068]), array([1.85059184, 1.82599449, 1.5494367 , ..., 0.50809773, 0.20749898,
       0.64822664]), array([-0.93247427, -1.58743018, -0.55716518, ...,  1.85937816,
        0.82512881,  2.80038987])]
<class 'list'>
10





By default, the imported images are stored in a list object. We can modify the output format using the `output_format` argument. Allowed values are 'list', 'numpy', and 'pandas', which return the images as a NumPy array and a Pandas DataFrame, respectively.

In [62]:
imgs = processing.import_images(infiles = es_files_downsampled,
                                mask = mask_downsampled,
                                output_format = 'numpy',
                                parallel = True,
                                nproc = 2)

print(imgs[:5])
print(type(imgs))
print(imgs.shape)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 135.15it/s]

[[ 0.03929304 -0.59906298 -0.3775884  ...  0.28202843  0.68806983
  -0.14135567]
 [-0.55086186 -0.35133998 -0.63647806 ...  0.27981543 -0.64250263
  -1.14555966]
 [ 1.30499114  1.30078428  1.50229257 ...  0.37850737  2.30401992
   0.72099068]
 [ 1.85059184  1.82599449  1.5494367  ...  0.50809773  0.20749898
   0.64822664]
 [-0.93247427 -1.58743018 -0.55716518 ...  1.85937816  0.82512881
   2.80038987]]
<class 'numpy.ndarray'>
(10, 53408)





In [68]:
imgs = processing.import_images(infiles = es_files_downsampled,
                                mask = mask_downsampled,
                                output_format = 'pandas',
                                parallel = True,
                                nproc = 2)

print(type(imgs))
print(imgs.shape)
imgs.head()

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 129.82it/s]

<class 'pandas.core.frame.DataFrame'>
(10, 53408)





Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,53398,53399,53400,53401,53402,53403,53404,53405,53406,53407
0,0.039293,-0.599063,-0.377588,-0.788395,-0.403643,-0.493537,-0.491797,0.359342,0.10921,0.104862,...,-0.515249,1.347266,0.644209,0.395816,0.039717,0.909528,0.929512,0.282028,0.68807,-0.141356
1,-0.550862,-0.35134,-0.636478,-0.381057,-0.400159,-0.213358,-0.141545,0.129444,-0.427039,-0.283058,...,-0.842354,1.537093,0.911292,1.037576,-1.883201,1.662702,0.707169,0.279815,-0.642503,-1.14556
2,1.304991,1.300784,1.502293,1.72331,1.426885,1.25194,1.485532,1.476679,1.128628,1.438059,...,1.485051,0.705651,0.484607,-0.304579,1.367329,2.495751,1.322206,0.378507,2.30402,0.720991
3,1.850592,1.825994,1.549437,1.646103,1.693011,1.110983,1.465058,1.331495,0.745751,0.90507,...,0.243547,0.926227,0.105105,-0.189767,-0.153433,0.752616,0.48721,0.508098,0.207499,0.648227
4,-0.932474,-1.58743,-0.557165,-1.202698,-0.556656,-1.355891,-1.348442,-1.724766,-0.207126,-0.701943,...,-1.029177,0.667458,1.240109,2.151893,-0.141185,0.103745,0.614438,1.859378,0.825129,2.80039


Note that images will always be flattened if the output format is 'numpy' or 'pandas'.

In [64]:
imgs = processing.import_images(infiles = es_files_downsampled,
                                mask = mask_downsampled,
                                output_format = 'pandas',
                                flatten = False,
                                parallel = True,
                                nproc = 2)

  warn(msg_warn)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 132.29it/s]


The module also provides a convenience function to create a DataFrame of voxel values: `build_voxel_matrix()`.

In [70]:
# Calculate the voxel matrix
df_imgs = processing.build_voxel_matrix(infiles = es_files_downsampled,
                                        mask = mask_downsampled,
                                        parallel = True,
                                        nproc = 2)

df_imgs.head()

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 131.64it/s]


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,53399,53400,53401,53402,53403,53404,53405,53406,53407,file
0,0.039293,-0.599063,-0.377588,-0.788395,-0.403643,-0.493537,-0.491797,0.359342,0.10921,0.104862,...,1.347266,0.644209,0.395816,0.039717,0.909528,0.929512,0.282028,0.68807,-0.141356,data/human/effect_sizes/absolute/resolution_3....
1,-0.550862,-0.35134,-0.636478,-0.381057,-0.400159,-0.213358,-0.141545,0.129444,-0.427039,-0.283058,...,1.537093,0.911292,1.037576,-1.883201,1.662702,0.707169,0.279815,-0.642503,-1.14556,data/human/effect_sizes/absolute/resolution_3....
2,1.304991,1.300784,1.502293,1.72331,1.426885,1.25194,1.485532,1.476679,1.128628,1.438059,...,0.705651,0.484607,-0.304579,1.367329,2.495751,1.322206,0.378507,2.30402,0.720991,data/human/effect_sizes/absolute/resolution_3....
3,1.850592,1.825994,1.549437,1.646103,1.693011,1.110983,1.465058,1.331495,0.745751,0.90507,...,0.926227,0.105105,-0.189767,-0.153433,0.752616,0.48721,0.508098,0.207499,0.648227,data/human/effect_sizes/absolute/resolution_3....
4,-0.932474,-1.58743,-0.557165,-1.202698,-0.556656,-1.355891,-1.348442,-1.724766,-0.207126,-0.701943,...,0.667458,1.240109,2.151893,-0.141185,0.103745,0.614438,1.859378,0.825129,2.80039,data/human/effect_sizes/absolute/resolution_3....


The function can also write the DataFrame to a CSV file if desired. This is specified using the `save` and `outfile` arguments.

In [71]:
# Get the output directory from the input files
es_dirpaths = [os.path.dirname(file) for file in es_files_downsampled]
es_dir_downsampled = list(set(es_dirpaths))[0]
es_dir_downsampled

# Output CSV file for the DataFrame
es_csv = 'ES_data_{}_nc_{}_threshold_{}_3.0mm.csv'.format(dataset, ncontrols, threshold)
es_csv = os.path.join(es_dir_downsampled, es_csv)
es_csv

'data/human/effect_sizes/absolute/resolution_3.0_dataset_1_ncontrols_10_threshold_5/ES_data_1_nc_10_threshold_5_3.0mm.csv'

In [72]:
# Calculate the voxel matrix
df_imgs = processing.build_voxel_matrix(infiles = es_files_downsampled,
                                        mask = mask_downsampled,
                                        save = True,
                                        outfile = es_csv,
                                        parallel = True,
                                        nproc = 2)

df_imgs.head()

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 133.11it/s]


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,53399,53400,53401,53402,53403,53404,53405,53406,53407,file
0,0.039293,-0.599063,-0.377588,-0.788395,-0.403643,-0.493537,-0.491797,0.359342,0.10921,0.104862,...,1.347266,0.644209,0.395816,0.039717,0.909528,0.929512,0.282028,0.68807,-0.141356,data/human/effect_sizes/absolute/resolution_3....
1,-0.550862,-0.35134,-0.636478,-0.381057,-0.400159,-0.213358,-0.141545,0.129444,-0.427039,-0.283058,...,1.537093,0.911292,1.037576,-1.883201,1.662702,0.707169,0.279815,-0.642503,-1.14556,data/human/effect_sizes/absolute/resolution_3....
2,1.304991,1.300784,1.502293,1.72331,1.426885,1.25194,1.485532,1.476679,1.128628,1.438059,...,0.705651,0.484607,-0.304579,1.367329,2.495751,1.322206,0.378507,2.30402,0.720991,data/human/effect_sizes/absolute/resolution_3....
3,1.850592,1.825994,1.549437,1.646103,1.693011,1.110983,1.465058,1.331495,0.745751,0.90507,...,0.926227,0.105105,-0.189767,-0.153433,0.752616,0.48721,0.508098,0.207499,0.648227,data/human/effect_sizes/absolute/resolution_3....
4,-0.932474,-1.58743,-0.557165,-1.202698,-0.556656,-1.355891,-1.348442,-1.724766,-0.207126,-0.701943,...,0.667458,1.240109,2.151893,-0.141185,0.103745,0.614438,1.859378,0.825129,2.80039,data/human/effect_sizes/absolute/resolution_3....
