Skip to content

Pore throat analysis execution

VasiliBaranov edited this page Mar 27, 2017 · 14 revisions

This page describes execution of the pore-throat analysis, its API and several implications. It shall be read after the accompanying Pore-throat analysis description. For sample configs and results, see the Example folder (the Expected results from -all subfolder represents the pore-throat analysis). Scripts to interpret the results can be found in the corresponding folder.

Table of contents:

  1. Processing in chunks
  2. Euclidean distance transform
  3. Finding containing ball radii
  4. Initial pore propagation
  5. Watershed through containing ball radii
  6. Parallelization
  7. Monolith resampling
  8. Program API
  9. Program execution
  10. Euclidean Distance Transform
  11. Remove Inner Balls
  12. Assign Pores and Throats
  13. Remarks
  14. Configuration file
  15. Processing of actual monoliths

1. Processing in chunks

If the monolith is too large, the first three algorithm steps may process the monolith in chunks. The last step, Watershed through containing ball radii, cannot do that. The amount of memory available to the program is specified in the configuration file (see below).

1.1. Euclidean distance transform

The algorithm works in two steps: (i) determining the EDT in XY planes separately; (ii) determining final EDT along the Z lines for fixed X and Y values. Each step can thus process the monolith in chunks: (i) at the first step, chunks shall cover entire XY plane; (ii) at the second step, chunks shall cover the entire monolith in Z direction.

1.2. Finding containing ball radii

Containing ball radii can also be assigned chunk by chunk, we only need to load margins around chunks, so that the ball at the boundary of a chunk may set pixels belonging to it in the margin of the chunk. I.e., if we select the width of the margin as the maximum Euclidean distance from the previous step, we never miss a pixel.

1.3. Initial pore propagation

Initial pore propagation procedure allows processing the image pore by pore instead of setting all the pores simultaneously. It is also becomes possible to process large monoliths in chunks:

  1. Split the 3D image into “cells”. For each of the cells
  2. Load the cell into memory, include some margins around the cell
  3. For each of the maximum balls in the current cell, do the propagation. Cell margins shall be selected large enough to include all the pores that start in the cell
  4. After the entire image is processed, search for the pores that were completely rewritten with larger pores from neighboring cells, rewrite the pore id of the “rewritten” pore with the id of the larger pore, update shared pixel types correspondingly

Margin is selected in a way to ensure that none of the pores that starts in the chunk propagates outside of the margin. I usually selected four maximum Euclidean distances as margin width. The exact value is adjustable in the configuration file of the program.

1.4. Watershed through containing ball radii

As far as watershed boundaries are determined by propagating all the pores simultaneously, I did not find a fast and easy to understand version of the algorithm to process the monolith in chunks. Unfortunately, this step requires the entire monolith to be loaded in memory.

2. Parallelization

I implemented parallelization of several steps, i.e., computation is done in several threads. All threads use common (shared) memory. The number of threads available to the program is specified in the configuration file (see below).

Parallelization works for Euclidean distance transform and containing ball radii computation. It is possible to do this for initial pore assignment and probably for watershed segmentation, but I did not do this. Parallelization is implemented using the OpenMP technology.

3. Monolith resampling

If pixels in the initial monolith are not cubic, the monolith is resampled to have cubic pixels. Cubic pixel side is equal to the minimum side of the initial pixel. Pixels are resampled using nearest-neighbor interpolation. Initial pixel size is specified in the configuration file (see below).

4. Program API

The program is a console C++ program.

The following console parameters are supported:

Console parameter Short meaning or execution stage Meaning
-edt Euclidean Distance Transform Calculates Euclidean distance transform
-rib Remove Inner Balls Computes containing ball radii for each pixel. In other words, removes inner balls, i.e., balls fully contained in other balls
-apt Assign Pores and Throats Assigns pores and shared pixels: (i) does initial pore propagation; then (ii) does watershed segmentation
-all Do all of the above Does all the steps above (in top to bottom order)

5. Program execution

The program searches recursively files “config.txt” in the working folder (the folder, from where the executable was called). The working folder is included in search. The structure of the configuration file is described later.

The program expects a folder “Images” along with each configuration file. It shall contain images of monolith slices along the Z direction. Images shall be 8-bit tiff files. Value 255 (white) shall encode monolith pixels and value 0 shall encode void pixels. The tiff images shall have the normal lookup table, not the inverted one (see the inverted solid/void problem).

Next sections specify what happens on each stage of program execution (each corresponds to a single console parameter). Stage “Remove Inner Balls” assumes that “Euclidean Distance Transform” has already been performed. Stage “Assign Pores and Throats” assumes that “Euclidean Distance Transform” and “Remove Inner Balls” have been performed.

The next table specifies which folders and files will be created for different console parameters. They will be described below

Console parameter Created folders Created files
-edt Original, if resampling occurs EuclideanDistanceSquares
-rib ContainingBallRadiiSquares, ContainingBallCoordinatesX, ContainingBallCoordinatesY, ContainingBallCoordinatesZ, IsMaximumBall
-apt PoreThroatIds, PoreThroatIdsWatershed pores.txt, sharedPixelTypes.txt, poresWatershed.txt, sharedPixelTypesWatershed.txt

5.1. Euclidean Distance Transform

If pixels are not cubic, the monolith is first resampled. Initial images are moved to a folder “Original” (created alongside the config.txt file and Images folder) and the folder “Images” is populated with resampled images. If the Original folder already exists, the program assumes that the Images folder contains the resampled monolith and the monolith is not resampled again.

EDT calculation results are stored in the “EuclideanDistanceSquares” folder. It contains images with the same names as in the “Images” folder. Each image in the EuclideanDistanceSquares folder contains Euclidean distance squares for pixels in a corresponding file from the Images folder.

EDT and square values are stored directly in images. Image metadata only specify that these values shall be interpreted as CMYK colors. It allows inspecting the results visually and avoids any additional visualization overhead.

EDT values are stored as unsigned integers (four bytes per pixel). CMYK color coding also requires exactly four bytes per pixel. Each byte represents a corresponding color: the first byte is cyan, the second one is magenta, the third one is yellow, and the last one is black. CMYK is subtractive scheme. I.e., zero value in a byte corresponds to a white pixel. The larger is the value, the more you “subtract” from the white color. So the value 255 in the byte for magenta is the strongest magenta. The value 255 in the entire integer (the byte for cyan is covered with ones, all other bytes are zero) corresponds to the pure cyan.

This scheme leads to a certain peculiarity: the value 256 is represented as 1 in the byte for magenta + 1 in the byte for cyan (256 = 255 * 1 + 1), which means that neither magenta nor cyan are pronounced, and the pixel is almost white.

Void pixels always have EDT value larger than zero. Solid pixels have exactly zero EDT. Thus, they are white in EDT images.

5.2. Remove Inner Balls

At this stage, containing balls are found (or “inner” balls are removed). This stage creates several folders along with the config.txt files: ContainingBallRadiiSquares, ContainingBallCoordinatesX, ContainingBallCoordinatesY, and ContainingBallCoordinatesZ. Each of the folders has images with the same names as in the Images folder.

For each pixel, ContainingBallRadiiSquares stores a value of a maximum radius square of a ball that covers this pixel. For each pixel, ContainingBallCoordinatesX, ContainingBallCoordinatesY, and ContainingBallCoordinatesZ store X-, Y-, and Z-coordinate of a center of this ball, respectively. Color coding for ContainingBallRadiiSquares is the same as for EuclideanDistanceSquares.

Containing ball coordinates are stored as integer values, each one consuming four bytes. They are also coded as CMYK colors. Different from EuclideanDistanceSquares, solid pixels are encoded with black colors (all the four bytes contain the value 255). This is necessary, as coordinates of containing ball centers can actually be zero, so we cannot encode solid pixels with zero ContainingBallCoordinatesX (Y or Z).

On this stage, one more folder is created, IsMaximumBall. It contains images with the same names as “Images” folder. These images are grey-scale 8-bit images. Each pixel in these images may have one of three values:

  1. zero (black)—corresponds to solid pixels
  2. 255 (white)—corresponds to void pixels that are centers of containing balls
  3. 127 (grey)—corresponds to other void pixels

There might be a certain confusion in naming, as under maximum balls I mean here containing balls, not the maximum balls in a slightly more general sense of (Dong & Blunt 2009). TODO: maybe I shall improve it.

5.3. Assign Pores and Throats

This execution stage will perform both the initial pore propagation and watershed segmentation. This is merely for historical reasons, though may lead to some confusion.

5.3.1. Initial pore propagation

At this point, PoreThroatIds folder will be created. It contains images with the same names as “Images” folder. Each pixel contains the entity id, which this pixel was assigned to. Entity can be either a pore or a shared pixel (pixel, shared among several pores). Solid pixels contain zero values. IDs are integer numbers. As before, image metadata specifies that these integers shall be treated like CMYK images.

To make pore ids and shared pixel ids distinguishable, pore ids start from 127 (strong cyan) and are increased. When pore id reaches 255, the next one is 256 + 127 (strong cyan and strong magenta). It makes all the pores visible. Shared pixel ids are started from the maximum value of unsigned integers and are decreased. It corresponds to black colors. Thus, pores and shared pixels are well distinguishable from each other and from white solid pixels.

Shared pixels have the same ids, if they are shared among the same pores.

This stage creates two additional files, pores.txt and sharedPixelTypes.txt. Both files are simple text (csv) files. The first one specifies pore parameters and statistics (so that it is possible to retrieve pore starting pixel by pore id, for example). The second one specifies parameters of shared pixel types and which pores share this pixel type. Each line corresponds to a single pore or shared pixel type and the first line specifies the format of lines.

pores.txt file format in the first line looks like: “PoreId SeedX SeedY SeedZ SeedRadiusSquare IsBoundaryPore Volume”. Here,

  1. PoreId—pore id, as found in pixels of tiff images in the PoreThroatIds folder
  2. SeedX SeedY SeedZ— the center of the containing ball, from which pore propagation started (the seed pixel and the seed ball)
  3. SeedRadiusSquare—radius square (in pixels) of the seed containing ball
  4. IsBoundaryPore—value of zero or one. One is used for pores, whose pixels reach the boundaries of the monolith. Thus, their coordination number and volume may be determined incorrectly.
  5. Volume—volume of a pore in pixels. Pixels that are shared with other pores are not included.

sharedPixelTypes.txt format in the first line looks like: “SharedPixelTypeId; Volume; PoreId1 PoreId2 ...”. Here,

  1. SharedPixelTypeId—shared pixel type id, as found in pixels of tiff images in the PoreThroatIds folder
  2. Volume—the number of pixels that belong to this shared pixel type
  3. PoreId1 PoreId2… — ids of pores that are shared with pixels of this type (these ids are separated with spaces)

5.3.2. Watershed segmentation

At this point, PoreThroatIdsWatershed folder will be created, as well as poresWatershed.txt and sharedPixelTypesWatershed.txt. These files and folders have the same meaning and format as in the Initial pore propagation section, but only for another segmentation algorithm.

5.4. Remarks

If you run a program stage without running all the needed previous stages, you will get errors.

Currently there is no logic in the program to check existing folders and to check if the stage being run has already been executed (during a previous program run), so it will be executed again. I may add this check in the future.

There is no automatic cleaning up before stage execution as well. I.e., if your previous program run finished unexpectedly and some of the intermediate results are stored on disk in folders, it is necessary to delete these folders manually before running the stage again. You may find the folders you have to delete in the table above.

If monolith resampling was finished, you do not have to delete the Original folder, because then the images will be resampled twice.

The very first stage, -edt, creates one more additional file for internal usage, intermediateStatistics.txt. It is then updated by other stages. It shall not be deleted.

6. Configuration file

Each monolith to be processed shall be accompanied with a config.txt file, which shall look somewhat like this (see also the example):

Resolution of initial images: 30 30 125.9
Available memory in megabytes: 2000
Max ratio of pore radius to seed radius: 4.0
Use D3Q7 stencil: 0
Number of threads: 64

Here,

  1. Resolution of initial images—three floating-point or integer values. Size of pixels in initial images. This size can be in arbitrary units (e.g., nanometers). The monolith will be resampled so that pixels are cubic, one side of a cube will be equal to the minimum of these values. X is the axis from top to bottom in images, Y is the axis from left to right, Z is the axis along which different images are placed. XY axes do not follow usual convention for images, but they follow a usual matrix notation: a pixel with X=5 and Y=10 can be accessed as image[5, 10].
  2. Available memory in megabytes—integer. Amount of random access memory available to the program. All the data arrays (parts of images) that will be read by the program will not consume altogether at any point of time more than this amount or memory. Some additional data structures may nevertheless slightly overcome this threshold.
  3. Max ratio of pore radius to seed radius—integer of floating-point value. It specifies what is the expected ratio of distance from pore center to the last pore pixel (pore radius) to the radius of the containing ball from which the pore starts (seed radius). After detecting maximum Euclidean distance, we can estimate maximum pore radius (by multiplying maximum Euclidean distance to this ratio). Then, we can select margins of image chunks for initial pore propagation: even if a pore seed is on the boundary of a chunk, this pore will never spread outside the margins of this chunk, so we will detect all the pore pixels correctly. If you specified enough memory to process all the images in one chunk at all the program stages, this parameter is not important, as margins for chunks are not needed
  4. Use D3Q7 stencil—a value of zero or one. “One” specifies that pixels with numbers from 1 to 6 in a D3Q7 lattice will be considered as neighbors of a pixel with number zero. By default, pixels with numbers from 1 to 27 in a D3Q27 lattice are considered as neighbors of a pixel with number zero. Neighboring pixels are searched for during initial pore propagation and watershed segmentation.
  5. Number of threads—integer. This value specifies, how many threads the program shall use in those steps that support multithreading (currently these are Euclidean distance transform and Finding containing balls)

The last stage, watershed segmentation, does not produce meaningful results if all the necessary data arrays for the entire monolith cannot be loaded into memory (i.e., not images that store pixel types, but containing ball radii, Euclidean distance transforms, “IsMaximumBall” mask, and “PoreThroatIdsWatershed”).

Currently, there is no way to omit the “Available memory in megabytes” option. If you are sure that you have enough random access memory to process the monolith, just specify an arbitrary very large integer number.

The only easy way to determine how much memory is needed for a given program step is to look at the console output of the program. At the beginning of each stage the program writes what is the amount of memory to perform a stage in one chunk. The line looks like:

“Current available memory in Mb is 200000, optimal is 106101.490707”

Please note that, as mentioned, program stages shall be run sequentially: execution of stages, as well as memory calculations of a current stage depend on execution results of previous stages.

7. Processing of actual monoliths

Actual monoliths from examples (polymer and silica) require a lot of memory for the last stage (watershed segmentation). The largest one (2000x2000x818 pixels), requires ~70Gb. Thus, the last stage for these monoliths can only be run on a high-performance computing cluster. Usually there are restrictions on execution time on HPC clusters. Using many cores allows doing all the processing faster, because some steps are parallelized.