Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean Mask ROI Tagging for Images #290

Open
adityachivu opened this issue Mar 26, 2018 · 11 comments
Open

Boolean Mask ROI Tagging for Images #290

adityachivu opened this issue Mar 26, 2018 · 11 comments

Comments

@adityachivu
Copy link

Hopefully I read the documentation thoroughly enough, but I was wondering if nix supports boolean mask ROI tagging for images? Thanks!

@jgrewe
Copy link
Member

jgrewe commented Mar 26, 2018

Hi @adityachivu, I am sorry to admit, that nix at the moment does not support tagging using ROI masks. Storing this information is easily possible but it cannot be used for tagging and automated data retrieval as is supported by the Tag and MultiTag entities.
At the moment we can only tag rectangular segments of n-dimensional data. Further, so far we can only return rectangular slices of the data.

We are discussing about changing the linking mechanism and also discussed exactly the use-case you describe.
What features would you like to have? What should the library provide? It would be great, if you would share your thoughts/requirements on this.

@adityachivu
Copy link
Author

Thank you for your prompt response!
I am trying to use nix to efficiently store a time series of 3D image data (so effectively 4D). I am completely new to this type of data handling/versioning, and still wrapping my head around how to effectively use nix. Hopefully I will soon have a better grasp of available features in the toolbox and additional ones that would be interesting for our project.

I would love to have your suggestion on how to do the above. I was thinking of using a unique block for each 3D image and storing the boolean mask as an separate DataArray of the same dimensionality as my image. So my HDF structure would consist of multiple blocks where the block index would correspond to the timestamp and each block would contain two DataArrays, one with the actual data and the other with the ROI mask. What could be a better alternative? An immediate drawback I see is that different "sessions" can not be distinguished at the block level of the hierarchy.

Thank you for your time!

@jgrewe
Copy link
Member

jgrewe commented Apr 8, 2018

Hi @adityachivu, that would be one option. Does your ROI mask change over time? If it does not, you could do it a bit more efficient.

Am I correct to assume the first three dimensions are the x-, y- and z- extent? The fourth dimension would be time? Or do you also have a number of color channels? That would leave it 5-D right?
You could store the whole time-series in a single 5D DataArray For the sake of simplicity I assume a time stack of 2D data:

import nixio as nix
import numpy as np
import matplotlib.pyplot as pat

f = nix.File.open('ImageStack.nix', nix.FileMode.Overwrite)
b = f.create_block('imaging session', 'nix.recording')

# 1024 time 1024 times 100 images
data  = np.random.randn((1024, 1024, 100))
time_stamps = np.linspace(0.0, 1.0, 100)
image_data_array = b.create_data_array('image data', 'nix.image_sequence', data=data)

# kind of critical, the dimension descriptors, we need 3
x_dim = image_data_array.append_sampled_dimension(1)
x_dim.label = 'x-axis'

y_dim = image_data_array.append_sampled_dimension(1)
y_dim.label = 'y-axis'

t_dim = image_data_array .append_range_dimension(time_stamps)
t_dim.label = 'time'
t_dim.unit = 's'

# the region of interest
roi = np.zeros((1024, 1024))
roi[256:768, 256:768] = 1

roi_data_array = b.create_data_array('roi', 'nix.bitmask', data=roi)

x_dim = roi_data_array.append_sampled_dimension(0.01)
x_dim.label = 'x-axis'
x_dim.unit = 'mm'

y_dim = roi_data_array.append_sampled_dimension(0.01)
y_dim.label = 'y-axis'
y_dim.unit = 'mm'

f.close()

Given, this you have all the data in one Block, respectively one DataArray . If you then want to tag regions in the image data you could use the Tag or MultiTagentities for that. E.g. you would like to tag a certain time-span of the recording:

tag = b.create_tag('condition 1', 'nix.segment', position=[0, 0, 0])
tag.extent = [1024, 1024, 0.5]
tag.references.append(image_data_array)

You could then retrieve the tagged data from the tag during analysis:

segment  = tag.retrieve_data(0)[:]
print(segment.shape)

If you want to keep order in you file, and have lots of entities, or at least make it easy to find those entities that belong together, you may want to use Group entities for that.

I hope this helps, if you need more information, let me know. Feel also free to post questions in our IRC channel gnode at freenode.

@adityachivu
Copy link
Author

adityachivu commented May 4, 2018

Hi @jgrewe . I was able to convert our data to NIX reading your detailed explanation. Thank you very much for it. The data shape is 4D, i.e. I have slices of 2D b/w images (single channel) over time. Now I have defined a generator to sequentially read each 3D image i.e. one timestep. I found that this was rather surprisingly slow, and wanted to know if I was doing something incorrectly:

# data.shape = [num_images, num_slices, height, width]

def img_generator(data_arr):

    shape = data_arr.shape # [1500, 20, 1628, 1292] dtype = np.uint16
    
    for i in range(shape[0]):
        yield data_arr[i, :, :, :]


data_arr = nix_file.blocks['test'].data_arrays['test_data']

img_gen = img_generator(data_arr)

for img in img_gen:
    pass

# Total time: 21m

@jgrewe
Copy link
Member

jgrewe commented May 4, 2018

Hi @adityachivu I do not think you do something wrong. This is a huge dataset. I just played with just creating data of that size in c++ with and without storing it in nix.

#include <boost/multi_array.hpp>

int main() {
    typedef boost::multi_array<int16_t, 4> array_type;
    typedef array_type::index index;
    array_type data(boost::extents[1][20][1628][1292]);
    for (int16_t value = 0; value < 10; ++value) {
        for(index i = 0; i < 20; ++i)
            for(index j = 0; j < 1628; ++j)
                for (index k = 0; k < 1292; ++k)
                    data[0][i][j][k] = value;
    }
    return 0;
}

This runs for about 50 seconds on my laptop. Writing it to nix adds a few seconds and switching on compression adds another 4s. I did not try the reading, so far. Would you mind doing a similar test with python, just creating a few numpy.ones() with that shape.

@jgrewe
Copy link
Member

jgrewe commented May 4, 2018

well, numpy.ones is much faster, using random values makes it in about 15s. Guess, this needs some research...
@adityachivu: yet another question: which backend are you using? The h5py or hdf5 backend? The latter would use the nix c++ library. In case you installed nixio via pip without installing the nix cpp lib, it would be h5py.

@adityachivu
Copy link
Author

adityachivu commented May 8, 2018

  • Installation: I followed the nix cpp lib installation. I didn't realise that h5py and hdf5 were exclusive of each other. Is there anyway I can explicitly check? Because the file handling errors I get are always from h5py.

  • Run time: I was comparing read times with numpy memory maps, but I found an error in my comparison where I wasn't reading the entire image. Now memory maps takes 19m where NIX takes 21m for the same task (reading all 3D images sequentially). I was wondering if the order of the dimensions makes any difference. As of now, write operation for one 3D image, i.e. [20, 1628, 1292] takes ~65 seconds. This seems to be in line with your predicted duration.

  • Data Access: I'm getting errors while trying to read from the DataArray along particular dimensions. So I tested all possible combinations and couldn't understand the source of error. Would be great if you could provide some input on this. I ran each case as a new script.

img = data[0, 0, 0, 0] 
img = data[0, 0, 0, :] 
img = data[0, 0, :, 0] 
img = data[0, 0, :, :] 
img = data[0, :, 0, 0] # Error
img = data[0, :, 0, :] # Error
img = data[0, :, :, 0] 
img = data[0, :, :, :]
img = data[:, 0, 0, 0] # Error
img = data[:, 0, 0, :] # Error
img = data[:, 0, :, :] # Too large, didn't try
img = data[:, :, 0, 0] # TypeError: Can't broadcast (1500, 20, 1, 1) -> (1500, 20, 1)

The error traceback is the following

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    img = data[0,:,0,0]
  File "/path/to/env/lib/python3.5/site-packages/nixio/data_array.py", line 136, in __getitem__
    self._read_data(raw, count, offset)
  File "/path/to/env/lib/python3.5/site-packages/nixio/pycore/data_array.py", line 82, in _read_data
    super(DataArray, self)._read_data(data, count, offset)
  File "/path/to/env/lib/python3.5/site-packages/nixio/pycore/data_array.py", line 28, in _read_data
    dataset.read_data(data, count, offset)
  File "/path/to/env/lib/python3.5/site-packages/nixio/pycore/h5dataset.py", line 58, in read_data
    self.dataset.read_direct(data, sl)
  File "/path/to/env/lib/python3.5/site-packages/h5py/_hl/dataset.py", line 657, in read_direct
    self.id.read(mspace, fspace, dest, dxpl=self._dxpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 181, in h5py.h5d.DatasetID.read
  File "h5py/_proxy.pyx", line 130, in h5py._proxy.dset_rw
  File "h5py/_proxy.pyx", line 84, in h5py._proxy.H5PY_H5Dread
OSError: Can't read data (src and dest data spaces have different sizes)

@jgrewe
Copy link
Member

jgrewe commented May 8, 2018

they are not mutually exclusive. When opening the file you can specify which backend to use. From the stacktrace is is obvious, that you are using the h5py directly, not using the nix cpp lib.

Do I understand correctly, that you NIX vs Numpy test is reading from file in the NIX case and creating in memory for the numpy case? If this is so, then the file io is not too bad...
Regarding the errors, I would like to draw @achilleas-k into the discussion :) Maybe also @gicmo has a few ideas.

@adityachivu
Copy link
Author

Sorry for being unclear, I compared only read times between Numpy memory maps and NIX (19m vs 21m respectively in my case). I haven't compared write times, but for NIX it takes around 65 seconds per image for writing to file. about 1600 minutes in total. This is just for your information. It seems that read times are optimal. Further, could you please tell me how to specify the backend? I wasn't able to find out how.

Also, would you like me to open a different issue for the read errors as it is unrelated to the original issue?

Thank you very much for your prompt responses!

@achilleas-k
Copy link
Member

Hello @adityachivu.

You can specify a backend when opening a file:

f = nix.File.open('ImageStack.nix', nix.FileMode.Overwrite, backend='h5py')  # or backend='hdf5'

Also, would you like me to open a different issue for the read errors as it is unrelated to the original issue?

That would be great, thanks. If you could include a minimal working example to help me investigate the issue, I'd greatly appreciate it.

@jgrewe
Copy link
Member

jgrewe commented May 11, 2018

the write times are indeed very long. Would you mind pasting the code? As I said, in the c++ example above, writing the 10 x 20 x 1628 x 1292 takes roughly one minute including compression. Of these 60s, most of the time is spent creating the dataset. So it should end up with 15 mins total, assuming linearity.

One issue might be the chunking of the data. Depending on the way you created the DataArray the chunk size will be different. For example, if you initialize the DataArray with an initial size of (1,1,1,1) and then write a single stack (1,20,1628, 1292) will require a huge number of io actions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants