Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random permutation of FCMA #217

Merged
merged 10 commits into from Apr 28, 2017
Merged

Conversation

@yidawang
Copy link
Member

@yidawang yidawang commented Apr 27, 2017

  1. add random voxel permutation to FCMA;
  2. add more debugging and information logs;
  3. minor tweak for docstring format, etc.

Thanks in advance for reviewing.

@yidawang yidawang requested a review from mihaic Apr 27, 2017
Copy link
Contributor

@xoltar xoltar left a comment

Thanks, Yida! Looks pretty good, just a few small changes needed.

raw_data2, _ = _separate_epochs(activity_data2, conditions)
else:
activity_data1 = list(mask_images(images, mask1, np.float32))
_randomize_subject_list(activity_data1, random)

This comment has been minimized.

@xoltar

xoltar Apr 27, 2017
Contributor

_randomize_single_subject modifies data in place (and then returns the same array). _randomize_subject_list returns a new list, containing old arrays that have been modified in place, which is why these lines work even though you're ignoring the return value. I would say either write them both to work in-place, or write them both to return new values. Mixing the two makes for confusion. If you decide you want to have _randomize_single_subject return a new array instead of working in-place, consider numpy.random.permutation instead of shuffle.

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

Thanks for the suggestion! Somehow I ignored the return value here although I should put it as activity_data1 = _randomize_subject_list(activity_data1, random). I would prefer working in-place, let me modify _randomize_subject_list accordingly.

Parameters
----------
data: 2D array in shape [nVxels, nTRs]

This comment has been minimized.

@xoltar

xoltar Apr 27, 2017
Contributor

nVoxels

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

done

Seed for random state used implicitly for shuffling.
Returns
data: 2D array in shape [nVxels, nTRs]

This comment has been minimized.

@xoltar

xoltar Apr 27, 2017
Contributor

nVoxels

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

done

@@ -292,8 +374,20 @@ def prepare_searchlight_mvpa_data(images, conditions, data_type=np.float32):

for sid, f in enumerate(images):
data = f.get_data().astype(data_type)
[d1, d2, d3, d4] = data.shape
if random == RandomType.REPRODUCIBLE:
data = _randomize_single_subject(data.reshape((d1 * d2 * d3, d4)),

This comment has been minimized.

@xoltar

xoltar Apr 27, 2017
Contributor

You could make this reshaping logic part of _randomize_single_subject (i.e. have it flatten, randomize, then reshape into whatever shape the original array was).

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

I didn't do this because I would like to make the input argument data of _randomize_single_subject an nVoxels x nTRs 2D array for clarity purpose.

@@ -303,7 +397,7 @@ def prepare_searchlight_mvpa_data(images, conditions, data_type=np.float32):
processed_data[:, :, :, idx] = \
np.mean(data[:, :, :, epoch[2]:epoch[3]], axis=3)

logger.info(
logger.debug(
'file %s is loaded and processed, with data shape %s' %

This comment has been minimized.

@xoltar

xoltar Apr 27, 2017
Contributor

logger does formatting itself, you can just do:
logger.debug('file %s is loaded and processed, with data shape $s', f.get_filename(), data.shape)

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

done

@@ -163,7 +163,8 @@ def _master(self):
the length of array equals the number of voxels
"""
logger.info(
'Master starts to allocate tasks'
'Master at rank %d starts to allocate tasks' %

This comment has been minimized.

@xoltar

xoltar Apr 27, 2017
Contributor

Same logging comments apply to this file.

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

done

@@ -59,6 +62,10 @@ def load_images_from_dir(in_dir: Union[str, Path], suffix: str = "nii.gz",
files = sorted(in_dir.glob("*" + suffix))
for f in files:
yield nib.load(str(f))
logger.info(
'file %s is read' %

This comment has been minimized.

@xoltar

xoltar Apr 27, 2017
Contributor

Here too. No need for str(), either.

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

f is a path here, which is not supported to be the argument of numpy.load(), and presumably nibabel.load(), for numpy version before 1.12. @mihaic should be able to further justify this.

This comment has been minimized.

@mihaic

mihaic Apr 27, 2017
Contributor

Right. However, Paths work with string formatting without explicit casting to str, e.g.:
"test %s" % pathlib.Path("/")

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

OK, just realized that Bryn was referring the str usage below this line :) Done.

@@ -266,10 +345,13 @@ def prepare_searchlight_mvpa_data(images, conditions, data_type=np.float32):
Condition specification.
data_type
Type to cast image to.
random: Optional[RandomType]
Randomize the data within subject or not.
Default NORANDOM

This comment has been minimized.

@mihaic

mihaic Apr 27, 2017
Contributor

You should not specify the default because you are duplicating the information in the function definition, which is visible both in the code and in the generated docs. (Of course, if you use None as a placeholder, you should explain what it stands for.)

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

done

@@ -25,12 +25,15 @@

import nibabel as nib
import numpy as np
import logging

This comment has been minimized.

@mihaic

mihaic Apr 27, 2017
Contributor

Use alphabetical order, please. I will introduce a checker real soon now®.

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

done

logger.info(
'file %s is read',
f
)

This comment has been minimized.

@mihaic

mihaic Apr 27, 2017
Contributor

I would say you should use the debug level for the messages you added in "io.py".

Also, do not break lines if they do not exceed the maximum line length.

Finally, the other message placement, before yield, makes more sense, but I would change the wording to "Starting to read [...]" and put a full stop at the end of the message.

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

done, but I would put some brief logger info to indicate in the beginning of some stages (reading file, apply masking, separating epochs, etc.)

This comment has been minimized.

@mihaic

mihaic Apr 27, 2017
Contributor

Sure, but not in "io.py".

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

Why not? I think within the io methods are the best places to go.
BTW, I am feeling that io.load_images_from_dir and io.load_images should be merged.

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

anyway, I will leave io.py intact

This comment has been minimized.

@mihaic

mihaic Apr 27, 2017
Contributor

Because they are not useful. Why would a function log an info-level message saying it has been called every time it is called? Such messages could be emitted by the caller, if relevant.

We can discuss merging the load functions, but I would rather not, because "_dir" requires a suffix as well, so we would need to explain that.

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

well, the suffix can be an optional argument. Let's discuss this in the next standup.

@@ -53,11 +53,24 @@
'mask size: %d' %
np.sum(mask)
)
images = io.load_images_from_dir(data_dir, extension)
images = io.load_images_from_dir(data_dir, suffix=extension)

This comment has been minimized.

@mihaic

mihaic Apr 27, 2017
Contributor

It makes more sense to rename extension to suffix here.

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

done

@@ -41,16 +41,30 @@
extension = sys.argv[2]
mask_file = sys.argv[3]
epoch_file = sys.argv[4]
images = io.load_images_from_dir(data_dir, extension)
images = io.load_images_from_dir(data_dir, suffix=extension)

This comment has been minimized.

@mihaic

mihaic Apr 27, 2017
Contributor

Same.

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

done

@@ -58,6 +61,9 @@ def load_images_from_dir(in_dir: Union[str, Path], suffix: str = "nii.gz",
in_dir = Path(in_dir)
files = sorted(in_dir.glob("*" + suffix))
for f in files:
logger.debug(
'starts to read file %s', f
)

This comment has been minimized.

@mihaic

mihaic Apr 27, 2017
Contributor

Please write sentences in log messages, at least in this file.

This comment has been minimized.

@yidawang

yidawang Apr 27, 2017
Author Member

done

@mihaic
mihaic approved these changes Apr 27, 2017
data_list[i] = _randomize_single_subject(data_list[i], seed=i)
elif random == RandomType.UNREPRODUCIBLE:
for i in range(len(data_list)):
data_list[i] = _randomize_single_subject(data_list[i])

This comment has been minimized.

@xoltar

xoltar Apr 28, 2017
Contributor

This is better, but still mixed usage. If you're going to do the change in place, don't return a value. If you're going to return a value, it should be a new value. So if you want to use in-place, it should look like:

for subject in data_list:
    _randomize_single_subject(subject)

So neither _randomize_single_subject nor this function should return a value. Also, note since you're modifying in place, you don't need to assign to indexes, so you can use the simpler for loop.

This comment has been minimized.

@yidawang

yidawang Apr 28, 2017
Author Member

done

seed: Optional[int]
Seed for random state used implicitly for shuffling.
Returns

This comment has been minimized.

@xoltar

xoltar Apr 28, 2017
Contributor

Good! Now just need to remove the Returns section from docs of both of these functions since they don't return anything anymore.

yidawang added 2 commits Apr 28, 2017
@xoltar
xoltar approved these changes Apr 28, 2017
@xoltar xoltar merged commit 3a4c566 into brainiak:master Apr 28, 2017
3 checks passed
3 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
linux Build finished.
Details
macos Build finished.
Details
@yidawang yidawang deleted the yidawang:fcma-random-permutation branch Apr 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants