Random permutation of FCMA #217
Conversation
Thanks, Yida! Looks pretty good, just a few small changes needed. |
raw_data2, _ = _separate_epochs(activity_data2, conditions) | ||
else: | ||
activity_data1 = list(mask_images(images, mask1, np.float32)) | ||
_randomize_subject_list(activity_data1, random) |
xoltar
Apr 27, 2017
Contributor
_randomize_single_subject modifies data in place (and then returns the same array). _randomize_subject_list returns a new list, containing old arrays that have been modified in place, which is why these lines work even though you're ignoring the return value. I would say either write them both to work in-place, or write them both to return new values. Mixing the two makes for confusion. If you decide you want to have _randomize_single_subject return a new array instead of working in-place, consider numpy.random.permutation instead of shuffle.
_randomize_single_subject modifies data in place (and then returns the same array). _randomize_subject_list returns a new list, containing old arrays that have been modified in place, which is why these lines work even though you're ignoring the return value. I would say either write them both to work in-place, or write them both to return new values. Mixing the two makes for confusion. If you decide you want to have _randomize_single_subject return a new array instead of working in-place, consider numpy.random.permutation instead of shuffle.
yidawang
Apr 27, 2017
Author
Member
Thanks for the suggestion! Somehow I ignored the return value here although I should put it as activity_data1 = _randomize_subject_list(activity_data1, random)
. I would prefer working in-place, let me modify _randomize_subject_list
accordingly.
Thanks for the suggestion! Somehow I ignored the return value here although I should put it as activity_data1 = _randomize_subject_list(activity_data1, random)
. I would prefer working in-place, let me modify _randomize_subject_list
accordingly.
Parameters | ||
---------- | ||
data: 2D array in shape [nVxels, nTRs] |
xoltar
Apr 27, 2017
Contributor
nVoxels
nVoxels
yidawang
Apr 27, 2017
Author
Member
done
done
Seed for random state used implicitly for shuffling. | ||
Returns | ||
data: 2D array in shape [nVxels, nTRs] |
xoltar
Apr 27, 2017
Contributor
nVoxels
nVoxels
yidawang
Apr 27, 2017
Author
Member
done
done
@@ -292,8 +374,20 @@ def prepare_searchlight_mvpa_data(images, conditions, data_type=np.float32): | |||
|
|||
for sid, f in enumerate(images): | |||
data = f.get_data().astype(data_type) | |||
[d1, d2, d3, d4] = data.shape | |||
if random == RandomType.REPRODUCIBLE: | |||
data = _randomize_single_subject(data.reshape((d1 * d2 * d3, d4)), |
xoltar
Apr 27, 2017
Contributor
You could make this reshaping logic part of _randomize_single_subject (i.e. have it flatten, randomize, then reshape into whatever shape the original array was).
You could make this reshaping logic part of _randomize_single_subject (i.e. have it flatten, randomize, then reshape into whatever shape the original array was).
yidawang
Apr 27, 2017
Author
Member
I didn't do this because I would like to make the input argument data
of _randomize_single_subject
an nVoxels x nTRs 2D array for clarity purpose.
I didn't do this because I would like to make the input argument data
of _randomize_single_subject
an nVoxels x nTRs 2D array for clarity purpose.
@@ -303,7 +397,7 @@ def prepare_searchlight_mvpa_data(images, conditions, data_type=np.float32): | |||
processed_data[:, :, :, idx] = \ | |||
np.mean(data[:, :, :, epoch[2]:epoch[3]], axis=3) | |||
|
|||
logger.info( | |||
logger.debug( | |||
'file %s is loaded and processed, with data shape %s' % |
xoltar
Apr 27, 2017
Contributor
logger does formatting itself, you can just do:
logger.debug('file %s is loaded and processed, with data shape $s', f.get_filename(), data.shape)
logger does formatting itself, you can just do:
logger.debug('file %s is loaded and processed, with data shape $s', f.get_filename(), data.shape)
yidawang
Apr 27, 2017
Author
Member
done
done
@@ -163,7 +163,8 @@ def _master(self): | |||
the length of array equals the number of voxels | |||
""" | |||
logger.info( | |||
'Master starts to allocate tasks' | |||
'Master at rank %d starts to allocate tasks' % |
xoltar
Apr 27, 2017
Contributor
Same logging comments apply to this file.
Same logging comments apply to this file.
yidawang
Apr 27, 2017
Author
Member
done
done
@@ -59,6 +62,10 @@ def load_images_from_dir(in_dir: Union[str, Path], suffix: str = "nii.gz", | |||
files = sorted(in_dir.glob("*" + suffix)) | |||
for f in files: | |||
yield nib.load(str(f)) | |||
logger.info( | |||
'file %s is read' % |
xoltar
Apr 27, 2017
Contributor
Here too. No need for str(), either.
Here too. No need for str(), either.
yidawang
Apr 27, 2017
Author
Member
f
is a path here, which is not supported to be the argument of numpy.load()
, and presumably nibabel.load()
, for numpy version before 1.12. @mihaic should be able to further justify this.
f
is a path here, which is not supported to be the argument of numpy.load()
, and presumably nibabel.load()
, for numpy version before 1.12. @mihaic should be able to further justify this.
mihaic
Apr 27, 2017
Contributor
Right. However, Path
s work with string formatting without explicit casting to str
, e.g.:
"test %s" % pathlib.Path("/")
Right. However, Path
s work with string formatting without explicit casting to str
, e.g.:
"test %s" % pathlib.Path("/")
yidawang
Apr 27, 2017
Author
Member
OK, just realized that Bryn was referring the str
usage below this line :) Done.
OK, just realized that Bryn was referring the str
usage below this line :) Done.
@@ -266,10 +345,13 @@ def prepare_searchlight_mvpa_data(images, conditions, data_type=np.float32): | |||
Condition specification. | |||
data_type | |||
Type to cast image to. | |||
random: Optional[RandomType] | |||
Randomize the data within subject or not. | |||
Default NORANDOM |
mihaic
Apr 27, 2017
Contributor
You should not specify the default because you are duplicating the information in the function definition, which is visible both in the code and in the generated docs. (Of course, if you use None
as a placeholder, you should explain what it stands for.)
You should not specify the default because you are duplicating the information in the function definition, which is visible both in the code and in the generated docs. (Of course, if you use None
as a placeholder, you should explain what it stands for.)
yidawang
Apr 27, 2017
Author
Member
done
done
@@ -25,12 +25,15 @@ | |||
|
|||
import nibabel as nib | |||
import numpy as np | |||
import logging |
mihaic
Apr 27, 2017
Contributor
Use alphabetical order, please. I will introduce a checker real soon now®.
Use alphabetical order, please. I will introduce a checker real soon now®.
yidawang
Apr 27, 2017
Author
Member
done
done
logger.info( | ||
'file %s is read', | ||
f | ||
) |
mihaic
Apr 27, 2017
Contributor
I would say you should use the debug level for the messages you added in "io.py".
Also, do not break lines if they do not exceed the maximum line length.
Finally, the other message placement, before yield
, makes more sense, but I would change the wording to "Starting to read [...]" and put a full stop at the end of the message.
I would say you should use the debug level for the messages you added in "io.py".
Also, do not break lines if they do not exceed the maximum line length.
Finally, the other message placement, before yield
, makes more sense, but I would change the wording to "Starting to read [...]" and put a full stop at the end of the message.
yidawang
Apr 27, 2017
Author
Member
done, but I would put some brief logger info
to indicate in the beginning of some stages (reading file, apply masking, separating epochs, etc.)
done, but I would put some brief logger info
to indicate in the beginning of some stages (reading file, apply masking, separating epochs, etc.)
mihaic
Apr 27, 2017
Contributor
Sure, but not in "io.py".
Sure, but not in "io.py".
yidawang
Apr 27, 2017
Author
Member
Why not? I think within the io methods are the best places to go.
BTW, I am feeling that io.load_images_from_dir
and io.load_images
should be merged.
Why not? I think within the io methods are the best places to go.
BTW, I am feeling that io.load_images_from_dir
and io.load_images
should be merged.
yidawang
Apr 27, 2017
Author
Member
anyway, I will leave io.py
intact
anyway, I will leave io.py
intact
mihaic
Apr 27, 2017
Contributor
Because they are not useful. Why would a function log an info-level message saying it has been called every time it is called? Such messages could be emitted by the caller, if relevant.
We can discuss merging the load functions, but I would rather not, because "_dir" requires a suffix as well, so we would need to explain that.
Because they are not useful. Why would a function log an info-level message saying it has been called every time it is called? Such messages could be emitted by the caller, if relevant.
We can discuss merging the load functions, but I would rather not, because "_dir" requires a suffix as well, so we would need to explain that.
yidawang
Apr 27, 2017
Author
Member
well, the suffix can be an optional argument. Let's discuss this in the next standup.
well, the suffix can be an optional argument. Let's discuss this in the next standup.
@@ -53,11 +53,24 @@ | |||
'mask size: %d' % | |||
np.sum(mask) | |||
) | |||
images = io.load_images_from_dir(data_dir, extension) | |||
images = io.load_images_from_dir(data_dir, suffix=extension) |
mihaic
Apr 27, 2017
Contributor
It makes more sense to rename extension
to suffix
here.
It makes more sense to rename extension
to suffix
here.
yidawang
Apr 27, 2017
Author
Member
done
done
@@ -41,16 +41,30 @@ | |||
extension = sys.argv[2] | |||
mask_file = sys.argv[3] | |||
epoch_file = sys.argv[4] | |||
images = io.load_images_from_dir(data_dir, extension) | |||
images = io.load_images_from_dir(data_dir, suffix=extension) |
mihaic
Apr 27, 2017
Contributor
Same.
Same.
yidawang
Apr 27, 2017
Author
Member
done
done
@@ -58,6 +61,9 @@ def load_images_from_dir(in_dir: Union[str, Path], suffix: str = "nii.gz", | |||
in_dir = Path(in_dir) | |||
files = sorted(in_dir.glob("*" + suffix)) | |||
for f in files: | |||
logger.debug( | |||
'starts to read file %s', f | |||
) |
mihaic
Apr 27, 2017
Contributor
Please write sentences in log messages, at least in this file.
Please write sentences in log messages, at least in this file.
yidawang
Apr 27, 2017
Author
Member
done
done
data_list[i] = _randomize_single_subject(data_list[i], seed=i) | ||
elif random == RandomType.UNREPRODUCIBLE: | ||
for i in range(len(data_list)): | ||
data_list[i] = _randomize_single_subject(data_list[i]) |
xoltar
Apr 28, 2017
Contributor
This is better, but still mixed usage. If you're going to do the change in place, don't return a value. If you're going to return a value, it should be a new value. So if you want to use in-place, it should look like:
for subject in data_list:
_randomize_single_subject(subject)
So neither _randomize_single_subject nor this function should return a value. Also, note since you're modifying in place, you don't need to assign to indexes, so you can use the simpler for loop.
This is better, but still mixed usage. If you're going to do the change in place, don't return a value. If you're going to return a value, it should be a new value. So if you want to use in-place, it should look like:
for subject in data_list:
_randomize_single_subject(subject)
So neither _randomize_single_subject nor this function should return a value. Also, note since you're modifying in place, you don't need to assign to indexes, so you can use the simpler for loop.
yidawang
Apr 28, 2017
Author
Member
done
done
seed: Optional[int] | ||
Seed for random state used implicitly for shuffling. | ||
Returns |
xoltar
Apr 28, 2017
Contributor
Good! Now just need to remove the Returns section from docs of both of these functions since they don't return anything anymore.
Good! Now just need to remove the Returns section from docs of both of these functions since they don't return anything anymore.
Thanks in advance for reviewing.