Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated how drift is calculated and how masking is done (Ready for review) #244

Merged
merged 53 commits into from Nov 10, 2017

Conversation

CameronTEllis
Copy link
Contributor

No description provided.

@mihaic mihaic requested a review from cbaldassano July 21, 2017 23:17
@mihaic
Copy link
Member

mihaic commented Jul 25, 2017

@CameronTEllis, if you plan to continue making changes to this PR, it would be useful to add a tag to the title, such as "WIP" (work in progress), to signal to the reviewers it is not ready for review. I will add this suggestion to CONTRIBUTING.rst.

@CameronTEllis CameronTEllis changed the title Updated how drift is calculated and how masking is done Updated how drift is calculated and how masking is done (Work in Progress) Jul 25, 2017
@CameronTEllis CameronTEllis changed the title Updated how drift is calculated and how masking is done (Work in Progress) Updated how drift is calculated and how masking is done (ready for review) Jul 28, 2017
@mihaic
Copy link
Member

mihaic commented Aug 8, 2017

@CameronTEllis, feel free to contact @cbaldassano or ask others to review if you want this merged faster.

@CameronTEllis CameronTEllis changed the title Updated how drift is calculated and how masking is done (ready for review) Updated how drift is calculated and how masking is done (work in progress) Sep 18, 2017
CameronTEllis added 6 commits September 21, 2017 19:32
@CameronTEllis CameronTEllis changed the title Updated how drift is calculated and how masking is done (ready for review) Updated how drift is calculated and how masking is done (work in progress) Oct 16, 2017
@CameronTEllis CameronTEllis changed the title Updated how drift is calculated and how masking is done (work in progress) Updated how drift is calculated and how masking is done (Ready for review) Oct 22, 2017
@CameronTEllis
Copy link
Contributor Author

@mihaic I am unsure what the error with Travis is. It says I am using the wrong version of Brew. Please advise the appropriate next steps.

Otherwise, this is ready for review e.g. @cbaldassano @lcnature

@mihaic
Copy link
Member

mihaic commented Oct 23, 2017

@CameronTEllis, thanks for pointing out the Travis error. It not related to your PR. I am looking for a workaround (issue #274), but we can merge your PR irrespective of it, as soon as we get a review.

Copy link
Collaborator

@cbaldassano cbaldassano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comments cover most of the main changes. I also looked at the notebook and that looks good to me. Congrats Cameron on finally getting this PR together!

""" Output an epoch file, necessary for some inputs into brainiak

This takes in the time course of stimulus events and outputs the epoch
file used in Brainiak.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you say here what the definition of an epoch is in this context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, Thank you so much for looking over this. Your comments are fantastic and much appreciated.

Secondly, I have added more description

How big is the response relative to the peak
# Cycle through the participants, different entries in the list
epoch_file = [0] * len(stimfunction)
for participant_counter in list(range(0, len(stimfunction))):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confused by the syntax here and in other loops like one below, is this different than just:

for participant_counter in range(len(stimfunction)):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified syntax here and below

----------

multi dimensional array
The time course of the HRF convolved with the stimulus function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? It looks like this function is just the HRF itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find, corrected

Parameters
----------

stimfunction : timepoint by timecourse array
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does stimfunction have one number per TR, or is it at the temporal resolution specified in the function (e.g. 1000Hz)? If it is at a different temporal resolution than the HRF, I don't understand how you can convolve them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, made simpler


float

What is the max activity measured here, a point of reference for masking
float

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird character here, maybe a Windows carriage return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find this issue- as you say, it might be Windows specific.


# Calculate the fwhm on a subset of volumes

if volume.shape[3] > 100:
# Take only 100 shuffled TRs
trs = np.arange(volume.shape[3])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace these three lines with
trs = np.random.choice(volume.shape[3], size=100. replace=False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thank you

@@ -959,77 +1223,129 @@ def calc_noise(volume,
else:
trs = list(range(0, volume.shape[3]))

# Go through the trs and pull out the fwhm
# Go through the trs and pull out the fwhm and snr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused by this comment - looks like snr is being computed after this loop, not just on the sampled trs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, didn't update from previous edits

# Since you are combining spatial and temporal noise, you need to
# subtract the variance of the two to get the spatial sd
if spatial_sd > temporal_sd:
spatial_sd = np.sqrt(spatial_sd ** 2 - temporal_sd ** 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reference for this? It makes intuitive sense but it is not totally obvious to me, e.g. should there be a factor here for the fact that there are 3 spatial dims and only 1 temporal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there is no reference here. This is part of the tool that I am least sure about. It was a very difficult problem to model noise that corresponded to the brain (things like autoregression and spatial noise) while independently manipulating system noise. The solution I had was to distinguish them as here but I am not confident this is the optimal way. I played with a number of alternative procedures and they all seemed sub optimal but I am open to suggestions.

@@ -1444,42 +1789,56 @@ def _generate_noise_temporal(stimfunction_tr,
# Finally, z score each voxel so things mix nicely
noise_temporal = stats.zscore(noise_temporal, 3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zscoring when some variables have 0 variance may cause some warnings to be thrown - not necessarily a problem, but could be solved by using a masked array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point. I will put this on the list for future things to fix

@@ -1524,48 +1882,77 @@ def mask_brain(volume,
zoom_factor = (brain_dim[0] / mask_dim[0],
brain_dim[1] / mask_dim[1],
brain_dim[2] / mask_dim[2],
1)
)

# Scale the mask according to the input brain
# You might get a warning but ignore it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part is throwing a warning? If it is not worth fixing, we should at least be more specific so that meaningful warnings don't get mistakenly ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is when zoom factor values are non integers. I have checked that the functionality is still present even with the warnings so added a comment describing this and saying why they can be safely ignored

Copy link
Contributor

@lcnature lcnature left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @CameronTEllis Sorry for my delayed review.
All these are great works! Just a few comments for your consideration.


generate_stimfunction
Create a function to describe the stimulus onsets/durations
Create a timecourse of the signal activation. This can be specified using event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth mentioning this function is before convolution with HRF, and has higher temporal resolution than TR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, thank you so so much for all of these sophisticated and in depth comments. You really dived deep into the code and I appreciate the scrutiny greatly!

Secondly, good suggestion, added


double_gamma_hrf
Convolve the stimulus function with the HRF to model when events are expected.
export_epoch_file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth explaining what an epoch file is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more detail

@@ -85,18 +107,17 @@ def _generate_feature(feature_type,
thickness=1):
"""Generate features corresponding to signal

Generate signal in specific regions of the brain with for a single
volume. This will then be convolved with the HRF across time
Generate a single feature, that can be inserted into the signal volume
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the feature means uniform activation in a region of specified shape here. The word "feature" itself is a bit vague to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, added detail

@@ -109,7 +130,7 @@ def _generate_feature(feature_type,
----------

3 dimensional array
The volume representing the signal to be outputed
The volume representing the signal

"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 145 signal = np.ones(np.power(feature_size, 3)) to line 150 may be reduced to one command of np.ones. You can directly create a 3-d array with np.ones (by making the size a tuple or list). The same applies for other feature_types

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth throwing some warning message to log files when loop reduces to a disk or when a cavity reduces to a sphere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions, added both

idx_n = round(event_durations[onset_counter] * temporal_resolution)
stimfunction[onset_idx:offset_idx] = [weights[onset_counter]] * idx_n
idx = offset_idx - onset_idx
stimfunction[onset_idx:offset_idx, 0] = [weights[onset_counter]] * idx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the multiplication by idx might not be necessary.
stimfunction[onset_idx:offset_idx, 0] = weights[onset_counter] might just work

@@ -1189,6 +1510,7 @@ def _generate_noise_temporal_phys(timepoints,
----------
one dimensional array, float
Generates the physiological temporal noise timecourse

"""

noise_phys = [] # Preset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (not critical): random phases may be added to the respiratory radian and heart radian.
I am personally against z-scoring. Although it makes the sample variance equal across any time series generated, essentially every time you generate noise, a different coefficient is multiplied due to the normalization. So there is no guarantee that the sinusoidal component has the same magnitude across different samples drawn. But this is personal flavor.
I also wonder whether the addition of Gaussian noise here is necessary. Isn't Gaussian noise also added somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, good find, that was actually what I intended to do but did it wrong. In terms of z scoring, I do it in order to make the combination of noise types more tractable. When they are z scored I know how to mix their relative values in order to fit SFNR and SNR

Copy link
Contributor

@lcnature lcnature Nov 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For periodic signal, you might get away with analytically calculating the expected standard deviation with magnitude/sqrt(2) according to this wiki page:
https://en.wikipedia.org/wiki/Root_mean_square
That way you do not need to z-score. But of course there may be issue of aliasing. But I think as long as ratio between the frequency of breathing and the fMRI sampling frequency is not a regular number, this estimate should OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what the current issues are with zscoring: at the moment it doesn't really matter that I do it because already the mean is ~0 and std is ~1 but I could remove it if you think it is necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more for reproducibility concern - that you always have the same expected summary statistics for any given type of noise. (i.e., when a user say I simulated certain noise from certain generative model with certain parameters, the parameters that the user put in indeed governs how the samples drawn). I agree it is not a critical issue. So it is up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will leave it as is but I appreciate the concern. Given that the critical output of this is the summed response of a respiratory and heart rate oscillation I think the code does what is necessary

Returns the FWHM of each TR in mm"""
Returns the FWHM of each TR in mm

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the math of calculating fwhm. But I am a bit uncertain of the meaning of taking absolute value in Line 952 v_sum += np.abs(volume[x, y, z]). And also I do not understand the last three equations before return. Of course if you are confident this is correct, then you can ignore this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fairly confident that this is correct and I have checked it using external fwhm calculators

temporal_noise noise_dict entry. To change the relative mixing of the
noise components, change the sigma's specified below.
Generate the time course of the average brain voxel. To increase or
decrease the amount of total noise then change the SFNR noise_dict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems "then" can be replaced by "," instead. But actually are SFNR or noise_dict inputs to this function? Where should the user change them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this was confusing. THose values aren't inputs. I had that sentence there to say what you can/should do with the output of this function but I removed it for clarity

What is the full width half max of the gaussian fields being created
to model spatial noise.

motion_sigma : float
Copy link
Contributor

@lcnature lcnature Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making the description of this type of noise a bit clearer. For example, making it clear that the noise is assumed to only occur during a task event and its magnitude is modulated by the weight of the task event time course. The users might not immediately understand why the variable name has a "motion" in it, and what "pre-processing" here means (aka., I am a bit confused by these :P).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, made the change

noise_system = _generate_noise_system(dimensions_tr=dimensions_tr)
# Calculate the sd that is necessary to be combined with itself in order
# to generate the temporal_sd
temporal_sd_element = np.sqrt(temporal_sd ** 2 / 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no concrete suggestion in this comment. It is just suggestion for future consideration.
It seems intuitive here to divide half of the temporal noise variance to noise_temporal and half to the temporal part of noise_system if we do not have much information. But this division of half and half seems a bit arbitrary. Is it helpful to actually allow the user to specify the magnitudes of the noise_temporal and noise_system separately? Maybe very few users would have a good intuition of what number they should put in if they have a choice. For future development, I think it may be possible to actually estimate these magnitudes from data. I think if you specify the belief of the distribution of the two types of noise, there may be ways to estimate their magnitudes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an astute point to bring up. This is related to my last comment from Chris above. Basically I agree with you and think that what I am doing is definitely a massive simplification. Assuming equality between the system and the non-system noise is a big (fallacious assumption). This is something I would love to improve upon in the future

@mihaic
Copy link
Member

mihaic commented Nov 8, 2017

@CameronTEllis, I assume you will need some time to address the helpful comments you received, so I am already preparing a new release in PR #295, without this code. Please let me know if I'm making a wrong assumption.

@CameronTEllis
Copy link
Contributor Author

CameronTEllis commented Nov 8, 2017 via email

@mihaic mihaic mentioned this pull request Nov 8, 2017
@@ -180,6 +175,10 @@ def _generate_feature(feature_type,
# Subtract the two disks to get a loop
loop = outer != inner

# Check if the loop is a disk
if np.all(inner is False):
logger.warn('Loop feature is actually a disk')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be logger.warning or logger.warn?
I would make the message something like "Loop feature reduces to a disk because the inner radius is too small"
Similarly for the others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestions

file used in Brainiak. The epoch file is a way to structure the timing
information in fMRI that allows you to flexibly input different stimulus
sequences. This is a list with each entry a 3d matrix corresponding to a
participant. The dimensions condition by epoch by time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence may be changed to "The dimensions of the 3d matrix are ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion

different condition. For this to be able to partition the start and
end of an epoch it is necessary that there is a change in value in
the function between the two epochs: if they are coded with the same
and weight and there is no time between blocks then this won't be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "and" may be redundant. Please double-check the grammar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, made this section more clear

# Down sample the stim function
stride = tr_duration * temporal_resolution
stimfunction_temp = stimfunction_ppt[:, condition_counter]
stimfunction_temp = stimfunction_temp[::int(stride)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't think this is a problem. A 0.01 s (I assume you meant s instead of ms) stimulus would predict an HRF almost indistinguishable from that of a 1 s stimulus multiplied by 1/100 because of the slow HRF (you can simulate this with your own functions). Such small a magnitude wouldn't influence GLM result in general. But if a 0.5 s stimulus get undetected, then there may be a problem. This might be left for future due to time pressure. But I highly suggest fixing this.

Takes in a string describing the hrf that ought to be created.
Can instead take in a vector describing the HRF as it was
specified by any function
hrf_type : str or list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there may be some requirement of brainiak code formatting that default values should be explicitly documented. But for this case, I suggest at least documenting the possible values that can be accepted. For example, which strings are acceptable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some details

@@ -1476,7 +1496,7 @@ def _generate_noise_temporal_autoregression(timepoints,
random = np.random.normal(0, 1)
temp.append(past_trs * past_reg + random)

noise_autoregression.append(np.mean(temp))
noise_autoregression.append(np.mean(temp))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that there is some issue here by adding Gaussian noise multiple times and taking the mean of temp. I think if you generate AR(2) noise with the current equation and recover the rhos with functions in nitime package, you might find both the rhos and variance being half of the true values.
It should be done by summing the product of the past signals and their corresponding rho coefficients, and adding only the Gaussian noise only once, but no averaging. The way to generate first p (p being the order) sample is more complex for p>1.
I think this document is correct that you can refer to:
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0ahUKEwjZ7bu-yLDXAhUCziYKHUwxAaQQFghAMAQ&url=http%3A%2F%2Fwww.stat.washington.edu%2Fcourses%2Fstat520%2Fwinter99%2Far-recipe%2Far-recipe.ps&usg=AOvVaw2wGB5KOz-LlYyNxTjLitFy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha good find, I now see the problem. I did some texting with the updated code and it seems improved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CameronTEllis One thing I noticed is the default rho is 1. You probably do not want this default value because you will end up generating Brownian motion with the variance ever growing. Some number smaller than 1 is perhaps more sensible. Please also keep in mind the issue with AR(p) and fix it in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CameronTEllis, this seems to be the last sticking point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the value to 0.5. It is a good compromise to avoid drift while still retaining autoregressive properties.

@@ -1189,6 +1510,7 @@ def _generate_noise_temporal_phys(timepoints,
----------
one dimensional array, float
Generates the physiological temporal noise timecourse

"""

noise_phys = [] # Preset
Copy link
Contributor

@lcnature lcnature Nov 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For periodic signal, you might get away with analytically calculating the expected standard deviation with magnitude/sqrt(2) according to this wiki page:
https://en.wikipedia.org/wiki/Root_mean_square
That way you do not need to z-score. But of course there may be issue of aliasing. But I think as long as ratio between the frequency of breathing and the fMRI sampling frequency is not a regular number, this estimate should OK.

@CameronTEllis
Copy link
Contributor Author

CameronTEllis commented Nov 10, 2017 via email

@lcnature
Copy link
Contributor

@mihaic Given the time pressure, I think it is fine to leave the code as is for the concerns I have raised and in the expectation that @CameronTEllis will improve it in future PR.

@mihaic
Copy link
Member

mihaic commented Nov 10, 2017

@lcnature, thanks for your input. @cbaldassano, you also requested changes; what do you think about the code?

Copy link
Collaborator

@cbaldassano cbaldassano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.