Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF : nlmeans now support arrays of noise std #572

Merged
merged 4 commits into from
Sep 21, 2015

Conversation

samuelstjean
Copy link
Contributor

So that's it, the upper wrapper function now also supports 3D arrays in addition to a single value for the nosie standard deviation. All the inner functions now also works on arrays, but that is transparent ot the user. In addition, one can now supply his own variable estimation, such as a volume computed with piesno for example.

All the test passes for whatever it's worth, and I also ran it on a T2 image I had lying around, seems to work nicely in the same way as the old one.

@@ -18,14 +18,14 @@ def test_nlmeans_padding():

def test_nlmeans_static():
S0 = 100 * np.ones((20, 20, 20), dtype='f8')
S0n = nlmeans(S0, sigma=1, rician=False)
S0n = nlmeans(S0, sigma=np.ones((20, 20, 20)), rician=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't replace the test here. Instead add the new test, together with the old one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are defined on the inner nlmeans cython function, while the wrapper nlmeans sets everything for you. Because of this, I had to change the tests or they would just not work since the inner mechanic is different.

The 3D/4D caller function still work as-is, which is what people are supposed to be calling.

@stefanv stefanv modified the milestone: 0.9 Mar 4, 2015
@@ -36,7 +36,7 @@ def nlmeans_3d(arr, mask=None, sigma=None, patch_radius=1,
"""

if arr.ndim != 3:
raise ValueError('arr needs to be a 3D ndarray')
raise ValueError('data needs to be a 3D ndarray', arr.shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for replacing arr with data here. Except if you change arr with data in the parameters. But if you do that then we will need to writing down in API changes.

@Garyfallidis
Copy link
Contributor

Hello, sorry for delaying to review this. I think an important issue here is that there is no benchmark.

I would like to know how much performance we lose by adding a block of sigmas by default. Can you write a benchmark were you compare the execution speed between having one sigma and an array of sigmas. Please use a full brain (DWI and T1) to get more realistic timings.

If the speed gets worse we need to support both options. Have sigma as one number or as an array so that at least the people who were using nlmeans will not see any delays in the execution time.

@samuelstjean
Copy link
Contributor Author

for a 2mm stack of 10 dwis, old run time is 2 mins, and new runtime is 2 mins 40 s. For a t1 it's somethibng like 1 min 20s and 1 min 40s I think. So it would better to have two code paths, but that adds a lot of code duplication of course.

@fmorency
Copy link

Since #576 should be merged soon, the next step in order to be able to use piesno easily is this PR. I will review the code and I would like to relaunch the discussion about the benchmark results. @Garyfallidis, what's your call on this?

@arokem
Copy link
Contributor

arokem commented Mar 19, 2015

Closed in favor of #576

@arokem arokem closed this Mar 19, 2015
@samuelstjean
Copy link
Contributor Author

Why was this closed? These two issues are mostly unrelated. So far, (to my knowledge), no functions accept a variable noise estimation array as is provided by piesno. Enhancing nlmeans with this fix would actually make it the first one to use such an estimation, which we can add such support for stuff like the restore fitting algorithm.

@arokem arokem reopened this Mar 19, 2015
@arokem
Copy link
Contributor

arokem commented Mar 19, 2015

Sorry - my bad.

@arokem
Copy link
Contributor

arokem commented Mar 19, 2015

I meant to close #390, of course.

On Thu, Mar 19, 2015 at 12:19 PM, Samuel St-Jean notifications@github.com
wrote:

Why was this closed? These two issues are mostly unrelated. So far, (to my
knowledge), no functions accept a variable noise estimation array as is
provided by piesno. Enhancing nlmeans with this fix would actually make it
the first one to use such an estimation, which we can add such support for
stuff like the restore fitting algorithm.


Reply to this email directly or view it on GitHub
#572 (comment).

@samuelstjean
Copy link
Contributor Author

oh well, it's only the first in refactoring a bunch of stuff, but it will at least provide a bsis example on how to do it and possible improvements it can bring. This may be less apparent on 1.5T data, but you can actually see the noise varying spatially on 3T data as you move away from the coils to the deep nucleis ;)

@Garyfallidis
Copy link
Contributor

Hi @samuelstjean because the delay is significant the "code duplication" makes sense. Please create a variable that checks if one sigma value is used or an array of sigmas is provided and execute the relevant code accordingly.

Also as you are on it can you address issue #608 ?
The default behaviour should be use all cores.

Keep it up!

@samuelstjean
Copy link
Contributor Author

Maybe someday, it would require two codepath, and I don't have time to do this properly right now with the dicom parser being more useful right now. Plus adding the variable to keep track the number of cores should be relatively easy so that @fmorency can do it.

@Garyfallidis
Copy link
Contributor

Look, if you don't fix this probrem of calling different code for different sigma input I will not merge this PR. I thought it would be trivial for you to fix also what @fmorency issued as you are on this codebase right now and you know what this algorithm does. So, it would be much easier and faster for you than for @fmorency. If you think that it is so time consuming and difficult for you that is okay. But you should definitely fix the issue of calling different code when there is one sigma or many. Otherwise, this PR is incomplete.

@samuelstjean
Copy link
Contributor Author

Yes I agree, I did not mean that is good as it is right now. I'll just let it simmer and come back to it later, like end of april/may/this summer maybe.

In the meantime, adding the number of cores selectign variable could be done separately and is easily doable in my opinion. Plus thetre is a bunch of people doing little fixes for gsoc elegibility, you could have one of them do it to familizarize with the codebase.

@arokem
Copy link
Contributor

arokem commented Jun 24, 2015

Is this PR still a thing? Any chance to rebase it and revisit?

@samuelstjean
Copy link
Contributor Author

I don't know if it should be split in two or what, it pretty much the same except for the array access.
Expliciting in two functions would duplicate stuf, but it would also be more flexible later for adding improvements and whatnot, so I don't know.

@arokem
Copy link
Contributor

arokem commented Jun 25, 2015

See @Garyfallidis comments above.

@samuelstjean
Copy link
Contributor Author

I still think the little added computing time is not worth splitting the code in two and risking changing one without the other, while it would give more flexibility, it could laos just be doing something for not much gain after all, hence I'm still undecided.

@samuelstjean samuelstjean deleted the nlmean_array branch July 13, 2015 13:18
@Garyfallidis
Copy link
Contributor

@samuelstjean did you close this PR intentionally or accidentally?

@samuelstjean
Copy link
Contributor Author

Intentional, it has been simmering since february, so I'll just restart
fresh another day instead by spliting stuff in two.
Le 2015-07-13 09:37, Eleftherios Garyfallidis a écrit :

@samuelstjean https://github.com/samuelstjean did you close this PR
intentionally or accidentally?


Reply to this email directly or view it on GitHub
#572 (comment).

@Garyfallidis
Copy link
Contributor

ok

@arnaudbore
Copy link

I think nlmeans, piesno and even estimate_sigma are not well integrated as a whole.
Is there any update on this part of the code @samuelstjean, @Garyfallidis ? What about NLSAM ?

@arokem
Copy link
Contributor

arokem commented Aug 31, 2015

Thanks for the feedback! Could you please be a bit more specific? What difficulties have you encountered? What would you propose as a rational organization of this part (these parts) of the library? Feel free to give an example of the kind of code you would like to write when using these elements, to demonstrate how you would like to see these things better integrated.

Also - what's NLSAM? Link?

@samuelstjean
Copy link
Contributor Author

Well, easy - this PR adds support for arrays of variance (which is the output of piesno) while nlmeans only support single values. So if estimate_sigma fails, the user is stuck, and he cannot use piesno directly because of limitations to the nlmeans function itself. So basically piesno serves no direct purpose unless you use advanced / custom functions not in mainline dipy. Well, it's just that people who want to use more features usually don't like taking random people branches I guess.

NLSAM is the other denoising algorithm I'm developping, but that's gonna be in 2-3 PR later as I build upon those I made recently (stabilisation PR for example) instead of putting 2000+ lines at once.

@arnaudbore
Copy link

Thank you @samuelstjean for the quick answer. Is there anything to do/code/merge to add these feature for NLMEANS ?

Is there an issue (label as enhancement) about NLSAM to follow its development ?
NLSAM: http://samuelstjean.github.io/pages/downloads/st-jean_et_al_ismrm_2014.pdf

@samuelstjean
Copy link
Contributor Author

Nah it's only in my personal (private) repo and in 2-3 abstracts + my master thesis so far. Until a bunch of other functions are in, it will be missing stuff and I hate doing wip pr as they get drenched with commits which are probably not interesting.

@arnaudbore
Copy link

First thing first, NLMEANS should be able to use piesno output otherwise it doesn't make sense to have a different way to compute sigma and not been able to use it automatically. I don't really understand here what is missing to make it work @arokem, @Garyfallidis, @samuelstjean ?

@arokem
Copy link
Contributor

arokem commented Aug 31, 2015

Thanks - that really clarifies for me. A similar request came up in the
context of this PR: #677, and see post
here:
http://gsoc2015dipydki.blogspot.co.uk/2015/08/rnh-post-12-attempt-to-further-improve.html

IIUC, you mean that you want to be able to do something like:

noise = piesno(data)
denoised = nlmeans(noise, data)

Seems like it should be possible to make that work. At least under some
circumstances (see post from Rafael, and discussion on #677).

On Mon, Aug 31, 2015 at 11:30 AM, arnaudbore notifications@github.com
wrote:

First thing first, NLMEANS should be able to use piesno output otherwise
it doesn't make sense to have a different way to compute sigma and not been
able to use it automatically. I don't really understand here what is
missing to make it work @arokem https://github.com/arokem, @Garyfallidis
https://github.com/Garyfallidis, @samuelstjean
https://github.com/samuelstjean ?


Reply to this email directly or view it on GitHub
#572 (comment).

@arnaudbore
Copy link

Thank you @arokem
This is exactly what I was talking about. Like @RafaelNH in #677 I think we really need to work on the denoising part so people don't get confuse about the way to use these functions (and @samuelstjean if you could keep working on the NLSAM algorithm , it would be G.R.E.A.T).

@samuelstjean , is the support of arrays for nlmeans ready for merging ?

@samuelstjean
Copy link
Contributor Author

Has always been since february.

@Garyfallidis
Copy link
Contributor

Hi all. About the array support. As far as I know Sam hasn't yet made the effort to keep the initial performance when one sigma is added and not an entire array. What I mean is that his introduction of arrays is making the general algorithm slower even if only one sigma is provided. This is bad because we often use nlmeans for very large datasets and it's bad to go for 4 hours to 6 hours although we don't use multiple sigmas.

Also @arnaudbore I think that probably NLSAM will be added after the journal paper has been published.

But yes we should improve what we already have. And do it before the coming release.

@arnaudbore
Copy link

Thank @Garyfallidis, I'm going to follow the development of NLSAM !!
@samuelstjean can you find a way to make it run faster ? Do you need help ? Anything that can help you on this ?

@samuelstjean
Copy link
Contributor Author

The only speed drawback seems that acessing arrays in C is slower than having a single variable, that seems to be the only performance hit incurred. That being said, I also think it's a terrible idea to duplicate the whole thing just because of that, since if anything needs to be fixed/improved in the future, pretty sure one version or the other will be left behind and produce different result because of that.

Now, I have no idea how the two could be used without major copy-paste of stuff and keep only a single version which would branch out in each case as needed, so if you have suggestion feel free to post stuff. And about the speed perofrmance, I don't think it can be made faster as the change is only some lines about memcopy stuff and whatnot.

@arnaudbore
Copy link

@samuelstjean are you 100% sure it will end with different results, did you run some tests ? I understand that it's never a good feeling to have to duplicate code. If it's the only way to not end with 4-6 hours when you have only one sigma let's go for it. I think @Garyfallidis will agree with me if I say that is better to have this in the next release even with duplicate code than not having it at all.

@samuelstjean
Copy link
Contributor Author

Not sure what you meant, I said the problem with two different codepath is that one is bound to be forgotten when the other is changed, thus leading to various problem. Just changing the function to accept an array will produce exactly the same result if the array is constant (I tested it).

About runtime, I more recall something like 2 mins vs 2 min 40 sec on a simple T1 image, no idea is that is 40 secs per volume though, and let's say that T1 have 8 times more voxels than regular DWIs (1mm iso vs 2mm iso). So even taking into account that the change is for all volume, that would rather be 40 s / 8 / 65 DWIs = 5,41 mins more than the regular version, which is not that bad in my opinion.

@Garyfallidis
Copy link
Contributor

Okay, 2' (120sec) vs 2' 40" (160sec) this is 1/4 more time and not 1/3 third more time which is what I said. So, Sam why don't your recalculate the time to make sure that the difference is indeed as small and then reopen the PR?

@samuelstjean
Copy link
Contributor Author

All done, and it does not seems to add much time on a 1.8mm dataset with 65 DWIs.

stjs2902 ~/frank $ [frank] time run_nlmeans.py dwi.nii.gz runtime.nii.gz --noise_est basic --mask b0_brain_mask.nii.gz -N 1
real 5m58.365s
user 21m50.078s
sys 0m1.568s

stjs2902 ~/frank $ [frank] time run_nlmeans.py dwi.nii.gz runtime.nii.gz --noise_est basic --mask b0_brain_mask.nii.gz -N 1
real 6m50.071s
user 25m6.314s
sys 0m3.144s

So, 50 secs more in all, which is almost nothing.

@Garyfallidis
Copy link
Contributor

Okay this is good news. Reopen the PR. Let's have a quick look and then merge asap.


# this line takes most of the time! mem access
d = cache[(B + a) * BS * BS + (B + b) * BS + (B + c)] - cache[(m + a) * BS * BS + (n + b) * BS + (o + c)]
summ += d * d
sigm += sigma_block[(m + a) * BS * BS + (n + b) * BS + (o + c)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lot of computation going on here to obtain the index (m + a) * BS * BS does not change when b and c do. The same goes for (n + b) * BS w.r.t. c. (See also line 169).

I don't know if the compiler already optimize this for you but if you need speed up it's worth trying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really changes that much for 5-6 additions and multiplications?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you try it out and tell me if it does :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saves like 4 seconds after all, so not worth the trouble.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the compiler must already do that trivial optimization then. Thanks for testing it.

@samuelstjean samuelstjean reopened this Sep 1, 2015
@Garyfallidis
Copy link
Contributor

sure okay.

@Garyfallidis
Copy link
Contributor

@samuelstjean you have forgotten to update the docstring for the sigma parameter in nlmeans. It still says that the expected parameter is float but now can be an array of floats correct?

Apart from this. This PR is failing because of the rbf problem which is irrelevant for this PR. So, it feels like I should go ahead and merge it. Except if other people think that I should wait first for the rbf problem. @arokem and @stefanv any input on this? Seems it needs an asap resolution. I am open to suggestions.

@arokem
Copy link
Contributor

arokem commented Sep 12, 2015

Yes - fine for a merge on my end.

On Sat, Sep 12, 2015 at 4:15 PM, Eleftherios Garyfallidis <
notifications@github.com> wrote:

@samuelstjean https://github.com/samuelstjean you have forgotten to
update the docstring for the sigma parameter in nlmeans. It still says that
the expected parameter is float but now can be an array of floats correct?

Apart from this. This PR is failing because of the rbf problem which is
irrelevant for this PR. So, it feels like I should go ahead and merge it.
Except if other people think that I should wait first for the rbf problem.
@arokem https://github.com/arokem and @stefanv
https://github.com/stefanv any input on this? Seems it needs an asap
resolution.


Reply to this email directly or view it on GitHub
#572 (comment).

@samuelstjean
Copy link
Contributor Author

I can correct the doc when I get access to a computer.
On Sep 13, 2015 1:15 AM, "Eleftherios Garyfallidis" <
notifications@github.com> wrote:

@samuelstjean https://github.com/samuelstjean you have forgotten to
update the docstring for the sigma parameter in nlmeans. It still says that
the expected parameter is float but now can be an array of floats correct?

Apart from this. This PR is failing because of the rbf problem which is
irrelevant for this PR. So, it feels like I should go ahead and merge it.
Except if other people think that I should wait first for the rbf problem.
@arokem https://github.com/arokem and @stefanv
https://github.com/stefanv any input on this? Seems it needs an asap
resolution.


Reply to this email directly or view it on GitHub
#572 (comment).

@arnaudbore
Copy link

Ready for merging ;) ?

Garyfallidis added a commit that referenced this pull request Sep 21, 2015
NF : nlmeans now support arrays of noise std
@Garyfallidis Garyfallidis merged commit f0d46ce into dipy:master Sep 21, 2015
@Garyfallidis
Copy link
Contributor

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants