Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unimodal components of S matrix from MCRALS #9

Closed
lnacquaroli opened this issue Jul 25, 2019 · 23 comments
Closed

Unimodal components of S matrix from MCRALS #9

lnacquaroli opened this issue Jul 25, 2019 · 23 comments
Assignees
Labels
enhancement New feature or request

Comments

@lnacquaroli
Copy link

Is there a way to force the S matrix returned from MCRALS to be unimodal?

@caseykneale
Copy link
Owner

Currently this is unsupported. I could probably write the code for that though! Is this important for your work here?

@lnacquaroli
Copy link
Author

I noticed that when I do use it (tested with a Matlab toolbox) it does a great job indeed. Otherwise, I need to struggle to get the results I intend. But no worries, I understand priorities!

@caseykneale
Copy link
Owner

I'll look into this for you and see how difficult it is to enforce and see if I can get it on-line this weekend. Unfortunately scientific literature is behind a pay-wall for me right now so I may need to do some poking around.

I checked your github repo's. We actually have some similar interests. I'll see what I can find and get back to you

@caseykneale caseykneale added the enhancement New feature or request label Jul 26, 2019
@caseykneale caseykneale self-assigned this Jul 26, 2019
@lnacquaroli
Copy link
Author

Sounds great! If you need some specific literature, I can manage to get it. Just let me know.

@lnacquaroli
Copy link
Author

Thanks again!

@caseykneale
Copy link
Owner

I did find a highly relavent paper by one of the world experts in this sort of thing: "Least squares algorithms under unimodality and non‐negativity constraints". I may be able to message him to see if he can share the requisite information. If not I'll pester you for the details :)

@lnacquaroli
Copy link
Author

I read some of his work. I recently used the drEEM toolbox they developed. Notice that the paper you are looking for (telling by the name) is available here for download.

@caseykneale
Copy link
Owner

Thanks, shouldn't be toooo bad to implement. Very interesting work here, I never bothered to look into how this was done. I should be able to implement it - it's really nice they actually have matlab code for this available! So I can make sure whatever I cook up matches their work. You may be my guinea pig though.

@caseykneale
Copy link
Owner

caseykneale commented Jul 28, 2019

Have it working on some dumby data going to try to implement it and pop it on master for testing. Why on master? because it's easier to try out and it's my repo so I can do that sort of thing :D

Do you have a set of nonproprietary data to test this with? I'm kind of flying blind here. What I'll do is make sure that I can get no errors in running the code then I'll post it for you to test with. I can't do much more than that right now...

@caseykneale
Copy link
Owner

caseykneale commented Jul 28, 2019

Okay the constraint is "tested" and now it's on master so feel free to git clone and try it. The docs aren't updated yet because I want to clean this thing up, but here's how they will read
"""
MCRALS(X, C, S = nothing; norm = (false, false), Factors = 1, maxiters = 20, constraintiters = 500, nonnegative = (false, false), unimodalS = false, fixedunimodal = false )
Performs Multivariate Curve Resolution using Alternating Least Squares on X taking initial estimates for S or C.
S or C can be constrained by their norm, or by nonnegativity using nonnegative arguments. S can be constrained by unimodality(EXPERIMENTAL).

The number of resolved Factors can also be set.

Tauler, R. Izquierdo-Ridorsa, A. Casassas, E. Simultaneous analysis of several spectroscopic titrations with self-modelling curve resolution.Chemometrics and Intelligent Laboratory Systems. 18, 3, (1993), 293-300.
"""
Please set constraintiters to a small number if you have a lot of variables (maybe 10?). The current algorithm is really slow. Using the fixedunimodal algorithm should be MUCH faster, but I still don't know if this thing is working at all... Please try it and let me know if its really painful to use or fails. On my data it appears to work...

@lnacquaroli
Copy link
Author

I will give a try tomorrow and let you know. Thanks a lot!

@caseykneale
Copy link
Owner

No worries about thanking me until the thing works and runs fast enough to be used. But if it works (fingers crossed) I'd be really happy. Usually this sort of thing takes 2-3 cracks I'd call this Version 0.5, I think my SIMPLISMA algorithm is a little wonky too I may have to touch that up when I get a chance as well...

@lnacquaroli
Copy link
Author

lnacquaroli commented Jul 29, 2019

So, I tried the master and it run well. No problems so far, I get good results. I ran the NMF to get the initial conditions instead of the SIMPLISMA.

FYI:

@time ( C_MCRALS, S_MCRALS, err ) = MCRALS(X', W_NMF, H_NMF; norm=(true, false), Factors=2,
                                     maxiters=10, constraintiters=5,
                                     nonnegative = (true, true), unimodalS = true,
                                     fixedunimodal = false )

98.860419 seconds (1.06 G allocations: 24.060 GiB, 3.33% gc time)
julia> size(X')
(11, 1000)

For improving the code (I am far from an expert here, so disregard this if you see anything foolish):

  • Notice the amount of allocations in the algorithms, this may actually take a lot of time in there. Maybe BenchmarkTools can give better insight. Also, I have not tried @code_warntype.
  • For the nonnegativity constraint,
    S = map(x -> (x < 0.0) ? 0.0 : x, S)

I have seen this constraint written this way in some code from Bro for instance, so I guess is fine. You may just consider changing

julia> S = randn(1000,11);
julia> @time S = map(x -> (x < 0.0) ? 0.0 : x, S);
  0.037114 seconds (67.65 k allocations: 3.390 MiB)

to:

julia> S = randn(1000,11);
julia> @time S[S.<0.0] .= 0.0;
  0.000098 seconds (20 allocations: 48.516 KiB)
  • Consider using @inbounds (for loops) to speed up iterations, ONCE YOU KNOW THEY WORK
  • Use abs2.(X .- D) instead of ( X .- D ) .^ 2, since it is optimised
  • Blas and LAPACK functions from LinearAlgebra sometimes are a bit more optimised as well, but you may need to rearrange a good deal of the code. Not needed now I guess.

One thing left in the list is, the fixedunimodal argument that I did not understand. I have not had enough time to get to all the code, but it is robust so far. There is always room for improvements.

Adding types to the code may also help to efficiency.

Hope that helps.

@caseykneale
Copy link
Owner

I'm really glad that it actually ran and seems to be giving the correct results.

You said nothing foolish whatsoever! I'm still learning myself - thanks for the tips. I'll pop those fixes into place - except the @inbounds. That one I may save for after I get good unit tests in place. I need to pepper the entire package with that. I've been waiting for later versions of the package before doing lots of optimization, but the MCR-ALS routine I wrote could use some attention - its not performant and it's an important tool to have to get people to make the switch to Julia.

So "fixedunimodal" assumes that at a given maximum position the left-hand-side and right-hand-side of the spectra are increasing and decreasing. For simple line-shapes this is likely fine with good peak reproducibility this should hold up well enough. While the other approach is my attempt at solving for the optimal place to put the two isotonic regressions together via least squares using some hueristics. I think Bro et al did something a bit more fancy, but for me this was "good enough" to try out. I really need to buff the documentation.

Thanks for testing and the advice. Alright I'll do a little fine tuning, then I'll probably cut a new release if everything looks okay. I've added a bit of functionality since my last release that makes it worth doing.

@caseykneale
Copy link
Owner

Ah yea, so I made those changes and it did reduce the load/time but only marginally so.
Using fixedunimodal reduces the run time by 100x for my data.
I now know where the slow step is and I think I can fix it by making the isotonic regression "online" I expect we'll see a 50x gain in performance if I can make that happen... I won't push/cut a release until I at least try to write that.

@lnacquaroli
Copy link
Author

Great you spot that out. Indeed it was extremely slow compared to the Matlab toolbox I mentioned above, but it does not seem to be so different though. I will give another try with fixedunimodal=true.

Just another quirk: given the times slices of arrays are used in the code, using @view may also help to speed it up. I have not tried this anyway.

@caseykneale
Copy link
Owner

I removed all of the copy() statements after seeing your hint with the @view decorators and am seeing huge speed ups. I've been writing in a language outside of Julia lately for my day job so I'm a little rusty. Interestingly, the first run takes about 10x the subsequent ones(even if the function arguments are changed). So stating types and helping the compiler out a bit will probably speed things up to what it needs to be competitive.

Can you check again that everything is working? I'm getting reasonable results, but I haven't had time to check for bugs yet.

@lnacquaroli
Copy link
Author

I removed all of the copy() statements after seeing your hint with the @view decorators and am seeing huge speed ups.

Great! Yup, Julia is quite something on this.

Interestingly, the first run takes about 10x the subsequent ones(even if the function arguments are changed). So stating types and helping the compiler out a bit will probably speed things up to what it needs to be competitive.

The type definition may help as well, but I would not count much on it regarding the first run of the code, since is mostly due to the JIT overhead of the precompilation every time you run the code the first time. This is a bit of a curse in Julia and they are trying to handle it (see here, here and in the discourse site also).

So I tried again the new master and it works fine with the unimodalS = false, fixedunimodal = true. Whenever I set the unimodalS = true, things get weird.

With unimodalS = false:

( C_MCRALS, S_MCRALS, err ) = MCRALS(X), W_NMF, H_NMF; norm=(true, false), Factors=2, maxiters=20, constraintiters=5, nonnegative = (true, true), unimodalS = false, fixedunimodal = true )

Correct results: err, C_MCRALS and S_MCRALS, respectively:

Figure_1
Figure_2
Figure_3

With unimodalS = true:

( C_MCRALS, S_MCRALS, err ) = MCRALS(X), W_NMF, H_NMF; norm=(true, false), Factors=2, maxiters=20, constraintiters=5, nonnegative = (true, true), unimodalS = true, fixedunimodal = true )

Weird results: err, C_MCRALS and S_MCRALS, respectively:

Figure_1b
Figure_2b
Figure_3b

Not sure I am using them correctly.

@caseykneale
Copy link
Owner

caseykneale commented Jul 30, 2019

Ah okay, then I do need a copy in there... The question is which ones .. let me poke around and get it back to working order. Thanks again for working with me.

Let me cook up a dumby example very similar to yours and get it working in that way. Forgot I have this whole section of the package for making kernel density line shapes and things.

@caseykneale
Copy link
Owner

Let me clarify something:
unimodalS tells MCR-ALS to use a unimodal constraint
fixedunimodal if MCR-ALS is using a unimodal constraint we can change the algorithm to fixed(true) or not fixed(false)

So in the example you gave not using the unimodal constraint worked best. Which is funny, but I think I am seeing the issue.

I am running into issues similar to what you're depicting with your images. I think these issues run deeper then the performance changes I made. Unless you can confirm everything worked great prior to the last commit to master - if so I will just revert back in the interim time.

Sorry I need more time to debug this thing maybe by Thursday I'll have a working version that's somewhat performant. Alas - Day job begins.

@lnacquaroli
Copy link
Author

So in the example you gave not using the unimodal constraint worked best. Which is funny, but I think I am seeing the issue.

Yes, was confusing.

I would not revert the master, I think is fine. Just the clarification could be done as well in the docs for information.

@caseykneale
Copy link
Owner

caseykneale commented Jul 30, 2019

Well I got it working again the bug was pretty goofy I won't even bore you with it. Should be a good deal faster then before, but still nothing amazing. Here's the code I used to test it:

SpectraBank = [ Universe(1, 300; bins = 300) for i in 1:10 ];
for i in 1:10
    SpectraBank[i](GaussianBand(20, i, 222))
    SpectraBank[i](GaussianBand(30, (10-i), 101))
end

RawSpec = SpectralArray(SpectraBank);
RawSpec = map(X -> (X < 1e-3) ? 0.0 : X, RawSpec );
Plots.plot(RawSpec', legend = false)

(C_Simplisma, S_Simplisma, vars) = SIMPLISMA(RawSpec; Factors = 5)
cuts = S_Simplisma[ [3,5], :]./ [1.0,20.0];
#Plots.plot(cuts')
#Find purest variables that are not neighbors with one another
@time ( C_MCRALS, S_MCRALS, err ) = MCRALS(RawSpec, nothing, cuts;
                                    Factors = 2, maxiters = 100, constraintiters = 200,
                                    norm = (true, false),
                                    nonnegative = (true, false),
                                    unimodalS = true, fixedunimodal = false )
Plots.plot(err)
Plots.plot(S_MCRALS')
Plots.plot(C_MCRALS)

The latest version should be on Master. Please let me know if anything funky happens. I will update the docs for MCR-ALS once I get a chance too. You're definitely right the documentation is confusing. I have a feature or two to add then I'll cut a new release.

@lnacquaroli
Copy link
Author

I just tried again, looks good to me now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants