Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some VNL solvers are not thread-safe #230

Open
nslay opened this issue Nov 27, 2018 · 20 comments
Open

Some VNL solvers are not thread-safe #230

nslay opened this issue Nov 27, 2018 · 20 comments
Labels
area:Core Issues affecting the Core module status:Backlog Postponed without a fixed deadline type:Bug Inconsistencies or issues which will cause an incorrect result under some or all circumstances

Comments

@nslay
Copy link

nslay commented Nov 27, 2018

Description

Contrary to v3p's README, some VNL netlib optimizers still have static variables declared in functions. The differences are especially easy to spot when comparing the ITK repository with the VXL repository.

https://github.com/InsightSoftwareConsortium/ITK/blob/master/Modules/ThirdParty/VNL/src/vxl/v3p/netlib/README.md

The v3p_netlib library contains updated netlib code converted with f2c in a thread-safe manner. Sources have been modified to use the v3p_*.h headers in this directory. Types and function names have been mangled to start with a v3p_netlib_ prefix. This distinguishes them from other netlib libraries and keeps the code grouped and isolated.

Example non-thread-safe netlib code in ITK:
https://github.com/InsightSoftwareConsortium/ITK/blob/master/Modules/ThirdParty/VNL/src/vxl/v3p/netlib/opt/lbfgsb.c

Example thread-safe netlib code in VXL
https://sourceforge.net/p/vxl/svn/HEAD/tree/trunk/v3p/netlib/opt/lbfgsb.c

Here's a thread-safe VNL netlib library fix announcement on the VXL mailing list:
https://sourceforge.net/p/vxl/mailman/vxl-maintainers/thread/44CF9607.6090804%40kitware.com/

Maybe someone re-ran f2c and clobbered those changes in ITK?

Steps to Reproduce

  1. In several threads, instantiate one thread-local vnl_lbfgsb solver (vnl_lbfgs is actually thread-safe!).
  2. Run the solver in each thread and hope it crashes by chance.
  3. Or... check that deterministic code is not giving the same output for the same inputs when running multiple threads (example attached).

See attached.
LetsCrashVNL.zip

Try the following:
vnl_lfbgs and observe that it works (line 39)
vnl_lfbgsb and observe that it's not working (lots of different answers).
uncomment num_threads(1) (line 32) directive and observe vnl_lfbgsb works now.

When it works, there should be no "Different answer" messages as this is deterministic code being fed the same input over and over again.

Expected behavior

When presenting the user with a class as the interface to a solver, I would expect isolation between multiple instantiations of the solver.

Actual behavior

Using one instantiation of a solver may have side effects on a different instantiation of the solver.

Reproducibility

Random. May even appear to work!

Versions

ITK 4.13, but I've seen mysterious crashes when using OpenMP with VNL solvers (one per thread) as far back as ITK 4.7.

Environment

Windows 7, 10, Ubuntu 14.04, FreeBSD 11.2

Additional Information

https://sourceforge.net/p/vxl/mailman/vxl-maintainers/thread/44CF9607.6090804%40kitware.com/
https://sourceforge.net/p/vxl/svn/HEAD/tree/trunk/v3p/netlib/opt/lbfgsb.c

This problem may not be unique to vnl_lbfgsb. This problem will obviously affect users of std::thread, pthreads, etc... L-BFGS-B is especially nice as it performs constrained optimization while L-BFGS does not.

@hjmjohnson
Copy link
Member

@nslay vxl has moved to github several years ago. Would you please reference the code on the github vxl site?

https://github.com/vxl/vxl

@nslay
Copy link
Author

nslay commented Nov 27, 2018

Thanks for the prompt response!

I'm quite surprised because the github VXL repository has the same discrepancy as ITK while the sourceforge one is thread-safe. Here is the offending non-threadsafe lbfgsb.c code:

https://github.com/vxl/vxl/blob/master/v3p/netlib/opt/lbfgsb.c

I've downloaded the sourceforge one and attached it.

lbfgsb.txt

Sorry, github doesn't support C files. I renamed it with a .txt extension.

I wonder what happened in the migration to github!

@nslay
Copy link
Author

nslay commented Nov 27, 2018

The very first commit on github is threadsafe. I'll try to find where it changed.
https://github.com/vxl/vxl/blob/ae9c7b5a9253f4ac14f5b247a816b5a1a559d80a/v3p/netlib/opt/lbfgsb.c

@hjmjohnson
Copy link
Member

My guess is : vxl/vxl@3cdc40c

@nslay
Copy link
Author

nslay commented Nov 27, 2018

The massive amount of static variables got reintroduced here (to fix another bug):
vxl/vxl@3cdc40c#diff-b59cc8496f016fff270b123014286c74

@nslay
Copy link
Author

nslay commented Nov 27, 2018

I'm glad VXL is still alive and kicking! It's my favorite numerics library. I thought it was abandoned or was just extremely mature when I saw that the last commit was in 2013 on sourceforge.

Google pointed me to VXL's sourceforge (first result). Github wasn't anywhere in the search results!

@thewtex thewtex added type:Bug Inconsistencies or issues which will cause an incorrect result under some or all circumstances area:Core Issues affecting the Core module labels Nov 27, 2018
@dzenanz
Copy link
Member

dzenanz commented Nov 27, 2018

If somebody has write access to vxl.sourceforge.net, they could add a link to GitHub and state that VXL has moved there.

@hjmjohnson
Copy link
Member

I think that would be @bradking

@nslay
Copy link
Author

nslay commented Nov 27, 2018

What I don't understand is why f2c emits static variables for otherwise local variables in the original Fortran code. I'd expect this for, say, Fortran's common blocks (yuck!), but not these variables!

Does Fortran have some sort of guarantee about the states of local variables in functions called multiple times?

@bradking
Copy link
Member

I've opened vxl/vxl#555 to discuss the future of the vxl homepage.

@hjmjohnson
Copy link
Member

@dzenanz FYI: vxl/vxl#555 . VXL homepage moved to github.

@hjmjohnson
Copy link
Member

@nslay f2c probably does not emit static, but f2c often produces a draft of code that subsequently requires modification in order to meet coding standards/ performance/linkage constraints not supported in the original implementation.

It would be fabulous if you could provide a pull request for this codebase that fixes the problem you identified.

both f2c, and the developers that attempted to fix this file are imperfect.

Thanks,
Hans

@nslay
Copy link
Author

nslay commented Dec 8, 2018

Hi Hans,
ITK/VXL developers are not perfect!? I am outraged!

OK. I will work on this problem. I also apparently need to tidy up ITKIOOpenSlide!

Nate

@hjmjohnson
Copy link
Member

@nslay 👍 It was almost certainly me who made those static. Just be careful not to blindly move the old code forward. It may fix thread safety at the expense of re-introducing other problems.

Thanks for looking into this.

@stale
Copy link

stale bot commented Apr 7, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status:Backlog Postponed without a fixed deadline label Apr 7, 2019
@stale stale bot closed this as completed May 7, 2019
@pieper
Copy link
Contributor

pieper commented Sep 12, 2019

hey, shouldn't this be reopened? Sounds like it's not fixed and people have told me it's still and issue. @hjmjohnson @thewtex

@nslay
Copy link
Author

nslay commented Sep 13, 2019

Yeah yeah, I intended to fix this. I will have another look this weekend.

@hjmjohnson
Copy link
Member

@nslay THANK YOU!!!

You may want to look at recent upstream changes to this code base L-BFGS-B-C for fixing this+potentially other bugs. If you find additional bugs, perhaps we should submit them upstream.

@pieper
Copy link
Contributor

pieper commented Sep 14, 2019

@nslay, ditto on @hjmjohnson's THANK YOU!!!

We have some code that can be used to help test when you are ready.

@thewtex thewtex reopened this Oct 1, 2019
@stale stale bot removed the status:Backlog Postponed without a fixed deadline label Oct 1, 2019
@stale
Copy link

stale bot commented Jan 29, 2020

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

@stale stale bot added the status:Backlog Postponed without a fixed deadline label Jan 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Core Issues affecting the Core module status:Backlog Postponed without a fixed deadline type:Bug Inconsistencies or issues which will cause an incorrect result under some or all circumstances
Projects
None yet
Development

No branches or pull requests

6 participants