New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shutdown issues with exceptions #15
Comments
Can you elaborate what you mean by this:
Maybe with a code example. I don't quite see the problem with exceptions. I also don't understand how that quote from the I do see your point concerning the exit code. Currently, a script like this one import mph, sys
client = mph.Client()
sys.exit(1) exits with code 0, and not 1 as intended. It surprises me a bit as I didn't think Python would pay any attention to the exit code passed to the JVM inside the exit handler, which is indeed 0. But apparently it does. Maybe some JPype magic takes care of that? Anyway, according to this Stackoverflow question, this issue can be fixed with a little bit of monkey-patching. |
I have to patch up some code for examples, but I will need some days sorry. To give you a quick answer: Running something like above but
and connecting a subprocess will not only yield code 0 but also an empty stderr. Leaving the client away will give you an exception in stderr and an exit code. The emptiness of stderr and exit code 0 makes it virtually impossible to check if a subprocess raised an exception after |
No worries, take your time. I was planning on adding an example to the documentation that demonstrates running multiple processes, either via the As hinted at in #10, I have built a job scheduler based on MPh before. (I'm not at liberty to publish it as I coded that on company time.) It's basically a Qt front-end for running simulations in parallel. The different "workers" were started as a Regarding the JPype issue you opened: It remains a mystery why JPype 0.7.5 worked so much better with Comsol. But, as explained in #1, going back to the old JPype isn't really an option as it won't work with the latest Python versions. Calling |
For reference here is the differences in question From native/common/jp_context.cpp
In the old version there is a message to Java which tells it to free resources, then it goes straight to unload the Library which effectively halts a running JVM without any clean up. Crashes could happen because any calls between the delete of resources and the terminate would access dead objects. In the new version we first issue a destroy which must not return until all Java threads and proxies are terminated. Then once everything is dead we then clean up the resources and then finally unload the JVM. There is a possible argument that we should unload before we kill the resources. That would have made it a bit safer in the old version, but there is still the possibility of an active proxy which would crash if it were waiting with a java resource on the call stack. |
@Thrameos: I don't really know what proxies Comsol might create. (Or what proxies really are, due to my very limited knowledge of Java.) But I did try to terminate the non-daemon threads that Comsol creates. Only that didn't solve the issue either. Here's how far I got. I took this test script and added the following debug code right before calling threads = jpype.java.lang.Thread.getAllStackTraces()
current = jpype.java.lang.Thread.currentThread()
threads = sorted(threads, key=lambda thread: thread.getId())
for thread in threads:
id = int(thread.getId())
name = str(thread.getName())
daemon = '(daemon)' if thread.isDaemon() else ''
print(f'{id:2}: {name:<25} {daemon}', flush=True)
for thread in threads:
if thread != current and not thread.isDaemon():
print(f'Stopping thread {thread.getId()}.')
thread.stop()
while not thread.isInterrupted():
pass
for thread in threads:
id = int(thread.getId())
name = str(thread.getName())
state = '(alive)' if thread.isAlive() else '(stopped)'
print(f'{id:2}: {name:<25} {state}', flush=True) If we comment out the lines in the test script where the Comsol client loads the simulation model from a file, the output is this:
So here the shutdown works fine. But once the model is actually loaded, I get this:
So there are three extra non-daemon threads that Comsol apparently creates when it loads the model into memory. And two daemon threads as well. However, forcing the non-daemon threads to stop doesn't eliminate the shutdown delay. I would conclude from this that the threads as such are not the issue. It must be some other kind of resource. |
Seems like a deadlock issue. There is a java utility to dump all thread states. Unfortunately as this is commercial software it would be difficult to replicate. |
Ill check other options and the detailed discussion later on, but just to give you a quick replicator what my main concern is. Running the following via import sys
from mph import Server, Client
raise RuntimeError('Im very lazy today')
print('Starting')
cServer = server.Server()
port = cServer.port
host = 'localhost'
print('Local server created')
raise RuntimeError('Im lazy today')
cClient = client.Client()
raise RuntimeError('I am at least trying')
model = cClient.load('testVNew_caeModel.mph')
raise RuntimeError('I loaded a model') Depending on where the exception is raised (specifically after the |
I had a call with Support. They will report the bug but state that this is neither a confirmation for a solution nor a timeframe for solution. |
Yeah, we shouldn't be holding our breaths. There is no money in it for Comsol, not even in the long run, because there are certain "contractual reasons" why they cannot work on a Python API. There are clearly differences in behavior between Windows and Unix. The issue you report here is a good example: I can reproduce it on Linux, but not on Windows. If I run this test import mph
client = mph.Client()
raise RuntimeError on Windows (
in the So this (part of the) issue cannot be fixed with a little bit of monkey-patching, it seems. And it's almost certainly related to the JVM shutdown, as starting the client is definitely what triggers this behavior, and it's also at that point that we register the hard |
I guess MathWorks got them by their best parts...A last try to stand the fight against python... So with all the information collected, I forked Jpype and build a flag in which allows the old way. This actually works quite well and does not interfere to much with the rest, since you have to register atexit to set the flag in |
Yes, that's true. I myself refuse anything that requires a C compiler on Windows. It's just too much of a headache. I saw your JPype PR and read Thrameos's excellent explanation. I finally understand what the issue is: It's who goes down first, Python or Java. If they do add some way to configure the shutdown behavior, then that's a pretty easy choice for us since MPh only accesses Java objects from the Python side, not the other way around, so we want Python to die last. Also, in my last comment, I spoke too soon. There is actually a very easy fix for that too. The reason we see no output in the file (but do see it in the console) is the "lazy writing" on Unix-like systems. So all it takes is to flush Edit: I've just committed this fix to |
Yes I was aiming for a sort of fallback solution for locken java processes and by hiding it like this it should not interfere too much. If there‘s a superior solution down the road on either end changing back would be very easy. ...I think I tried that during debugging to no avail, however I am not 100% sure. I’ll check that on Monday, I have left my work for the weekend and intend to not code too much...we will see... ;) If this works, shouldn‘t it be possible to check if something is in stderr and if so set |
I agree, this is not a clean solution. But see if it works for you when you get a chance next week. I've also just committed the monkey-patch for the exit-code issue: 7feee56. This only hooks into |
I dod forget about that use of stderr. Not a good idea then.
Terminating a script with |
By the way Thrameos offered a fix over at Jpype. I will test it on Monday too ... |
Great. I'll happily revert all changes if we get a clean solution in the end. Though in the meantime, I have committed the missing monkey-patch with the exception hook: a88cf7f. |
First of all here too: Thanks for the effort and the great work! I managed to run some tests via VPN which confirmed that the hooks and the exit codes are working. I will do some more testing on Monday. If this is confirmed my suggestion would be to leave this as a fix and if / when jpype adds a shutdown config this can be rolled back. This would prevent having to deal with custom dependencies and compile issues. Also, I feel like Thrameos solution might be vastly superior to my patching. |
Thanks to you for getting the ball rolling again on this issue. I've also run some tests with the latest JPype PR. As far as I can tell (by running the test suite on Windows and Linux), this solves the issue of extended shutdown delays if all we do is I've created a branch named Yes, vendoring in whatever version of JPype is a lot of hassle and also not future-proof. We're better off waiting for its next release. I could release the work-arounds earlier if you want. Or we just discard them and wait for JPype. |
I think this fix is ok and will not interfere to much with anything else so I suggest to release. With the complexity of a package like If you want to wait a few days I have some convenience functions in the model class for common tasks like toggling boundary conditions or adding new file data to interpolation functions. I think I could offer a PR starting next week...However I have to work out how to catch exceptions from java in python, I am still looking for the appropriate classes... |
Okay, I released 0.8.2 with the work-arounds. Yeah, the next JPype release may be a while, especially as it would introduce new API features. Speaking of new API features, I'm open to that, but also want to keep the scope manageable. Like anything needed for general scripting, yes. But nothing too project-specific. So importing data for interpolation functions is a definite yes, but changing boundary conditions maybe not. But we can discuss/negotiate that in a new issue one of these days. |
I have to "reopen" the issue with the JVM shutdown again.
Partly to document an underlying problem which might help someone else with debugging, partly to enforce the importance to fix the current solution.
With the current setup,
atexit
calls.exit(0)
. This shuts down the python interpreter with exit status 0. Additionally, asstated in the atexit docs:
Thus, any exception raised by actually ANY code after importing
client
will be printed, however is lost afterwards. A standard use case at least for me is building python script which then runs a model by loading, doing some adaptions, solving etc...Another case would be even running a script using subprocesses.In both cases neither stderr nor the exit code are set. This is quite drastic...
Have you had an issue running on jpype regarding changes in between 0.75 and >1.0? I did a backward test and with 0.7.5 everything works smoothly as you stated in the original issues (this also holds on macOS)
The text was updated successfully, but these errors were encountered: