New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Octoprint just stop mid print #2677
Comments
The interesting thing in the screenshot is that it shows off in the table but on the graph it's still showing a target temperature. Which is weird since the data for both comes from the same source. Anything in the browser's error console? Does the |
Oh right, this time I didn't close the browser so here are the two errors showed in the console :
it says that connection with eventsource and websocket was lost during the page load And yes, the serial.log just stop here, with nothing else |
What happens when you reload the page in this mode? If anything happens at all that is. It feels a bit like something might be crashing, but in your |
Except for the last time, the server is responding and the UI is loading fine But I can't disconnect from the printer and I can't set temperatures |
I tried, again, with the same Gcode, and it still hangs (but never at the same spot) Browser console (as soon as I refresh the page, not before) : Last lines of Serial.log :
Last lines of octoprint.log :
I've truncated the begining of the logs but there really is nothing else important, they're just like the previous full log I uploaded earlier in this thread I tried to make some actions on the UI (setting temps, moving axis, manually send GCode...) but nothing works. Actually, it seems that the server, for whatever reason, stop responding but stays active ( = still sends his heartbeat) Note : on my first message I said that server was still OK after a forced refresh - that was true, but maybe that was a bug of the crash ? :D |
well, I finally tried in safe mode and the print went fine I'll investigate more deeply my plugins, begining with Octolapse because it's the only one witch mess with the GCode while printing, but the last octolapse log I got has nothing weird inside) |
Octolapse at least is the only plugin that I so far know of that utilizes the |
Yeah, that's why I think there should be an obscure race condition involved here with Octolapse (and maybe with this particuliar GCode, because until now I didn't have any problem) As recommended in issue #2424 I ran octoprint in a screen to catch any exception but nothing...
For now I will just disable Octolapse ( 😢 ) and keep tuned on future updates |
This could have something to do with Octolapse for sure, though i would expect the print to continue after a timeout period has expired (it's a few minutes). You might want to try sending an M114 through the console when it locks up. If printing resumes it's probably caused by Octolapse. I should have a chance to take a look at this after I get back from vacation. @gege2b, if you don't hear from me by July 1, please send me a message to remind me about this, or open an issue in the Octolapse repositiory where I'll be sure to see it. |
Hey hi I also waited half an hour (to see if server was still sending heartbeats) and still locked I keep up to see the progress, have a nice vacations! :) |
Just for the record, I discovered the M85 Gcode witch act like a watchdog and turn off the printer in case of inactivity I've added this to my start GCode with a timeout of 5 minutes, just to be sure |
OK so today I ran into this problem BUT octolapse was disabled... (I mean really disabled, from the plugin manager,not from the ON/OFF button in octolapse) Unfortunately, I disabled serial logging thinking the problem wouldn't occurs anymore, but here are the last lines of the terminal (I snipped the begining because really there was nothing weird)
M85 saved the day here :) @foosel just a question, can the "!" in my trigger GCode for octolapse actually break something ? Even if octolapse was disabled, I was printing a Gcode where my trigger word was inserted ( Interestingly, there was a lot more triggers before, but I don't know why this particular one seems to have made octoprint to lost his mind (this was about 15% to the end of the print) Right now I'm stripping all the triggers from this GCode and will run another print to see if the problem still occurs |
That might be the reason. "At commands" (like |
Seems legit. So, I tried with two GCode variant :
My interpretation is that's my snapshot commands are probably the culprit here, instead of only Octolapse For now, I'll use a "M118" command, so Marlin and Octoprint shouldn't complain I'll keep this thread updated according to the results |
I have experienced this same issue. I can provide logs.. sample code.. I did try to fiddle with the exact command to trigger snapshots and it didn't seem to make much difference. An M118 or an @snap produced the same result. When it does "hang", it hangs hard. I can't always recover without a reboot of the raspberry pi. |
in my case, a restart of the octoprint service is sufficient |
@gege2b, try updating Pip and reinstalling Octolapse. I have been able to get it working on 1.3.9 rc2 after updating, but I too am now seeing this hard-lock issue. To double check, I downgraded to 1.3.8, and the issue went away, or was greatly reduced. I have noticed that when Octoprint locks up it seems to occur after I call set_job_on_hold(False). Here is a simplified version of the code I'm using:
So I dug a bit deeper into the set_job_on_hold function and found that it is hanging when calling _send_from_job_queue from the _continue_sending function. Within _send_from_job_queue, it attempts to call self._sendCommand, which cannot acquire the _sendingLock, and it just stalls there. Here is the exact place where the killer stall seems to occur with comm.py:
I will keep playing with it over the next few days to see if I can come up with any additional info. I'll also see if I can figure out which commit caused this issue, but that's a slow process so it might take me a while. |
I've got confirmation from a few users now that downgrading to 1.3.8 fixes the issue, at least when using the devel branch of Octolapse. I'll update again if I learn otherwise. |
Great progress !! |
Sounds like a dead lock here, but of course one that doesn't always trigger. I did have to change some things there to get the script holding to work, and also did some things to remove the race condition on pause/cancel, so maybe one of these changes introduced this issue. |
I would be happy to help with any testing. |
I'm not sure this will help but I've been debugging for a long time and need to tell somebody what I've been seeing :) Ive been digging into the comm.py file, logging and watching threads get and release locks, and I can tell you that the typical pattern is this: Octolapse calls set_job_on_hold(True) at some point, then calls set_job_on_hold(False). At some point _handle_ok is called on the main thread (not the one running Octolapse) and it acquires the sending lock. It will never release this lock. Now the thread running Octolapse picks up and is running _continue_sending and makes it all the way to _sendCommand, where it is waiting to acquire the lock which is not released by _handle_ok. Meanwhile the main thread is also running the _continue_sending function but stops executing once it runs out of items in the _send_from_command_queue function (within the except queue.Empty) and returns false . For some reason the logs just stop there, even though I can see no reason why I wouldn't be seeing evidence that other functions within _continue_sending are executed (I added logging to practically every line). I have checked and double checked and can't see what I'm missing (but will keep looking). This might not be useful, but if I remove the sending lock from _handle_ok, things start to work. Is is possible that a _sendCommand from one thread is being handled by _handle_ok from another, before the lock is released? Also, it's worth noting that the UI continues to respond even after the lockup, and I can see logs from the UI being generated. It seems to be the comm library is locking and that's all. |
Wow, seems to be a beast hidden in the shadow :wow: About the UI still responding, I think the only things goind bad are all commands to/from the printer (and this make sense given what you found here) As mentionned earlier, the problem also occured once while Octolapse was disabled, but it's almost always happens when it's enabled So maybe the problem isn't octolapse-only, but it is a catalyst at least |
I'm trying to wrap my head around how _handle_ok can lock here and am so far failing. The only way that could happen as far as I understand is if the call to continue_sending would keep on looping. It should exit the loop however since the job_on_hold flag gets set. Unless that locks (which I'm unsure right now if it can). My guess is, this is another of these cases where the fix will be one line at most, which will take ages to figure out however 😕 |
I agree. I'm going to switch gears and see if I can figure out which commit started this issue. |
My money would be on d124efd since that introduced the lock inside _handle_ok to work around a race condition under specific resend scenarios. |
you both rocks ! that's being said, if I can be of any help by testing/logging/making some ritual sacrifice to dev gods, feel free to ask |
Good news I think! I managed to repro the issue (after upgrading my Octolapse install to the devel branch, for some reason I couldn't get it to run at all with 1.3.9rc3.dev otherwise - my fault? Do I need to fix anything there @FormerLurker?) and I can say, it doesn't look like that stalling after the
If push comes to shove, you could ritually prepare a plate of fried calamari rings and sacrifice them to your stomach, that will threaten the OctoPrint mascot and maybe force it into compliance ;) |
I'm not sure what's going on with the current release and 1.3.9 x, but I will take a look ASAP, probably tomorrow. When you think you have everything else working you might want to test cancelling the print while Octolapse is taking a snapshot. It's easy to do if you crank the camera delay up to 5000MS. I find this to be a very useful test myself. |
The issue with 1.3.9x might also have been caused by my current config, I'm not sure. It simply didn't start - interface was there, config made sense, but it would neither show a snapshot nor do anything on start of a print. Then I switched to the devel branch and it worked as expected. I haven't yet had a chance to dig deeper, BUT I've now hopefully found a different solution for #2632 which will allow to |
Our original one causes dead locks as reported in #2677
Ok, so, I haven't been able to trigger any deadlocks with the current state on I also tried cancelling multiple times with a snapshot timeout of 5s and so far could not trigger any weird issues, dead locks or stalls either. I got briefly confused when I noticed that temperature commands where no longer sent from the job, but then noticed that I had selected "Full Diagnostic - Test Mode" ;) Since I've now been exclusively looking on this code for the past couple of hours and am not seeing the forest due to all the trees by now, "third party test" are highly welcome (regardless of outcome ;)) |
I got the same problem too (mentionned that on the feedback issue for rc1 & 2) I did a test print right now with 1.4-dev and 0.3.2 and same problem
|
I haven't merged those changes yet to 1.4.0 aka devel, they are so far only available on staging/maintenance, so I'm not surprised you are still seeing the same problem there. |
So far I haven't encountered any deadlocks or stalls! Cancel seems to work. Cancel when paused seems to work. Cancel during a snapshot seems to work. I tried with and without start/end gcode in Octoprint, and that worked as well (hooray!). I'm running a longer file through the virtual printer, then later today I'll try it out on my prusa and see how things go! |
Hey, I'm having a bit of trouble after updating my octopi instance. The OctoPrint version is showing as 0+unknown, which triggers some version checking within Octolapse to prevent print start. I installed this way:
But maybe there is a better way, or a way to change the current version number? |
For development versions best use a git checkout. The direct pip install has - as you noticed - issues resolving the correct version (since it depends on the git history on the development branches). You could also try the correct git+https url, but to be honest I can't get that together from the top of my head and I'm not 100% sure it will work. edit Seems to be |
Yes, the checkout worked as expected. My first very small test print finished as expected! I'll do a bit more testing and will get back with you. |
A longer print (about an hour) finished without issue! I've got lots of testing to do, and will report any additional issues immediately, but it's really looking good! |
I'll dare to be carefully optimistic ;) |
Just for the record (although slightly off topic):
I can no longer reproduce this with 0.3.1 after nuking the config an reinstalling it seems, so it might just have been my local configuration being wonky in some way. |
I just had this happen today where the print stopped at 50% (approx) and server locked up but heatbed and hotend still going. I am now running in safe mode to see what happens. Does it matter if octolapse is turned off (not removed)? First print after insta;;ing 1.3.9rc2. |
@GeekDad63, disabling octolapse and removing it should be roughly equivalent. Were you running octolapse when it locked up? If so, the issue may be resolved in the next Octoprint rc, and with the devel branch of octolapse. |
Tentatively marked as solved |
Fixing commit is now also available on |
Ahh, I ran into this issue when I was having issues with the 3.3.0 prusa firmware and running 1.3.9rc2. I didn't realize it was unrelated. I'll snag 1.3.9rc3 and give it a go! |
Did a real condition test print and all went fine with rc3 and octolapse dev |
Just realized that I forgot to close this now that rc3 (and in a couple minutes even rc4) is out. |
What were you doing?
Note : I've looked at similar issues first and found #2647 but the difference with my problem is that Octoprint server was still up (didn't try to log on SSH but forced refresh worked)
Maybe it's a different issue, maybe it's a variant, I don't know, feel free to close if needed
It's a really random stop, I just run into that problem once as long as I remember, and maybe I'll never face it again...
Just in case, here is the GCode where the problem occured (I have sucessfully printed these objects before, but not this particular file)
20180607-154849-PhoneSupport_Base_0.2-15%.gcode.zip
What did you expect to happen?
Print to complete without problem
What happened instead?
Printer stopped
Also, temperature graph was showing "Off" in both the "actual" and the "target" column but the bed and the hotend was still heating
As far as I know, the only state when this is shown is when Octoprint is disconnected from the printer
Did the same happen when running OctoPrint in safe mode?
Not tried
Version of OctoPrint
Operating System running OctoPrint
raspbian strech 9.4 on rpi3
Printer model & used firmware incl. version
prusa mendel with marlin 1.1.8
Browser and version of browser, operating system running browser
firefox 60 on windows 10
Link to octoprint.log
octoprint.log
Link to contents of terminal tab or serial.log
Terminal tab
Serial logging wasn't enabled :-(
Link to contents of Javascript console in the browser
Unfortunatly I unexpectedly closed the browser before copying, but there was only an error about the connection to websocket, I don't know if it's related to the issue
I have read the FAQ.
The text was updated successfully, but these errors were encountered: