Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCS 2.0.0 Occasional hangs during printing #119

Closed
gtjoseph opened this issue Apr 17, 2020 · 18 comments
Closed

DCS 2.0.0 Occasional hangs during printing #119

gtjoseph opened this issue Apr 17, 2020 · 18 comments

Comments

@gtjoseph
Copy link
Contributor

gtjoseph commented Apr 17, 2020

Original report but see below...

You might want to put in the documentation somewhere that running DCS with debug logging from systemd is NOT a good idea for real prints. I was just printing a long running job and noticed that the print was pausing every few minutes for up to 30 seconds with the nozzle still in contact with the part. I realized that at the same time, I was seeing this message in the journalctl -f...

systemd-journald[3401]: Suppressed 6463 messages from duetcontrolserver.service

I think it was flushing the journal buffer to disk and blocking stdout while it was doing it. That in turn blocked any threads that generated output.

Not much you can do about it other than documentation I think.

@gtjoseph
Copy link
Contributor Author

https://forum.duet3d.com/topic/15674/rrf-3-01-rc8-dwc-2-1-3-dsf-2-0-0-mid-print-hanging

More hangs running DCS in foreground without debug.

@gtjoseph gtjoseph changed the title DCS Pre-2.0.0 Minor documentation/implementation issue DCS 2.0.0 Occasional hangs during printing Apr 18, 2020
@chrishamm
Copy link
Contributor

In fact I never mentioned or instructed people to change their systemd service to run DCS with the log level debug. Even the README provides instructions to run DCS in standalone operation without systemd when debug logs are supposed to be created.

The hang may be related to a deadlock that I've fixed in the upcoming DSF 2.1.0 but we'll need to confirm that once it has been released (with a bit of luck later today).

@gtjoseph
Copy link
Contributor Author

I'll be testing shortly.

@gtjoseph
Copy link
Contributor Author

gtjoseph commented Apr 20, 2020

Not fixed with DSF commit d46965d
and RRF commit 56c8e1d583d7f17e5c4266c7ce8cc227d40e3bdd (SBC improvements)

Still hanging for many seconds. No messages in DCS log.

EDIT: This is without Chrome and my video feed running. 95% Idle.

@chrishamm
Copy link
Contributor

Hmm, do you still see "System time has been changed" in the logs when it hangs? If yes, please have a look at your config.json and enable the deadlock detection (set MaxMachineLockTime from -1 to 2000). I wonder if that will reveal anything useful.

@gtjoseph
Copy link
Contributor Author

No this time I saw no messages at all while printing except for normal ones...

[info] Selected file /opt/dsf/sd/gcodes//Mask7.gcode
[info] Starting file print
[info] Starting macro file start.g on channel File
[info] File: Finished macro file start.g
[info] Starting macro file /macros/start_print.g on channel File
[info] File: Optional macro file tpre0.g not found
[info] File: Finished intermediate macro file tpre0.g
[info] File: Finished macro file tpre0.g
[info] File: Optional macro file tpost0.g not found
[info] File: Finished macro file tpost0.g
[info] File: Finished macro file /macros/start_print.g
[info] Starting macro file /macros/end_print.g on channel File
[info] File: Finished macro file /macros/end_print.g
[info] Finished job file
[info] Finished printing file 0:/gcodes//Mask7.gcode, print time was 1h 51m

@chrishamm
Copy link
Contributor

Hmm, did you start seeing this problem after commit d1c7875? If yes, I suspect that the I/O read call blocks from time to time. Can you - just for testing - move your G-code file to /tmp, create a symink in /opt/dsf/sd/gcodes to it and check if the hangs still persist?

@gtjoseph
Copy link
Contributor Author

It could have been around that time. I run from a USB3/nvme SSD but I'll try from /tmp.
I'm going to have to start doing "air" prints for this. Using to much filament. :)

@gtjoseph
Copy link
Contributor Author

Got a pause even while printing from tmp. Strangely, it happened at exactly the top of the hour and I got the 'System time" message....

Apr 20 15:00:13 duet3j.f5.int DuetControlServer[29699]: [info] System time has been changed

@chrishamm
Copy link
Contributor

I just managed to reproduce this once. Did that same message occur without an exception after setting MaxMachineLockTime to a different value (e.g. 2000)?

@gtjoseph
Copy link
Contributor Author

I didn't set it the last time. I'll set it now and try again.

@gtjoseph gtjoseph reopened this Apr 20, 2020
@gtjoseph
Copy link
Contributor Author

Hit the wrong button. :) Testing with MaxMachineLockTime = 2000.

@chrishamm
Copy link
Contributor

I'm about to push another commit, I hope I have figured out why the hangs occurred. Still finishing a few tests though..

@gtjoseph
Copy link
Contributor Author

OK. FYI... Just hung with MaxMachineLockTime = 2000. No messages or backtraces. Not even the "System time" message.

@chrishamm
Copy link
Contributor

OK, I've just pushed another commit. I haven't been able to reproduce the hang with this one so please let me know if you notice anything.

@gtjoseph
Copy link
Contributor Author

testing in progress...

@gtjoseph
Copy link
Contributor Author

My test print went through without any pauses so I think it's good to go!

@chrishamm
Copy link
Contributor

Great, thanks once again for testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants