Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serial communications distrubs and does not recover #3917

Closed
Haasje opened this issue Dec 31, 2020 · 8 comments
Closed

Serial communications distrubs and does not recover #3917

Haasje opened this issue Dec 31, 2020 · 8 comments
Labels
approved Issue has been approved by the bot or manually for further processing not octoprint Issue is not on OctoPrint's end

Comments

@Haasje
Copy link

Haasje commented Dec 31, 2020

What were you doing?

This issue has been happening for a while now on different versions of OctoPrint and Marlin.
I start a print from the UI and the printer gets to work. It will run for a random amount of time (can be 20 minutes but can also be hours, sometimes it doesn't happen at all) and the printer just freezes. The heating (bed and extruder) stays on, fans keep going but no movement.

From what I can gather from the serial.log is that a line is send to the printer but either not send correctly or not received correctly, there is no answer from the printer. Octoprint then tries to trigger a response after the timeout. Something weird happens then. The printer answers with an unknown command with a combination of two commands garbled up. Octoprint then sends the next line to which the printer anwsers with a line number error. But instead of resending the right line, octoprint just continues with the next and the next and so on. This is what freezes the printer.

I have the Raspberry and the SKR1.4 turbo board in a custom case. Because I did not want to run a usb cable in my case I connected them using a serial cable. I have octoprint set to use /dev/ttyAMA0. I disabled the bluetooth on the Pi to use the 'good' serial port. It's running at 500.000 kb.
My best guess is that the serial packet somehow gets corrupted by interference from the stepper wires or something but the recovery failes because octoprint just keeps sending the next line instead of sending the requested line.

I have tried the same thing in safe mode and it also happens there. The model also has no influence cause it happened of several different prints/models. I reprinted the same gcode and it froze at different points in the print.

I have included the serial.log and the octoprint.log of the latest failed print.
I also included a textfile with extracts of previous failed prints but these are not complete files, just the error parts of the serial.log.
https://pastebin.com/p7CUj1Nm

If you need any additional information, just let me know.

What did you expect to happen?

I expect octoprint to recover by sending the right line to the printer.

What happened instead?

Octoprint just continues with sending the next line and the next and so on without regard to the answer which causes the printer to freeze and the print to fail.

Did the same happen when running OctoPrint in safe mode?

Yes

Version of OctoPrint

Octoprint 1.5.1
Python 2.7.16
OctoPi 0.17.0

Operating System running OctoPrint

OctoPi 0.17.0 on a Raspberry 4

Printer model & used firmware incl. version

Creality Ender 3 pro
BigTreeTech SKR 1.4 Turbo
Firmware 2.0.6.1 self compiled
BigTreeTech TFT32 E3 self compiled

Browser and version of browser, operating system running browser

Google Chrome version 88.0.4324.50 beta

Link to octoprint.log

https://pastebin.com/w4cyKmGr

Link to contents of terminal tab or serial.log

The file is to big to upload to gist or paste.bin so I uploaded it to google drive.
https://drive.google.com/file/d/1s0oRjx1DtgQ5OGW29Xx007LzgBZv2-T5/view?usp=sharing
Error starts at line 527843

Link to contents of Javascript console in the browser

Not applicable I think

Screenshot(s)/video(s) showing the problem:

Not applicable

I have read the FAQ.

@github-actions github-actions bot added the triage This issue needs triage label Dec 31, 2020
@GitIssueBot GitIssueBot added the approved Issue has been approved by the bot or manually for further processing label Dec 31, 2020
@warrengray
Copy link

warrengray commented Jan 2, 2021

Seeing similar behaviour on with OctoPi Version 0.17.0 (OctoPrint 1.5.2), running on Raspberry Pi 3 Model B Rev 1.2 using default settings.

The printer runs just fine for an arbitrary period of time (in my case it's always been beyond an hour) and then the print halts and the printer is left in the heated state. Terminal logs indicate a SerialException and OctoPrint continues through the GCode at a rapid pace.

Printer is a MonoPrice Maker Select V2 running Repetier. I haven't seen this enough to track down where it's happening, but the issue sounds similar.

@cp2004
Copy link
Member

cp2004 commented Jan 3, 2021

@warrengray SerialException errors are a different problem to the one reported above, I haven't had time to analyse it but it is related to timeouts rather than that problem. Please look for support on the community forums or discord server.

@Haasje Thanks for access to the serial.log, I've had a look, and the snippets provided tell most of the story.

It appears to be that the firmware is missing resend requests. When the communication breaks down, the printer misses what looks to be part of the line, it complains and says the last line was X, but there is no request to resend. I'm not an expert in firmware communication protocol, but I think this is a bug in Marlin in this case. You could try updating again (2.0.7.2, or bugfix for the latest code) since this may have been fixed. This is a snippet, which is repeated many times.:

2020-12-30 16:19:37,503 - Send: N108288 G1 X126.71 Y126.998 E1793.94096*89
2020-12-30 16:19:38,565 - Recv:  T:244.37 /245.00 B:70.00 /70.00 @:101 B@:51
2020-12-30 16:19:41,565 - Recv:  T:245.00 /245.00 B:70.00 /70.00 @:23 B@:42
2020-12-30 16:19:44,569 - Recv:  T:245.00 /245.00 B:70.00 /70.00 @:76 B@:44
2020-12-30 16:19:44,581 - Communication timeout while printing, trying to trigger response from printer. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves.
2020-12-30 16:19:44,585 - Send: N108289 M105*29
2020-12-30 16:19:44,592 - Recv: echo:Unknown command: "108288 G1 X126.N108289 M105"
2020-12-30 16:19:44,595 - Recv: ok
2020-12-30 16:19:44,601 - Send: N108290 G1 X126.502 Y126.924 E1793.94673*107
2020-12-30 16:19:44,605 - Recv: Error:Line Number is not Last Line Number+1, Last Line: 108287
2020-12-30 16:19:47,568 - Recv:  T:245.00 /245.00 B:70.00 /70.00 @:70 B@:49
2020-12-30 16:19:50,568 - Recv:  T:245.00 /245.00 B:70.00 /70.00 @:88 B@:78
2020-12-30 16:19:50,579 - Communication timeout while printing, trying to trigger response from printer. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves.
2020-12-30 16:19:50,587 - Send: N108291 M105*20

There is no resend request for line 108288 as there should be, and no OK either. Not much that OctoPrint can do here, since it hasn't got a definitive response from the firmware.

I'll leave this open for now, since there may be someone else with better insight. Please try and update your firmware in the meantime, and let us know if this fixes the problem.

@Haasje
Copy link
Author

Haasje commented Jan 3, 2021

I will update the marlin firmware to see if that brings anything.
I don't agree that there is nothing OctoPrint can do. Marlin send back an error: Line Number is not Last Line Number + 1, last line 108287. Should this not be a trigger for OctoPrint to send line 108288 instead of just blindly going forward and receiving the same error over and over?

@cp2004
Copy link
Member

cp2004 commented Jan 3, 2021

I believe the firmware is supposed to send a resend request, rather than just an error. I'll have to see if I can find an example of what it's supposed to look like, but there should be another line like Resend: 1000. Since resend requests can happen at strange places, due to buffering etc. I've seen them happen out of usual 'sync', and in that case it picks up just fine.

Like I said, I'm not the expert here, but we'll have to wait for at least next week to get official word on this 😉.

@foosel
Copy link
Member

foosel commented Jan 4, 2021

The protocol requires a resend request to be sent, the error line is merely of informative nature and the messages therein are not even remotely standardized. Unless there's a resend request, OctoPrint won't and must not resend. If the firmware doesn't send resend requests on communication errors it detects, it is buggy and needs to be fixed.

@foosel foosel added not octoprint Issue is not on OctoPrint's end and removed triage This issue needs triage labels Jan 11, 2021
@foosel
Copy link
Member

foosel commented Jan 11, 2021

After another look at the logs in question now that I'm back in office, yes, this is indeed a firmware issue. It should be sending a resend request, it doesn't, and without a resend request the communication can't recover. Closing as this is nothing that can be fixed from OctoPrint's side.

@foosel foosel closed this as completed Jan 11, 2021
@Haasje
Copy link
Author

Haasje commented Jan 11, 2021

I will update the firmware to the latest version this week to see if that fixes anything and if not, I will open a request on the Marlin github.

Thank you for your time!

@GitIssueBot
Copy link

This issue has been mentioned on OctoPrint Community Forum. There might be relevant details there:

https://community.octoprint.org/t/print-freezes-due-to-checksum-mismatch/31425/6

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Issue has been approved by the bot or manually for further processing not octoprint Issue is not on OctoPrint's end
Projects
None yet
Development

No branches or pull requests

5 participants