Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System freezes (& fan stops!) after same line of gcode #5660

Closed
yonkiman opened this issue Jan 5, 2017 · 34 comments
Closed

System freezes (& fan stops!) after same line of gcode #5660

yonkiman opened this issue Jan 5, 2017 · 34 comments
Labels
Bug: Potential ? Needs: More Data We need more data in order to proceed

Comments

@yonkiman
Copy link

yonkiman commented Jan 5, 2017

Been running 1.1.0 RCBugFix for a few days. Have had no problem with small prints. Now trying to print something larger, and printer is freezing (stops moving, hot end fan stops, stepper motors lose power) on the exact same line of gcode, a few minutes into the first layer. The uC doesn't reset (watchdog fails or isn't enabled) so the printer stays in this 'dead' state as long as power stays on. Restarting Octoprint doesn't wake it up either - a power cycle seems to be the only thing that works.

After it happened twice I disabled DEBUG_LEVELING_FEATURE since it "Requires a lot of PROGMEM!" and recompiled, but it still froze on the same line. And that line is N1762 G1 X-0.432 Y15.215 E4.5184*118.

I shifted the parts on the bed, resliced, and it fails on the same edge of the same shape (N1756 G1 X-3.930 Y8.108 E4.5185*79). It's not a particularly high extrusion value, and E12.1892 worked earlier in the sequence).

Not sure what to look at next...

EDIT: Tried printing just the part it freezes on, printer made it past the trouble spot.

Also verified that I have USE_WATCHDOG defined and WATCHDOG_RESET_MANUAL undefined. Reading the comments about the watchdog: "If you have a watchdog reboot in an ArduinoMega2560 then the device will hang forever". I have a Mega2560, so I'm guessing the uC is crashing, the watchdog is firing, and I'm in the "hang forever" state.

Going to try with M100_FREE_MEMORY_WATCHER enabled.

EDIT: Inserting M100 F before the troublesome line makes the problem go away. Of course. Response is Found 886 bytes free at 0x1C9B.

Might be able to make it print now but want to find root cause...

@shacal
Copy link

shacal commented Jan 5, 2017

Same problem, freezes as out of memory... investigating also.

@manianac
Copy link
Contributor

manianac commented Jan 6, 2017

Post your config files (not c/p please, link them) and around a 10 line snippet of the failing gcode line.

Any serial output with debugging on would be helpful too (M111 S23 should output enough)

@yonkiman
Copy link
Author

yonkiman commented Jan 6, 2017

Embarrassingly I still haven't figured out how to upload my changes to my public github branch, so I put the files here.

EDIT: I'm now going to try to trim the gcode to the minimum that still causes the fail. I removed a huge chunk earlier today and it still failed - maybe I can remove most of what's left.

@yonkiman
Copy link
Author

yonkiman commented Jan 6, 2017

This is all the gcode needed to trigger the crash:

G28 ; home all axes
M420 S1; re-enable bed leveling ;; ***no failure unless bed leveling is enabled***
;M111 S23; enable debug output
;G92 E0
;G1 Z1.200 F7000
G1 X-0.432 Y-61.985 E1.9508 F700
G1 X-0.432 Y15.215 E4.5184
; MARLIN FAILS AFTER RECEIVING THE ABOVE GCODE
G28 ; home all axes (if Marlin has not crashed)

Additional data:

Send: M501
Recv: echo:V29 stored settings retrieved (445 bytes)
Recv: echo:Steps per unit:
Recv: echo:  M92 X160.00 Y160.00 Z160.00 E97.00
Recv: echo:Maximum feedrates (mm/s):
Recv: echo:  M203 X500.00 Y500.00 Z500.00 E200.00
Recv: echo:Maximum Acceleration (mm/s2):
Recv: echo:  M201 X4000 Y4000 Z4000 E4000
Recv: echo:Accelerations: P=printing, R=retract and T=travel
Recv: echo:  M204 P500.00 R3000.00 T1000.00
Recv: echo:Advanced variables: S=Min feedrate (mm/s), T=Min travel feedrate (mm/s), B=minimum segment time (ms), X=maximum XY jerk (mm/s),  Z=maximum Z jerk (mm/s),  E=maximum E jerk (mm/s)
Recv: echo:  M205 S0.00 T0.00 B20000 X15.00 Y15.00 Z15.00 E20.00
Recv: echo:Home offset (mm)
Recv: echo:  M206 X0.00 Y0.00 Z0.00
Recv: Auto Bed Leveling:
Recv: echo:  M420 S0
Recv: echo:Endstop adjustment (mm):
Recv: echo:  M666 X0.00 Y0.00 Z0.00
Recv: echo:Delta settings: L=diagonal_rod, R=radius, S=segments_per_second, ABC=diagonal_rod_trim_tower_[123]
Recv: echo:  M665 L265.00 R109.13 S160.00 A0.00 B0.00 C0.00
Recv: echo:PID settings:
Recv: echo:  M301 P22.00 I2.00 D60.00
Recv: echo:Filament settings: Disabled
Recv: echo:  M200 D3.00
Recv: echo:  M200 D0
Recv: echo:Z-Probe Offset (mm):
Recv: echo:  M851 Z0.20
Recv: ok


Send: M420 V
Recv: echo:Bed Leveling Off
Recv: Bilinear Leveling Grid:
Recv:       0     1     2     3     4
Recv:  0 -4.20 -4.13 -4.09 -4.03 -4.19
Recv:  1 -4.16 -4.25 -4.27 -4.25 -4.08
Recv:  2 -4.13 -4.26 -4.39 -4.31 -4.18
Recv:  3 -4.09 -4.20 -4.26 -4.21 -4.18
Recv:  4 -4.13 -3.99 -4.03 -4.06 -4.32
Recv: ok

@yonkiman
Copy link
Author

yonkiman commented Jan 7, 2017

To be more succinct, this is all the code required on my system to kill Marlin:

G28 ; home all axes
M420 S1; re-enable bed leveling ;; no failure unless bed leveling is enabled
G1 X-0.432 Y-61.985 E1.9508 F700
G1 X-0.432 Y15.215 E4.5184

@manianac
Copy link
Contributor

manianac commented Jan 8, 2017

Could you try a few different things? For instance, remove the E axis moves. Does it still crash? What about the Y axis moves? Very interesting that you have no Z moves and it still crashes with ABL on.

I'm also following an interesting bug on another thread, could you try disabling leveling fade?

@yonkiman
Copy link
Author

yonkiman commented Jan 9, 2017

Well, while G1 X-0.432 Y-61.985 E1.9508 F700 doesn't contain a Z-axis coordinate, when executed my (delta) printer slowly moves down to the build plate. Seems like Z = 0 is implied when not initially specified...

However Z=0 is scraping my bed, so I'm changing that line to G1 X-0.432 Y-61.985 Z1.2 E1.9508 F700 to prevent that. Marlin still crashes after adding Z1.2. Also, F700 takes forever to get to the Z axis, so I'm changing the code so the F700 is on line 4 instead of line 3. Here's the final code that the subsequent experiments will be based on (this code still causes Marlin to crash) :

  1. G28 ; home all axes
  2. M420 S1; re-enable bed leveling ;; no failure unless bed leveling is enabled
  3. G1 X-0.432 Y-61.985 Z1.2 E1.9508 F7000
  4. G1 X-0.432 Y15.215 E4.5184 F700

Trying various F values in line 4:
PASS: 7000 2000 1000 950 925 913 910
FAIL: 700 701 800, 900 907 909

So F <= 909: FAIL
F >= 910: PASS
(So if you're a Beatles fan you can say the problem goes away on the One After 909.)

Next, if I delete the E parameter from line 4 (G1 X-0.432 Y15.215 F700), it passes.

E threshold search (* means passed one time, failed another):
FAIL: 1.0, 1.0000, 2.0, 3.0 3.5 3.8 3.82, 3.83, 3.835, 3.85, 3.87 3.882 3.883 3.8832 3.8833 3.8834 3.8835 3.8837 3.8838 3.8839 3.884* 3.885*

PASS: 5.5184 4.5184 4.0 3.85 3.84 3.9 3.89 3.885* 3.884* 3.9

Numbers with asterisks mean that at first they failed but later they passed. Maybe the threshold drifts depending on previous commands. When a test value caused a crash, the printer has to be reset, so the next test value is starting with a reinitialized system. If a test value didn't cause a crash, I did not reset the printer to save time. Still it's clear that the E threshold seems to be around 3.885.

X value sensitivity on line 4:
PASS at -50
FAIL at -0.432 at +20
not repeatable around +24
PASS at +25, +30, +50

Raw data (* means passed one time, failed another)
FAIL: -0.432 0 20 22 23 24* 24.5* 20

PASS: 50 -50 30 25 24.8 24.7 24.6 24.5* 24.0* (after a few passes) 24.5* (immediately after a reboot)

Hope this helps.

EDIT: Also, I successfully printed 8 objects yesterday, each taking between 20 minutes and 2 hours, with no freezes. It only seems to fail under a surprisingly narrow set of conditions...

@shacal
Copy link

shacal commented Jan 14, 2017

N135978 G1 X47.221 Y128.563 E11.2216*109

And boom...

@yonkiman
Copy link
Author

yonkiman commented Jan 14, 2017

This week I printed about 10 more parts (one at a time) over about 15 hours print time with this exact configuration - no problems. I haven't re-run the bed leveling (changing the leveling matrix) because I think there's a good chance it would make the problem disappear for these coordinates and maybe be impossible to recreate. But I just realized that I could re-run the leveling but not save the matrix - I'll try that when I get home in a few days...

EDIT: I'll also try disabling leveling fade as manianac suggested.

@yonkiman
Copy link
Author

yonkiman commented Jan 16, 2017

Not sure I have leveling fade enabled; there's no #define ENABLE_LEVELING_FADE_HEIGHT in my config as this page indicates there should be.

Added Z0 to M420 command:

 G28 ; home all axes
 M420 S1 Z0; re-enable bed leveling, set fade to 0
 G1 X-0.432 Y-61.985 Z1.2 E1.9508 F7000
 G1 X20.0 Y15.215 E4.5184 F700

Still crashes uC. M420 seems to ignore Z parameter.

Re-leveled bed with G28 / G29. Leveling matrix was different, but still crashed on the above code.

@yonkiman
Copy link
Author

I've been printing a lot over the last month with RCBugFix and this bug did not appear. In fact I forgot this problem even existed...until it crashed again today 10 minutes into a 13 hour print. I rotated the model 90 degrees and resliced, and it froze on the exact same spot on the model. I'm using the RCBugfix changes as of 2/16/2007.

Please let me know if there's anything else I can do to help find the problem.

@Sebastianv650
Copy link
Contributor

Sebastianv650 commented Feb 22, 2017

If the date of RCBugFix is right, you have #5829 included so we can eliminate that as a fault.

@yonkiman, @shacal, is the following true for both of you:

  • You are not using LIN_ADVANCE
  • You are using bed leveling
  • You can repeat the error with an exact gcode combination
  • The error doesn't occur when bed leveling is completely disabled (// commented in the config)?

I thinking about if the cause of this problem may be related to #5699, were bed leveling might also be the cause of the issue..

@yonkiman
Copy link
Author

Correct on all 4 except I disabled leveling in gcode, not config file. Will try that now.

@Sebastianv650
Copy link
Contributor

If it's working when it's disabled over gcode that's enough to prove there is something wrong with bed leveling.

@thinkyhead, who is the right person to name here regarding bed leveling code?

@psavva
Copy link
Contributor

psavva commented Feb 23, 2017

Following

@thinkyhead
Copy link
Member

@Sebastianv650 For this issue, as it causes an obvious crash, it should be fairly easy to narrow down, whether specific to bed leveling or only ancillary to it. I'll do some testing and see if I can locate the root cause.

@thinkyhead
Copy link
Member

thinkyhead commented Feb 26, 2017

M420 seems to ignore Z parameter.

If you add ENABLE_LEVELING_FADE_HEIGHT to your config it will set the fade height from that parameter. As it is off by default, setting it to 0 leaves it unchanged anyway.

#if ENABLED(ENABLE_LEVELING_FADE_HEIGHT)
  if (code_seen('Z')) set_z_fade_height(code_value_linear_units());
#endif

@thinkyhead
Copy link
Member

@shacal Are you also running a delta with bilinear bed leveling?

@thinkyhead
Copy link
Member

@yonkiman @shacal Are you running any of your steppers with a micro-stepping of 32x?

@yonkiman
Copy link
Author

yonkiman commented Feb 26, 2017

Are you running any of your steppers with a micro-stepping of 32x?

Yes, delta with 32x microstepping on all three (of course) axes.

@thinkyhead
Copy link
Member

thinkyhead commented Mar 1, 2017

@yonkiman You're probably overtaxing your 16MHz AVR processor with so many steps-per-mm. Reduce your micro-steps to 16 and the problem should not appear.

@yonkiman
Copy link
Author

yonkiman commented Mar 1, 2017

You're probably overtaxing your 16MHz AVR processor with so many steps-per-mm.

Maybe - it just seems odd that I'm able to print dozens of different objects with hundreds of thousands of lines of gcode but it's just this particular line (and one other that I didn't isolate) that causes a lockup. And why does it fail with slower feedrates (F <= 909) and work with faster rates (F >= 910) - shouldn't a slower feedrate increase the amount of time available to calculate each step?

Reduce your micro-steps to 16 and the problem should not appear.

I'll give that a try tomorrow.

@yonkiman
Copy link
Author

yonkiman commented Mar 1, 2017

@thinkyhead Went to 16 microsteps and it still crashed with the same code:

G28 ; home all axes

M420 S1; re-enable bed leveling ;; no failure unless bed leveling is enabled

G1 X-0.432 Y-61.985 Z1.2 E1.9508 F7000

G1 X20.0 Y15.215 E4.5184 F700

; MARLIN FAILS AFTER RECEIVING THE ABOVE GCODE

G28 ; home all axes (if Marlin has not crashed)

@yonkiman
Copy link
Author

yonkiman commented Mar 1, 2017

I pulled the latest changes from RCBugFix before I compiled. All code (including the config files I'm using) is on my fork.

@thinkyhead thinkyhead added Bug: Potential ? Needs: More Data We need more data in order to proceed labels Mar 1, 2017
@thinkyhead
Copy link
Member

What happens if you insert M400 before the G28 at the end?

@yonkiman
Copy link
Author

yonkiman commented Mar 1, 2017

Still crashes.

@thinkyhead
Copy link
Member

thinkyhead commented Mar 1, 2017

Sketch uses 76,944 bytes (30%) of program storage space. Maximum is 253,952 bytes.
Global variables use 6,846 bytes (83%) of dynamic memory, leaving 1,346 bytes for local variables. Maximum is 8,192 bytes.
Low memory available, stability problems may occur.

With a BLOCK_BUFFER_SIZE of 64, the stack is probably overflowing into the global area. Try changing it back to 32 or 16.

16: 5,142 bytes for local variables.
32: 3,878 bytes for local variables.

@thinkyhead
Copy link
Member

thinkyhead commented Mar 1, 2017

I've re-jiggered the code so that SOLENOID_PROBE is a standalone option, not connected to Z_PROBE_SLED (which is a very specific design). That will make it more clear that it can be used with DELTA, etc. I'll submit that as a PR pretty soon.

Meanwhile I've redone your changes at rc_broken_abl_test (full diff), if you want to test it.

Note that if you simply switch the connections on Z_MIN and Z_MAX pins, you can leave pins_RAMPS.h as it was.

@yonkiman
Copy link
Author

yonkiman commented Mar 1, 2017

With a BLOCK_BUFFER_SIZE of 64, the stack is probably overflowing into the global area. Try changing it back to 32 or 16.

That did it.

Thank you so much for tracking this down. I feel pretty stupid for ignoring the low memory warning - my printer came with a BLOCK_BUFFER_SIZE of 64 (Marlin 1.0), so I just blindly copied it over, forgot about it, and then ignored the warning since everything had been working fine for so long.

I also really appreciate you cleaning up SOLENOID_PROBE.

@thinkyhead
Copy link
Member

W00t! Happy to help. Freezes are almost always a case of buffers / stack / vars getting corrupted, and low memory can certainly lead to that!

@yonkiman
Copy link
Author

yonkiman commented Mar 24, 2017

D'OH! The original problem (with a slightly different gcode causing it) has come up again. It originally would fail with the original G1 X20.0 Y15.215 E4.5184 F700 command when BLOCK_BUFFER_SIZE was 16. When I increased BLOCK_BUFFER_SIZE to 32, the problem went away. However I had since found a second gcode command that would cause it to fail, and this second command still failed after I increased BLOCK_BUFFER_SIZE to 32. So then I increased it to 64, and both known-failing commands started working, so I thought the problem was fixed (though in the back of my mind I knew it didn't make sense that doubling the available RAM for a marginal memory problem wouldn't be enough to fix another marginal memory problem).

But tonight I was printing another object and it failed in the exact same way, but with BLOCK_BUFFER_SIZE = 64.

This is using the build from 22 days ago. I pulled the latest RCBugFix changes into my repo but can't enable bed leveling, so I haven't been able to check with the latest fixes.

This is just a heads-up that I don't think the original fix worked...

@thinkyhead
Copy link
Member

How about a BLOCK_BUFFER_SIZE of 16? As far as we know the GCode parser is in good working order.

@yonkiman
Copy link
Author

yonkiman commented Apr 3, 2017

Thanks. Will give it a shot when I have more time to fix whatever I've done to my repository - can't compile the month-old build I've been using anymore, and haven't been able to make my printer work with the latest RCBugFix changes (I think due to a conflict with the hack I did to make the Z SLED probe work without sliding for my pivot/touch probe and the changes the community did to properly support pivot/touch probing. Maybe in a few weeks.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug: Potential ? Needs: More Data We need more data in order to proceed
Projects
None yet
Development

No branches or pull requests

6 participants