Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Crash Detection #8765

Open
psavva opened this issue Dec 12, 2017 · 61 comments
Open

[FR] Crash Detection #8765

psavva opened this issue Dec 12, 2017 · 61 comments
Labels
F: Trinamic T: Feature Request Features requested by users.

Comments

@psavva
Copy link
Contributor

psavva commented Dec 12, 2017

Dear Marlin Devs,

This is a Feature Request to allow a Print to re-home once TMC2130 (and similar) detects a missed step (which is reported to the driver)

The idea is that when the TMC drivers detect missed steps, the printer will automatically re-home the X or the Y, depending on which axis skipped a step.

This will be extremely useful to recover what would have been a failed print due to any skipped steps.

@teemuatlut do you think you could do this?

@teemuatlut
Copy link
Member

It's mostly a matter of Marlin reacting to a triggered endstop signal on moves that are not deliberate homing moves. But since I personally use stealthChop all the time, it's a low priority for me. Maybe I'll look into it when I get more into playing with the TMC2660 drivers.
Just don't expect this to happen anytime soon.

@DavidBjerreBjoerklund
Copy link
Contributor

I agree, this would be an awesome feature and save great amounts of plastic.

@alexborro
Copy link
Contributor

alexborro commented Dec 12, 2017 via email

@thinkyhead thinkyhead added the T: Feature Request Features requested by users. label Dec 13, 2017
@psavva
Copy link
Contributor Author

psavva commented Dec 13, 2017

@alexborro, Do you have any reference documentation/datasheets showing how the driver reports how many steps were missed?

As far as I know, it's impossible to determine how many steps were missed unless you have a closed loop system.

@alexborro
Copy link
Contributor

alexborro commented Dec 13, 2017 via email

@psavva
Copy link
Contributor Author

psavva commented Dec 14, 2017

@alexborro Do I understand correctly that dcStep is only supports full steps? Ie, it cannot detect microsteps skipped, but full steps only? We lose resolution in this case.

I also see that under certain conditions, it may loose positional accuracy when stalling, and when that happens stallguard2 is activated.

It seems that re-homing a stalled axis would be beneficial in both cases, with and without dcstep.

Did you manage to implement anything in Marlin from your research and build?

@teemuatlut
Copy link
Member

teemuatlut commented Dec 14, 2017

I've looked into a bit more and it is possible to use microstepping along with dcStep. However, it's likely this line on page 74 was what initially made me not consider it more throughly: dcStep operates the motor in fullstep mode at the target velocity or at reduced velocity if the motor becomes overloaded
Fullstep operation is also hinted at p77 when vhighfs and vhighchm are required for dsStep: As soon as VDCMIN becomes exceeded, the chopper becomes switched to fullstepping

@alexborro
Copy link
Contributor

alexborro commented Dec 14, 2017

@psavva in dcStep mode the driver drives the motor like a DC motor.. there is no more "steps", although the driver still keeps the current position in the internal step table. The LOST_STEPS will have the microstep resolution, so it can be used to further send the lost microsteps.

Although you can re-home the axis as well in case of lost steps, I think the best approach will monitor the motor torque (using stallGuard2 feature) and avoid missing steps. I need to read the manual again to check if the driver can provide how many steps was missed if not in dcStep.. I guess it cannot.

I have the TMC2130 development kit from Trinamic, I just need to set it up and make somes tests.. and make it worth the 200 bucks I spend on it!!!!

Cheers.

Alex.

@AnHardt
Copy link
Member

AnHardt commented Dec 14, 2017

Is there a way to lose microsteps? I don't think so.
With an external moment you can displace the rotor by microsteps. But if the moment is gone it will return to its place. If the moment is too high the motor can't snap in again before it turned at least 4 full steps. But when it will snap in and the moment is away, then it will have the correct microstep (not full step) again.

Ah i see (18.3.1). They have an other definition of lost (delayed) (micro) steps.

@lonelymyp
Copy link

lonelymyp commented Dec 14, 2017

In the Prusa MK3 this function is already implemented (will be implemented), so there is no reason why it should not be done.
here's a video with a running feature
https://www.youtube.com/watch?v=sPvTB3irCxQ&feature=youtu.be&t=4m12s

@psavva
Copy link
Contributor Author

psavva commented Dec 15, 2017

@alexborro, I can confirm that the skipped steps are only recorded in the dcStep mode.

On page 30 of the datasheet:
LOST_STEPS -
Number of input steps skipped due to higher
load in dcStep operation, if step input does
not stop when DC_OUT is low. This counter
wraps around after 2^20 steps. Counts up or
down depending on direction. Only with
SDMODE=1.

I think the best option would be to configure StallGuard and in case of a stall, re-home.

This would minimise skipped steps due to StallGuard,, and finally if a step is skipped due to a mechanical reason, then re-home and continue...

@teemuatlut,
I wanted to attempt the implementation, but this would be my first time in C and at this level... I will need help :)

I'm a Database programmer using Oracle...

@psavva psavva closed this as completed Dec 15, 2017
@psavva psavva reopened this Dec 15, 2017
@psavva
Copy link
Contributor Author

psavva commented Dec 15, 2017

Oops..

@Hywelmartin
Copy link

For a while ago I managed to hear when my printer skipped a step (or a lot of steps). I paused, rehomed and then continued the print. It finished nice ....

One thing is that the babysteps got lost because I rehomed...

@thinkyhead
Copy link
Member

no reason why it should not be done.

It's cool that vendors who have very specific architectures are implementing this for their machines. Just be aware that for the main fork we must implement it in a way that works for Cartesian, CoreXY, Delta, and SCARA.

@dedalodaelus
Copy link

@teemuatlut this control of the lost steps could be placed this on the monitor_tmc_driver() function on tmc_utils.cpp?

@teemuatlut
Copy link
Member

No it needs to be a reaction to the stallGuard signal.

@Robotec101
Copy link

Hi, taking a look into your comments and the working of the mk3 its obvius that in the mk3 the printer is put on hold by the stallGuard signal and executes a macro consisting in raising Z, rehoming X and Y and continuing the printing.

I´m trying to do the same on a RAMPS 1.4 inyecting the OR gate of the X and Y signals of the 2 drivers into a interrupt pin, then i have to pause the printing and execute the macro mencioned above.

The problem comes when i have to make an interrupt in marlin , i dont know how it will affect other subsystems and how to execute a gcode macro in the coding, im looking at the code but its huge at first glance , and i was looking for something relatively easy to avoid layer faults.

In short i need a way to make an interruption rutine that execute gcode or sets a flag to execute it without disrupting timers and etc, any idea its welcome.

@Roxy-3D
Copy link
Member

Roxy-3D commented Apr 29, 2018

In short i need a way to make an interruption rutine that execute gcode or sets a flag to execute it without disrupting timers and etc, any idea its welcome.

If the signal is latched (not a pulse... does not go away...) you can do it similar to how the filament run out sensor does its work. No interrupt needed. The signal is checked in the manage_inactivity() code and action is taken if the signal shows up...

class FilamentRunoutSensor {
  public:
    FilamentRunoutSensor() {}

    static void setup();

    FORCE_INLINE static void reset() { runout_count = 0; filament_ran_out = false; }

    FORCE_INLINE static void run() {
      if ((IS_SD_PRINTING || print_job_timer.isRunning()) && check() && !filament_ran_out) {
        filament_ran_out = true;
        enqueue_and_echo_commands_P(PSTR(FILAMENT_RUNOUT_SCRIPT));
        stepper.synchronize();
      }
    }
  private:
FORCE_INLINE static bool check() {
  #if NUM_RUNOUT_SENSORS < 2
    // A single sensor applying to all extruders
    const bool is_out = READ(FIL_RUNOUT_PIN) == FIL_RUNOUT_INVERTING;
  #else
    . . .
  #endif
  return (is_out ? ++runout_count : (runout_count = 0)) > FIL_RUNOUT_THRESHOLD;
}

@teemuatlut
Copy link
Member

Doesn't work that way.
Think of a full bed sized rectangle being printed without bed leveling (so movements don't get split). Each side of the square is then one block in the planner queue. With a typical queue size of 16 you'd get 4 full rounds of the print before reacting to anything. Even more if you enqueue the actions to the end of the gcode buffer.
Even if you pause the print when the endstop (stallGuard) is triggered, you still can't enqueue the commands because you still have a full queue movements before.
If you discard the full queue to execute your commands, you also lose all those commands and you no longer know the position where the skipped step happened so you could return to that point.
You also shouldn't execute the motion block to the end where the skipped step happened. Again, if that particular motion is a long one, it'll definitely show in the print.
Then if you manage to pause the print, halt the queue, execute raise Z, home XY, return to XY step where the skip happened, lower Z. Then you can't just return to executing the interrupted block because it likely was in nominal rate state of the motion. You need to recalculate the block with new acceleration and deceleration ramps.

So there are many more things to consider than just if this then insert this.

@thinkyhead
Copy link
Member

thinkyhead commented Apr 30, 2018

If you discard the full queue to execute your commands, you also lose all those commands

It's true! If steps get lost, the firmware has to:

  • save the command queue in a secondary buffer,
  • clear the command buffer (head = tail),
  • wait for the planner to run out its moves (stepper.synchronize),
  • save the current_position in a temporary variable,
  • execute the G28 XY (delta: G28)
  • move back to the old current_position,
  • put the saved commands back into the queue, and
  • carry on with the print job.

@Robotec101
Copy link

So there are many more things to consider than just if this then insert this.

Yeah, i was afraid of that and it was the reasoning behind my post but as well, if it was so easy someone would have done it already.

Roxy-3D, thanks for the tip, im looking at the handle_filament_runout routine and it looks that it takes the solution of putting the necesary gcode in the gcode queue to solve the problems described by teemualut.

well, looking at the aviable options(without taking into account redeveloping the motion planner) the easy way it to put the gcode on queue like that routine does, For a first implementation enqueue_and_echo_commands_P should put the homing as the first action in the planner, it will drain other commands but for a first aproach i think its the most simple solution.

The right one by looking at he code and traspasing what thinkyhead was saying would be (correct me if im mistaken)

-Cloning command_queue and current_command into a variable.
-clear command_queue(all empy? or 0'?).
-stepper.synchronize();(pauses the print in conjuction with the buffer clear,clearing remaining steps.)
-save current position.
-insert G28 XY as the current command then G1 saved position and add the saved commands to the command_queue ,problem here its that the queue would be 2 commands longuer than the maximum.
.Resuming the job-would be automatically done right?

@teemuatlut
Copy link
Member

Due to the issues I described you can't use the same method as in filament runout.
A simple solution is not a sufficient one.

The way I've been thinking about it is

  • In the ISR:
    • Save current step count, steps left, etc to store current position as well as target position.
    • Set a flag
    • Disable interrupts so we can populate secondary queue outside of ISR
    • Discard current block
  • Target secondary queue to add
    • G91
    • G1 Z10
    • G28 XY
    • G1 skipped_position_XY
    • G1 Z-10
    • G90 (if absolute when triggered originally)
    • G1 target position
  • Enable interrupts to begin executing the secondary buffer
  • Some condition to unset flag and tell Marlin to target the original planner queue
  • Continue with original queue from block after the skipped step block

I don't think we can use stepper.synchronize as that will clean out the planner queue and execute blocks when we especially don't want to do that.

@Roxy-3D
Copy link
Member

Roxy-3D commented Apr 30, 2018

Also... Consider this: Homing is a very dangerous operation if there is a partial print on the bed. If on your printer you home to Z-Max, that will help a lot. But in general, most printers home to Z-Min and there is no way to know what is in the way of the Z-Axis homing operation.

@stevereno30
Copy link

This is true, but in my experience, skipped steps is not an issue with z position or z motion but with x or y motion. With this in mind, the printer could raise z 5mm, home x and y, return to the expected xy position, drop z 5mm and continue printing. There would be no risk of crashing unless the printer was printing multiple parts in sequence.

@thinkyhead
Copy link
Member

@teemuatlut — I realize that to be 100% thorough and respond as quickly as possible you would need to halt the stepper ISR and copy all the planner blocks, but that would require even more SRAM and meticulous adjustment. To continue a block from the middle, with a ramp up in speed from zero, requires reconstructing the partial move and re-sending it to the planner anew. Certainly possible, just very challenging.

We don't actually need to formally populate a command queue with the various commands for recovery, as we now have methods that can go straight to the parser, and/or we can just do direct planner moves, removing some of the complications at that level.

My suggestion to let the planner queue run out does mean a slower recovery and more errant printing than a very fast stop, but it gives a simpler implementation and less SRAM usage than saving off the planner blocks, and SRAM usage is one of our major concerns since we have so little to spare.

@p3p
Copy link
Member

p3p commented Apr 30, 2018

I'd be a bit concerned with how long the planner buffer can take to empty given simple geometry.

@teemuatlut
Copy link
Member

teemuatlut commented Apr 30, 2018

halt the stepper ISR and copy all the planner blocks

I'm more thinking of: halt stepper ISR and then execute from secondary buffer but do not mess with the main one (other than discarding the current one where the skipped step happened).
Then return to main block queue once the secondary is empty.

An alternative option: because the needed steps for correction are roughly the same every time, the secondary buffer could only contain one block that would each time be populated by a state machine. This could reduce memory usage as we wouldn't have to allocate memory for a full buffer size.
Or use a linked list. I know you'd like those.
Execution speed really doesn't matter during the whole process as long as it starts at the right time.

@Robotec101
Copy link

Robotec101 commented Aug 26, 2018

Did you had both Diag1 and Diag0 connected in your testing ?

These pins can be configured by SPI commands to act as diagnostic pins, diag 0 is the one connected to the stallguard system therefore that is why is the only one used in out configurations.

You can see the possible warning outputs in the page 25 of the TMC 2130 datasheet(https://www.trinamic.com/fileadmin/assets/Products/ICs_Documents/TMC2130_datasheet.pdf) and teemuatlut made a good library to configure these pins.

i hope you will be more successful than me!

@Grapsus
Copy link

Grapsus commented Oct 22, 2018

Would it be possible to merge those functions from the Prusa firmware ?

The ISR handling routine is here:
https://github.com/prusa3d/Prusa-Firmware/blob/e6c80eaa0e5a94614dfcb63cec292dfc7fa8913c/Firmware/tmc2130.cpp#L239

And the save and restore print functionality is over there:
https://github.com/prusa3d/Prusa-Firmware/blob/bbd4f70f41bfcb69be0494bca8a6052e429632f8/Firmware/Marlin_main.cpp#L8612

@labotecno
Copy link

hello every one something was implemented about this function for lost step ? on the new firmware ? thank you

@MrStump
Copy link

MrStump commented Apr 16, 2019

any news?

@kingofl337
Copy link

I was just wondering is it a resources issue or do the developers of Marlin not like the implementation of Prusa's re-homing? I realize it's for a cartesian printer, but if someone were to take the time and migrate the code would it be accepted?

@InsanityAutomation
Copy link
Contributor

Part resources, part looking for a better solution. More recently we injected new commands to the front of the queue so the prusa method is less objectionable.

@stantond
Copy link

stantond commented Feb 7, 2020

Just came here to say that this is a very interesting feature Prusa have developed, and as StallGuard becomes more common as the 2209 drivers grow in adoption, this would now benefit even more Marlin users.

Can I suggest renaming this issue to "Accommodate for skipped steps with StallGuard"? Automatically re-homing is only part of the solution, you also want it to resume the print, and it now applies to multiple TMC drivers.

There's also an important caveat highlighted 4 minutes into this video that should be highlighted in any documentation for this feature. StallGuard will not detect skipped steps that occur in the current direction of travel. There must be an opposing force to trigger detection. This feature is still useful and worth implementing, but it's worth understanding exactly what problems it does and does not solve.

@NovaViper
Copy link

Hey is there any update on this? I'm willing to give any demo code a try! I just got some TMC2209s installed my Ender

@TazerReloaded
Copy link

I can understand that the whole deal with homing and resuming the print is very complicated, but I'd like an option to pause the printer if a stepper skips steps. Could be stuck filament or a broken part, in which case I don't want any automatic recovery attempt anyways.
This could also be solved with a current limit, but I'd rather have my printer pause than rattling about until I notice it.

@YaronSoffer
Copy link

YaronSoffer commented Jun 25, 2020 via email

@rallegade
Copy link

I too would be very interested in a feature like this for Marlin.

Has this feature been totally dropped?

@Fusseldieb
Copy link

Just installed four TMC2209's on my RAMPS printer and it works awesome, but we can't really use its functions...

Would like to see that implementation :)

@3DCoded
Copy link

3DCoded commented Feb 26, 2023

+1

@GadgetNutt
Copy link
Contributor

I'm still interested in this feature btw.

@thisiskeithb thisiskeithb changed the title [FR] TMC2130 - Automatic Re-Homing on lost steps when printing [FR] Crash Detection May 5, 2023
@William-Baker
Copy link

This still hasn't been implemented!? that's a shame

@Fusseldieb
Copy link

This still hasn't been implemented!? that's a shame

Same!

@GadgetNutt
Copy link
Contributor

ping

@psavva
Copy link
Contributor Author

psavva commented Feb 22, 2024

@thinkyhead @teemuatlut any options to get this feature developed?

@MarlinFirmware MarlinFirmware deleted a comment from Fusseldieb Feb 29, 2024
@ellensp
Copy link
Contributor

ellensp commented Feb 29, 2024

Reality check.

Full time Marlin developers < 1
Funds available = 0
Feature requests outstanding = 521
Other issues of greater importance than feature requests = 183

Not at all helpful, disparaging remarks over the lack of progress for this FR have been removed .

@vovodroid
Copy link
Contributor

Besides recovering from skipped steps, does current MONITOR_DRIVER_STATUS behavior detect them as an error and halts printer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F: Trinamic T: Feature Request Features requested by users.
Projects
None yet
Development

No branches or pull requests