Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed limiting to reduce load #4931

Closed
thinkyhead opened this issue Oct 2, 2016 · 20 comments
Closed

Speed limiting to reduce load #4931

thinkyhead opened this issue Oct 2, 2016 · 20 comments
Labels
T: Design Concept Technical ideas about ways and methods. T: Development Makefiles, PlatformIO, Python scripts, etc.

Comments

@thinkyhead
Copy link
Member

thinkyhead commented Oct 2, 2016

As more end-users upgrade to 32x micro-stepping with drivers like the DRV8825 we're seeing more issues where innocuous activities like homing the Z axis will cause the CPU to become so overtaxed that the watchdog timer isn't getting reset. The end result is that sometimes during a move that is too taxing for the CPU the machine will simply reboot.

This comes principally from tying the watchdog to the period of temperature reading. But of course that's the point of the watchdog, to ensure that temperature readings are not so out of date as to become meaningless and throw the PID tuner into a tailspin. The watchdog is a necessary thing. Delaying heater readings for more than 4 seconds is dangerous. The other readings —filament width, etc.— are also time-sensitive and useless once they become stale.

But the watchdog wasn't necessarily designed with Stepper ISR overdrive in mind. It's not intended that stepping should eat up so much time that sensors cannot be read.

The cause of reset is that updateTemperaturesFromRawValues isn't getting called often enough, because temp_meas_ready isn't set for a long period of time… because set_current_temp_raw isn't called nearly often enough… because temp_count hasn't counted up to OVERSAMPLENR (which is 16). And, temp_count itself only increases once per 12 invocations of Temperature::isr – after all the ADC sensors have been read. The temperature ISR runs at 62.5 KHz, so normally the temperature readings are updated 325 times per second. The temperature ISR would need to be blocked pretty heavily to be delayed by over 1300 invocations.

The temperature ISR also manages heater PWM, so there's another good reason it shouldn't get blocked or slowed significantly.

Is it possible to apply a speed limit to the Stepper ISR, perhaps predicting the maximum load and capping movement speed? We need to make sure the ISR finishes quickly and that there's a minimum period of time between invocations, to leave enough time for other activities, prevent the steppers from stuttering, etc.

Even with improvements in performance, it's still easy to overload the CPU by trying to move too fast. So we need to place a cap on movement speed based on some assessment of load. If the planner senses that the stepper ISR is getting too close to the limit, it could apply a speed limit to the feed rate on blocks.

If the planner takes care of the speed limiting, then the Stepper ISR only needs to count up its total number of cycles. The planner can divide by millis() to get a sense of the average number of cycles per millisecond, and the speed will just be capped to something like 12 million cycles per second — or 75% of the available CPU.

In the documentation it should be mentioned that every time you double your micro-stepping or your gearing it also doubles the amount of CPU load. I'd like to evaluate how close we are to performance limits at this point and come up with a good margin so we know what's supportable.

@thinkyhead thinkyhead added T: Development Makefiles, PlatformIO, Python scripts, etc. T: Design Concept Technical ideas about ways and methods. labels Oct 2, 2016
@emartinez167
Copy link
Contributor

Would a sensible thing to do then would be to do a sanity check on startup
(or even better, at compile time) to determine if configuration settings
are just too aggressive? How would we go about that?
On Sun., 2 Oct. 2016 at 11:14, Scott Lahteine notifications@github.com
wrote:

As more end-users upgrade to 32x micro-stepping with drivers like the
DRV8825 we're seeing more issues where innocuous activities like homing the
Z axis will cause the CPU to become so overtaxed that the watchdog timer
isn't getting reset.

The end result is that sometimes during a move that is too taxing for the
CPU the machine will simply reboot. This comes principally from tying the
watchdog to the period of temperature reading. But of course that's the
point of the watchdog, to ensure that temperature readings are not so out
of date as to become meaningless and throw the PID tuner into a tailspin.
The watchdog is a necessary thing. Delaying heater readings for more than 4
seconds is dangerous. The other readings —filament width, etc.— are also
time-sensitive and useless once they become stale.

But the watchdog wasn't necessarily designed with Stepper ISR overdrive in
mind. It's not intended that stepping should eat up so much time that
sensors cannot be read.

The cause of reset is that updateTemperaturesFromRawValues isn't getting
called often enough, because temp_meas_ready isn't set for a long period
of time… because set_current_temp_raw isn't called nearly often enough…
because temp_count hasn't counted up to OVERSAMPLENR (which is 16). And,
temp_count itself only increases once per 12 invocations of
Temperature::isr – after all the ADC sensors have been read. The
temperature ISR runs at 62.5 KHz, so normally the temperature readings are
updated 325 times per second. The temperature ISR would need to be blocked
pretty heavily to be delayed by over 1300 invocations.

The temperature ISR also manages heater PWM, so there's another good
reason it shouldn't get blocked or slowed significantly.

Is it possible to apply a speed limit to the Stepper ISR, perhaps
predicting the maximum load and capping movement speed? We need to make
sure the ISR finishes quickly and that there's a minimum period of time
between invocations, to leave enough time for other activities, prevent the
steppers from stuttering, etc.

Even with improvements in performance, it's still easy to overload the CPU
by trying to move too fast. So we need to place a cap on movement speed
based on some assessment of load. If the planner senses that the stepper
ISR is getting too close to the limit, it could apply a speed limit to the
feed rate on blocks.

If the planner takes care of the speed limiting, then the Stepper ISR only
needs to count up its total number of cycles. The planner can divide by
millis() to get a sense of the average number of cycles per millisecond,
and the speed will just be capped to something like 12 million cycles per
second — or 75% of the available CPU.

In the documentation it should be mentioned that every time you double
your micro-stepping or your gearing it also doubles the amount of CPU load.
I'd like to evaluate how close we are to performance limits at this point
and come up with a good margin so we know what's supportable.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#4931, or mute the thread
https://github.com/notifications/unsubscribe-auth/AGcPb-stKgPi69Fa8f8TEaYz9tk3wGY0ks5qvyGJgaJpZM4KL697
.

@Roxy-3D
Copy link
Member

Roxy-3D commented Oct 2, 2016

@gruvin @AndyZuerich @Rerouter

This comes principally from tying the watchdog to the period of temperature reading. But of course that's the point of the watchdog, to ensure that temperature readings are not so out of date as to become meaningless and throw the PID tuner into a tailspin. The watchdog is a necessary thing. Delaying heater readings for more than 4 seconds is dangerous. The other readings —filament width, etc.— are also time-sensitive and useless once they become stale.

I pretty much agree with everything else you said. But please let me nit-pick on this. I'm wondering if we can relax some constraints.

I honestly don't believe delaying heater readings by 4 seconds is dangerous. Let's take a worst case example. Suppose I'm doing ABS at 240 C. And suppose because of what was going on like a very high extrusion rate (where I needed lots of power applied to the heater to keep up) and the last few readings make the PID feel like it was falling behind. In that scenario, maybe the PID puts out a 'Full Power' request. Even if the PID was putting out a max heat request... 4 seconds is only going to be a few degrees. And who knows, maybe it over shoots a few more degrees after the PID backs off. I'm kind of skeptical. I don't believe that is going to happen. (I'm just trying to help you under stand my motivation for the next part.)

This might be a good time to ask the question: When doing a PID algorithm, do we need a uniform time difference between the readings? It would seem to me we don't need that. It would seem mathematically that we can just take the time from the last reading and use that when we update our results. Specifically... The P part is what ever it is when we take the reading. That doesn't change with the time period. The D part may be bigger if our time period stretches because we were busy doing other things. But that can be compensated by just dividing by the longer time period. And similarly, the I part is going to need to be bigger because we are integrating over a longer time period. We just multiply that by the longer time period.

Admit-ably, we lose a small amount of resolution because we are not tracking the temperature curve quite as accurately. But really... my guess is it will hardly matter.

Is it worth letting the PID update function just run when ever it can run, and make it smart so it scales each incremental update to how much time has passed? Or, what I'm really asking is this: Can't we just run the PID algorithm when we have time to run it? Do we need to be a slave to it and make sure it runs at uniform time increments? If the printer is busy doing other important things, can't we wait 500 ms to run the PID algorithm and just compensate for the delay?

My gut hunch is the answer is "Yes!"

@Rerouter
Copy link
Contributor

Rerouter commented Oct 2, 2016

I've not had a chance to dig too deep into the stepper library, but it seems there is a massive amount going on in the ISR, which is very different to what i am used to (get in, do its thing, get out as soon as possible)

I have been looking at the PID and been thinking of making it use a discrete time implementation so being delayed by a block of moments causes less issues than the current approach that assumes its fired off exactly on time every time, (it already calculates the time scaling each loop because you can change the variables)

In other words the code is already in place, i'll draw up another pull request if you guys are happy with how that sounds,

As far as the temperature goes, we would need some filtering in any case, but we could likley get away with less than 325 reading required per second, I'm having a dig through the datasheet to see what options we have available

@Rerouter
Copy link
Contributor

Rerouter commented Oct 2, 2016

To see what i mean about the PID time, the variable used is "PID_dT" , currently defined as
#define PID_dT ((OVERSAMPLENR * 12.0)/(F_CPU / 64.0 / 256.0)) e.g. assuming a fixed quantity,

@thinkyhead
Copy link
Member Author

thinkyhead commented Oct 2, 2016

Of course I mention the thermalManager not getting enough time as the most prominent feature of CPU overload. I seem to recall at some point there was some kind of speed throttle on the stepper ISR based on load. But perhaps I mis-remember.

Shaving cycles throughout the planner and stepper ISR is going to help, of course. But that still leaves open the possibility of allowing configured speeds to go beyond rated limits. The most obvious case is when upgrading to 32x micro-stepping drivers, immediately doubling the load. The old speed limits may be too high. Understandably, nobody wants to lower their feed rates after an upgrade. But 32x only gives you quieter steppers, and are only worth it if you can afford the extra CPU overhead.

It would be nice if we could use static_assert to test configured speeds and catch cases where a max feed rate is generally impossible, taking account of the number of presumed ISR calls per second and the duration of the ISR, etc. But the duration of the ISR is nigh impossible to figure in advance.

So it seems to make the most sense to dynamically throttle the stepper ISR speed.

@Rerouter
Copy link
Contributor

Rerouter commented Oct 2, 2016

It would more be a case of adding a debug variables to count how much time the ISR consumes at a given feedrate at a given steps per mm, we can then assume the worst case for 5 axis's all moving at once and add it to sanity checking, we would then relax it when we later improve things in other areas

@Blue-Marlin
Copy link
Contributor

Shouldn't be too difficult to maintain a minimum time between the stepper interrupts.
At the end of the stepper interrupt:
If the time to the next stepper interrupt is below a minimum and we are not in quad stepping increase the multi-stepping, else increase the time to the next interrupt to the minimum time (to do something else in between)

@AndyZuerich
Copy link

Delaying heater readings for more than 4 seconds is dangerous. The other readings —filament width, etc.— are also time-sensitive and useless once they become stale.

This depends highly on the hardware setup. For example with my heated bed, within 4 seconds, not much will happen. But my hotend is quick - very quick. Within 4 seconds and full power, I will have a temperature overshoot of 30 degrees or more, which would be unacceptable.

Maybe I did not understand the problem in total: The 32 Microstepping with high speeds consumes most of the computation power. But the thermal manager is not the limiting process, i.e. the thermal manager is in general fast enough. The only problem is that the thermal manager is not getting called at all when the Arduino is driven at its limit.

Concerning PID with e.g. a 4 seconds (or variable) delay: Well, if it is known that the sampling time is variable, this could be considered in the thermal manager algorithms. Currently, I only see the following problems:

  • Bang-Bang thermal control: With bang-bang, and a 4 second delay, this means automatically a 4 second full power on or 4 second full power off period, which is the worst case for over/undershoot. I think for the hotend thermal control, bang-bang will become unusable.
  • PID: At the moment, the PID PWM algorithm is implemented in software (am I correct, here?) , and not in the hardware -> The on/off times are set manually with a digital write to the output pins. When the thermal manager is not called at all, then we have the same situation as with bang-bang: The heater will be constantly on or off during such a period. Hardware PWM for the heaters is only used for a maximum value limitation:
#define BANG_MAX 90 // limits current to nozzle while in bang-bang mode; 255=full current
#define PID_MAX BANG_MAX // limits current to nozzle while PID is active (see PID_FUNCTIONAL_RANGE below); 255=full current

The default frequency for this maximum value limitation is (as far as I remember 490 Hz)

Therefore, the first solution approach for fixing this would be to use pure hardware PWM for all heaters. I am doing this anyways, since I control my heaters with a 31 kHz buck converter and hardware PWM.
In fact, I don't really know why, at the moment, the PWM for our heaters is done in software. Maybe it is a low-frequency issue: For the simple MOSFETS on the RAMPS board, people wanted to have the PWM frequency as low as possible so that switching losses are no concern. I think by default, the sampling time for the PWM routines (done in software) is something as 8 Hz? You can even see the LED blinking on the PCB when the heater is in action.
When we do pure hardware PWM instead of software PWM, then the issue with the heater manager not getting called regularly is much more relaxed. Therefore, in the Arduino code, we would need to merge the output of the PID_MAX value and the actual control value that the PID algorithm creates to a single value (which is quite easy and would even simplify the code). The disadvantage: The PWM output resolution would suffer. For instance, when PID_MAX is set to 40, we only have 40 discrete output values for the heater control. Just recently, re-router improved the PID output resolution from 7 bit to 8 bits. In most of the cases, people have a well dimensioned heater system and they don't need to limit the PID_MAX value. So, for most people changing from software PWM to hardware PWM would not have any disadvantage.

The cause of reset is that updateTemperaturesFromRawValues isn't getting called often enough, because temp_meas_ready isn't set for a long period of time… because set_current_temp_raw isn't called nearly often enough… because temp_count hasn't counted up to OVERSAMPLENR (which is 16). And, temp_count itself only increases once per 12 invocations of Temperature::isr – after all the ADC sensors have been read. The temperature ISR runs at 62.5 KHz, so normally the temperature readings are updated 325 times per second. The temperature ISR would need to be blocked pretty heavily to be delayed by over 1300 invocations.

Did I understand this right: By default, the temperature readings are set 325 times per second? Why don't we just update the temperature readings once every time before the thermal manager is called? All the other temperature readings are then anyways thrown away.

To do a short summary of my text above:

  • I think having something as a 4 seconds delay in the thermal manager might be a big issue for some hardware configurations (over-dimensioned heater power, especially hotends).
  • At the moment, the outer-loop PWM is done in software. This fact would totally kill the thermal control performance of the heaters when delays in the range of seconds arise due to CPU overload. Switching to pure hardware PWM would help here a lot (bang-bang would still be suffering, so PID is then much more superior=.
  • My impression is: The thermal manager code is not computationally intensive. It should always be possible to call the thermal manager within reasonable time. From my personal feeling/estimation, I would say that the thermal manager should be called at least every 1 second.
  • In case it is known that the thermal manager is called in irregular time periods, this can be accounted for in the thermal control routines. I think at the moment, the code assumes a fixed sampling frequency and would result in some "unexplanable" ringing when not called with fixed sampling frequency.

@thinkyhead Would it hurt a lot for the stepper action to call the temperature reading just before the thermal manager, and then call the thermal manager with a fixed sampling period? Or is the reading of ADC values very time consuming, or does it include another delay time that we have to consider?

I have just recently upgraded my delta printer to the DRV8825. My printer is at the moment not in operation (only the heaters work). Additionally, I have upgraded to a 128x64 display. Therefore, I expect that I would run into the same issue when I finish my hardware within the next weeks.

@thinkyhead
Copy link
Member Author

Why don't we just update the temperature readings once every time before the thermal manager is called?

It's an ADC thing. You need to start ADC, wait a short period, then fetch the result. Only one ADC may be active at a time. So we start an ADC on one pass, then read the ADC result on the next pass. As there may be up to 6 different sensors, there are 12 passes before all sensors are ready. We can drop the extra passes for sensors that don't exist. On most setups this would drop from 12 down to 4, so the sensor values could be read at 976Hz instead of 325Hz.

@thinkyhead
Copy link
Member Author

Would it hurt a lot for the stepper action to call the temperature reading just before the thermal manager, and then call the thermal manager with a fixed sampling period?

Stepper::isr runs at many rates and sometimes not (or hardly) at all, so it makes sense that Temperature::isr should be used. It's the canary in the coal mine to show us these limitations.

I thought that what @Blue-Marlin described was basically in place already. The code that sets the rate also limits it. Take a look at calc_timer. How should we modify this?

static FORCE_INLINE unsigned short calc_timer(unsigned short step_rate) {
  unsigned short timer;

  NOMORE(step_rate, MAX_STEP_FREQUENCY);

  if (step_rate > 20000) { // If steprate > 20kHz >> step 4 times
    step_rate >>= 2;
    step_loops = 4;
  }
  else if (step_rate > 10000) { // If steprate > 10kHz >> step 2 times
    step_rate >>= 1;
    step_loops = 2;
  }
  else {
    step_loops = 1;
  }

  NOLESS(step_rate, F_CPU / 500000);
  step_rate -= F_CPU / 500000; // Correct for minimal speed
  if (step_rate >= (8 * 256)) { // higher step rate
    unsigned short table_address = (unsigned short)&speed_lookuptable_fast[(unsigned char)(step_rate >> 8)][0];
    unsigned char tmp_step_rate = (step_rate & 0x00ff);
    unsigned short gain = (unsigned short)pgm_read_word_near(table_address + 2);
    MultiU16X8toH16(timer, tmp_step_rate, gain);
    timer = (unsigned short)pgm_read_word_near(table_address) - timer;
  }
  else { // lower step rates
    unsigned short table_address = (unsigned short)&speed_lookuptable_slow[0][0];
    table_address += ((step_rate) >> 1) & 0xfffc;
    timer = (unsigned short)pgm_read_word_near(table_address);
    timer -= (((unsigned short)pgm_read_word_near(table_address + 2) * (unsigned char)(step_rate & 0x0007)) >> 3);
  }
  if (timer < 100) { // (20kHz - this should never happen)
    timer = 100;
    MYSERIAL.print(MSG_STEPPER_TOO_HIGH);
    MYSERIAL.println(step_rate);
  }
  return timer;
}

@thinkyhead
Copy link
Member Author

thinkyhead commented Oct 9, 2016

Synopsis:

  • The initial step rate is for accelerating, cruising, decelerating
  • Step rate gets an upper limit. 10000 with Toshiba controllers, 40000 otherwise.
  • Step rate 20000-40000 is divided by 4 (5KHz-10KHz), step_loops is 4.
  • Step rate 10000-20000 is divided by 2 (5KHz-10KHz), step_loops is 2.
  • For rates under 10000 (Toshiba) step_loops will always be 1.
  • Lastly, step_rate may not be less than 32 (or 40) steps per second
    • The final subtraction is a correction for all speeds

The timer value is produced with help from speed lookup tables. At the end, there's a sanity-check of the timer to make sure it's not under 100 - which is 20KHz with a prescaler of 1/8. So the Stepper::isr will never run faster than 20KHz. From previous limits, it should in reality be no more than 10KHz.

After OCR1A is set to the timer value, at the end of the stepper ISR, there's one more patch to the timer compare register.

  NOLESS(OCR1A, TCNT1 + 16);

This limits the timer compare so the stepper ISR won't run again immediately — or at least, within 128 CPU cycles.

This correction will only apply at the highest step rates, which have already been capped to 20KHz – more likely to 10KHz. When a stepper is using 32x micro-stepping and 160 steps per mm, even with quad stepping this seems to make 250mm/s the upper speed limit.

How long does the stepper ISR need to be before this cap is invoked? By my calculation, only 784 CPU cycles when running at 20KHz, or 1584 cycles at 10KHz.

Increasing this to TCNT1 + 24 or TCNT1 + 32 might be one way to open up a little more time for other code to run, but will limit the speed further. If realistically a count of 200 (1600 cycles) allows 40K steps, then these options reduce top speed to 38400 and 36800 steps/s. (Or, as low as 230mm/s for 32x and 460mm/s for 16x.)

(All given a 500mm/s inherent speed limit with a common belt driven axis at 80 steps/mm, or 250mm/s inherent limit if you go to 32x or 160 steps/mm.)

@boelle
Copy link
Contributor

boelle commented Jun 22, 2019

i agree with @emartinez167

some sort of check at boot that either warns or just cap the speed would be ideal so that bad prints could be avoided

@boelle boelle mentioned this issue Jun 22, 2019
20 tasks
@RaabenF
Copy link

RaabenF commented Jul 13, 2019

This limits the timer compare so the stepper ISR won't run again immediately — or at least, within 128 CPU cycles.

dont know the code at all, but:
if he's right, that sounds like the temperature isr may be to heavy to execute in that 128 cycles window

at least the ISR should be able to run completely in that window once. or can you swear, you did all push/pop stack savings right?
i like such dusty problems, and started reading a bit code, but im not actually able to find out where the ISRs are built!?
doesnt seem to be in the HAL section

if all the other issues mentioned are new on v2 im 90% sure its an ISR problem:

The thing is, if you have multiple IRs at once, you have to save registers and stuff at the IRs beginning and restore them at end. otherwise you get that unexpacted behaviour. of course it worked long time, but everytime this values are not saved right at IRs you get a more ore less failure on some values

or its just a timer issue, if some timers stop working while executing IRs, you find about that in the arduino reference


i didnt really get all that calculations, but wonder if that 20kHz really is enough for over 100mm/s

but practically i think too, its not acceptable to check temps less then 0.5s / or pid bouncing will surely ruin prints at 300° PC :-/

@RaabenF
Copy link

RaabenF commented Jul 14, 2019

PS: really great work so far! that HALL thing is great and would be a huge effort for other projects.
i first thought you would discontinue the 8 bit branch, which would be naturally.
i had my printer slowing down even on 16th steppers, at a fast test with spiral vase and complex forms...
and needed days to notice, that it wasnt slicers fault. (on avr 1280)
so any advice to the user would be helpfull

@Rerouter you seem to think in the same direction with IRs
i think youre wasting much precious time talking about many issues and looking for that needle in the haystack. that came while migrating to AVR HALL. i used platform IO first time yesterday and am really happy about that advice.
but even if i could find some way to improve it, i have no idear how really to test, or prove to you

as said above: i belief its either:
-too long temp IR
-or wrong register saving while IRs

othe workarounds would be really nasty

@boelle
Copy link
Contributor

boelle commented Oct 24, 2019

@thinkyhead is this one still relavant or has it served its purpose?

@boelle
Copy link
Contributor

boelle commented Nov 24, 2019

another stale one

and we could as well use discord:

Marlin Discord server. Join link: https://discord.gg/n5NJ59y

@boelle boelle closed this as completed Nov 24, 2019
@github-actions
Copy link

github-actions bot commented Jul 3, 2020

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

3 similar comments
@github-actions
Copy link

github-actions bot commented Oct 3, 2020

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions
Copy link

github-actions bot commented Dec 3, 2020

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T: Design Concept Technical ideas about ways and methods. T: Development Makefiles, PlatformIO, Python scripts, etc.
Projects
None yet
Development

No branches or pull requests

8 participants