Permalink
Commits on Feb 28, 2017
  1. DDA: Fix slow travel speed

    @phord abstract this to: This happens only when !recalc_speed,
    meaning we are cruising, not accelerating or decelerating. So it
    pegs our dda->c at c_min if it never made it as far as c_min.
    
    This commit will fix #69
    Wurstnase committed Oct 25, 2016
Commits on Feb 24, 2017
  1. dda.h: steps also for acceleration reprap

    With the rework and revert of some commits we miss to bring this part back for acceleration reprap.
    Wurstnase committed Feb 24, 2017
Commits on Feb 13, 2017
  1. Revert "dda.c: let's save 3 divisions."

    delta_um can become very small, where maximum_feedrate_P is constant.
    When moving this division out of the loop, the result can be wrong.
    dda->total_steps becomes also very small with delta_um. So this will fit perfectly.
    
    This reverts commit cd66feb.
    Wurstnase committed Feb 11, 2017
  2. DDA: The test of total_steps against step_no was a good idea.

    So let's bring this part back.
    
    We save 35 clock cycles at 'LED on time maximum'
    
    ATmega sizes               '168   '328(P)   '644(P)     '1280
    Program:  18038 bytes      126%       59%       29%       14%
       Data:   1936 bytes      190%       95%       48%       24%
     EEPROM:     32 bytes        4%        2%        2%        1%
    
    short-moves.gcode statistics:
    LED on occurences: 888.
    LED on time minimum: 217 clock cycles.
    LED on time maximum: 520 clock cycles.
    LED on time average: 249.626 clock cycles.
    
    smooth-curves.gcode statistics:
    LED on occurences: 22589.
    LED on time minimum: 217 clock cycles.
    LED on time maximum: 537 clock cycles.
    LED on time average: 284.747 clock cycles.
    
    triangle-odd.gcode statistics:
    LED on occurences: 1636.
    LED on time minimum: 217 clock cycles.
    LED on time maximum: 520 clock cycles.
    LED on time average: 270.933 clock cycles.
    Wurstnase committed Jan 26, 2017
  3. dda.c: simply resort the values and save up to 8 clocks

    ATmega sizes               '168   '328(P)   '644(P)     '1280
    Program:  18266 bytes      128%       60%       29%       15%
       Data:   1936 bytes      190%       95%       48%       24%
     EEPROM:     32 bytes        4%        2%        2%        1%
    
    short-moves.gcode statistics:
    LED on occurences: 888.
    LED on time minimum: 243 clock cycles.
    LED on time maximum: 555 clock cycles.
    LED on time average: 250.375 clock cycles.
    
    smooth-curves.gcode statistics:
    LED on occurences: 22589.
    LED on time minimum: 243 clock cycles.
    LED on time maximum: 572 clock cycles.
    LED on time average: 292.139 clock cycles.
    
    triangle-odd.gcode statistics:
    LED on occurences: 1636.
    LED on time minimum: 243 clock cycles.
    LED on time maximum: 555 clock cycles.
    LED on time average: 275.699 clock cycles.
    Wurstnase committed Jan 3, 2017
  4. DDA:testing steps

    start the simulation with ./parse_clean xyz, where 'xyz' can be anything to name the created files.
    
    in the end you will get 3 pictures.
    swan-reference-xyz.png how it should looks like.
    swan-current-xyz.png how it will looks now.
    swan-diff-xyz.png is the difference.
    
    This 3 pictures show only the X-axis.
    
    you will get also a forth file. pp-xyz.asc. you can open this file for example with meshlab and you can see that current model in 3d.
    
    If you want to use your own gcode, please do the following:
    Create a normal gcode. Delete any M116 (temp waitings). Maybe you want also deleting comments.
    Then add M114 for every x line.
    I do this with the swan-test.gcode:
    sed '1~2 s/$/\nM114/g' < swan.gcode > swan-test.gcode
    Wurstnase committed Jan 3, 2017
Commits on Feb 12, 2017
  1. dda.c: when we have no move on the cartesian axes, we have a move on E.

    But we can also have very short moves with only 1 step, without E. So include this moves also.
    Wurstnase committed Jan 2, 2017
Commits on Feb 11, 2017
  1. run-in-simulavr.sh: repair report.

    Use the CONFIG setup some lines before. So it will also work when running this script solely.
    Wurstnase committed Feb 11, 2017
Commits on Feb 1, 2017
  1. dda->id is needed even when !LOOKAHEAD

    In `ACCELERATION_RAMPING` code we use the dda->id field even when we do
    not enable `LOOKAHEAD`. Expose the variable and its related `idcnt`
    when `ACCELERATION_RAMPING` is used.
    
    Add a regression-test to catch this in the future.
    phord committed Feb 1, 2017
Commits on Dec 21, 2016
Commits on Dec 15, 2016
  1. dda.c: move code to reduce size.

    No functional change. Reduces program size by 2 bytes:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  17942 bytes      126%       59%       29%       14%
         Data:   1920 bytes      188%       94%       47%       24%
       EEPROM:     32 bytes        4%        2%        2%        1%
    Wurstnase committed with Dec 12, 2016
  2. Testcases: run them faster.

    Simple trick: raise the feedrate, no need to care about a milling
    bit when running a simulation. This reduces simulated time and as
    such, duration of the simulation (by about 50%).
    
    Also remove G-code which was never executed because simulations
    are chopped at 1 minute of simulation time and smooth-curves.gcode
    took about 1.5 minutes.
    
    Step pulse measurements remain about the same:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  17944 bytes      126%       59%       29%       14%
         Data:   1920 bytes      188%       94%       47%       24%
       EEPROM:     32 bytes        4%        2%        2%        1%
    
      short-moves.gcode statistics:
      LED on occurences: 888.
      LED on time minimum: 202 clock cycles.
      LED on time maximum: 380 clock cycles.
      LED on time average: 232.092 clock cycles.
    
      smooth-curves.gcode statistics:
      LED on occurences: 22589.
      LED on time minimum: 194 clock cycles.
      LED on time maximum: 423 clock cycles.
      LED on time average: 254.425 clock cycles.
    
      triangle-odd.gcode statistics:
      LED on occurences: 1636.
      LED on time minimum: 220 clock cycles.
      LED on time maximum: 380 clock cycles.
      LED on time average: 245.575 clock cycles.
    Wurstnase committed with Dec 12, 2016
  3. DDA: get rid of dda->delta_um[].

    These values were queued up just for finding out individual axis
    speeds in dda_find_crossing_speed(). Let's do this calculation
    with other available movement properties and save 16 bytes of RAM
    per movement queue entry.
    
    First version of this commit forgot to take care of the feedrate
    sign (prevF, currF). Lack of that found by @Wurstnase. Idea of
    tweaking calculation of 'dv' to achieve this also by @Wurstnase.
    
    It was tried to set the sign immediately after calculation of the
    absolute values, but that resulted in larger ( = slower) code.
    
    Binary size down 132 bytes, among that two loops. RAM usage down
    256 bytes for the standard test case:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  17944 bytes      126%       59%       29%       14%
         Data:   1920 bytes      188%       94%       47%       24%
       EEPROM:     32 bytes        4%        2%        2%        1%
    committed Dec 10, 2016
Commits on Dec 9, 2016
  1. dda_lookahead.c: reduce size of ATOMIC section.

    We calculate a safe join speed in dda_join_moves using data from
    two source DDA movements. We ensure the DDA values we use are sane
    by atomically copying them to local variables before beginning our
    calculation. But later we discard all our results if the DDA went
    live in the meantime, as evidenced by changes in `DDA->live` or
    `DDA->id`.
    
    Since we will not use the results of our calculations if either of
    these change, we can safely reference all the other DDA values
    non-atomically. Change the ATOMIC section to protect only the
    `DDA->id` values at the start.
    
    Added by Traumflug: this costs a negligible 4 bytes binary size:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  18082 bytes      127%       59%       29%       15%
         Data:   2176 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    phord committed with Nov 16, 2016
  2. dda_lookahead.c: remove unneeded assignments.

    Gcc optimizes them out anyway. No functional change. No surprise,
    same binary size.
    Wurstnase committed with Nov 16, 2016
Commits on Dec 8, 2016
  1. AVR: turn on link time optimisation (LTO).

    Following the resounding success on ARMs, let's try LTO on AVRs,
    too. Advantage isn't all that well, binary size increases by 462
    bytes and even an additional byte of RAM is needed.
    
    According to @Wurstnase's research, this size increase is pretty
    unique to the config.h.Profiling configuration. All other
    configurations he tried actually showed a size drop.
    
    Anyways, we have 15 to 17 clock cycles less on any step, so an
    about 7% general stepping performance increase.
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  18078 bytes      127%       59%       29%       15%
         Data:   2176 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    
      short-moves.gcode statistics:
      LED on occurences: 888.
      LED on time minimum: 202 clock cycles.
      LED on time maximum: 380 clock cycles.
      LED on time average: 232.092 clock cycles.
    
      smooth-curves.gcode statistics:
      LED on occurences: 23648.
      LED on time minimum: 220 clock cycles.
      LED on time maximum: 423 clock cycles.
      LED on time average: 255.22 clock cycles.
    
      triangle-odd.gcode statistics:
      LED on occurences: 1636.
      LED on time minimum: 220 clock cycles.
      LED on time maximum: 380 clock cycles.
      LED on time average: 245.575 clock cycles.
    committed Dec 7, 2016
Commits on Dec 7, 2016
  1. Makefile-AVR: solve the .siminfo section problem properly.

    After researching this issue for the third time, I finally found
    a proper solution: one can't keep an entire section without re-
    writing the entire link script, but one can keep individual
    symbols. That's what we do now, so we can use --gc-sections when
    linking with SimulAVR support.
    
    The problem came up again because -flto drops unused symbols, too.
    
    This commit changes binary size drastically (1654 bytes less), so
    let's take a new performance measurement snapshot:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  17616 bytes      123%       58%       28%       14%
         Data:   2175 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    
      short-moves.gcode statistics:
      LED on occurences: 888.
      LED on time minimum: 218 clock cycles.
      LED on time maximum: 395 clock cycles.
      LED on time average: 249.051 clock cycles.
    
      smooth-curves.gcode statistics:
      LED on occurences: 23648.
      LED on time minimum: 237 clock cycles.
      LED on time maximum: 438 clock cycles.
      LED on time average: 272.216 clock cycles.
    
      triangle-odd.gcode statistics:
      LED on occurences: 1636.
      LED on time minimum: 237 clock cycles.
      LED on time maximum: 395 clock cycles.
      LED on time average: 262.572 clock cycles.
    committed Dec 7, 2016
  2. ARM: turn on link time optimisation (LTO).

    Suggested by @Wurstnase. Apparently gcc got better, so it's
    actually an advantage now.
    
    Actually a pretty big advantage. While binary size decreases some
    200 bytes, pulse length of the debug LED is a lot shorter
    (measured on the scope):
    
      without LTO:  4.59 us
      with LTO:     3.65 us
    
    That's a 25% performance increase by just turning on a flag!
    committed Dec 7, 2016
Commits on Dec 6, 2016
  1. dda.c: pretty-format dda_start().

    Formatting was messed up during all the recent changes.
    
    Only whitespace and comment changes, no functional change.
    committed Dec 6, 2016
  2. DDA: revert recent dda_start() changes.

    Neither of them brought a performance improvement, so we revert
    both. Commits as well as revert kept to preserve the knowledge
    gained.
    
    This reverts commits
    
      "DDA, dda_start(): use mb_tail_dda directly." and
      "DDA, dda_start(): don't pass mb_tail_dda as parameter."
    
    Performance and binary size is back to what we had before:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19270 bytes      135%       63%       31%       15%
         Data:   2179 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    
      short-moves.gcode statistics:
      LED on occurences: 888.
      LED on time minimum: 218 clock cycles.
      LED on time maximum: 395 clock cycles.
      LED on time average: 249.051 clock cycles.
    
      smooth-curves.gcode statistics:
      LED on occurences: 23648.
      LED on time minimum: 237 clock cycles.
      LED on time maximum: 438 clock cycles.
      LED on time average: 272.216 clock cycles.
    
      triangle-odd.gcode statistics:
      LED on occurences: 1636.
      LED on time minimum: 237 clock cycles.
      LED on time maximum: 395 clock cycles.
      LED on time average: 262.572 clock cycles.
    committed Dec 6, 2016
  3. DDA, dda_start(): use mb_tail_dda directly.

    Just avoiding to pass mb_tail_dda as parameter didn't work out,
    so how about using it directly? This is what this commit does.
    
    Result: binary size another 32 bytes bigger, slowest step another
    16 clock cycles slower. No dice.
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19306 bytes      135%       63%       31%       15%
         Data:   2179 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    
      short-moves.gcode statistics:
      LED on occurences: 888.
      LED on time minimum: 218 clock cycles.
      LED on time maximum: 414 clock cycles.
      LED on time average: 249.436 clock cycles.
    
      smooth-curves.gcode statistics:
      LED on occurences: 23648.
      LED on time minimum: 237 clock cycles.
      LED on time maximum: 457 clock cycles.
      LED on time average: 272.256 clock cycles.
    
      triangle-odd.gcode statistics:
      LED on occurences: 1636.
      LED on time minimum: 237 clock cycles.
      LED on time maximum: 414 clock cycles.
      LED on time average: 262.595 clock cycles.
    committed Nov 27, 2016
  4. DDA, dda_start(): don't pass mb_tail_dda as parameter.

    Instead, read the global variable directly.
    
    The idea is that reading the global variable directly removes
    the effort to build up a parameter stack, making things faster.
    
    Actually, binary size increases by 4 bytes and the slowest step
    takes 3 clock cycles longer. D'oh.
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19274 bytes      135%       63%       31%       15%
         Data:   2179 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    
      short-moves.gcode statistics:
      LED on occurences: 888.
      LED on time minimum: 218 clock cycles.
      LED on time maximum: 398 clock cycles.
      LED on time average: 249.111 clock cycles.
    
      smooth-curves.gcode statistics:
      LED on occurences: 23648.
      LED on time minimum: 237 clock cycles.
      LED on time maximum: 441 clock cycles.
      LED on time average: 272.222 clock cycles.
    
      triangle-odd.gcode statistics:
      LED on occurences: 1636.
      LED on time minimum: 237 clock cycles.
      LED on time maximum: 398 clock cycles.
      LED on time average: 262.576 clock cycles.
    committed Dec 6, 2016
  5. DDA: avoid looking up the movebuffer array.

    As we have mb_tail_dda now, that's no longer necessary. Using
    something like movebuffer[mb_tail] is more expensive than
    dereferencing mb_tail_dda directly.
    
    This is the first time we see a stepping performance improvement
    since introducing mb_tail_dda. 13 clock cycles faster on the
    slowest step, which is 9 cycles faster than before that
    introduction.
    
    Binary size also a nice 94 bytes down.
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19270 bytes      135%       63%       31%       15%
         Data:   2179 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    
      short-moves.gcode statistics:
      LED on occurences: 888.
      LED on time minimum: 218 clock cycles.
      LED on time maximum: 395 clock cycles.
      LED on time average: 249.051 clock cycles.
    
      smooth-curves.gcode statistics:
      LED on occurences: 23648.
      LED on time minimum: 237 clock cycles.
      LED on time maximum: 438 clock cycles.
      LED on time average: 272.216 clock cycles.
    
      triangle-odd.gcode statistics:
      LED on occurences: 1636.
      LED on time minimum: 237 clock cycles.
      LED on time maximum: 395 clock cycles.
      LED on time average: 262.572 clock cycles.
    committed Nov 27, 2016
  6. dda_queue.c/.h: eliminate queue_current_movement().

    Again no stepping performance improvement, but another 34 bytes
    off the binary size:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19364 bytes      136%       64%       31%       16%
         Data:   2179 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    committed Nov 27, 2016
  7. dda_queue.c/.h: eliminate queue_empty().

    This is no longer needed, because mb_tail_dda gives the same
    information, just faster. Wanted side effect: better encapsulation.
    
    No stepping performance improvement, but binary size 36 bytes
    smaller:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19398 bytes      136%       64%       31%       16%
         Data:   2179 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    committed Nov 27, 2016
  8. dda_queue.c/.h: introduce mb_tail_dda.

    For now, this costs 2 bytes RAM, 8 bytes binary size and slows
    down the slowest step by 4 clock cycles. We expect opportunities
    for improvements elsewhere, of course.
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19434 bytes      136%       64%       31%       16%
         Data:   2179 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    
      short-moves.gcode statistics:
      LED on occurences: 888.
      LED on time minimum: 230 clock cycles.
      LED on time maximum: 407 clock cycles.
      LED on time average: 263.008 clock cycles.
    
      smooth-curves.gcode statistics:
      LED on occurences: 23648.
      LED on time minimum: 251 clock cycles.
      LED on time maximum: 450 clock cycles.
      LED on time average: 286.212 clock cycles.
    
      triangle-odd.gcode statistics:
      LED on occurences: 1636.
      LED on time minimum: 251 clock cycles.
      LED on time maximum: 407 clock cycles.
      LED on time average: 276.568 clock cycles.
    committed Nov 27, 2016
  9. dda_queue.c: take advantage of a special case.

    No functional change or stepping performance improvement, but a
    14 bytes smaller binary:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19426 bytes      136%       64%       31%       16%
         Data:   2177 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    committed Nov 27, 2016
  10. dda_queue.c: eliminate next_move() entirely.

    All the simplifications before led to a simple three-line
    function, one of which happened to duplicate a line of the calling
    code. Also update comments mentioning this former function.
    
    No stepping performance improvement, but cleaner code and 32 bytes
    less binary size:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19440 bytes      136%       64%       31%       16%
         Data:   2177 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    committed Nov 22, 2016
  11. dda_queue.c: inline a simplified version of next_move().

    As we're in an interrupt already, we can simplify the test for an
    empty queue. Slowest step down to 446 clock cycles, another 26
    ticks less. Binary size only 36 bytes up:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19472 bytes      136%       64%       31%       16%
         Data:   2177 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    
      short-moves.gcode statistics:
      LED on occurences: 888.
      LED on time minimum: 226 clock cycles.
      LED on time maximum: 403 clock cycles.
      LED on time average: 262.922 clock cycles.
    
      smooth-curves.gcode statistics:
      LED on occurences: 23648.
      LED on time minimum: 251 clock cycles.
      LED on time maximum: 446 clock cycles.
      LED on time average: 286.203 clock cycles.
    
      triangle-odd.gcode statistics:
      LED on occurences: 1636.
      LED on time minimum: 251 clock cycles.
      LED on time maximum: 403 clock cycles.
      LED on time average: 276.561 clock cycles.
    committed Nov 22, 2016
Commits on Dec 5, 2016
  1. DDA: don't queue up heater waits.

    Not queuing up waits for the heaters in the movement queue removes
    some code in performance critical paths. What a luck we just
    implemented an alternative M116 functionality with the previous
    commit :-)
    
    Performance of the slowest step is decreased a nice 29 clock
    cycles and binary size decreased by a whoppy 472 bytes. That's
    still 210 bytes less than before implementing the alternative
    heater wait.
    
    Best of all, average step time is down some 21 clock cycles, too,
    so we increased general stepping performance by no less than 5%.
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19436 bytes      136%       64%       31%       16%
         Data:   2177 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    
      short-moves.gcode statistics:
      LED on occurences: 888.
      LED on time minimum: 259 clock cycles.
      LED on time maximum: 429 clock cycles.
      LED on time average: 263.491 clock cycles.
    
      smooth-curves.gcode statistics:
      LED on occurences: 23648.
      LED on time minimum: 251 clock cycles.
      LED on time maximum: 472 clock cycles.
      LED on time average: 286.259 clock cycles.
    
      triangle-odd.gcode statistics:
      LED on occurences: 1636.
      LED on time minimum: 251 clock cycles.
      LED on time maximum: 429 clock cycles.
      LED on time average: 276.616 clock cycles.
    committed Nov 22, 2016
  2. Make temperature waiting independent from the movement queue.

    The plan is to remove this stuff from the movement queue.
    
    We still accept additional G-code ... until a G0 or G1 appears.
    This e.g. allows to do homing or read temperature reports while
    waiting.
    
    Keep messages exactly as they were before, perhaps some Host
    applications try to parse this.
    
    This needs 2 bytes RAM and 138 bytes binary size. Performance is
    unchanged. Let's see how this compares to the size reduction when
    we remove the temperature handling code from the movement queue.
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19646 bytes      138%       64%       31%       16%
         Data:   2177 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    
      short-moves.gcode statistics:
      LED on occurences: 888.
      LED on time minimum: 280 clock cycles.
      LED on time maximum: 458 clock cycles.
      LED on time average: 284.653 clock cycles.
    
      smooth-curves.gcode statistics:
      LED on occurences: 23648.
      LED on time minimum: 272 clock cycles.
      LED on time maximum: 501 clock cycles.
      LED on time average: 307.275 clock cycles.
    
      triangle-odd.gcode statistics:
      LED on occurences: 1636.
      LED on time minimum: 272 clock cycles.
      LED on time maximum: 458 clock cycles.
      LED on time average: 297.625 clock cycles.
    committed Nov 27, 2016
Commits on Nov 28, 2016
  1. gcode_process.c: remove G30.

    This was "Go home via point". The RepRap community has apparently
    decided for a super complex Z probing command with this number:
    
      http://reprap.org/wiki/G-code#G30:_Single_Z-Probe
    
    This reduces binary size by 18 bytes:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19508 bytes      137%       64%       31%       16%
         Data:   2175 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    committed Nov 28, 2016
Commits on Nov 27, 2016
  1. dda.c: simplify copy of startpoint.

    This reduces binary size by 26 bytes without drawback.
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19526 bytes      137%       64%       31%       16%
         Data:   2175 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    committed Nov 27, 2016
  2. DDA: don't queue up nullmoves.

    Nullmoves are movements which don't actually move a stepper. For
    example because it's a velocity change only or the movement is
    shorter than a single motor step.
    
    Not queueing them up removes the necessity to check for them,
    which reduces code in critical areas. It also removes the
    necessity to run dda_start() twice to get past a nullmove.
    
    Best of this is, it also makes lookahead perform better. Before,
    a nullmove just changing speed interrupted the lookahead chain,
    now it no longer does. See straight-speeds.gcode and
    ...-Fsep.gcode, which produced different timings before, now
    results are identical.
    
    Also update the function description for dda_create().
    
    Performance increase is impressive: another 75 clock cycles off
    the slowest step, only 36 bytes binary size increase:
    
      ATmega sizes               '168   '328(P)   '644(P)     '1280
      Program:  19652 bytes      138%       64%       31%       16%
         Data:   2175 bytes      213%      107%       54%       27%
       EEPROM:     32 bytes        4%        2%        2%        1%
    
      short-moves.gcode statistics:
      LED on occurences: 888.
      LED on time minimum: 280 clock cycles.
      LED on time maximum: 458 clock cycles.
      LED on time average: 284.653 clock cycles.
    
      smooth-curves.gcode statistics:
      LED on occurences: 23648.
      LED on time minimum: 272 clock cycles.
      LED on time maximum: 501 clock cycles.
      LED on time average: 307.275 clock cycles.
    
      triangle-odd.gcode statistics:
      LED on occurences: 1636.
      LED on time minimum: 272 clock cycles.
      LED on time maximum: 458 clock cycles.
      LED on time average: 297.625 clock cycles.
    
    Performance of straight-speeds{-Fsep}.gcode before:
    
      straight-speeds.gcode statistics:
      LED on occurences: 32000.
      LED on time minimum: 272 clock cycles.
      LED on time maximum: 586 clock cycles.
      LED on time average: 298.75 clock cycles.
    
      straight-speeds-Fsep.gcode statistics:
      LED on occurences: 32000.
      LED on time minimum: 272 clock cycles.
      LED on time maximum: 672 clock cycles.
      LED on time average: 298.79 clock cycles.
    
    Now:
    
      straight-speeds.gcode statistics:
      LED on occurences: 32000.
      LED on time minimum: 272 clock cycles.
      LED on time maximum: 501 clock cycles.
      LED on time average: 298.703 clock cycles.
    
      straight-speeds-Fsep.gcode statistics:
      LED on occurences: 32000.
      LED on time minimum: 272 clock cycles.
      LED on time maximum: 501 clock cycles.
      LED on time average: 298.703 clock cycles.
    
    There we save even 171 clock cycles :-)
    committed Nov 21, 2016