Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experimental] Direct stepper chunk support #2 #7047

Closed

Conversation

colinrgodsey
Copy link
Contributor

@colinrgodsey colinrgodsey commented Jun 13, 2017

Continuation of #7012. Big thanks to @thinkyhead for helping me clean this up. Still more work to do, but it's at least updated and did a quick test to make sure it runs 👍

[...]

This feature addition allows an external device to concurrently upload chunks of 1024 steps to the device, and trigger their sequencing by using a new G-code command (currently, 'C0'). The protocol for updating chunk buffers themselves is a binary protocol that starts a packet using the control character '!'. This character is not used elsewhere in g-code (input anyways), and allows low-level processing of the serial sequence- enabling buffering independent of the command parser (all handled in the ISR).

C0 command format:
C0 I[chunk start index] R[number of chunks, defaults to 1] S[steps per second]

The execution of the chunks is done by extending the Marlin block format with a field and flag that lets it execute the buffered chunk instead of looking for the normal trapezoid related parameters. The step speed is configurable, I've had success with 10k-30k steps/s, although 30k seems to starve the temperature ISR causing runaway errors.

[...]

My external test planner: https://github.com/colinrgodsey/step-daemon

RepRap docs: http://reprap.org/wiki/Stepper_Chunks

More details in #7012.

chunk_response[current_block->chunk_idx] = CHUNK_RESPONSE_NONE;

//keep iterating until we eventually reach our total step_events_completed
current_block->chunk_idx = (current_block->chunk_idx + 1) % (NUM_CHUNK_BUFFERS - 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i think i confused a modulo and a bitwise &... not sure why im subtracting one here.


//loop through every bucket, start 1 after the index of the last response we gave
for(uint8_t i = 1 ; i < (NUM_CHUNK_BUFFERS + 1) ; i++) {
const uint8_t idx = (last_response_index_start + i) % (NUM_CHUNK_BUFFERS - 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm same thing here. I should maybe just change these to &'s and force the pow2 requirement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, simply enforcing powers of 2 will work. Just add the necessary test to SanityCheck.h.

@colinrgodsey
Copy link
Contributor Author

colinrgodsey commented Jul 6, 2017

Pushed some optimizations and cleanup that have been going good so far.

No sign of the temperature ISR issues anymore. I got rid of the per-tick updates to count_position and only do them every segment (8 ticks), so I cut the amount of 64-bit long emulation in 8 there. Seems to work really well, getting a lot more consistent sound out of the thing, and 25k step/s works great.

Finishing up my first full test print with OctoPrint + StepD finally (and these new Marlin changes). Frog model at 120mm/s with tons of curves, curved surface infill too. Constant 25k step/s (haven't tried 30k again). 500kbps USB (not really needed, but works fine). Atmega2560 @16mhz (MKS Base v1.4).

Zero stutter at all from OctoPrint on this model, which it normally would. Posting the video mostly just for some "proof" that it actually functions, but also trying to showcase the full range of noise. There's one wave table (or combo?) that seems to produce an interesting sound, but for the most part my retraction and z-hop causes the most noise, which it always has.

https://youtu.be/vQWB500AFjg

frog1

frog2

* producing and triggering direct stepper chunks using the
* described protocol:
*
* <!insert links to docs here!>
Copy link
Contributor Author

@colinrgodsey colinrgodsey Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thinkyhead do you know where would be a good place to write a draft specification for this support? ill have to dig around the reprap wiki and see if theres a good dumping ground somewhere, but that might not be the right place...

EDIT: actually, that looks like it might be the right place

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Use the RepRap wiki. All kinds of draft specifications there!

@colinrgodsey
Copy link
Contributor Author

colinrgodsey commented Jul 7, 2017

also just finished my first print with octoprint+marlin+stepd all on the RPi 3. Works great, zero stutter. Both services taking less than 10% total CPU combined on the device.

@Roxy-3D
Copy link
Member

Roxy-3D commented Jul 10, 2017

And... does this mean all current GCodes and MCodes need to give up any 'C' parameters they have?
After all... I was told again and again that GCode can not have ANY REPETITION of a letter on a given line.

// Handle a known C, G, M, or T
  |   | switch (parser.command_letter) {
  |   | +
  |   | +    #if ENABLED(CHUNK_SUPPORT)
  |   | +
  |   | +      case 'C': switch (parser.codenum) {
  |   | +        case 0:
  |   | +          gcode_C0();
  |   | +          break;
  |   | +      }
  |   | +      break;
  |   | +    #endif // CHUNK_SUPPORT
  |   | +
  |   | case 'G': switch (parser.codenum) {

@colinrgodsey
Copy link
Contributor Author

@Roxy-3D ahhh, ok i didnt really think about that. probably makes great sense for simplifying parsers and such. it can be another G-code

@Roxy-3D
Copy link
Member

Roxy-3D commented Jul 10, 2017

I actually don't really care... G29 and G26 are not going to give up the use of their 'C' parameter. But it seems a fair amount of the arguments against me using G and M as easy to remember parameters just got thrown in the trash heap.

@colinrgodsey
Copy link
Contributor Author

colinrgodsey commented Jul 10, 2017

hah, yea i feel like thats a whole debate i should just stay out of ;) either way, it's easily changed. ill see if any other input trickles in. I honestly expected that to be the first point of contention.

EDIT: @Roxy-3D hah, just noticed the fire storm going on in #7131. yea, g-code is pretty janky

@thinkyhead
Copy link
Member

thinkyhead commented Jul 10, 2017

@colinrgodsey Given that this is a "proprietary extension" that you might say "bypasses" G-code, and is not widely supported (yet) by slicers and hosts, I'm not worried about the extension of G-code for this specific feature while it's being evaluated. However, I agree with Roxy that we should pick wisely. So, I propose for this extension the initial code "letter" should not be a letter at all, but we should use another character, such as one of these: !@#$%

As we are past the 1.1 release and have mailed that package, I feel that Marlin 2.0.x can be more "outside the box" with its protocols. I think that insofar as we have a G-Code protocol, it should try to adhere to the CNC methodologies. But, for binary, compressed, or otherwise optimized protocols like Stepper Chunks, we should feel more free to experiment and settle on what works best technically.

Obviously, we can provide pre-processor scripts to convert normal G-code into chunks or binary, or any mixture…

@colinrgodsey
Copy link
Contributor Author

colinrgodsey commented Jul 10, 2017

Small update from yesterday. Did a test print at 30k ticks/steps per second all from the RPi, worked great, and no sign of temperature ISR issues. Today I'll make an attempt at 40k, but mostly just for fun.

Also wanted to showcase some of how step-daemon was working on the RPi 3 alongside OctoPrint:

stepd top

So thats 30k ticks/s, with full per-segment bed leveling. Really not using a ton of CPU, still plenty of space for timelapses or whatnot, and the multi-threaded pipeline for step-daemon happily chugs away on all cores.

@colinrgodsey
Copy link
Contributor Author

colinrgodsey commented Jul 10, 2017

@thinkyhead I'm already using '!' as a submission control character, suppose I could use any of those other ones for 'triggering' the buffers. Hmm and yea, I agree you would never see this protocol in any g-code file, nor would you be able to really trigger it manually.

Not a huge fan of the "proprietary" description tho, although it is definitely a non-standard transmission protocol. I'd love for this to be used... wherever, hoping to get a full proposal up on RepRap in the next day or two. Just want to verify that I don't benefit from this lol This doesn't seem like an area of machining where people go to get rich ;)

EDIT: sorry, reflecting too heavily on @bobc 's comments on the first PR

@colinrgodsey
Copy link
Contributor Author

colinrgodsey commented Jul 10, 2017

@thinkyhead ah, just saw your edits. Yea, this extension could technically be done entirely from the file. I was fiddling around with a web version of step-d that could do all the conversion just from the browser (normal JS app). Would be easy to have it write back to a file, but there's some other semantics I need to think over.

Marlin 2.x makes sense, definitely interested in taking a more technical deep dive with whoever wants to get involved. I've been really pleased with the results from this proof of concept so far, but there's no real rush or demand for this yet. If step daemon (or some other external planner or preprocessor) really showcases its worth then there might be, but that's not realistically going to happen soon without some sort of solid firmware support and candidate spec that's beneficial on a more fundamental level.

But at this point, I've at least gotten over my OctoPrint stutter and can print reliably at 120mm/s (at 30k/s). So I'm happy for now.

@colinrgodsey colinrgodsey changed the title [Expo] Direct stepper chunk support #2 [Experimental] Direct stepper chunk support #2 Jul 10, 2017
@colinrgodsey
Copy link
Contributor Author

started a reprap article for the actual protocol, could be a good place for high-level protocol talk: http://reprap.org/wiki/Stepper_Chunks

@davidelang
Copy link

arriving at the discussion a bit late, but I've read through the docs linked, and skimmed through the prior issue. I've also been looking at how grbl and maslow handle motion.

I think that wave tables are a bad option for the communication, they are going to be large, and one of the limiting factors is the communication speed (serial port)

Thinking about the requirements to define the motion, grbl has good information about the output of their planner, look at around line 172 at https://github.com/gnea/grbl/blob/master/grbl/stepper.c

They break things down to ramp-up, constant speed, ramp-down sections, as the output of their planner. That seems like a MUCH more compact representation than wave tables, while still doing all the hard math on a faster external system.

@davidelang
Copy link

Also, if you are going to do an external planner, you should also figure that you can have more than one board driving things.

This means that you need a way to sync multiple boards together (since you cannot simultaneously transmit a 'go' signal vial serial or network).

I would suggest something like what I2C and USB use, have a common line that has a pull-down (or up) resistor on it, and each device on the bus pulls it the other direction shortly after they start a sequence. This allows you to have a 'sync/start' command that releases the local pull up on the line and waits for the line to go low. Once all devices have hit the sync/start command, and the master has also released the line, it will drop and all boards can start simultaneously.

As long as the motion commands send to the various boards are supposed to take the same amount of time, and you put a sync in every few seconds, the drift in the time between the different boards canb e kept to an acceptable level.

If we are going to define a new protocol for controlling CNC machines, let's make it be something that can control any type of machine with any number of axis rather than something limited to 3D printing with 1-2 extruders. Let's design something that would support having the 4-filament extruder one one controller board with a 4+ axis motion control system on another controller board as an example, or a 5-axis milling machine with another axis or two to control tool selection.

@colinrgodsey
Copy link
Contributor Author

colinrgodsey commented Jul 11, 2017

@davidelang ah, so the wave tables for the CH-4x4-128 format are actually fixed tables, they never get sent over the wire. They're actually totally optional, but for that particular format the wave table is rather small (128 bytes) and used just to avoid having to do the Bresenham line math in the stepper ISR. For other formats, the wave tables might not make sense, or might not have any use (like 1-tick segment encodings, there's no lines to interpolate over). I might also go back to looking at Bresenham lines for this format, but I'm not sure I can do it with pure 8-bit math (the stepper ISR has to be really lean to keep up with the constant high tick rate, 30k/s is the highest ive gone so far). The chunk format should have a constant data transmission rate for a certain tick rate (unless we start talking about compression, which is something to discuss down the road). So for 30k tick/s, there's a constant 30 chunks/s, about 61kbps over serial.

That encoding basically breaks all moves down into 8-tick 'lines' or segments. So each 4 bits of the encoding describes a line that's either... +1 steps over 8 ticks, -2 steps over 8 ticks, etc up to +/-7 steps over 8 ticks. Or basically it describes the velocity for that stepper for those 8 ticks. The wave tables are there in that format just for a quick lookup so the device knows how to distribute the... +3 steps over 8 ticks, etc.

Transmitting pure "GRBL style" trapezoids (physics-adjusted movements) is definitely something i considered, in fact I always wondered why there wasnt just an extension to G0 commands that let you do that already. In this case, we could save total machine cycles by limiting the amount of planning math, but the stepper ISR is still just as busy as it was (preventing higher consistent tick rates). Plus with the chunk format, you're able to do more non-trapezoidal work externally, like advanced bed leveling or linear advance, where you need influence at a per-step level (technically per 8 ticks with CH-4x4-128), and not just per-trapezoid. For example, step-daemon does bed leveling per segment (8 ticks) vs per trapezoid like its generally done on the device. One reason I started on all this was because I wanted to do a more 'realistic' simulation of linear advance that would be basically impossible on an 8-bit machine.

The multi-device thing is very interesting, haven't really put much thought into it, but you're definitely right. I think there should be some way for controlling multiple devices and syncing them, which should be very possible with the external planner.

As far as multiple steppers, the formats should have a pretty standard way of modifying them, plus should support a few "standards" that are optimized for various machines. So for CH-4x4-128 (which is 4-bit segment, 4 motors, 128 segments) could be expanded out to be CH-4x8-64 (4-bit segment, 8 motors, 64 segments) or some other variant rather easily. The one I provided is an "optimized 4 motor format for 8-bit control boards". The format, chunk size, wave tables, etc. were all done with 8-bit optimization, 4 motor, and high tick-rate in mind. But there should definitely be some standards for other cases. So for example with Marlin, you would want to compile the firmware with the format that works best for your configuration.

One example I can throw out there is webcams. Webcams support certain color spaces and encodings for their video output. Most webcam host software is generic enough to support most or all of these formats and color spaces. The device itself however is normally very limited, and generally only supports the formats etc that work best with the hardware, while maintaining a set quality and data rate.

So what I basically see happening, is that your host device can probably support any format out of the box, at any time. But the control device itself will probably be limited to just a few formats that the hardware can handle, and depending on desired tick-rate, motors, etc.

EDIT: I also understand this a hard departure from GRBL style trapezoids. GRBL is based around moving in fixed-speed lines, but adjusting the time domain (tick rate) to simulate acceleration. This implementation flips that and keeps a constant tick rate but modifies the segments (essentially velocity) to simulate the same physics. At this point I can confirm the physics are just as accurate either way, while maybe reducing total noise with the high-rate chunk format.

@davidelang
Copy link

Ok, I don't understand what the wave tables are. I thought they were the specific step pattern to be used to drive the steppers for all axis at once. That would require changing the wave table every time the relationship between the different axis changes.

so if your method of syncing the different axis is not to change the wave tables to reflect the different speeds of each axis, how do you sync them?

@thinkyhead
Copy link
Member

At this point Marlin 1.1.x is end-of-life, so this PR will need to be re-done starting from bugfix-2.0.x and targeting that branch. Sorry for the inconvenience!

@thinkyhead thinkyhead closed this Oct 24, 2018
@GMagician
Copy link
Contributor

I'm wondering why this has been lost..it was a good idea helpful for laser systems. I don't need it, but I think Marlin lost a good feature.

This will remind me a famous sentence in Blade-runner:

[cut]. All those moments will be lost in time, like tears in rain. Time to die.

@thinkyhead
Copy link
Member

There are not too many changes here. Bringing them over to 2.0.x should only take a day. The final PR would have to be accompanied by a pre-processor to convert G-code to C-code for various kinematics and planner settings.

As for laser systems, I'm working on code to change PWM DC faster and in sync with the stepper ISR which should help a lot. After that, the laser raster buffer feature of certain other firmwares also needs to be ported over so that we can do raster scaling and mirroring on the fly and other cool laser tricks.

@colinrgodsey
Copy link
Contributor Author

revived this in #17853

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants