Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Support] Still getting pauses on commRefactoring branch #568

Closed
BJClark opened this issue Aug 29, 2014 · 74 comments
Closed

[Support] Still getting pauses on commRefactoring branch #568

BJClark opened this issue Aug 29, 2014 · 74 comments

Comments

@BJClark
Copy link

BJClark commented Aug 29, 2014

  1. What were you doing?
    Printing.
  2. What did you expect to happen?
    Smooth continuos printing.
  3. What happened instead?
    Lots of pausing mid print
  4. Branch & Commit or Version:
    HEAD on commRefactoring
  5. Printer model & used firmware incl. version
    Raspberry Pi to a Printrbot (Plus v2 with a rev D board) original 2013 firmware
  6. Link to octoprint.log on gist.github.com or pastebin.com:
  7. Link to contents of terminal tab on gist.github.com or pastebin.com:
    https://gist.github.com/BJClark/47740fdaf14f6fdca775

Is this maybe another issue not fixed in the commRefactoring branch? Is there anyway to verify in the UI that I'm actually running on the commRefactoring branch? I followed the steps to change branches, but I'm not 100% sure the install worked.

Here is the output (partial, in the middle of a print) with what I'm seeing:

Communication timeout during printing, forcing a line Send: M105 Communication timeout during printing, forcing a line Send: M105 Recv: ok T:199.52 B:49.98 @:92 Send: N49513 G1 X58.827 Y169.757 E11.05820*108 Communication timeout during printing, forcing a line Send: M105 Recv: ok T:199.97 B:49.46 @:84 Send: N49514 G1 X57.929 Y170.031 E11.10783*102 Recv: Error:Line Number is not Last Line Number+1, Last Line:49511

@GitIssueBot
Copy link

Hi @BJClark,

It looks like there is some information missing from your ticket that will be needed in order to process it properly. Please take a look at the Contribution Guidelines and the page How to file a bug report on the project wiki, which will tell you exactly what your ticket has to contain in order to be processable.

If you did not intend to report a bug, please take special note of the title format to use as described in the Contribution Guidelines.

I'm marking this one now as needing some more information. Please understand that if you do not provide that information within the next two weeks (until 2014-09-12 20:00) I'll close this ticket so it doesn't clutter the bug tracker.

Best regards,
~ Your friendly GitIssueBot

PS: I'm just an automated script, not a human being.

@BJClark
Copy link
Author

BJClark commented Aug 30, 2014

To update this, about 6 hours into the print, the printer is barely moving, spending more time paused than actually extrusion.

@foosel
Copy link
Member

foosel commented Aug 30, 2014

  1. You didn't read the guidelines completely
  2. You are not running the commRefactoring branch, the "forcing a line" stuff isn't even in that anymore
  3. In fact you are running some comparatively old something, since 3ee745d isn't even in there, which would make sure those M105s at least don't get sent without a line number (which is the reason for the resend directly after each timeout message)

Please make sure you have removed all old versions of OctoPrint (something might be interfering in your case, causing the issues you are observing), try updating again (git pull && git checkout commRefactoring), verify that the version is indeed a recent one (python setup.py version should show something along the lines of 1.2.0-dev-156-g5ffcd78, that version is now also logged in octoprint.log upon startup).

@BJClark BJClark changed the title Still getting pauses on commRefactoring branch [Support] Still getting pauses on commRefactoring branch Sep 2, 2014
@BJClark
Copy link
Author

BJClark commented Sep 2, 2014

I've verified I'm now running the new commRefactoring branch as of Sunday (Version: 1.2.0-dev-131-g854387b). I am also still getting communication timeouts.

Serial log: https://gist.github.com/BJClark/059aeca207ec8b6d70a2
Octoprint.log https://gist.github.com/BJClark/43556873ac5ded7e3ad2

@foosel
Copy link
Member

foosel commented Sep 3, 2014

Something is still wrong there, again, the "forcing a line" stuff doesn't exist in the commRefactoring branch, so if you are seeing this (it's in the log again), it's the wrong branch.

Are you running OctoPi by any chance? If so, what does ~/oprint/bin/octoprint --version show you? Otherwise: How did you install OctoPrint and how do you start it?

Additionally, it looks like your problem can be solved by just increasing the communication timeout. Take a look into the Settings, you can change it directly on the Serial page for anything that is not commRefactoring. For the commRefactoring branch you need to edit ~/.octoprint/config.yaml, example:

communication:
  transportOptions:
    timeout:
      communication: 10
      temperature: 10

Please keep in mind that I'm still working on the new branch, although right now I have to do some higher prioritized stuff first, the settings layout might still change a bit.

@BJClark
Copy link
Author

BJClark commented Sep 3, 2014

Hmmm..

$ ~/oprint/bin/octoprint --version
Traceback (most recent call last):
File "/home/pi/oprint/bin/octoprint", line 5, in
from pkg_resources import load_entry_point
File "/home/pi/oprint/local/lib/python2.7/site-packages/distribute-0.6.24-py2.7.egg/pkg_resources.py", line 2711, in
parse_requirements(requires), Environment()
File "/home/pi/oprint/local/lib/python2.7/site-packages/distribute-0.6.24-py2.7.egg/pkg_resources.py", line 584, in resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: OctoPrint==1.0.0

So it looks like the setup script isn't working? How is the stuff in ~/Octoprint/ supposed to land in ~/oprint?

Here's the log of running the setup script: https://gist.github.com/BJClark/819c6538991c8d743f32

Also, why does the interface say one thing while the ~/oprint says something else?

@foosel
Copy link
Member

foosel commented Sep 3, 2014

pi@octopi ~/OctoPrint $ sudo python setup.py install

There's your mistake, compare that to the instructions for updating the installation on OctoPi. You just installed OctoPrint globally, you need to install it into the virtualenv though. The init script on OctoPi is using ~/oprint/bin/octoprint and thus you were running a vanilla master branch the whole time.

Also the octoprint executable in the oprint virtual event apparently doesn't find its installation anymore. Suggestions: First deinstall OctoPrint globally:

sudo pip uninstall OctoPrint

Then reinstall following the guide linked above into the oprint virtualenv (please repull, I just pushed a bunch of updates from the devel branch to keep stuff in sync):

cd
cd ~/OctoPrint/
git pull
git checkout commRefactoring
~/oprint/bin/python setup.py install

After that a

~/oprint/bin/octoprint --version

should hopefully work and print something along the lines of OctoPrint version 1.2.0-dev-198-g48d6a27

Depending on your OctoPi version, you'll either have to reboot to get the daemon running on the new version (OctoPi < 0.9.0 released in June) or issue a sudo service octoprint restart (OctoPi >= 0.9.0 released in June)

@BJClark
Copy link
Author

BJClark commented Sep 4, 2014

I was able to finally get switched to the commRefactoring branch, but I was unable to test it because the ability to start a print seems to be broken. Clicking the "Print" button gave no response, and logged nothing.

I'm currently running on the devel branch and have updated the timeouts to 10 as suggested, but still get significant pausing towards the end of prints.

Someone on IRC suggested watching TOP on the raspberry pi to see if there was an issue with overloading the Pi, but there doesn't seem to be. Load never goes above .8 and it's not using any swap.

$ free -m
total used free shared buffers cached
Mem: 373 211 162 0 35 114
-/+ buffers/cache: 61 312
Swap: 99 0 99

@foosel
Copy link
Member

foosel commented Sep 4, 2014

Messaged you on IRC, but apparently that didn't reach you...

Anyways, are there any errors in the JS console? Have you tried clearing your browser cache?

@BJClark
Copy link
Author

BJClark commented Sep 4, 2014

These are the JS warnings I'm getting:

[Log] Drawing layer 0 from 0 to 2615 (current: true) (renderer.js, line 345)
[Error] Failed to load resource: the server responded with a status of 404 (Not Found) (None.js, line 0)
[Error] Failed to load resource: the server responded with a status of 404 (Not Found) (jquery.min.map, line 0)
[Error] Failed to load resource: the server responded with a status of 404 (Not Found) (underscore-min.map, line 0)
[Log] Drawing layer 0 from 0 to 2615 (current: true) (renderer.js, line 345)

[Error] ReferenceError: Can't find variable: printed
    _onModelLoaded (gcode.js, line 335)
    processMessage (ui.js, line 65)

@BJClark
Copy link
Author

BJClark commented Sep 15, 2014

Just wanted to check in and see if there's been any progress made on this branch or if there's more I can do to debug this.

@foosel
Copy link
Member

foosel commented Sep 15, 2014

Argh, sorry...

The ReferenceError above should be solved (I fixed this in devel and just now merged the changes). Can you try if this solves your Print button issue?

@BJClark
Copy link
Author

BJClark commented Sep 15, 2014

I didn't seem to fix my print button issues. I no longer get those errors, in fact I don't get any errors, but the print button still doesn't seem to do anything while everything else seems to work fine.

I do see a request returning (in what I think is a happy state) but it doesn't cause the file to go to print.

job
/api
1 requests ❘ 340 B transferred
HeadersPreviewResponseCookiesTiming
Remote Address:10.1.10.63:80
Request URL:http://octopi.local/api/job
Request Method:POST
Status Code:204 NO CONTENT
Request Headersview source
Accept:application/json, text/javascript, */*; q=0.01
Accept-Encoding:gzip,deflate
Accept-Language:en-US,en;q=0.8
Cache-Control:no-cache
Connection:keep-alive
Content-Length:19
Content-Type:application/json; charset=UTF-8
Cookie:session="xxxx"
Host:octopi.local
Origin:http://octopi.local
Pragma:no-cache
Referer:http://octopi.local/
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36
X-Api-Key:xxxxx
X-Requested-With:XMLHttpRequest
Request Payloadview source
{command:start}
command: "start"
Response Headersview source
Content-Length:0
Content-Type:text/html; charset=utf-8
Server:TornadoServer/4.0.1

@foosel
Copy link
Member

foosel commented Sep 15, 2014

I really don't get it... I just tried it again, it works just fine for me. Only idea I have left is: do you have access control enabled or disabled?

For me it's enabled, I connected to the (virtual, don't have the real one at hand right now) printer, uploaded a file, selected it, clicked print, commands were sent just fine, Print button changed state, everything :/

@BJClark
Copy link
Author

BJClark commented Sep 15, 2014

Unfortunately, the logging.yaml file you gave me over IRC causes the server to not start up properly. Not sure what the correct format is.

I do have access control enabled.

I don't think the issue is actually in the javascript since I'm seeing the correct 204 request come back when I click print. It seems like the printer isn't starting the print on the backend after it receives the command. I'm not sure how to debug that since I can't seem to get logging to work properly.

@foosel
Copy link
Member

foosel commented Sep 16, 2014

I'm sorry, I made a typo in the original version, try the logging.yaml again please

@BJClark
Copy link
Author

BJClark commented Sep 16, 2014

I was able to get debugging logs, and there didn't seem to be much in there except for a couple things.

  1. Clicking the print button didn't cause any log messages, but I did get messages for every other action and I could still see the web requests going through and an HTTP 204 status come back.
  2. I see lots of log entries like this:
    2014-09-16 20:14:53,626 - octoprint.comm.protocol - WARNING - Printer requested line 981 but no sufficient history is available, can't resend

Another data point to consider is that Dattas on IRC also has a Printrbot printer, is experiencing the pausing issue, and switched to the commRefactoring branch. "Print" also doesn't work for them. I'm wondering if this isn't some kind of issue with the printrbot and this branch?

@BJClark
Copy link
Author

BJClark commented Sep 16, 2014

Here's my octopi.log: https://gist.github.com/BJClark/01f376d7605cfced28e7
Here's serial.log for the same time period: https://gist.github.com/BJClark/174b5001fae95699c8c9

@dattas
Copy link
Contributor

dattas commented Sep 17, 2014

The problem that I've found so far is the printer is stuck in connected mode and never actually makes it to operational
https://github.com/foosel/OctoPrint/blob/commRefactoring/src/octoprint/printer.py on line 413 is where it is falling through and just blindly returning because it is not in the operational state.
When I add this to before line 412:

logging.info("self._protocol.get_state: %s"%self._protocol.get_state())

I see this output:

2014-09-16 23:47:42,601 - root - INFO - self._protocol.get_state: Connected

When I throw this into src/octoprint/comm/protocol/__init__.py after line 184

self._logger.info("Changing protocol state from {0} to {1}".format(oldState,self._state))

I see this output:

2014-09-17 00:04:28,684 - octoprint.comm.protocol - INFO - Changing protocol state from Offline to Offline
2014-09-17 00:04:28,691 - octoprint.comm.protocol - INFO - Changing protocol state from Offline to Connecting
2014-09-17 00:04:28,893 - octoprint.comm.protocol - INFO - Changing protocol state from Connecting to Connected

Showing that we never actually get to a Operational state. I'll see if I can do some more to help out as to why we never get to this operational state.

@foosel
Copy link
Member

foosel commented Sep 17, 2014

Do you both have feature.alwaysSendChecksum enabled in your settings by chance?

@BJClark
Copy link
Author

BJClark commented Sep 17, 2014

I have tried it with "Send a checksum with every command" turned on and off. I think mine was on to being with. Doesn't change anything.

@BJClark
Copy link
Author

BJClark commented Sep 18, 2014

@foosel @dattas

Thanks again for getting the "print" bug worked out.

We were able to make 2 really really nice prints yesterday. However today, we're experiencing lots of problems with octoprint losing the connection with the printer mid print.

https://gist.github.com/BJClark/1d038a5ee37f7ca260be

I did pull the latest code Version: 1.2.0-dev-245-g36c8f25 (commRefactoring branch), but it's still happening there.

img_4202

Feels like we're so close! The prints are coming out so much better (when they finish).

@BJClark
Copy link
Author

BJClark commented Oct 3, 2014

@foosel Just wanted to check in and see if there was anything else I could do to help debug this issue.

@foosel
Copy link
Member

foosel commented Oct 6, 2014

@BJClark sorry, currently deep diving into slicing. It looks like something is still going wrong with the timeouts there (it should wait longer than 1s before "giving up after 20 retries"), I think I already mentioned that in the IRC channel a while back too while we were trying to debug your issue.

I simply can't take a proper look right now though since i first have to finish something that colleagues are waiting for.

@presslab-us
Copy link

I have a R-Pi B+ and trying to improve the speed by optimizing the code.

By playing around with the commRefactoring branch I've managed to utilize the 127 byte input buffer on the motherboard (as opposed to the ping-pong it is now). I've also done a few other things with the send queue. So far I'm seeing up to 3x speed improvement. There's more testing to be done, but I'll push my changes to my github in the next few days if anyone wants to give it a shot.

@brandonjank
Copy link

@presslab-us that would be awesome. Managed to get octoprint on the intel edison working. Would be awesome to see if the single core 700 can beat the dual 500.

@BJClark
Copy link
Author

BJClark commented Oct 31, 2014

I'm showing
Version: 1.2.0-dev-320-gd0ff6d1 (commRefactoring-rxcache branch) in the UI. Is that not the latest?

@BJClark
Copy link
Author

BJClark commented Oct 31, 2014

Seems to be working slightly better than in the video but still lots of errors.
https://gist.github.com/BJClark/e516c88c3728bd82cd36

@foosel
Copy link
Member

foosel commented Oct 31, 2014

That looks like the latest. Sorry then, I just fixed an issue where the temperature was not updated correctly for Marlin and I thought that and your statement that the temp readings where not working correctly correlated ;)

@foosel
Copy link
Member

foosel commented Oct 31, 2014

Hm... It would be helpful if the start of the problem was in there too. Probably it rolled over just at the wrong moment again. I'd expect there to be some long parts where everything is fine, and then suddenly everything goes wrong and resends start piling up. Or is it like this right from the very very beginning?

Btw, for comparison with what I'm seeing: https://www.youtube.com/watch?v=Rc2z53iJuQc

@foosel
Copy link
Member

foosel commented Oct 31, 2014

(Triple post, sorry) I just finished printing that one:

So far that model with all it's curves and retracts and what not was something that regularly slowed down to a crawl. Now, ignore for a second that he looks quite blobby and ribbed, I printed too hot which is quite visible right at the top and I still have some stupid z-ribbing issue with that particular printer (also I probably should re-tram the bed and tighten the spectra lines, I've been lugging that thing around on travels for quite some time now without any calibration at all). It finished and it didn't do any weird slow downs in the middle.

@presslab-us
Copy link

@BJClark It looks like when it sends multiple commands that the printer reports the wrong line number. Can you check your firmware to see how large the rx cache buffer is? Try setting the buffer size to 63 instead of 127 in config.yaml.

@foosel
Copy link
Member

foosel commented Nov 1, 2014

@BJClark it also looks like it's resending repetions of the same lines over and over again, but with different line numbers, example:

Send: N3192 G1 X87.341 Y87.790 E3.34770*87
Send: N3193 G1 X89.555 Y86.143 E3.39756*91
Recv: Resend: 3191
Recv: ok
Recv: Error:Line Number is not Last Line Number+1, Last Line: 3190
Send: N3191 G1 X87.341 Y87.790 E3.34770*84
Send: N3192 G1 X87.341 Y87.790 E3.34770*87

It appears to jump into the wrong location in the line history for the resends and jumble up things all together.

I saw this a couple of times yesterday before I added a couple of my changes to the branches, but those should already be part of your version (1.2.0-dev-320-gd0ff6d1). Now the only idea I have left right now (besides the suggestion from @presslab-us, but I think that still won't solve the doubles lines and wrong resends) is to stop OctoPrint, do a pip uninstall OctoPrint, then reinstall with python setup.py install (or ~/oprint/bin/pip uninstall OctoPrint and ~/oprint/bin/python setup.py install if you're running OctoPi) to make sure that are now left overs interfering (although I wouldn't know how there could be left overs for the comm layer but not for the version module...) and finally restart.

If that doesn't help we'll have to dive deeper into the communication but for that we'll need a full serial.log. In order to make sure that this gets not truncated again in the middle create a file ~/.octoprint/logging.yaml (or - if it already exists - extend it) with these contents:

handlers:
  serialFile:
    maxBytes: 104857600 # 100 MB

and then restart. This should set the log file's maximum size to 100MB, which hopefully will be enough (default is 2MB). If you have the hard disk space for this and want to really sure it's not going to be truncated you can of course add another 0 to make it 1000MB.

@presslab-us
Copy link

@foosel Looks like _last_line.append() is not synchronous with the _current_line += 1 increment, they are in different threads. Probably best to move the append to where the increment is, and wrap all access with _line_lock.

@foosel
Copy link
Member

foosel commented Nov 1, 2014

@presslab-us Meh. The increment happens in the preprocessing method (which you moved outside of the send loop :P), the appending must happen after the line was actually sent (otherwise everything becomes even more mind boggling).

Maybe kill two birds with one stone and remove the _last_lines buffer altogether and substitute it with an extended _nack_lines which not only contains the size but also the line number the command was sent as and the whole preprocessed command (and maybe additional information like a flag along the lines of "part of the print progress" so we can increment the progress there and not when enqueueing)? I'll look into that (already got something worked out on the whiteboard actually), but since it's already past 11pm over here I'll probably not whip up something functional today.

PS: I feel stupid for asking this but -- how does one write "enqueueing" properly?

@foosel
Copy link
Member

foosel commented Nov 1, 2014

I also just realized that moving the preprocessing outside of the sending queue also broke high prioritized commands (because they'll now be sent first but might have a higher line-number). Argh. @presslab-us I know you are arguing against rolling the preprocessing back into the sending queue due to performance reasons, but I think we need to think some more before that really can work with the whole feature set. E.g. I want/need to be able to manually emit an M112 (Emergency stop) right now and not wait until the current queue is processed (admittedly manual commands from the user right now don't get sent with high prio, but that's just not implemented yet).

@presslab-us
Copy link

@foosel The way Repetier-Host handles e-stop is to toggle the DTR line causing a reset of the motherboard.

But yes if changing the order of commands on the fly is needed then the line numbers will need to change as well as the checksum. Might as well move everything back to the _handle_send_queue thread.

@BJClark
Copy link
Author

BJClark commented Nov 4, 2014

I have a print going right now and so far it's looking really really nice. I assume it's setting the cache size to 63, but I'm not sure. I'm not seeing any resends in the logs either.

@foosel
Copy link
Member

foosel commented Nov 4, 2014

Great! I'll need some help with test prints once I find and remove the dead
lock I somehow produced yesterday (which I'll hopefully do today).

@foosel
Copy link
Member

foosel commented Nov 5, 2014

I just pushed a big bunch of changes to the commRefactoring-rxcache branch. Resends are now handled completely via the nack_lines and some logic/magic, line numbers and checksums are created only within the send loop, gcodes can be processed in three phases: when added to the send queue, when sent over the transport and when acknowledged. I moved the existing handling of gcodes where necessary for the various purposes they fulfill.

I'm still experiencing short stalls from time to time when actually printing the dragon above (many curves and sudden direction changes, so a lot of traffic on the serial line). I've set the rx buffer size to 63bytes, since apparently on the Printrboard (AT90USB based) the HardwareSerial module from Arduino is used within Marlin which defaults to a rx buffer size of 64bytes (-1 as a safety margin).

I think when it stalls, it does this because the write to serial is actually blocking. I'm not entirely sure how that can happen, since the buffer should never be completely full and hence there should be no blocking. If any of you could try to reproduce this, it would be greatly appreciated. I'm starting to go crazy over here due to this, I can't pinpoint why it's stalling...

foosel added a commit that referenced this issue Nov 6, 2014
@foosel
Copy link
Member

foosel commented Nov 6, 2014

Found the reason for the stalling.

For whom it may be interesting, the reason was clearing the clear_to_send flag when a command couldn't be sent due to not fitting the buffer any more, because that happened asynchronously and this way oks that came in between would be swallowed. I've now turned that flag into something like an inverse Semaphore (and fixed a couple of other problems I encountered while hunting for that issue), would be happy about any reports on how the current state works for others (for me it was a full out success, no more stalling, instantly visible in the print quality too).

@presslab-us
Copy link

@foosel I tested your latest out and it seems to be working fine. I did do a performance test on a particular test piece (lots of tight turns, small moves) and on your branch it took 62 seconds to get to 1.9 mm height. I tested my old branch and it took 44 seconds. Both have checksum disabled, with checksum enabled your branch was 71 seconds (I ran this first before adding the checksum option). I know mine does not support the priority feature, so maybe that's just how it's going to be. Do you think your code could be further optimized?

@foosel
Copy link
Member

foosel commented Nov 6, 2014

@presslab-us My guess is that it's not actually the priority queueing but rather the pre/postprocessing. So maybe there's optimization potential there to make it a bit faster (I'm actually not sure how fast reflection in python is, so maybe something could be done there with some kind of lookup table created in __init__). Might be worth a try. But not today. Nearly 12h straight coding now, I really need to take a break ;)

@presslab-us
Copy link

@foosel Your and my version both use the same reflection code, right? So I would think it is something else that causes yours to take 40% longer. It could be a latency issue but the CPU usage is maxed out indicating that it's not a blocking problem (I/O wait) but something else. And because it doesn't seem to be a latency problem I don't think the slowdown is necessarily related to moving the preprocessing into the send thread.

I played with using a regex for the OK response and I saw a 6% performance boost. But that still leaves 34%...

@foosel
Copy link
Member

foosel commented Nov 6, 2014

But didn't your code only do the preprocessing before entering the send
loop? Mine also does one of the three phases within the send loop.

@presslab-us
Copy link

@foosel Yes but the overall CPU usage should be the same when printing; it's processing the same g-codes. My version only sought to reduce latency (and subsequent rxcache underflow) by having the two threads.

@foosel
Copy link
Member

foosel commented Nov 7, 2014

@presslab-us please pull. The CPU usage was a good hint (I hadn't taken a look at top yet), that was caused by clear_for_send being decremented at the wrong location and thus causing an unthrottled looping in the send loop while the buffer was full (of course, trying to fix this I reintroduced the stalling issue, this time I at least knew where to look and it should already be ironed out again).

I also took the liberty to change the fill queue handler in such a way that it now uses a semaphores limited to 20 acquires and an event for signaling state changes to the interesting state. This way the thread can just wait until notified that the current situation has changed for it instead of doing "if - else - sleep - loop" continuously.

I'm currently seeing CPU usage of 4-5% on the Pi while printing with these changes.

@presslab-us
Copy link

@foosel I pulled it, it's maybe 1% faster than it was before. In my case the buffer is probably hardly full because I'm maxing out the CPU. You need to test it with more of a load. To generate a tricky gcode file, I use Cura and I slice this: http://www.thingiverse.com/thing:99338. My perimiter speeds are 100mm/s. I disable the heaters and extruder. This generates a ton of small moves which easily consumes all the CPU of my Pi. Then I can measure the time it takes, and this gives a relative indication of performance.

@foosel
Copy link
Member

foosel commented Nov 8, 2014

@presslab-us I am testing it with a gcode file that previously maxed out the existing code (pretty much not printable with devel and master). Are you measuring this against the real printer (just with the movements) or against the virtual printer? I assume the former, I just want to check. That you are still seeing a maxed out CPU on the Pi doesn't fit my own observations (after the most recent changes that is) at all, which is why I'm a bit puzzled.

@presslab-us
Copy link

@foosel Yes I'm sure your new version is faster than devel, but my whole point is why is my version faster than yours? I am testing with my real printer, yes, with a RAMBo board. I'm purposefully giving it a difficult g-code file to max out the CPU so I can measure a difference in speed.

Not sure what the best way to say this is, but we have different priorities for how we want the software to operate. I prioritize speed and quality over (arguably unnecessary) features. Of course it's your project so you can do it any way you want! Thanks for considering my modifications in any case.

@foosel
Copy link
Member

foosel commented Nov 8, 2014

@presslab-us No worries, I got your point, I was just surprised by your results. I sliced the file linked, 0.2mm layer height, no heat up, 100mm/s perimeter speed, removed heatups and extrusions and limited to 5cm height. I "printed" both via a Pi on a Printrbot Simple (which was capped speed wise at under 100mm/s I think) and an Ultimaker. I did not see that max out the CPU. Printing to the Printrbot I got up to just under 50%, for the Ultimaker I maxed out at 70%. So unless I misunderstand you, you are seeing higher CPU loads than me, and that troubles me.

Now, as for how to proceed here, you see it correctly that we have somewhat different priorities. I need to build something here that fulfills certain expectations from the user base - working error correction, reaction to commands sent mid print without too much latency, monitoring, etc (I built on your branch to reintroduce these things, because in your branch a lot of these are broken, e.g. resends, priority queue handling, etc, plus I'm fairly sure you also got that stalling issue in there - yes, I know, you don't use these things really, but others do and I have to think of a wider audience here). I also think that while processing speed is an important factor in host software, there's a limit as to how much one should focus there as long as other options/lower hanging fruit aren't already utilized (like e.g. optimized slicing utilizing G2/G3 or possibly spline interpolation which I saw someone working on, don't know how far that has gone). And there's always the option of attaching a 15€ SD card reader to your board and streaming directly from that (which gets also rid of that horrible serial protocol and the 250k baud bottleneck altogether while still allowing tight monitoring of what's happening with your print).

That being said, if you have further ideas where to optimize, given that basically stuff that works today has to continue to work, I'm all ears, and I'll also merge my version of the -rxcache branch (including your baseline) on the "real" commRefactoring branch once I've taken another look at some issues that are still in there (like peek & popleft in the send queue processing running unsynchronized and hence a command that got added to the queue with high priority could be mistakenly removed from the queue instead of the sent one if the stars are just right). But for now I will not (and honestly can not) spend more time on trying to optimize for the rare (although valid) examples where the code is still too slow (on a Pi) - at least not for now.

Now, the new communication layer - as you might have noticed - is modular, and my next step in getting this stuff ready to be merged onto devel is to get protocol and transport to use properly configurable. So - no problem really for you to keep using your version, if that's what works best for you! My plan is to make protocol and transport implementations part of the plugin system for exactly these kinds of reasons.

@foosel
Copy link
Member

foosel commented Jan 23, 2015

There are more than enough tickets already that are related to the communication stuff, closing this one.

@foosel foosel closed this as completed Jan 23, 2015
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants