New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Support] Still getting pauses on commRefactoring branch #568
Comments
Hi @BJClark, It looks like there is some information missing from your ticket that will be needed in order to process it properly. Please take a look at the Contribution Guidelines and the page How to file a bug report on the project wiki, which will tell you exactly what your ticket has to contain in order to be processable. If you did not intend to report a bug, please take special note of the title format to use as described in the Contribution Guidelines. I'm marking this one now as needing some more information. Please understand that if you do not provide that information within the next two weeks (until 2014-09-12 20:00) I'll close this ticket so it doesn't clutter the bug tracker. Best regards, PS: I'm just an automated script, not a human being. |
To update this, about 6 hours into the print, the printer is barely moving, spending more time paused than actually extrusion. |
Please make sure you have removed all old versions of OctoPrint (something might be interfering in your case, causing the issues you are observing), try updating again ( |
I've verified I'm now running the new commRefactoring branch as of Sunday (Version: 1.2.0-dev-131-g854387b). I am also still getting communication timeouts. Serial log: https://gist.github.com/BJClark/059aeca207ec8b6d70a2 |
Something is still wrong there, again, the "forcing a line" stuff doesn't exist in the Are you running OctoPi by any chance? If so, what does Additionally, it looks like your problem can be solved by just increasing the communication timeout. Take a look into the Settings, you can change it directly on the Serial page for anything that is not
Please keep in mind that I'm still working on the new branch, although right now I have to do some higher prioritized stuff first, the settings layout might still change a bit. |
Hmmm.. $ ~/oprint/bin/octoprint --version So it looks like the setup script isn't working? How is the stuff in ~/Octoprint/ supposed to land in ~/oprint? Here's the log of running the setup script: https://gist.github.com/BJClark/819c6538991c8d743f32 Also, why does the interface say one thing while the ~/oprint says something else? |
There's your mistake, compare that to the instructions for updating the installation on OctoPi. You just installed OctoPrint globally, you need to install it into the virtualenv though. The init script on OctoPi is using Also the
Then reinstall following the guide linked above into the
After that a
should hopefully work and print something along the lines of Depending on your OctoPi version, you'll either have to reboot to get the daemon running on the new version (OctoPi < 0.9.0 released in June) or issue a |
I was able to finally get switched to the commRefactoring branch, but I was unable to test it because the ability to start a print seems to be broken. Clicking the "Print" button gave no response, and logged nothing. I'm currently running on the devel branch and have updated the timeouts to 10 as suggested, but still get significant pausing towards the end of prints. Someone on IRC suggested watching TOP on the raspberry pi to see if there was an issue with overloading the Pi, but there doesn't seem to be. Load never goes above .8 and it's not using any swap. $ free -m |
Messaged you on IRC, but apparently that didn't reach you... Anyways, are there any errors in the JS console? Have you tried clearing your browser cache? |
These are the JS warnings I'm getting:
|
Just wanted to check in and see if there's been any progress made on this branch or if there's more I can do to debug this. |
Argh, sorry... The ReferenceError above should be solved (I fixed this in |
I didn't seem to fix my print button issues. I no longer get those errors, in fact I don't get any errors, but the print button still doesn't seem to do anything while everything else seems to work fine. I do see a request returning (in what I think is a happy state) but it doesn't cause the file to go to print.
|
I really don't get it... I just tried it again, it works just fine for me. Only idea I have left is: do you have access control enabled or disabled? For me it's enabled, I connected to the (virtual, don't have the real one at hand right now) printer, uploaded a file, selected it, clicked print, commands were sent just fine, Print button changed state, everything :/ |
Unfortunately, the logging.yaml file you gave me over IRC causes the server to not start up properly. Not sure what the correct format is. I do have access control enabled. I don't think the issue is actually in the javascript since I'm seeing the correct 204 request come back when I click print. It seems like the printer isn't starting the print on the backend after it receives the command. I'm not sure how to debug that since I can't seem to get logging to work properly. |
I'm sorry, I made a typo in the original version, try the logging.yaml again please |
I was able to get debugging logs, and there didn't seem to be much in there except for a couple things.
Another data point to consider is that Dattas on IRC also has a Printrbot printer, is experiencing the pausing issue, and switched to the commRefactoring branch. "Print" also doesn't work for them. I'm wondering if this isn't some kind of issue with the printrbot and this branch? |
Here's my octopi.log: https://gist.github.com/BJClark/01f376d7605cfced28e7 |
The problem that I've found so far is the printer is stuck in connected mode and never actually makes it to operational logging.info("self._protocol.get_state: %s"%self._protocol.get_state()) I see this output:
When I throw this into self._logger.info("Changing protocol state from {0} to {1}".format(oldState,self._state)) I see this output:
Showing that we never actually get to a Operational state. I'll see if I can do some more to help out as to why we never get to this operational state. |
Do you both have |
I have tried it with "Send a checksum with every command" turned on and off. I think mine was on to being with. Doesn't change anything. |
Thanks again for getting the "print" bug worked out. We were able to make 2 really really nice prints yesterday. However today, we're experiencing lots of problems with octoprint losing the connection with the printer mid print. https://gist.github.com/BJClark/1d038a5ee37f7ca260be I did pull the latest code Version: 1.2.0-dev-245-g36c8f25 (commRefactoring branch), but it's still happening there. Feels like we're so close! The prints are coming out so much better (when they finish). |
@foosel Just wanted to check in and see if there was anything else I could do to help debug this issue. |
@BJClark sorry, currently deep diving into slicing. It looks like something is still going wrong with the timeouts there (it should wait longer than 1s before "giving up after 20 retries"), I think I already mentioned that in the IRC channel a while back too while we were trying to debug your issue. I simply can't take a proper look right now though since i first have to finish something that colleagues are waiting for. |
I have a R-Pi B+ and trying to improve the speed by optimizing the code. By playing around with the commRefactoring branch I've managed to utilize the 127 byte input buffer on the motherboard (as opposed to the ping-pong it is now). I've also done a few other things with the send queue. So far I'm seeing up to 3x speed improvement. There's more testing to be done, but I'll push my changes to my github in the next few days if anyone wants to give it a shot. |
@presslab-us that would be awesome. Managed to get octoprint on the intel edison working. Would be awesome to see if the single core 700 can beat the dual 500. |
I'm showing |
Seems to be working slightly better than in the video but still lots of errors. |
That looks like the latest. Sorry then, I just fixed an issue where the temperature was not updated correctly for Marlin and I thought that and your statement that the temp readings where not working correctly correlated ;) |
Hm... It would be helpful if the start of the problem was in there too. Probably it rolled over just at the wrong moment again. I'd expect there to be some long parts where everything is fine, and then suddenly everything goes wrong and resends start piling up. Or is it like this right from the very very beginning? Btw, for comparison with what I'm seeing: https://www.youtube.com/watch?v=Rc2z53iJuQc |
(Triple post, sorry) I just finished printing that one: So far that model with all it's curves and retracts and what not was something that regularly slowed down to a crawl. Now, ignore for a second that he looks quite blobby and ribbed, I printed too hot which is quite visible right at the top and I still have some stupid z-ribbing issue with that particular printer (also I probably should re-tram the bed and tighten the spectra lines, I've been lugging that thing around on travels for quite some time now without any calibration at all). It finished and it didn't do any weird slow downs in the middle. |
@BJClark It looks like when it sends multiple commands that the printer reports the wrong line number. Can you check your firmware to see how large the rx cache buffer is? Try setting the buffer size to 63 instead of 127 in |
@BJClark it also looks like it's resending repetions of the same lines over and over again, but with different line numbers, example:
It appears to jump into the wrong location in the line history for the resends and jumble up things all together. I saw this a couple of times yesterday before I added a couple of my changes to the branches, but those should already be part of your version (1.2.0-dev-320-gd0ff6d1). Now the only idea I have left right now (besides the suggestion from @presslab-us, but I think that still won't solve the doubles lines and wrong resends) is to stop OctoPrint, do a If that doesn't help we'll have to dive deeper into the communication but for that we'll need a full
and then restart. This should set the log file's maximum size to 100MB, which hopefully will be enough (default is 2MB). If you have the hard disk space for this and want to really sure it's not going to be truncated you can of course add another 0 to make it 1000MB. |
@foosel Looks like |
@presslab-us Meh. The increment happens in the preprocessing method (which you moved outside of the send loop :P), the appending must happen after the line was actually sent (otherwise everything becomes even more mind boggling). Maybe kill two birds with one stone and remove the PS: I feel stupid for asking this but -- how does one write "enqueueing" properly? |
I also just realized that moving the preprocessing outside of the sending queue also broke high prioritized commands (because they'll now be sent first but might have a higher line-number). Argh. @presslab-us I know you are arguing against rolling the preprocessing back into the sending queue due to performance reasons, but I think we need to think some more before that really can work with the whole feature set. E.g. I want/need to be able to manually emit an |
@foosel The way Repetier-Host handles e-stop is to toggle the DTR line causing a reset of the motherboard. But yes if changing the order of commands on the fly is needed then the line numbers will need to change as well as the checksum. Might as well move everything back to the |
I have a print going right now and so far it's looking really really nice. I assume it's setting the cache size to 63, but I'm not sure. I'm not seeing any resends in the logs either. |
Great! I'll need some help with test prints once I find and remove the dead |
I just pushed a big bunch of changes to the I'm still experiencing short stalls from time to time when actually printing the dragon above (many curves and sudden direction changes, so a lot of traffic on the serial line). I've set the rx buffer size to 63bytes, since apparently on the Printrboard (AT90USB based) the HardwareSerial module from Arduino is used within Marlin which defaults to a rx buffer size of 64bytes (-1 as a safety margin). I think when it stalls, it does this because the write to serial is actually blocking. I'm not entirely sure how that can happen, since the buffer should never be completely full and hence there should be no blocking. If any of you could try to reproduce this, it would be greatly appreciated. I'm starting to go crazy over here due to this, I can't pinpoint why it's stalling... |
…ture reporting and a couple other things See #568
Found the reason for the stalling. For whom it may be interesting, the reason was clearing the |
@foosel I tested your latest out and it seems to be working fine. I did do a performance test on a particular test piece (lots of tight turns, small moves) and on your branch it took 62 seconds to get to 1.9 mm height. I tested my old branch and it took 44 seconds. Both have checksum disabled, with checksum enabled your branch was 71 seconds (I ran this first before adding the |
@presslab-us My guess is that it's not actually the priority queueing but rather the pre/postprocessing. So maybe there's optimization potential there to make it a bit faster (I'm actually not sure how fast reflection in python is, so maybe something could be done there with some kind of lookup table created in |
@foosel Your and my version both use the same reflection code, right? So I would think it is something else that causes yours to take 40% longer. It could be a latency issue but the CPU usage is maxed out indicating that it's not a blocking problem (I/O wait) but something else. And because it doesn't seem to be a latency problem I don't think the slowdown is necessarily related to moving the preprocessing into the send thread. I played with using a regex for the OK response and I saw a 6% performance boost. But that still leaves 34%... |
But didn't your code only do the preprocessing before entering the send |
@foosel Yes but the overall CPU usage should be the same when printing; it's processing the same g-codes. My version only sought to reduce latency (and subsequent rxcache underflow) by having the two threads. |
@presslab-us please pull. The CPU usage was a good hint (I hadn't taken a look at top yet), that was caused by I also took the liberty to change the fill queue handler in such a way that it now uses a semaphores limited to 20 acquires and an event for signaling state changes to the interesting state. This way the thread can just wait until notified that the current situation has changed for it instead of doing "if - else - sleep - loop" continuously. I'm currently seeing CPU usage of 4-5% on the Pi while printing with these changes. |
@foosel I pulled it, it's maybe 1% faster than it was before. In my case the buffer is probably hardly full because I'm maxing out the CPU. You need to test it with more of a load. To generate a tricky gcode file, I use Cura and I slice this: http://www.thingiverse.com/thing:99338. My perimiter speeds are 100mm/s. I disable the heaters and extruder. This generates a ton of small moves which easily consumes all the CPU of my Pi. Then I can measure the time it takes, and this gives a relative indication of performance. |
@presslab-us I am testing it with a gcode file that previously maxed out the existing code (pretty much not printable with devel and master). Are you measuring this against the real printer (just with the movements) or against the virtual printer? I assume the former, I just want to check. That you are still seeing a maxed out CPU on the Pi doesn't fit my own observations (after the most recent changes that is) at all, which is why I'm a bit puzzled. |
@foosel Yes I'm sure your new version is faster than devel, but my whole point is why is my version faster than yours? I am testing with my real printer, yes, with a RAMBo board. I'm purposefully giving it a difficult g-code file to max out the CPU so I can measure a difference in speed. Not sure what the best way to say this is, but we have different priorities for how we want the software to operate. I prioritize speed and quality over (arguably unnecessary) features. Of course it's your project so you can do it any way you want! Thanks for considering my modifications in any case. |
@presslab-us No worries, I got your point, I was just surprised by your results. I sliced the file linked, 0.2mm layer height, no heat up, 100mm/s perimeter speed, removed heatups and extrusions and limited to 5cm height. I "printed" both via a Pi on a Printrbot Simple (which was capped speed wise at under 100mm/s I think) and an Ultimaker. I did not see that max out the CPU. Printing to the Printrbot I got up to just under 50%, for the Ultimaker I maxed out at 70%. So unless I misunderstand you, you are seeing higher CPU loads than me, and that troubles me. Now, as for how to proceed here, you see it correctly that we have somewhat different priorities. I need to build something here that fulfills certain expectations from the user base - working error correction, reaction to commands sent mid print without too much latency, monitoring, etc (I built on your branch to reintroduce these things, because in your branch a lot of these are broken, e.g. resends, priority queue handling, etc, plus I'm fairly sure you also got that stalling issue in there - yes, I know, you don't use these things really, but others do and I have to think of a wider audience here). I also think that while processing speed is an important factor in host software, there's a limit as to how much one should focus there as long as other options/lower hanging fruit aren't already utilized (like e.g. optimized slicing utilizing G2/G3 or possibly spline interpolation which I saw someone working on, don't know how far that has gone). And there's always the option of attaching a 15€ SD card reader to your board and streaming directly from that (which gets also rid of that horrible serial protocol and the 250k baud bottleneck altogether while still allowing tight monitoring of what's happening with your print). That being said, if you have further ideas where to optimize, given that basically stuff that works today has to continue to work, I'm all ears, and I'll also merge my version of the Now, the new communication layer - as you might have noticed - is modular, and my next step in getting this stuff ready to be merged onto |
There are more than enough tickets already that are related to the communication stuff, closing this one. |
Printing.
Smooth continuos printing.
Lots of pausing mid print
HEAD on commRefactoring
Raspberry Pi to a Printrbot (Plus v2 with a rev D board) original 2013 firmware
https://gist.github.com/BJClark/47740fdaf14f6fdca775
Is this maybe another issue not fixed in the commRefactoring branch? Is there anyway to verify in the UI that I'm actually running on the commRefactoring branch? I followed the steps to change branches, but I'm not 100% sure the install worked.
Here is the output (partial, in the middle of a print) with what I'm seeing:
Communication timeout during printing, forcing a line Send: M105 Communication timeout during printing, forcing a line Send: M105 Recv: ok T:199.52 B:49.98 @:92 Send: N49513 G1 X58.827 Y169.757 E11.05820*108 Communication timeout during printing, forcing a line Send: M105 Recv: ok T:199.97 B:49.46 @:84 Send: N49514 G1 X57.929 Y170.031 E11.10783*102 Recv: Error:Line Number is not Last Line Number+1, Last Line:49511
The text was updated successfully, but these errors were encountered: