-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Palettes: Bad performance on 0.14.* and later #3978
Comments
@blazoncek , why did you add the "cannot reproduce" label? It is super simple to reproduce this! wled_presets.json 600 LEDs, WS281X, using Glitter effect, on a regular ESP8266EX D1 mini module. Using the release binaries provided here, I get the following results: WLED_0.13.3_ESP8266.bin => 21 FPS (self compiled WLED_0.13.3_ESP8266_160.bin => 25 FPS) This is not just a minor performance issue, I hope you agree? |
One further update, also checked the performance of the first release after 0.13.3, which would be 0.14.0-b1: WLED_0.14.0-b1_ESP8266.bin => 16 FPS But then I made a nice discovery: When changing to a different palette, FPS went up to 20 FPS, so similar to 0.13.3. Analyzed this further, turns out that the framerate drops tremendously once the "Power Limiter" kicks in! This was not the case in earlier versions. @blazoncek , any idea? Next I'm going to check the source code changes of that part. |
The power limiter seems not be the reason, it just causes a drop of about 2 FPS, but that's the same for 0.13.3 and 0.14.0-b1.
On 0.13.3, the selected palette has zero influence on the FPS, all run with 21 FPS. |
I'm only considering current 0_15 branch for anything that would need investigation. |
Ok, I will do that! Btw, for 0.14.0-b1, I have found the issue: The performance issue is caused by Segment::color_from_palette in FX_fcn.cpp. It calls loadPalette for each single call of color_from_palette, which is causing a high CPU load (except for palettes 6, 7, 8, 9, 10, 11 and 12, as those are always kept in memory)! Quick fix, using a paletteCache, renders the same performance as the 0.13 series!
|
Basically same problem in 0.15 and main branch. To give you an idea, just using the cache made it go up from 12 FPS to 24 FPS. So the current implementation spends half of its time in loadPalette calls :-/ |
Well for me that's a debateable (and acceptable) option 😉 I'm not saying everyone has to agree. But here's my personal reasoning:
So that's my personal perspective as a developer in a small team, and we devs usually do not receive anything in return for our commitment to WLED. At the same time, everybody wants to have a say about "what is the correct way". Sorry you are not the first (or the last) - but I really had to blow off some steam. tl;dr you've hit a nerve. Feeels better now 🤓 So back to technical... Technicially, the palettes problem you have discovered seems to affect all platforms. Maybe it's worse on 8266, but esp32 could also benefit from an improvement 👍 so thanks for your investigations. |
…from_palette creating loading/creating palettes on each call.
I understand the reasoning from a personal pov, but don't necessarily agree to all points, as my personal requirements might differ a lot. I really appreciate all the work you put into WLED, and all the new features it got so far! Thanks a lot! I've created a pull request for 0_15, but compared to earlier versions, that branch does not work nicely on ESP8266 (yet). One more performance issue with 0.14.3 (might be the same case for 0.15): When the power limiter forces a lower brightness, it loses about 50% of its FPS. I have found the reason already, and a quick hack resolves it, but still working on a clean solution. Hope that helps a bit! |
@softhack007 I want to thank you for all your efforts and I totally respect your opinion. Replacing tens of bulbs can cost too much for the whole house and that isn't a practical fix in terms of both convenience and cost.. unless of course to replace only the dead ones, for my case I do have bulbs that are hard to reach and needs very high ladder size that I usually rent and tell the workers nearby for help since it's not safe distance. I know that many things can be fixed in a hacky way and this isn't very practical or actually need to sacrifice some features to make something else works, I agree this takes time and doesn't help the software as a whole, just certain devices but.. in the same time It's kinda definitive that maxing the possibilities of an old device is challenging and most likely will open the eyes to few things that we are not aware of without facing problems such as new ways of better memory management and optimizations which normally might not come to mind with powerful devices such as esp32 because everything seems to work flawlessly.. unless we are facing issues like this one now that in return bring optimizations to the whole software. @jw2013 Special thanks for your great find. I do really appreciate all of your guys efforts. |
@softhack007 : "programming for 8266 is like torturing yourself" - totally understood and agreed. Heroic efforts to keep bringing new features to old hardware just has to stop at some point. They're not rewarded. At some point, it just doesn't fit. At the risk of being one of those that wants a say (I don't think I have any code in this project), I'll offer what's worked for many projects: Pick a point in the project and formally (and loudly) pull the plug. Version numbers are cheap. Make one that counts. Add #if defined esp8266 (or whatever) #error "Here's a nickel kid, get yourself a better computer" and link to the Dilbert cartoon. You can't stop anyone from removing that and trying to build it on their own or forking it, but after that, you can just aggressively close things as 'unsupported platform' and it's no longer YOUR problem. As devs, we've all had do to this for VAX, Win95, NT3, Qt3, SunOS 4, K&R C, i386, Python 2, and other 'OMG, this HAS to live forever' platforms. Even non-free software development doesn't have infinite resources to keep things alive forever. I get the arguments about e-waste and $3 boards (do yourself a favor and choose replacements with PSRAM.) needing techs (that usually cost > $3) to install them, changes in assembly lines, SKUs, or whatever, but 2016 boards are happiest running 2016 code. It just can't fit forever - whether that's bytes of SRAM, Flash, microseconds in an interrupt handler, jitter for frame refreshes, or whatever. Staying on old code is an option. Sometimes, we must trim the branches of the shrubs of the gardens we maintain so that overall growth and life can thrive. Happy Hacking, P.S. That said, the analysis and premise of #3979 seems pretty sound...and since a message came in while I was typing, I'll clarify I'm not unsympathetic to that POV, either. You can make tiny changes to old branches (for a little while) that can improve those and hopefully have enough resources that everything isn't the proverbial straw on the camel's back. Need 32 bytes for an extra block? Hopefully you blew the whistle in time to still have it and BEFORE that branch picked up some new feature that they may not need anyway. My fundamental point is that there needs to be a reasonable feature/functionality freeze so new LED support and new JSON support doesn't take those last straws. |
As someone with about 20 ESP8266's in my environment be sure I consider 8266 development seriously. But, there is a point in time (or lifecycle) when program can no longer run efficiently on an ancient hardware due to all the bells and whistles it has accumulated in that time. Consider what features have been added to the code since 0.13.3. The features affecting this issue are mode (AKA effect) blending, palette blending and custom palettes which were inexistent in 0.13.3. If you do not need those features then there is no reason to upgrade, but if you need them you should also consider upgrading your hardware platform if you find performance lacking. P.S. I am guessing that you are no longer using iPhone 6 (if you ever were; released at the same time as ESP8266) which cost several hundred times more and has long been made obsolete. That's for comparison. |
@vPrapo , @softhack007 , @robertlipe , @blazoncek , thank you all for sharing your thoughts and comments! First of all, none of the performance fixes that I work on are related to ESP8266 in any way. But slow hardware in general acts as an early warning system, in case performance gets unnoticedly sacrificed. The fact that @blazoncek added the 'cannot reproduce' label right away further proves that point. A software upgrade reduces the performance by 50%? My first thought would not be: I need new hardware, NOW. My personal intentions here are most likely similar to those of @vPrapo: Public art installation with tons of waterproof sealed controller boxes (custom hardware, not cheap). No way those get replaced soon. That being said, I have been able to fix the performance issue caused by the power limiter, too. Those two updates would make 0.14.3 the best performing version for ESP8266. Will commit this fix in a minute. |
This fix works by restoring the brightness in a 'smarter' way, similar to what 0.13.* and earlier did. ORIG:
FIX:
ORIG renders each frame with _brightness, and changes all pixels to lower brightness in each call of show(). This creates a worst-case scenario, performance-wise. FIX only renders the first frame with _brightness, and change all pixels to lower brightness. All subsequent frames would get rendered with the last calculated brightness. It reduces brightness of all pixels immediately if required, and increases the brightness if possible, delayed by one frame. Let's assume effect Palette, with a color palette and LED setting that would consume more power than available in budget. The internal brightness for all frames remains constant, nearly no loss of performance. Effect breath would be the opposite, but even that one gains more FPS. Furthermore, the change in BusDigital::setBrightness makes sure, that only decrements in brightness change pixels immediately. Hope that helps! |
Just did a final performance test with 0.14.3 (settings attached): WLED_0.14.3_ESP8266.bin => 9 FPS |
For your convenience/testing, please find a compiled binary for ESP8266 (160 MHz) attached. (maintainer edit: binary file removed, as we cannot verify its integrity) |
@jw2013 Your proposal is (unfortunately) in conflict with design decisions made earlier for 0.14 and 0.15. To achieve better color quality in getPixelColor(), it was decided to let effects render all full brightness, and then do the brightness reduction as a post-process before passing pixels to the LED driver. In 0.15 with per-output ABL, this post-process was moved entirely into the BusManager. To keep the two topics (ABL and Palettes) separated, I'd propose you either create PR for 0_15 with your proposed changes, or you open a new "enhancement" ticket as a place to discuss and evaluate design options. Actually the "brightness limiter" is a safety feature and the code is very tricky - so I would even ask if it's worth to touch the ABL code AT ALL - and risk safety of user's equipment - for just gaining 2fps on 8266. |
- fixes #3978 - FX: Firenoise can use selected palette
Slightly different approach taken to keep it simple. |
@softhack007 , I haven't checked the ABL code in 0_15 yet, but focused on 0.14.3. Pretty selfish, I know ;-) Regarding my ABL patch for 0.14.3, I didn't really touch the ABL code itself, but optimized the logic how it gets applied. My fix might only delay the raise of the brightness, so in fact, it will never use more energy that the original one. Created some new stats with ABL turned off (on):
Worst impact can been seen with the unpatched 160 MHz version. |
Elegant approach @blazoncek . I thought about something like that too (most of the 'old' strip stuff would actually belong into segments), my fear was that it would use way more memory (one palette object per segment). That was also the reason, why I 'abused' strip._currentPalette for caching. |
- fixes Aircoookie#3978 - FX: Firenoise can use selected palette
So, give 8266 binaries with limited features, many dont need all the bling that can be crammed there. Even make multiple different binaries where each has some of the heavy stuff but not all. Im agreeing that 8266 should not hinder esp32 possibilities. |
@blazoncek , thank you for fixing the palette issue in 0.14.4! |
Many do not even know what version do they have let alone features they have unless you strip their feature away and then they come complaining. Unfortunately limited features binaries are not happening with 0.14 and most likely not with 0.15 as well. But as the code size increases we may see specific builds at some point in the future. If you need trimmed down 0.14 use custom compile on your own. |
No. ABL as of 0.14 is EOL. It has been moved to bus (output) logic in 0.15 to allow per-output (multiple) limitation. There is also work with @Makuna to modify NeoPixelBus so that repainting pixels will not be necessary regardless of global buffer. If you need performance disable ABL and provide adequate power to your LEDs or use global buffer to reduce number of pixel paints. |
The patch was meant for 0.14 only. Even if you consider it EOL, it is still in widespread use, and I think many users will already appreciate the release of 0.14.4. @blazoncek, just to be clear, I don't want to interfere with your 0.15 developments, and understand the design decisions made there. Actually I want you to be able to fully focus on 0.15! I help fixing 0.14 for now, as it clearly has advantages over 0.13 for older devices, despite its performance issues (that can all be fixed as shown). |
we had "fun" with ABL last summer after 0.14.0-b3, because the function was not working any more (side-effect after upgrading NPB) and we had to alert users for protecting their hardware against overcurrent. Lessons learned - don't touch anything related to ABL, unless you have an important reason and 6 month of beta testing afterwards. Not sure if I'm repeating myself now: |
@jw2013 Do you have a binary for 8266 with your custom settings at all? I Installed new 14.4 and tried 15.b3 and the 160 also makes my devices slower than the 80, its a newer model also, but 80 build is overall slow performance. |
@Doyle4, could you please try the binary I uploaded above: #3978 (comment) (that binary already included my palette and ABL optimizations, 160 MHz) |
@softhack007, I just ran the tests on your current 0.14.4 release on ESP8266, ABL turned off (on): WLED_0.14.4_ESP8266.bin: 24 (16) FPS The values without ABL match the ones of my versions, so @blazoncek's changes for the palette perform the same. Without my ABL optimization for 0.14, ABL reduces framerate by 8 FPS, with my patches just by 2 FPS, for both 80 and 160 MHz. Btw, all my benchmarks are easy to reproduce using my cfg.json and presets.json above, no need to connect an actual strip or PSU. |
@jw2013 did you try with global buffer enabled? |
For the 600 LEDs in my case, the global buffer cannot be enabled (LED memory (ERROR: Using over 4000B!)) When I reduce it to 570 LEDs, it works though, giving the same 27 FPS with or without ABL. Understood, without the global buffer, it needs to call multiple functions per pixel (CPU caching, stack...), while with the global buffer, it's a simple operation on memory. |
Installed, 15b3 160 using Hiphotic effect was average 21fps, suing your build Im getting 40! |
Please also try with "use global buffer" (LED settings) enabled, using 0.14.4 160 mhz build. |
@blazoncek, could you please spare some of your valuable time to elaborate, why it is incorrect in your opinion? |
How that works? Im used to doing compiles on ESPHome, so thats how far my knowledge goes. Assuming WLED doesnt have a interface similar to it, though would bring a different world to the whole WLED business. I'd drop everything else but the essentials i need. |
What happened?
I discovered, that previous WLED binaries on ESP8266 were running at 80 MHz instead of 160 MHz.
Using it a lot, so I tested the newer _160 releases for my d1-mini's, but they were actually much slower!
String with 600 LEDs (WS2815) was now running at 13 FPS, instead of the 20 FPS using e.g. the 0.13.3 release.
So I curiously compiled 0.13.3 using board_build.f_cpu = 160000000L, and performance went up to 25 FPS. Nice!
But why do the new releases perform so badly? I compiled basically all 0.14.* sources, and discovered that performance went downhill going from 0.14.1-b2 to 0.14.1-b3. Only real code change here was in json.cpp, and that one indeed causes the performance issue.
To Reproduce Bug
Install firmware 0.14.1-b2 and 0.14.1-b3 to compare.
Setup LEDs with power limiter (WS2815), 600 LEDs (WS281x), Effect Glitter. Check FPS in info.
Expected Behavior
Expected to keep the nice FPS rate.
Install Method
Self-Compiled
What version of WLED?
WLED 0.14.1-b2 and WLED-0.14.1-b3
Which microcontroller/board are you seeing the problem on?
ESP8266
Relevant log/trace output
No response
Anything else?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: