I need to try optimize this port.
will you port this game to Flixel?
Are you kidding me?
@elsassph recommended me:
"Don't set tiles RGB if it's not really necessary for tinting.
I don't think you're using .splice efficiently: you should reuse arrays and .splice out the extra items (if any). The way you say you're using it means you're reallocating the array on each iteration.
Try removing scrollRects too."
recommendations from previous comment didn't helped. But will be in the next commits
Finally found the reason: when compiling from Xcode you get a "debug", considerably slower, version of your haxe-compiled code.
The Flixel BunnyMark actually runs at 60fps when you select "Archive" compilation in Xcode (ie. IPA).
Still overall Flixel is over-architectured and has many slow paths - profiling using Instruments' "Time Profiler" lets you see the most expensive methods.
@elsassph thank you for this info. Will try it, but I'm using Windows.
There are profilers on Windows too.
Either with Visual Studio Ultimate or google brings me: http://www.codersnotes.com/sleepy and http://www.softwareverify.com/
@elsassph Oh, THANK YOU!!! Didn't use any profiler earlier. Will try that
I confirm that Sleepy works nicely for timing the CPU - however it doesn't give much insights on how wasteful HaxeFlixel is regarding garbage collection.
Need to work with hxcpp built-in-debugger: http://gamehaxe.com/2012/09/14/hxcpp-built-in-debugging/
crazysam on NME forum wrote: "With that said, I really think Flixel is a very bloated engine, and as it stands, not very good for mobile. Optimizing the rendering to use drawTiles() was a huge step in the right direction, but the underlying update and Camera structures are very slow (calling preUpdate, update and postUpdate for every FlxBasic is wasteful since they're mostly used to update animation, and not every object needs to be animated). I'm interested in this engine (I love the FlxGroup and the recycle paradigm), so I will continue to use it and improve it, and I hope in some months it will be the best option for the small developer that wants to hit the ground running."
I realize my statement on the forums wasn't very helpful. I'm very interested in getting into the HTML5 market, so I'll be running perf analysis and trying to identify bottlenecks in performance. It's likely I will be heavily modifying the core of Flixel, so I don't know how much Zaphod will want to keep and integrate into the main branch. I'll post updates when I achieve something notable.
@crazysam I think that html5 could be fast and this target need really "heavy modifying": html5 needs whole new renderer based on sprites instead of blitting (I believe so), but it will make multicamera support almost impossible
Multicamera support is likely one of the first things to be gutted. It doesn't make much sense for the html5 target, or for mobile platforms for that matter (I can see a potential application for tablet based games, but we can refactor it out, and leave it as an optional bonus instead of a constant perf concern).
@Beeblerox I got HXCPP profiler working on my code. It's actually really nice! I can show you what I did.
Gotta figure out how to get a profile from within Android. It seems to not support fullpaths, which makes it hard to get it out of the phone.
@crazysam I'm very interested in your results and think that profiling on pc (or mac) should give us usefull information about primary directions for optimizations
Sorry I kinda forgot about this thread. Here's a simple profile of my project: http://pastie.org/5100846
FlxGame::onEnterFrame is what we care most about, and looking in there, we can see that 25% of the time is spent in FlxGame::updateSoundTray. This is very strange, it seems getting the member data of a DisplayObject is very expensive in NME. We should bring this up in the Haxe google group. I commited a simple fix to avoid updating the sound tray unless we have to.
Looking at FlxGame::step we can see that we're spending a lot of time updating the mouse, as well as the JoystickManager. We might want to make the JoystickManager a plugin. As for the mouse... I commited a change that will avoid updating the DisplayObject container unless it's visible, but we should disable it for mobile targets altogether.
I also found that Lib.getTimer was very expensive operation in onEnterFrame, I committed an optimization for that issue.
Here's a profile with my optimizations: http://pastie.org/5101182
It's harder to find good spots of optimize now, Input.update() seems to be a pretty expensive operation, but it probably wouldn't be trivial or worth it to make it more efficient. Cheers!
Nice work crazysam. Yes I dont think its the best practice that HaxeFlixel uses all kinds of inputs even though the game might not require them.
As for moving JoystickManager to a plugin, it makes sense but if we decide that, it may be worth thinking about doing the same for Mouse, Touch etc. The biggest issue there might be protecting legacy code. So we could do a static var in FlxG with an override with get/set like https://github.com/Beeblerox/HaxeFlixel/blob/FlxLayer/src/org/flixel/FlxTimer.hx#L178 does with TimerManager.
If we do this for all inputs it will initialize them only if the game specifies the FlxG.joystickManager = new JoystickManager(); or even with the Keyboard etc ( something we would have to remind people with porting games ).
Developing this engine for mobile with the intention of maintaining 100% backwards compatibility seems silly. I'm hoping HaxeFlixel will be an evolution of Flixel capable of targeting as many platforms as possible (with much better performance than regular Flixel), not just a simple port of Flixel to run on mobile phones. With that said, I think your suggestion of using properties to get input plugins is a very good way to handle this issue.
For now my project is very simple and the optimizations I could identify are only the parts of the engine that are slow even when nothing particularly heavy is happening. When I have dozens of sprites moving about and animating I will likely have to be more aggressive in optimizations.
Right now I don't have time to optimize Flixel input handling, so maybe Zaphod will want to look at making input Plugin based after he's done with his Texture Atlas project.
I dont disagree sam i also think that changes to the api are ok if we have good reasons for it, just we would have to explain this to game makers . The goal must also be to take flixel where as3 couldnt and yes mobile is just one part.
I'll look at separating the input stuff after the next version. It maybe more sensible to also replicate the plugin system specifically for input so we can make sure it gets updated in the stack precisely.
This weekend I've made some comparisons between haxepunk and haxeflixel. Matt have done great work with optimizations of haxepunk: http://forum.haxepunk.com/index.php?topic=299.msg782#msg782 and I was interested to see results. I've used bunnymark as a test for both engines and here what i've got on my PC:
Bunnies HaxePunk fps HaxeFlixel fps
15k 59 47
20k 46 37
25k 38 29
30k 32 25
On mobiles results are similar.
As you can see HaxePunk become much faster. And I was curious why HaxeFlixel is so slow? After some digging I found that method calls (even empty functions) are pretty expensive and HaxeFlixel has three "update" methods (preUpdate(), update() and postUpdate()) while HaxePunk has only one such method. I've tried to remove these methods calls and move their code into one and it then i saw almost the same results/fps (as in HaxePunk version of BunnyMark).
So I need to think about some refactoring or change in engine's architecture (maybe merge these methods and make some of them switchable)
These update functions definitely are the bottleneck - also don't forget that, to call these functions, the engine has to iterate over all the entities 3 times.
So yes, this should be refactored, probably using a more explicit event registration mechanism.
aren't preUpdate() and postUpdate() called on the same update iteration? like:
If so they shouldn't cause that much burden, and anyway they may well be removed o.O
@ProG4mr I am not sure how that works either.maybe because we cant make them inline and override at the same time ?
Interesting, I hacked a raw merge of the update functions and saw;
20k Bunnies @ 45 move up to 52 fps, windows and same resolution/scale as the Haxepunk version.
It means every update needs super and the order of the update doesn't have the same control and may have broken legacy flixel code. Mode demo seems to work however.
Of course I imagine there must be some differences elsewhere between the engines, eg their core motion code and extra things flixel does, HaxePunk seems a lot more "barebones".
Maybe it has to do with inlinning, from what i know the pre and post update are just 2 extra calls per update, they don't force more iterations (that's why they are supposedly bad implemented), so its strange that it cranks performance so much. Maybe i am wrong, haven't looked at code or made any tests so far, I'm just like a sports commentator ^^.
When you iterate over thousands of objects every frame, every additional function call on each object will have an impact on performance.
This seems like a good change, since it's kind of redundant that we call preUpdate, and postUpdate, since they happen immediately before and after update (users could just implement their own preUpdate and call it before doing super.update(), and do postUpdate at the end of their update function).
Inlining these functions isn't an option since users are supposed to override them to add custom behaviors, and in haxe you can't override inlined functions.
Thanks sam yes its the context of iterate that elsassph mentioned that confused me. Its the cost of using 3 function calls themselves. I think my hack shows it. Sports commentors in programming we have a bigger problem :)
It would be very simple to remove both functions ,they are not usefull anyway, for any sports commentator that would be wise.
Anyways there are probably 3 options:
1 - Remove the functions
2 - Disguise the problem:
eg: change preUpdate and postUpdate to function callbacks, and verify if they are null before calling them.
3 - Correct pre and post update:
These are presented to you by:
@elsassph I am also choose this option
Seems like impaler already made a commit for this fix. Could just create a pull request impaler@3e47516.
That was a very rough hack. Only saw the update for the particles, groups etc to see mode demo work. If Beeblerox thinks this is the best way I'll clean it up and continue with this raw merge of update functions.
It does seem more like these three functions only served as a "cosmetic" api for update afterall.
I'm completely willing to sacrifice "cosmetic niceties" for performance. This might be the first big change that breaks backwards compatibility with vanilla flixel projects. Maybe its a good idea to update haxelib to v1.09 and save this change for v1.10.
Good idea about v1.09
So you are proposing to release current version on haxelib and then continue the work on optimizations, bug fixes and new features?
Yes, I propose to fix and freeze current code base, fix bugs and make new release. After it continue to work on optimizations and new features in dev and fix bugs on v1.09 (and dev).
I think there has been more than enough work on 1.09 for a release it seems it solves a lot of bug report from the forum. I suggest we release asap and make it the last release for haxe2 so we can do the next 1.10 or do a 2.x for haxe3 compliance and whatever else we work on.
what bugs you want to be fixed before 1.09 release?
I mean if you have any bugs in your todo that you want to fix before release.
there are too many items in my todo list and some of them will require a lot of work (i expect so) :( but I'll release current version anyway. bug-fixing will be continued after it