Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bit of NumPy #132

Merged
merged 9 commits into from
Oct 17, 2015
Merged

A bit of NumPy #132

merged 9 commits into from
Oct 17, 2015

Conversation

tcld
Copy link
Contributor

@tcld tcld commented Oct 14, 2015

I played around with a profiler, picked out one of the slowest functions and took a look at what could be improved - I found a couple of smaller things, but nothing major. I then proceeded to see what numpy could do - but it actually doesn't do much. Speed improvements are within the margin of error, memory improvements weren't measurable due to me not knowing how to properly use memory_profiler.

The generated maps are 100% the same as before, result of the tests hasn't changed either.

Here some more information, not very useful and quite boring: http://pastebin.com/8p1vwaV1

@tcld
Copy link
Contributor Author

tcld commented Oct 14, 2015

Sh**, I accidentally added the World_pb2.py I got from @psi29a . Could somebody tell me how I can undo that?

EDIT: Ok, at the cost of some extra commits that problem seems to be solved.

@tcld
Copy link
Contributor Author

tcld commented Oct 14, 2015

Oh, the build failed. When I return that numpy-array for the irrigation, it causes problems whenever something like "world1 == world2" is called since that array seemingly differs from a standard array more than expected (and thus "==" is not as clearly defined as before).

I will think about it.

@tcld
Copy link
Contributor Author

tcld commented Oct 14, 2015

I pushed another commit, now it hopefully works. I tried to test the addition to the protobuf-code (in world.py) but I am not sure what triggers it. "worldengine --protocol-buffer" ran fine.

I also did some testing regarding the memory consumption of a numpy array vs. a standard array. Outputs are read off of the Ubuntu Task Manager. Imprecise for small arrays. And for some reason there are two values - private and (virtual). The number on the left is the size of the array (e.g. 1000*1000).

old:
1000^2: 5 (40) MB -> 14 ( 48) MB
4000^2: 5 (40) MB -> 135 ( 169) MB
10000^2: 5 (40) MB -> 808 ( 876) MB
20000^2: 5 (40) MB -> 3134 (3466) MB

new:
1000^2: 5 (40) MB -> 17 ( 93) MB
4000^2: 5 (40) MB -> 17 ( 207) MB
10000^2: 5 (40) MB -> 17 ( 848) MB
20000^2: 5 (40) MB -> 17 (3137) MB

Creation and destruction are a lot faster for the numpy arrays. Memory consumption in comparison to standard arrays is slightly better for large arrays. As far as I understand it's the iteration that isn't much faster - but there are some optimized iteration possibilities, maybe they prove useful in other places of worldengine.
It was certainly worth a try, and depending on what you guys think about this and if the tests will start going through at some point, I might start looking at some of the other code-files and add in some numpy-code - hints on where it might be most profitable are welcome. (I don't have to go all out as with this commit, I just wanted to give it a fair shot.)

@tcld
Copy link
Contributor Author

tcld commented Oct 14, 2015

Yippie, the tests are passed! :) (Except for the lottery-test...)

@psi29a
Copy link
Member

psi29a commented Oct 14, 2015

Nice work! We've had this on our todo list for awhile. :)

I think there are more places in the code where the 2d arrays should be numpy.

@tcld
Copy link
Contributor Author

tcld commented Oct 14, 2015

Yes, there are. It would be great if all of them were, that might eliminate some overhead converting between the two types and give a very clean memory-footprint and small speed improvements.

When I started editing the function I just wanted to exchange the array for a numpy array, due to the TODO-remark. When I then dug into the documentation I noticed that there is a lot more that could be done with numpy - sadly the improvements are pretty minor, I was hoping for at least a factor of two in speed.

EDIT: Although I have to add that I never used numpy before, it is very likely that more improvements can be made.

@tcld
Copy link
Contributor Author

tcld commented Oct 15, 2015

I'm still polishing this, so please nobody merge it. numpy is kind of insane. :)

@psi29a
Copy link
Member

psi29a commented Oct 15, 2015

I'm a little worried about your revert of World_pb2.py, since what was in master was compatible with python2 and python3.

Did it not work with your version of protobuf?

@tcld
Copy link
Contributor Author

tcld commented Oct 15, 2015

Oh, no! It worked perfectly. I guess I used "commit -a" when I should have used "commit". I'm sorry. :(
I will remove it. Again.

@psi29a
Copy link
Member

psi29a commented Oct 15, 2015

No worries, nothing that can't be undone. I was just reading through your change-set and noticed it. :)

When you are ready for us to really look at it (and merge), just say so. We'll likely have you do a:

git rebase master -i

which will allow you to squash all your commits into one. This makes your commit changes super clean as a result. :)

@tcld
Copy link
Contributor Author

tcld commented Oct 15, 2015

Thanks for the hint. :)

The code as is is in a state of readiness. But numpy seems to be very powerful and I am currently trying to find a way to get rid of most of the iterations in the modified file. It would be nice to have a really polished example of numpy-usage in the code, maybe somebody else decides to put more numpy into worldengine and needs a reference. I, for one, had no idea that numpy would prove to be such a deep pond.

(It's a shame I will have to throw away my coverage-lottery ticket with the next commit, though.^^)

@tcld
Copy link
Contributor Author

tcld commented Oct 15, 2015

I might be done. Offline-tests ran fine, result of the generation is exactly the same as with the old algorithm and the code now looks quite beautiful to me.^^
I desperately have to run some performance tests on this because this now needs two orders of magnitude less iterations and the things that are done should be doable very efficiently. The old "new" method saved about 10% of time, hopefully the current one is at least as good.

@tcld
Copy link
Contributor Author

tcld commented Oct 15, 2015

Yippie, cProfile reports about 98% less time spent in irrigation.py - which is about 25% less time overall! 👯

(I also won the lottery again, this patch seems to be ripe for a merge. :) )

EDIT: Here some numbers: http://pastebin.com/gdQviHa4
I won't run line_profile after this, and the memory footprint of numpy in general was already covered in one of my earlier messages here. I declare testing done. :)

PS: This was done using a compiled master-build versus an uncompiled (?) test-build. I'm not sure when Python decides to compile code but there could be even more of a speed-up hidden somewhere.

EDIT2: I noticed that for the small test map (256x256) this saves 15 million (!) function calls (about 1/3 of the total).

@psi29a
Copy link
Member

psi29a commented Oct 15, 2015

Python unlike C/C++ (compiled languages) is an interpreted language. It will write bytecode out when the file is first run into *.pyc files. Python is smart enough to invalidate pyc files when the py files are changed but will only actually run the pyc files. Thus the Python interpreter will take the pyc files and do what is required of it... typical of a scripting language like lua, php or worse... javascript. ;)

So I'm not really sure what you mean by compiled master-build and uncompiled test-build.

If you mean our one file things found in our github releases page, that is a whole other ball of yarn having to do with pyinstaller black-magic. That assembles all the different libraries and Python bytecode into one file that resembles a self-extracting zip file.

@tcld
Copy link
Contributor Author

tcld commented Oct 15, 2015

I meant: I properly installed the master-build. Then uninstalled, removed all traces of .pyc (I don't trust Python because of some mishap in the past) and then reran the script.

But it might not be important if there is any more time that can be shaved off of the 0.2 seconds that are left. :P

@psi29a
Copy link
Member

psi29a commented Oct 15, 2015

For sake of nailing down definitions... let us just call it based on the branch.
How about just master vs. perf_opt
or if you are talking about a release, use the tag name like 0.18 vs. perf_opt
it is safe to say your branch or tag vs. another branch and tag because they are unique. :)

@tcld
Copy link
Contributor Author

tcld commented Oct 15, 2015

That seems more clear, I will remember it.^^

Meanwhile I am looking through the profiler-output to find some more timesinks. There are some that might be worth looking into, but nothing really prominent.

This one is interesting, though:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1000    8.378    0.008    8.378    0.008 {built-in method step}

Is that what is done when iterating through a range? I have never seen a step()-method - and a quick look into the Python documentation didn't help either.

EDIT:
The following take more than half of the total runtime:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    8.378    0.008    8.378    0.008 {built-in method step}
   289987    3.789    0.000    5.212    0.000 worldengine/world.py:376(tiles_around)
      462    2.237    0.005    2.237    0.005 worldengine/simulations/basic.py:54(count)
   655360    1.742    0.000    1.742    0.000 worldengine/common.py:97(anti_alias_point)
 12389829    1.503    0.000    1.503    0.000 worldengine/world.py:340(is_land)

Everything else is really minor in comparison. Maybe if world.ocean could be made a numpy-array the last point could be diminished (I don't know if accessing a numpy array is quicker than accessing a standard array).

anti_alias_point(), count() and tiles_around() are called very often, so maybe slight improvements would be useful. That doesn't seem to have anything to do with numpy, though - it looks like overall there is not that much room for improving on speed. Memory might be good to look into, but I don't know how to properly use memory_profiler.

This was referenced Oct 16, 2015
@tcld tcld mentioned this pull request Oct 16, 2015
# -method is run once per generation
# -iterations : width * height
# -memory consumption: width * height * 8 Byte (numpy 1.9.2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, I love documentation and I love precise memory allocation prediction and measurements. :)

@psi29a
Copy link
Member

psi29a commented Oct 17, 2015

Looks great, merging.

psi29a added a commit that referenced this pull request Oct 17, 2015
@psi29a psi29a merged commit 13086bb into Mindwerks:master Oct 17, 2015
@tcld tcld deleted the perf_opt branch October 17, 2015 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants