A bit of NumPy #132

tcld · 2015-10-14T14:14:58Z

I played around with a profiler, picked out one of the slowest functions and took a look at what could be improved - I found a couple of smaller things, but nothing major. I then proceeded to see what numpy could do - but it actually doesn't do much. Speed improvements are within the margin of error, memory improvements weren't measurable due to me not knowing how to properly use memory_profiler.

The generated maps are 100% the same as before, result of the tests hasn't changed either.

Here some more information, not very useful and quite boring: http://pastebin.com/8p1vwaV1

tcld · 2015-10-14T14:27:50Z

Sh**, I accidentally added the World_pb2.py I got from @psi29a . Could somebody tell me how I can undo that?

EDIT: Ok, at the cost of some extra commits that problem seems to be solved.

tcld · 2015-10-14T14:51:12Z

Oh, the build failed. When I return that numpy-array for the irrigation, it causes problems whenever something like "world1 == world2" is called since that array seemingly differs from a standard array more than expected (and thus "==" is not as clearly defined as before).

I will think about it.

tcld · 2015-10-14T19:11:39Z

I pushed another commit, now it hopefully works. I tried to test the addition to the protobuf-code (in world.py) but I am not sure what triggers it. "worldengine --protocol-buffer" ran fine.

I also did some testing regarding the memory consumption of a numpy array vs. a standard array. Outputs are read off of the Ubuntu Task Manager. Imprecise for small arrays. And for some reason there are two values - private and (virtual). The number on the left is the size of the array (e.g. 1000*1000).

old:
1000^2: 5 (40) MB -> 14 ( 48) MB
4000^2: 5 (40) MB -> 135 ( 169) MB
10000^2: 5 (40) MB -> 808 ( 876) MB
20000^2: 5 (40) MB -> 3134 (3466) MB

new:
1000^2: 5 (40) MB -> 17 ( 93) MB
4000^2: 5 (40) MB -> 17 ( 207) MB
10000^2: 5 (40) MB -> 17 ( 848) MB
20000^2: 5 (40) MB -> 17 (3137) MB

Creation and destruction are a lot faster for the numpy arrays. Memory consumption in comparison to standard arrays is slightly better for large arrays. As far as I understand it's the iteration that isn't much faster - but there are some optimized iteration possibilities, maybe they prove useful in other places of worldengine.
It was certainly worth a try, and depending on what you guys think about this and if the tests will start going through at some point, I might start looking at some of the other code-files and add in some numpy-code - hints on where it might be most profitable are welcome. (I don't have to go all out as with this commit, I just wanted to give it a fair shot.)

tcld · 2015-10-14T19:22:45Z

Yippie, the tests are passed! :) (Except for the lottery-test...)

psi29a · 2015-10-14T20:23:34Z

Nice work! We've had this on our todo list for awhile. :)

I think there are more places in the code where the 2d arrays should be numpy.

tcld · 2015-10-14T20:26:51Z

Yes, there are. It would be great if all of them were, that might eliminate some overhead converting between the two types and give a very clean memory-footprint and small speed improvements.

When I started editing the function I just wanted to exchange the array for a numpy array, due to the TODO-remark. When I then dug into the documentation I noticed that there is a lot more that could be done with numpy - sadly the improvements are pretty minor, I was hoping for at least a factor of two in speed.

EDIT: Although I have to add that I never used numpy before, it is very likely that more improvements can be made.

…of the program.

tcld · 2015-10-15T06:33:22Z

I'm still polishing this, so please nobody merge it. numpy is kind of insane. :)

psi29a · 2015-10-15T07:27:37Z

I'm a little worried about your revert of World_pb2.py, since what was in master was compatible with python2 and python3.

Did it not work with your version of protobuf?

tcld · 2015-10-15T07:30:16Z

Oh, no! It worked perfectly. I guess I used "commit -a" when I should have used "commit". I'm sorry. :(
I will remove it. Again.

psi29a · 2015-10-15T07:34:48Z

No worries, nothing that can't be undone. I was just reading through your change-set and noticed it. :)

When you are ready for us to really look at it (and merge), just say so. We'll likely have you do a:

git rebase master -i

which will allow you to squash all your commits into one. This makes your commit changes super clean as a result. :)

tcld · 2015-10-15T07:39:22Z

Thanks for the hint. :)

The code as is is in a state of readiness. But numpy seems to be very powerful and I am currently trying to find a way to get rid of most of the iterations in the modified file. It would be nice to have a really polished example of numpy-usage in the code, maybe somebody else decides to put more numpy into worldengine and needs a reference. I, for one, had no idea that numpy would prove to be such a deep pond.

(It's a shame I will have to throw away my coverage-lottery ticket with the next commit, though.^^)

tcld · 2015-10-15T11:38:15Z

I might be done. Offline-tests ran fine, result of the generation is exactly the same as with the old algorithm and the code now looks quite beautiful to me.^^
I desperately have to run some performance tests on this because this now needs two orders of magnitude less iterations and the things that are done should be doable very efficiently. The old "new" method saved about 10% of time, hopefully the current one is at least as good.

tcld · 2015-10-15T11:51:33Z

Yippie, cProfile reports about 98% less time spent in irrigation.py - which is about 25% less time overall! 👯

(I also won the lottery again, this patch seems to be ripe for a merge. :) )

EDIT: Here some numbers: http://pastebin.com/gdQviHa4
I won't run line_profile after this, and the memory footprint of numpy in general was already covered in one of my earlier messages here. I declare testing done. :)

PS: This was done using a compiled master-build versus an uncompiled (?) test-build. I'm not sure when Python decides to compile code but there could be even more of a speed-up hidden somewhere.

EDIT2: I noticed that for the small test map (256x256) this saves 15 million (!) function calls (about 1/3 of the total).

psi29a · 2015-10-15T12:17:36Z

Python unlike C/C++ (compiled languages) is an interpreted language. It will write bytecode out when the file is first run into *.pyc files. Python is smart enough to invalidate pyc files when the py files are changed but will only actually run the pyc files. Thus the Python interpreter will take the pyc files and do what is required of it... typical of a scripting language like lua, php or worse... javascript. ;)

So I'm not really sure what you mean by compiled master-build and uncompiled test-build.

If you mean our one file things found in our github releases page, that is a whole other ball of yarn having to do with pyinstaller black-magic. That assembles all the different libraries and Python bytecode into one file that resembles a self-extracting zip file.

tcld · 2015-10-15T12:21:21Z

I meant: I properly installed the master-build. Then uninstalled, removed all traces of .pyc (I don't trust Python because of some mishap in the past) and then reran the script.

But it might not be important if there is any more time that can be shaved off of the 0.2 seconds that are left. :P

psi29a · 2015-10-15T12:25:35Z

For sake of nailing down definitions... let us just call it based on the branch.
How about just master vs. perf_opt
or if you are talking about a release, use the tag name like 0.18 vs. perf_opt
it is safe to say your branch or tag vs. another branch and tag because they are unique. :)

tcld · 2015-10-15T12:32:24Z

That seems more clear, I will remember it.^^

Meanwhile I am looking through the profiler-output to find some more timesinks. There are some that might be worth looking into, but nothing really prominent.

This one is interesting, though:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1000    8.378    0.008    8.378    0.008 {built-in method step}

Is that what is done when iterating through a range? I have never seen a step()-method - and a quick look into the Python documentation didn't help either.

EDIT:
The following take more than half of the total runtime:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    8.378    0.008    8.378    0.008 {built-in method step}
   289987    3.789    0.000    5.212    0.000 worldengine/world.py:376(tiles_around)
      462    2.237    0.005    2.237    0.005 worldengine/simulations/basic.py:54(count)
   655360    1.742    0.000    1.742    0.000 worldengine/common.py:97(anti_alias_point)
 12389829    1.503    0.000    1.503    0.000 worldengine/world.py:340(is_land)

Everything else is really minor in comparison. Maybe if world.ocean could be made a numpy-array the last point could be diminished (I don't know if accessing a numpy array is quicker than accessing a standard array).

anti_alias_point(), count() and tiles_around() are called very often, so maybe slight improvements would be useful. That doesn't seem to have anything to do with numpy, though - it looks like overall there is not that much room for improving on speed. Memory might be good to look into, but I don't know how to properly use memory_profiler.

psi29a · 2015-10-17T07:31:43Z

worldengine/simulations/irrigation.py

+        #  -method is run once per generation
+        #  -iterations        : width * height
+        #  -memory consumption: width * height * 8 Byte (numpy 1.9.2)
+


This is nice, I love documentation and I love precise memory allocation prediction and measurements. :)

psi29a · 2015-10-17T07:32:26Z

Looks great, merging.

A bit of NumPy

tcld added 5 commits October 14, 2015 23:21

First try at optimizing irrigation.py.

cb22b53

Added in old, commented code - for reference.

8c1da42

Switched x and y so the resulting map is compatible with other parts …

6617b1c

…of the program.

Last cleanups.

1c0c1bd

Tiny typos.

b383827

tcld force-pushed the perf_opt branch from eee2edf to 294fc61 Compare October 15, 2015 06:25

tcld added 4 commits October 15, 2015 12:00

Reworked World.== operator to tie things up.

c82e6b0

Added a comment regarding complexity.

f541e73

Polished preparatory steps.

1ea9ca2

More refinements.

072c9d9

tcld force-pushed the perf_opt branch from 294fc61 to 072c9d9 Compare October 15, 2015 11:33

This was referenced Oct 16, 2015

Very Short Agenda #136

Closed

WIP: Variable ocean levels #134

Open

tcld mentioned this pull request Oct 16, 2015

Adopt pyplatec 1.4.0 #137

Merged

psi29a reviewed Oct 17, 2015
View reviewed changes

psi29a added a commit that referenced this pull request Oct 17, 2015

Merge pull request #132 from tcld/perf_opt

13086bb

A bit of NumPy

psi29a merged commit 13086bb into Mindwerks:master Oct 17, 2015

tcld deleted the perf_opt branch October 17, 2015 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A bit of NumPy #132

A bit of NumPy #132

tcld commented Oct 14, 2015

tcld commented Oct 14, 2015

tcld commented Oct 14, 2015

tcld commented Oct 14, 2015

tcld commented Oct 14, 2015

psi29a commented Oct 14, 2015

tcld commented Oct 14, 2015

tcld commented Oct 15, 2015

psi29a commented Oct 15, 2015

tcld commented Oct 15, 2015

psi29a commented Oct 15, 2015

tcld commented Oct 15, 2015

tcld commented Oct 15, 2015

tcld commented Oct 15, 2015

psi29a commented Oct 15, 2015

tcld commented Oct 15, 2015

psi29a commented Oct 15, 2015

tcld commented Oct 15, 2015

psi29a Oct 17, 2015

psi29a commented Oct 17, 2015

A bit of NumPy #132

A bit of NumPy #132

Conversation

tcld commented Oct 14, 2015

tcld commented Oct 14, 2015

tcld commented Oct 14, 2015

tcld commented Oct 14, 2015

tcld commented Oct 14, 2015

psi29a commented Oct 14, 2015

tcld commented Oct 14, 2015

tcld commented Oct 15, 2015

psi29a commented Oct 15, 2015

tcld commented Oct 15, 2015

psi29a commented Oct 15, 2015

tcld commented Oct 15, 2015

tcld commented Oct 15, 2015

tcld commented Oct 15, 2015

psi29a commented Oct 15, 2015

tcld commented Oct 15, 2015

psi29a commented Oct 15, 2015

tcld commented Oct 15, 2015

psi29a Oct 17, 2015

Choose a reason for hiding this comment

psi29a commented Oct 17, 2015