New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allowing singletonize Symbols #6
Conversation
Sure, but next week sometime. |
Just to evaluate the possible advantage of using the singletonized version of symbols, here is a comparison of how much takes to do comparisons of strings and objects in my computer: This is the code I used for the comparison:
|
Ok. This is for Python (what version?) versus Cython. Now how about a comparison using pyston and PyPy? I hear that Microsoft is also coming out with a performance enhanced version as well. And does the performance change with Python version? |
@rocky if Cython becomes mandatory PyPy won't be supported anymore. |
And maybe other JITs and Python implementations current and future. That is why we need to look at benchmarks of other systems and not be so cavalier about rushing off to solutions, especially ones that change the code base a lot, before we have carefully considered the possibilities and made little benchmark tests of them first. Customizing for a specific kind of speedup complicates code (which when not documented is a mess) and is sometimes makes it harder to undo when the current assumptions should be superceded in light of changing understanding and technology. Lastly, I should note that code can be written in such as way so that we get Cython speeded when desired without it being mandatory. |
Yes, another way better solution is to mix .py files with type declarations and .pxd files with the customs declarations. Today I'm going to see what needs type declarations (they speed up the code a bit [when Cython is enabled], it can reach 20% in some cases...). |
My idea is that we could have for most critical parts of the code, different versions depending on what is the available interpreter. |
@rocky , I didn't check that. |
@mmatera |
This is going the wrong way. We don't want to have tests that start failing depending on options. There is some flakiness somewhere and we don't want that. We have had that in the past because it was given and we didn't know how to remove. However something is works intermittently, in my opinion it is should be treated as a failure, unless there is a very good reason for why we need to tolerate this. |
Yes, I believe so. I am saying that performance gains for how it improves with Cython is not a reason to lock us into Cython. Object locations should be faster in straight Python. |
Totally agree: the other way would be just to skip the failing test, but it is not the idea. The good thing about having this extensive set of tests is that we can discover this flakiness when we change something a little bit. |
I don't understand, |
Yep, but the result of the test shouldn't depend on the number of threads. I think that with |
The tests also pass with The |
@mmatera I tested it manually.
Edit: I run GitHub Actions again, and that's failing, what I'm doing wrong? |
I tested it. A few type declarations don't change anything, in truth half of them just slow down the code (to be more specific: the ones complex, e.g.: Cython is more efficient when used by completely, but that is very hard to do (malloc, C++ Vectors, and more non-Python things). |
maybe the compilation of the Cython modules? |
I ran |
Here are the results of profiling the run of 100 calls that takes more
|
ncalls | tottime | percall | cumtime | percall | filename:lineno(function) |
---|---|---|---|---|---|
8005288 | 21.473 | 0.0 | 359.838 | 0.0 | pattern.py:176(match) |
260903 | 15.363 | 0.0 | 16.168 | 0.0 | clusters.py:830(remove) |
10 | 14.251 | 1.425 | 31.244 | 3.124 | clusters.py:860(reduce) |
625952 | 12.986 | 0.0 | 387.027 | 0.022 | expression.py:1351(evaluate_next) |
4264462 | 12.627 | 0.0 | 331.853 | 0.0 | pattern.py:200(yield_choice) |
60565005 | 9.108 | 0.0 | 9.264 | 0.0 | {built-in method builtins.isinstance} |
4323487 | 8.047 | 0.0 | 364.302 | 0.0 | rules.py:22(apply) |
1659121 | 7.772 | 0.0 | 24.832 | 0.0 | expression.py:720(new) |
61017 | 6.952 | 0.0 | 7.227 | 0.0 | libintmath.py:231(isqrt_fast_python) |
12000024 | 6.282 | 0.0 | 12.864 | 0.0 | definitions.py:430(get_definition) |
876955 | 5.796 | 0.0 | 299.649 | 0.004 | pattern.py:471(match_leaf) |
8018506 | 5.757 | 0.0 | 8.316 | 0.0 | expression.py:2091(sameQ) |
1838672 | 5.263 | 0.0 | 13.676 | 0.0 | expression.py:95(from_python) |
4389077 | 5.242 | 0.0 | 39.355 | 0.0 | pattern.py:72(does_match) |
1311476 | 4.916 | 0.0 | 8.92 | 0.0 | expression.py:1247(sameQ) |
6085 | 4.861 | 0.001 | 10.745 | 0.002 | gammazeta.py:1512(gamma_fixed_taylor) |
8005288 | 4.821 | 0.0 | 17.731 | 0.0 | pattern.py:114(get_attributes) |
5356087 | 4.777 | 0.0 | 342.096 | 0.0 | pattern.py:145(match) |
564735 | 4.702 | 0.0 | 387.401 | 0.036 | expression.py:1295(evaluate) |
1361889 | 4.497 | 0.0 | 6.539 | 0.0 | expression.py:862(_rebuild_cache) |
2957 | 4.425 | 0.001 | 4.425 | 0.001 | {method 'encode' of 'ImagingEncoder' objects} |
8459165 | 4.295 | 0.0 | 13.313 | 0.0 | expression.py:2056(get_attributes) |
9728569 | 4.04 | 0.0 | 5.328 | 0.0 | expression.py:44(ensure_context) |
8461710 | 3.833 | 0.0 | 9.03 | 0.0 | definitions.py:507(get_attributes) |
3340 | 3.808 | 0.001 | 3.834 | 0.001 | :1(hypsum_0_3__QRR_R) |
22221142 | 3.726 | 0.0 | 5.727 | 0.0 | {method 'get' of 'dict' objects} |
5843890 | 3.38 | 0.0 | 6.515 | 0.0 | expression.py:330(get_head_name) |
4779243 | 3.333 | 0.0 | 8.886 | 0.0 | expression.py:1436(rules) |
4267359 | 3.244 | 0.0 | 336.165 | 0.0 | pattern.py:275(yield_head) |
6089 | 3.223 | 0.001 | 5.31 | 0.001 | gammazeta.py:1455(gamma_taylor_coefficients) |
1495612 | 2.942 | 0.0 | 5.9 | 0.0 | expression.py:1999(new) |
25 | 2.787 | 0.111 | 7.766 | 0.311 | plot.py:1625(apply) |
1496102 | 2.733 | 0.0 | 3.261 | 0.0 | util.py:172(subranges) |
1456118 | 2.722 | 0.0 | 297.731 | 0.001 | patterns.py:867(match) |
3673175 | 2.691 | 0.0 | 4.086 | 0.0 | expression.py:260(new) |
4264462 | 2.671 | 0.0 | 333.902 | 0.0 | pattern.py:335(get_pre_choices) |
450884 | 2.548 | 0.0 | 290.736 | 0.005 | rules.py:31(yield_match) |
9 | 2.53 | 0.281 | 2.53 | 0.281 | {method 'quantize' of 'ImagingCore' objects} |
1838672 | 2.39 | 0.0 | 5.177 | 0.0 | numbers.py:78(get_type) |
939456 | 2.284 | 0.0 | 3.81 | 0.0 | definitions.py:691(get_config_value) |
4999090 | 2.225 | 0.0 | 2.225 | 0.0 | {built-in method new of type object at 0x55ccb0870020} |
38952 | 2.135 | 0.0 | 3.318 | 0.0 | libelefun.py:1011(exponential_series) |
444 | 2.068 | 0.005 | 2.073 | 0.005 | :1(hypsum_1_1_R_R_R) |
18662350 | 2.058 | 0.0 | 2.058 | 0.0 | {built-in method builtins.id} |
1654266 | 1.958 | 0.0 | 3.94 | 0.0 | expression.py:992(has_form) |
814530 | 1.936 | 0.0 | 296.54 | 0.004 | pattern.py:642(yield_wrapping) |
12702763 | 1.907 | 0.0 | 1.907 | 0.0 | expression.py:956(get_head) |
9 | 1.89 | 0.21 | 1.89 | 0.21 | {method 'rankfilter' of 'ImagingCore' objects} |
10329258 | 1.843 | 0.0 | 1.843 | 0.0 | evaluation.py:568(check_stopped) |
804435 | 1.736 | 0.0 | 294.435 | 0.003 | pattern.py:621(match_yield) |
322710 | 1.645 | 0.0 | 287.352 | 0.007 | rules.py:118(do_replace) |
14358974 | 1.622 | 0.0 | 1.626 | 0.0 | {built-in method builtins.len} |
913639 | 1.616 | 0.0 | 4.439 | 0.0 | expression.py:65() |
818490 | 1.559 | 0.0 | 297.281 | 0.003 | pattern.py:446(get_wrappings) |
10990548 | 1.543 | 0.0 | 1.543 | 0.0 | expression.py:2059(get_name) |
1101608 | 1.533 | 0.0 | 289.956 | 0.002 | patterns.py:1072(match) |
1272269 | 1.507 | 0.0 | 16.499 | 0.0 | cache.py:69(wrapper) |
3392765 | 1.468 | 0.0 | 13.082 | 0.0 | expression.py:725() |
8 | 1.466 | 0.183 | 7.126 | 0.891 | libelefun.py:538(agm_fixed) |
526947 | 1.344 | 0.0 | 240.705 | 0.004 | expression.py:1367(eval_range) |
241681 | 1.322 | 0.0 | 4.949 | 0.0 | numbers.py:1031(new) |
4 | 1.29 | 0.322 | 1.492 | 0.373 | gammazeta.py:1374(zeta_array) |
2292319 | 1.288 | 0.0 | 6.059 | 0.0 | base.py:773(get_attributes) |
1234 | 1.21 | 0.001 | 1.22 | 0.001 | :1(hypsum_0_1__R_R) |
875727 | 1.18 | 0.0 | 1.71 | 0.0 | expression.py:906(_timestamp_cache) |
4913518 | 1.157 | 0.0 | 1.157 | 0.0 | pattern.py:32(init) |
1761399 | 1.1 | 0.0 | 1.578 | 0.0 | patterns.py:864(get_match_count) |
2016262 | 1.082 | 0.0 | 1.528 | 0.0 | random.py:237(_randbelow_with_getrandbits) |
49444 | 1.078 | 0.0 | 2.512 | 0.0 | facts.py:499(deduce_all_facts) |
1003944 | 1.066 | 0.0 | 2.251 | 0.0 | sympify.py:92(sympify) |
5657541 | 1.051 | 0.0 | 1.051 | 0.0 | expression.py:738(leaves) |
256502 | 1.046 | 0.0 | 2.047 | 0.0 | expression.py:663(__cmp) |
944155 | 1.038 | 0.0 | 4.694 | 0.0 | definitions.py:364(lookup_name) |
1096334 | 1.006 | 0.0 | 1.537 | 0.0 | expression.py:25(fully_qualified_symbol_name) |
2726608 | 1.002 | 0.0 | 1.37 | 0.0 | expression.py:322(get_lookup_name) |
738110 | 0.998 | 0.0 | 2.659 | 0.0 | sympify.py:486(_sympify) |
1892276 | 0.996 | 0.0 | 0.996 | 0.0 | expression.py:196(init) |
454524 | 0.967 | 0.0 | 2.125 | 0.0 | libmpf.py:291(from_man_exp) |
812974 | 0.964 | 0.0 | 1.685 | 0.0 | patterns.py:885(get_match_candidates) |
7305 | 0.956 | 0.0 | 1.251 | 0.0 | libelefun.py:634(log_taylor_cached) |
617686 | 0.923 | 0.0 | 3.101 | 0.0 | evaluation.py:572(inc_recursion_depth) |
74089 | 0.916 | 0.0 | 3.187 | 0.0 | convert.py:114(from_sympy) |
869153 | 0.914 | 0.0 | 0.914 | 0.0 | libmpf.py:153(_normalize) |
4701140 | 0.914 | 0.0 | 0.925 | 0.0 | {built-in method builtins.hasattr} |
702701 | 0.909 | 0.0 | 4.439 | 0.0 | expression.py:2065(get_sort_key) |
261120 | 0.877 | 0.0 | 1.776 | 0.0 | linalg.py:2362(norm) |
777395 | 0.86 | 0.0 | 1.195 | 0.0 | expression.py:57(strip_context) |
824500 | 0.844 | 0.0 | 38.54 | 0.0 | expression.py:2106(evaluate) |
49688 | 0.839 | 0.0 | 54.457 | 0.001 | arithmetic.py:92(apply) |
833677 | 0.826 | 0.0 | 4.107 | 0.0 | expression.py:1955(get_head) |
271706 | 0.822 | 0.0 | 3.293 | 0.0 | expression.py:1116(get_sort_key) |
407297 | 0.799 | 0.0 | 0.989 | 0.0 | numbers.py:151(mpf_norm) |
123881 | 0.795 | 0.0 | 2.333 | 0.0 | random.py:348(shuffle) |
175833 | 0.782 | 0.0 | 3.556 | 0.0 | evalf.py:1425(evalf) |
1583173 | 0.78 | 0.0 | 14.247 | 0.0 | {built-in method builtins.all} |
826647 | 0.757 | 0.0 | 1.674 | 0.0 | :1033(_handle_fromlist) |
133550 | 0.756 | 0.0 | 0.88 | 0.0 | definitions.py:762(init) |
1687352 | 0.756 | 0.0 | 1.549 | 0.0 | expression.py:989(get_lookup_name) |
23596 | 0.754 | 0.0 | 8.05 | 0.0 | mul.py:178(flatten) |
156054 | 0.727 | 0.0 | 9.503 | 0.001 | assumptions.py:464(_ask) |
100 most called functions
|
100 functions that cost most "total time" (time inside the functions + the functions that are called)
|
@mmatera Thanks! This is useful and interesting. It doesn't seem at all different from when I last looked at things. It would be nice to have some place the code that was used to generate this or the basic data from it, so that we can rerun over time or as we adapt it to narrow focus. |
Here are the raw data: |
The code to produce the profile
This prints the profile to stdout. I collected the output and make some formatting by hand (changing spaces by "\t"). |
The code to read the "almost raw" data:
|
The code to build the tables
|
The code to build the summary tables:
|
(I could also upload the ipynb file where I did all the experiments, but I was not sure where) |
Thanks - I will delve into deeper next week sometime. |
Maybe Gist? |
3c2168e
to
c919dfc
Compare
What is failing here is that
for some reason returns |
@rocky, @TiagoCavalcante, finally I made it works. To do that, I had to change the code in several @places. However, please notice that many of the changes come from #44. Maybe the best would be to start a new PR. What do you think? |
I suspect a new PR would be easier. In the past looking at your PRs feel like: let me try this, no that didn't work okay let me try something else, no not that either, okay now I'll merge in code from the main branch. Okay now let me add some debug stuff, ok, I"ll remove it now. Nope, didn't remove all of it, etc. These kinds of things make it hard to follow the every hard to follow what the essence is. Bonus points if you want to start filling out docstrings, |
@rocky, maybe you can help me with this. For some reason, all the test pass if I enable the line that singletonizes the
Symbol
class, except for one test incombinatorica
. As you know better that code, could you help me investigating why this happens?If this starts to work, I guess that we can get an improvement in performance if we use the
id()
of the symbols instead of the string name the symbol in the pattern matching routines.