Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made PrimePY a LOT faster #9

Closed
wants to merge 2 commits into from

Conversation

rhbvkleef
Copy link
Contributor

You probably dont want to merge this as it significantly changes the spirit of the piece of code, but I wanted to show that Python can be VERY in some cases.

You probably dont want to merge this as it significantly changes the spirit of the piece of code, but I wanted to show that Python can be VERY in some cases.
@cristi-neagu
Copy link

This closes #3 by being almost twice as fast. Nice job!

@cristi-neagu
Copy link

Btw, would have been nice to post the numbers you were getting with C# and C++ as well, for comparison. All machines are different, so we don't know what "25939 passes" actually means.

@rhbvkleef
Copy link
Contributor Author

rhbvkleef commented Mar 28, 2021

This is tested on Arch Linux with kernel 5.11.6, 32GB DDR4 RAM, i7-9750H.

Language Iterations
C++ (GCC) 11556
C++ (Clang) 11267
C# (Mono) 1521
C# (DotNet Core) 3317
Python 10345

C++

  • GCC 10.2.0
  • g++ -O3 PrimeCPP.cpp
  • Clang 11.1.0
  • clang++ -O3 PrimeCPP.cpp

C#

  • Mono 6.12.0
  • csc -o+ -debug- PrimeCS.cs
  • DotNet Core 5.0.104
  • dotnet run --configuration=release

Python

  • CPython 3.9.2
  • python PrimePY.py

@JL102
Copy link

JL102 commented Mar 28, 2021

This is quite impressive. But here's a question. It's using multiple libraries instead of using native Python. Do those additional libraries use native code instead of pure python? If they are, it feels a bit like cheating, and perhaps there should be two versions - one with pure Python and built-in libraries, and the other which takes advantage of extra libraries for better performance.

@rhbvkleef
Copy link
Contributor Author

If they are, it feels a bit like cheating, and perhaps there should be two versions

Both Numpy and Numba use native code (although none of Numpy´s native code is actually executed). Numba contains quite a significant amount of native code, and actually produces machine-code at runtime. Is it cheating? Maybe, but not because the libraries use C code. The whole runtime is written in C, so using it as a qualifier is either meaningless or incredibly ambiguous. I think the benchmark for that should be whether we use specific-purpose native code, and that is NOT the case.

I do, however, se a case for keeping two separate solutions, one using the pure STL (in which case we can expect performance that is on par with the dotnet performance) and one where we use Numba´s JIT (or another Python JIT for that matter).

@JL102
Copy link

JL102 commented Mar 28, 2021

Fair reasoning!
I'm still quite unfamiliar with python at the moment, so I apologize if my question seemed accusatory. I see now that the biggest improvement was by using a JIT compiler, correct? Does that mean that Python does not do JIT compiling by default?

@cristi-neagu
Copy link

Well, think of Numba kinda like it used to be when people were writing C or C++ but the compilers weren't super optimized like today, and if they had something performance critical they would insert some assembly code to make it as fast as possible. It's sort of like that, except code written for Numba is much, much close to Python than assembly is to C. It's basically just Python with types.

So in this respect i wouldn't consider using Numba cheating. It's still Python. It's like choosing a different compiler for your C code cause it's a bit better optimized.

@rhbvkleef
Copy link
Contributor Author

Fair reasoning!

Thanks! I was afraid it would come across as a bit rant-ey. I´m happy my point got across despite that.

I apologize if my question seemed accusatory

Don´t worry about it! It didn´t. I just wanted to make sure that I explained my position properly, and wanted to explain why I see it differently.

I see now that the biggest improvement was by using a JIT compiler, correct?

Yes, that´s where the largest speed gain is.

Does that mean that Python does not do JIT compiling by default?

CPython doesn´t by default. There are interpreters (like Pypy and Jython) that do, but they are oftentimes slower than using Numba. They do provide a more "Pythonic" experience: Numba restricts the amount of Python we can write, so that it will be able to optimize it better.

@knowlen
Copy link

knowlen commented Apr 12, 2021

This is quite impressive. But here's a question. It's using multiple libraries instead of using native Python. Do those additional libraries use native code instead of pure python? If they are, it feels a bit like cheating, and perhaps there should be two versions - one with pure Python and built-in libraries, and the other which takes advantage of extra libraries for better performance.

Python built-ins are implemented in C. The source code for Python can be found here. Many Numpy ops just link to the Python implementation (eg; numpy.sum() is just sum()). Numpy still gets called through the interpreter though; which means everything has to happen in memory + you cannot optimize around multiple / overlapping functions. For example,

import numpy as np
def f(n):
    return np.sum(np.arrange(n)

print(f(10000000))

will literally create a 100000000 dimensional array in memory without any consideration to how sum will use it. This is pretty expensive, but if you decorate with numba,

import numpy as np
from numba import njit

@njit
def f(n):
    return np.sum(np.arrange(n)

print(f(10000000))

f() gets JIT compiled and no array generation occurs.

The Numba JIT compilation is potentially* cheating in the sense that Numba is doing some extra manual translation work in the frontend. It's sort of comparable to clang (which is also just an LLVM frontend), but I think the translation from c/++ -> LLVM IR in clang is more deterministic / automatic. The Numba team literally re-implements builtins / Numpy functions in their own language and then maps the Python code "something they know how to compile" before translating the Numba IR to LLVM IR. So there is probably a side effect here where Numba implicitly re-writes poor Python implementations to be optimal before lowering to LLVM IR --in a way that is more rigid than clang. I could be wrong though.


vs.

Generally, the name of the game for compiling Python is to either translate Python -> C (which can then be compiled by clang/gcc) or translating Python -> LLVM IR (which can be compiled directly by LLVM backend). There's dozens of mature libraries / tools for JIT and AOT compilation, but the methodology is essentially bimodal. Usually the former for AOT and the latter for JIT.

While it is possible make Python as fast as C languages in 2021 using JIT compilers, I concede that the comparison is somewhat beyond the scope of Dan's video. The C languages are designed to be compiled, and Python is designed to be interpreted. It would be equivalent to benchmark native Python vs. C++ using a 3rd party C interpreter.

rhbvkleef pushed a commit that referenced this pull request Apr 29, 2021
@rbergen
Copy link
Contributor

rbergen commented Jun 25, 2021

I'm wondering how this relates to #2?

@rbergen
Copy link
Contributor

rbergen commented Jul 21, 2021

Closing due to lack of contributor response. Can be reopened on drag-race branch if desired.

@rbergen rbergen closed this Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants