Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rationalise Python platforms benchmarks #8055

Closed

Conversation

gi0baro
Copy link
Contributor

@gi0baro gi0baro commented Mar 19, 2023

The main rationale behind this is to avoid mixing frameworks tests and platforms ones in Python.

We can match platforms with servers, and thus testing different servers on different frameworks makes quite no sense to me, as:

  • if we define the frameworks response time as RT, and we resume avg(RT) in tests, we can actually compute RT = PT + FT where PT is the platform or server time and FT is the actual framework time
  • if the final RT is always a composition of the used server and the framework, the single benchmark won't add any useful information on the table in case we have also platforms benchmarks
  • put in other word, if Uvicorn is faster than Hypercorn, any framework test that will run on both server will have the same ∂T produced by the original benchmarks of the 2 servers

Also, since we have "Composite scores" grouped by frameworks, it gets very complicated to understand such values as they can come from different implementations.

Skipping these benchmarks will:

  • save CPU time, energy and thus the planet
  • make the entire benchmarks suite faster

Details of changes:

  • Drop socketify tests from all frameworks
  • Drop fastwsgi tests from all frameworks
  • Display socketify in results with pypy as explicit and CPython as implicit to align to common usage
  • Add plain gunicorn test
  • Add plain hypercorn test
  • Review "Framework", "Platform" and "Webserver" labels for all tests

This will stay draft until I checked all the involved points.
In the meantime, a discussion can be started, I would like to have opinions from @cirospaciari, @remittor and @nbrady-techempower

@gi0baro gi0baro force-pushed the rationalise-python-platforms branch from f43f870 to 095a86a Compare March 19, 2023 23:53
@cirospaciari
Copy link
Contributor

cirospaciari commented Mar 20, 2023

I like this rationally but not every WebFramework calls WSGI and ASGI the same way, like Quart have worst performance on socketify than uvicorn, and pure socketify ASGI is way faster than uvicorn (almost double).

So not every framework has the same percent/avg uplift using socketify.
socketify.py is the only one using PyPy as implicit because originally it was a PyPy first framework (and still is), but I agree and will change that.

I can remove all tests and only keep pure ASGI, WSGI, and socketify itself but People should be able to compare different servers on popular web frameworks on python. As I said, not every framework uses ASGI/WSGI in the same way and may not have the same avg difference to leverage the faster server.

PyPy is another thing, some WebFrameworks have their overhead reduced and can have much better performance than using CPython. Most servers do not run very well or at all on PyPy. Take a look at django, that have 288,565 on PyPy vs 92,043 on CPython with socketify and about 70k using meinheld, and meinheld on Falcon is equal to or faster than socketify in CPython. Raw socketify WSGI is 1,561,530 on PyPy and 697,312 on CPython, meinheld should be close to or better than socketify on CPython, and is not compatible with PyPy.

We can limit the number of benchmarks for each WebFramework (only keep the fastest), I think this is fine but people should be able to know on what server it's running and why.

And Composite scores should only be grouped when using the same server + runtime.

@gi0baro gi0baro force-pushed the rationalise-python-platforms branch from 095a86a to f21cbe6 Compare March 20, 2023 00:35
@remittor
Copy link
Contributor

remittor commented Mar 20, 2023

We can match platforms with servers, and thus testing different servers on different frameworks makes quite no sense to me, as: ....

Not so simple. Some WSGI/ASGI servers may have some tweaks that allow you to work more efficiently with a particular framework.

For example, look my tweak for Flask: https://github.com/jamesroberts/fastwsgi/blob/796e5b70bbb20d5411df4b7fa1b19fdb17ef10d0/fastwsgi/request.c#L852-L871
By default, the Flask returns data from file objects in very small chunks in the form of PyBytes. That is, to transfer a file, the python interpreter (CPython) will have to create a lot of PyBytes objects and copy the data from the file stream there.
Therefore, it was decided to force the Flask to read from the file in chunks of the desired size. This greatly reduced the number of PyBytes objects created.

Tweak in bjoern server: https://github.com/jonashaag/bjoern/blob/25b14e5042f51eb869d6bfb67fe7e6213c9747ee/bjoern/server.c#L306-L328
Directly reads data from a file descriptor.

All these tweaks give a significant speed boost in some use cases.

@cirospaciari
Copy link
Contributor

cirospaciari commented Mar 22, 2023

As I promise the naming issue was addressed here: https://github.com/TechEmpower/FrameworkBenchmarks/pull/8058/files

My opinion about rationalizing benchmarks is that unfortunately is not possible, WebFrameworks diverges a lot in overhead and implementation if you include PyPy it's even more difficult to rationalize.
We can get all rounds and try to rationalize over uvicorn or meinheld or anything and we will never get precise results.

Adding different frameworks help-me a lot to find bugs in WSGI, ASGI implementations (I even opened issues on granian using this information, but never posted granian on TFB using other frameworks because I know you do not approve this and I respect your decision).

I also want to state here, that I disagree with keeping old/dead projects on benchmarks like vibora and japronto. Meinheld is not being maintained too but at least is used by a lot of people.

The only prize we get, being better over time in TFB, is getting a better understanding of the behavior and scaling of our application, and being able to compare the same hardware with other implementations. So keeping dead projects is only hurting the benchmark time.

I still want to create a cloud environment (12vcores or more) to run the different benchmarks, like tracking CPU, Memory, IO usage in each benchmark to identify bottlenecks and also add more types of payloads (different sizes).

For payloads my idea was:

'Hello, World!' (13 bytes)
HTML (1KB)
HTML (8KB)
HTML (16KB)
HTML (64KB)
Streaming (1MB)
Streaming (10MB)
Streaming (100MB)
Streaming (1GB)

avoid HTTP/1.1 pipelining use, also adding POST data benchmarks, and in the future adding WebSockets and others.

In this case, I will not test JSON performance or database, but instead, create another benchmark to test different JSON serializers/deserializers, and database connectors separated.

And also I will only add Python Benchmarks, with some other languages as references like Express, and Fastify using node.js, Golang gnet, fiber and gin, asp.net core, and Rust ntex, to be a reference.

But for this, I need more planning and time.

@gi0baro
Copy link
Contributor Author

gi0baro commented Mar 22, 2023

Given the comments, I gonna close this.

@nbrady-techempower feel free to continue the discussion, re-open this or extract parts from this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants