-
-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run load tests #275
Comments
I have some basic performance data available. I can give a more specific timing on sections if you tell me what you're interested in profiling. Last week, I built an API to ingest a file that has 256 thousand rows with 21 columns into postgres+flask. It's about 100MB of plaintext CSV data. This data is inserted into three tables:
Here's an anecdotal version of performance:
I had two versions implemented:
We're okay with 2 for now, but ultimately might move to doing a raw query transaction using INSERT + COPY instead for performance wins + transaction. |
Wow! Thank you for the very detailed report. It will certainly be helpful.
I'm most interested in profiling the parts of the code base that we can improve; Things like building the internal Prisma query, sending the query to the internal engine, deserialising records etc.
Just to double check, by batched rows are you referring to the query batcher? e.g. async with client.batch_() as batcher:
batcher.user.create({'name': 'Robert'}) If not, I wonder if it would be possible to ingest all the data in a single batched query? This would give you the benefits of running in a transaction and hopefully performance benefits too. You would obviously still be bottlenecked by httpx. This has given me an idea for a potential performance improvement - Prisma Python will still convert the data that Prisma gives us into pydantic models even if the returned pydantic models are redundant. Pydantic validation (while still fast) can incur significant performance costs especially when deserialising hundreds of thousands of records. We should add a parameter to actions to disable returning records. |
Looking into this a bit myself with a dataset of 500,000 records with 14 columns On my local machine:
@tday So if I'm understanding your use case correctly, it looks like your best bet would be to wrap multiple inserts within a batched query: async with client.batch_() as batcher:
batcher.table1.create_many(...)
batcher.table2.create_many(...)
batcher.table3.create_many(...) Bearing in mind that |
I actually did something a bit different by chunking up create many into multiple create many calls and then using a async.gather to maximize the io bound time. This gave about the same time as your batch version, but without a transaction. Unfortunately, I do need the relationship which batcher doesn't support. It seems like a nested write with a one to many relations executes many creates rather than a create_many. Is that a potential improvement? Alternatively, it'd be nice if you could do something like:
prisma supports this with interactive transactions |
That is a smart solution 👍
Could you elaborate more on what you mean please? Are you talking about the actual generated SQL queries?
I do have provisional support for this in a WIP branch, you could try it out by installing like so: Unfortunately when I started working on this there were a couple of kinks that still needed to be ironed out on the Prisma side before I could officially add support. This has been improved in the latest releases I just haven't gotten around to finishing up the implementation yet: #53 |
Yea, the actual generated queries are different which seems to have performance impact at a glance. I need to do more profiling to prove out this theory. Here is what it looks like at the database:
|
I see, I wonder if it would be feasible for Prisma to be smart enough to create a single query from those batched queries instead of creating multiple queries. I do not know how the SQL queries are generated internally, that's all handled by the Prisma engines team, so it might just not be possible, I'll have a look. |
Problem
We have not yet tested how well Prisma Client Python scales. Scalability is a core concern with ORMs / database clients.
Suggested solution
Use Locust to write tests.
We should also include these tests in CI so that we can fail checks if there are any regressions and upload the results so they can be viewed on GitHub pages.
The text was updated successfully, but these errors were encountered: