Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up Range.product_queryset() #4100

Merged
merged 7 commits into from Jun 2, 2023

Conversation

gwaidacher
Copy link
Contributor

@gwaidacher gwaidacher commented May 9, 2023

Speeding up Range.product_queryset()

see conversation in #4066 - created new PR with this issue.

Speeding up Range.product_queryset()
@codecov
Copy link

codecov bot commented May 9, 2023

Codecov Report

Merging #4100 (11a6f12) into master (eca7cee) will increase coverage by 0.07%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4100      +/-   ##
==========================================
+ Coverage   87.34%   87.41%   +0.07%     
==========================================
  Files         291      291              
  Lines       15825    15848      +23     
==========================================
+ Hits        13822    13854      +32     
+ Misses       2003     1994       -9     
Impacted Files Coverage Δ
src/oscar/apps/offer/abstract_models.py 88.98% <100.00%> (+0.19%) ⬆️

... and 8 files with indirect coverage changes

@gwaidacher gwaidacher changed the title Update abstract_models.py Speeding up Range.product_queryset() May 9, 2023
Copy link
Member

@specialunderwear specialunderwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look really good. I have on more suggestion, butt you need to verify if my remark makes sence, thanks a lot!

src/oscar/apps/offer/abstract_models.py Show resolved Hide resolved
@gwaidacher
Copy link
Contributor Author

gwaidacher commented May 16, 2023

It was not possible to save the queries for parent__categories and parent__excludes according to your suggestion, the changes failed in two different tests at self.assertTrue(self.range.contains_product(self.child1).

But with an additional check if Product.objects.filter(parent__excludes=self).exists(): it was possible to remove ~Q(parent__excludes=self) from the query, reducing query execution time from > 400 ms to < 4 ms:
We have more than 3,000.000 Products and a Range without classes and children; the additional check and the final query both had less than 2 ms execution time each.

Copy link
Member

@specialunderwear specialunderwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read the code correctly. Ignore this ...

@specialunderwear specialunderwear dismissed their stale review May 19, 2023 07:16

bad reading of code

Copy link
Member

@specialunderwear specialunderwear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'm going to test this on a site with lots of products and which uses child products.
There are a lot of queries added that do checks, so I'd like to know what the impact is on worst case performance before merging this.

I also have 1 more idea, maybe it is possible to do all the checks in a single query, using aggregations or something. Maybe you can look into that, in the meanwhile next week I'll try to find out how this works on a site with lot's of child products. Thanks for your effort!

@gwaidacher
Copy link
Contributor Author

Ok I'm going to test this on a site with lots of products and which uses child products. There are a lot of queries added that do checks, so I'd like to know what the impact is on worst case performance before merging this.

I also have 1 more idea, maybe it is possible to do all the checks in a single query, using aggregations or something. Maybe you can look into that, in the meanwhile next week I'll try to find out how this works on a site with lot's of child products. Thanks for your effort!

I executed the checks in shell, they take no time (< 2ms) each. I tested it with a Range having included_products and a Range having included_categories.

I could not find any useful information on howto perform multiple checks for existence on different fields using aggregate functions.

Joining the checks on different fields again would create something like the original query over categories, parents and classes I want to avoid with this PR. Remember, we had a response time over 10 seconds, having a simple range without classes and included_categories, because the Products table was joined multiple times (... from catalogue_product T1 join catalogue_product T2 on ... ) although these joins were not necessary.

Conclusion: In my opinion, even if it would be possible to do all the checks in a single query (e.g. with a raw query) this would not be faster than the single checks with a huge product database.

I expect you will get some positive results concerning performance with your tests, the query having child products should also be faster if there are no classes or no included_categories in the range.

@gwaidacher
Copy link
Contributor Author

Ok I'm going to test this on a site with lots of products and which uses child products. There are a lot of queries added that do checks, so I'd like to know what the impact is on worst case performance before merging this.

The two checks if the range has classes or included_categories can be ignored, they check if range_id exists on indexed field range_id on rather small tables offer_range_classes and offer_range_included_categories, so these checks will always be very fast.

The other checks join the product_ids from the range's included or excluded products, it's classes or it's included_categories on catalogue_product.parent_id which is also an indexed field - in each single case it takes only a few milliseconds in my database.

... FROM "catalogue_product" INNER JOIN "catalogue_product" T2 ON ("catalogue_product"."parent_id" = T2."id") INNER JOIN "offer_rangeproduct" ON (T2."id" = "offer_rangeproduct"."product_id") ...

The worst case, having a range with child products, classes and categories where all checks are performed might consume an average additional time between 10 and 20 ms for all checks, which can be ignored because the worst case overall query will take multiple seconds to be executed if there are millions of products in the database.

@specialunderwear
Copy link
Member

@gwaidacher I just tested this in a site that is very heavy on child products and I got very impressive results, I'm going to try to do all the checks in one query. But this is very good allready!

@specialunderwear
Copy link
Member

I decided the added complexity of doing conditional queries is not advantageous and to leave you PR as is, with minor readability changes.

@specialunderwear specialunderwear merged commit 8aa90a8 into django-oscar:master Jun 2, 2023
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants