composite_primary_keys subquery_for_count is slow on mysql #98

Closed
codeodor opened this Issue Apr 16, 2012 · 10 comments

Comments

Projects
None yet
2 participants
@codeodor
Contributor

codeodor commented Apr 16, 2012

Perhaps the issue is more with the choice of mysql, and I'm not entirely sure the issue is composite_primary_keys' fault. But here is my symptom.

In AR model Student, add

self.primary_keys = :id, :district_id

Run my queries and see this slow one:

(1723.0ms)  SELECT COUNT(*) FROM (SELECT DISTINCT `students`.* FROM `students` WHERE `students`.`district_id` = 3 
AND ((home_campus_id in (1,2,3,4,5,6,7,8,9,10,11,12,13) or home_campus_id is null) and (choice_campus_id in 
(1,2,3,4,5,6,7,8,9,10,11,12,13) or choice_campus_id is null) and grade_level <= '10' and grade_level >= '09')) 
subquery_for_count 

Remove the self.primary_keys = :id, :district_id from Student model and see this query in it's place:

(8.0ms)  SELECT COUNT(*) FROM `students` WHERE `students`.`district_id` = 3 AND ((home_campus_id in 
(1,2,3,4,5,6,7,8,9,10,11,12,13) or home_campus_id is null) and (choice_campus_id in (1,2,3,4,5,6,7,8,9,10,11,12,13) or 
choice_campus_id is null) and grade_level <= '10' and grade_level >= '09')

Same query except for the subquery.

So my question is:

Is there a reason to go with the subquery?

If not, could I remove it and submit a pull request? (note, I haven't yet looked through the code, I'm just wondering out loud about what's going on, and if removing the subquery is a valid solution you'd be willing to accept).

Thanks for your input!

@codeodor

This comment has been minimized.

Show comment Hide comment
@codeodor

codeodor Apr 16, 2012

Contributor

I'm on 5.0.4 by the way.

Contributor

codeodor commented Apr 16, 2012

I'm on 5.0.4 by the way.

@codeodor

This comment has been minimized.

Show comment Hide comment
@codeodor

codeodor Apr 16, 2012

Contributor

Looks like the code is in /lib/composite_primary_keys/relation/calculations.rb ...

Line 41 builds the subquery, while line 27 calls it.

For example of a "fix" for this issue, if I change line 42 (first line of the build_count_subquery to just return the relation it's passed:

 return relation

Then the issue seems goes away, but my code still works.

What's the count being used for, and can it safely be changed?

Thoughts?

Update: now that I'm not looking at this with sleepy eyes, I found in #execute_simple_calculation it's doing something special for the case when you want to aggregate the count -- it's building this subquery and querying it instead.

If I just change it to do the normal operation_over_aggregate_column instead of using this special case for count, the query becomes instantaneous and is the correct query.

So my only question remains: is there a reason it builds a subquery for the count? If not, I'll gladly clean it up and send you a pull request.

Contributor

codeodor commented Apr 16, 2012

Looks like the code is in /lib/composite_primary_keys/relation/calculations.rb ...

Line 41 builds the subquery, while line 27 calls it.

For example of a "fix" for this issue, if I change line 42 (first line of the build_count_subquery to just return the relation it's passed:

 return relation

Then the issue seems goes away, but my code still works.

What's the count being used for, and can it safely be changed?

Thoughts?

Update: now that I'm not looking at this with sleepy eyes, I found in #execute_simple_calculation it's doing something special for the case when you want to aggregate the count -- it's building this subquery and querying it instead.

If I just change it to do the normal operation_over_aggregate_column instead of using this special case for count, the query becomes instantaneous and is the correct query.

So my only question remains: is there a reason it builds a subquery for the count? If not, I'll gladly clean it up and send you a pull request.

@codeodor

This comment has been minimized.

Show comment Hide comment
@codeodor

codeodor Apr 16, 2012

Contributor

PS: I notice another strange thing. In my table students I have id (the student's ID) and district_id as the PK, and now when I output student.id it outputs both, comma delimited.

I gather that's by design for people who don't use ID as part of the PK, but can that be changed?

Contributor

codeodor commented Apr 16, 2012

PS: I notice another strange thing. In my table students I have id (the student's ID) and district_id as the PK, and now when I output student.id it outputs both, comma delimited.

I gather that's by design for people who don't use ID as part of the PK, but can that be changed?

@codeodor

This comment has been minimized.

Show comment Hide comment
@codeodor

codeodor Apr 16, 2012

Contributor

I do see that the project tests fail if that code is simply removed, so I'm looking into that now.

Contributor

codeodor commented Apr 16, 2012

I do see that the project tests fail if that code is simply removed, so I'm looking into that now.

@cfis

This comment has been minimized.

Show comment Hide comment
@cfis

cfis Apr 20, 2012

Contributor

Right, the subquery was intentional and was done because the query generated by Rails for postgesql doesn't work with composite keys. Its been a while since I looked at it, but if I remember, the query used postgresql's propietary distinct on syntax so something like this:

select distinct on (field) a, b, c

See 7.3.3 in http://www.postgresql.org/docs/9.1/static/queries-select-lists.html.

So the only thing that I could figure out that was returned the same results was a subquery. I'd imagine your query is slow because of the SELECT DISTINCT students.* - you have to go through all the fields for all the records to figure that one out.

The idea was to find all the unique values, then count them. Your alternative query does something that is a different - they won't return the same results.

Contributor

cfis commented Apr 20, 2012

Right, the subquery was intentional and was done because the query generated by Rails for postgesql doesn't work with composite keys. Its been a while since I looked at it, but if I remember, the query used postgresql's propietary distinct on syntax so something like this:

select distinct on (field) a, b, c

See 7.3.3 in http://www.postgresql.org/docs/9.1/static/queries-select-lists.html.

So the only thing that I could figure out that was returned the same results was a subquery. I'd imagine your query is slow because of the SELECT DISTINCT students.* - you have to go through all the fields for all the records to figure that one out.

The idea was to find all the unique values, then count them. Your alternative query does something that is a different - they won't return the same results.

@codeodor

This comment has been minimized.

Show comment Hide comment
@codeodor

codeodor Apr 20, 2012

Contributor

That's a good point, but perhaps something to think about -- in the code I never could isolate where the count was being requested, but the alternative query is the one rails generated without CPK, so I wonder if line 60 in https://github.com/drnic/composite_primary_keys/blob/master/lib/composite_primary_keys/relation/calculations.rb

relation.distinct(true)

should be changed to

relation.distinct(distinct) #using the parameter passed in

because if it's always selecting distinct, it would think it's changing the expected results if the original query was not distinct.

Contributor

codeodor commented Apr 20, 2012

That's a good point, but perhaps something to think about -- in the code I never could isolate where the count was being requested, but the alternative query is the one rails generated without CPK, so I wonder if line 60 in https://github.com/drnic/composite_primary_keys/blob/master/lib/composite_primary_keys/relation/calculations.rb

relation.distinct(true)

should be changed to

relation.distinct(distinct) #using the parameter passed in

because if it's always selecting distinct, it would think it's changing the expected results if the original query was not distinct.

@cfis

This comment has been minimized.

Show comment Hide comment
@cfis

cfis Apr 21, 2012

Contributor

Maybe. Would have to step through the code in a debugger again - it takes me a bit of time to wrap my head around this code each time I look into it. Probably good to look at it again since I haven't for a while, but unfortunately don't have time now.

If you have time and want to test with postgresql, that would be great. Take out the monkey patch, run the tests, and you'll see how they break...

Contributor

cfis commented Apr 21, 2012

Maybe. Would have to step through the code in a debugger again - it takes me a bit of time to wrap my head around this code each time I look into it. Probably good to look at it again since I haven't for a while, but unfortunately don't have time now.

If you have time and want to test with postgresql, that would be great. Take out the monkey patch, run the tests, and you'll see how they break...

@codeodor

This comment has been minimized.

Show comment Hide comment
@codeodor

codeodor Apr 21, 2012

Contributor

I'm heading to RailsConf this weekend and next week, then I'll be going fishing for a week.

I have a small window of time between the two where I might be able to do it, but more than likely, it'll be after I get back from fishing.

I've already put it on my calendar for when I return. I hate to let it sit that long, but the timing is poor for me right now.

Sorry about that. =(

Contributor

codeodor commented Apr 21, 2012

I'm heading to RailsConf this weekend and next week, then I'll be going fishing for a week.

I have a small window of time between the two where I might be able to do it, but more than likely, it'll be after I get back from fishing.

I've already put it on my calendar for when I return. I hate to let it sit that long, but the timing is poor for me right now.

Sorry about that. =(

@cfis

This comment has been minimized.

Show comment Hide comment
@cfis

cfis Apr 22, 2012

Contributor

Ah no problem, I know the feeling!

Contributor

cfis commented Apr 22, 2012

Ah no problem, I know the feeling!

@codeodor

This comment has been minimized.

Show comment Hide comment
@codeodor

codeodor May 19, 2012

Contributor

To answer the question I raised about should it use relation.distinct(true) or relation.distinct(distinct) (use the parameter that got passed in), the answer is "it doesn't make a difference."

I also did some poking around and removed the monkey patches like you mentioned, to see what all happened.

It was a good exercise, because I found a bug in the code I submitted before, and actually realized we could remove that code and call Rails' version of the method unless CPK needed to deal with an array of values.

Anyway, I'm submitting a pull request in a sec for that.

Contributor

codeodor commented May 19, 2012

To answer the question I raised about should it use relation.distinct(true) or relation.distinct(distinct) (use the parameter that got passed in), the answer is "it doesn't make a difference."

I also did some poking around and removed the monkey patches like you mentioned, to see what all happened.

It was a good exercise, because I found a bug in the code I submitted before, and actually realized we could remove that code and call Rails' version of the method unless CPK needed to deal with an array of values.

Anyway, I'm submitting a pull request in a sec for that.

@codeodor codeodor closed this May 19, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment