count on a collection from ActiveRecord::Base.near returns an ordered hash #86

Closed
mguterl opened this Issue Jul 14, 2011 · 57 comments

Projects

None yet
@mguterl
Contributor
mguterl commented Jul 14, 2011

Trying to figure out what is going on here:

>> Recipient.near("Cincinnati", 150)
=> [#<Recipient id: 1, list_id: 1, email: "ssmith@smith.com", created_at: "2011-06-20 16:01:04", updated_at: "2011-07-13 21:30:39", first_name: "Sam", last_name: "Smith", unsubscribed_at: "2011-07-13 21:30:39", latitude: 39.1, longitude: -84.5, location: "Cincinnati, OH", secret_id: "Ki1-a_R3mP-ohS5_LyYY">]

When trying to get the count, it does not return an integer as I would expect.

>> Recipient.near("Cincinnati", 150).count
=> #<OrderedHash {"Ki1-a_R3mP-ohS5_LyYY"=>1}>
@nlsrchtr

Just a quick thought: This one could be related to the problem, that pagination plugins like will_paginate or kaminari can't calculate the number of pages correctly.

See also related post at Ryan Bates Screencast by vincentpaca:

@jobs = Job.near(@address, @miles).paginate(:page => params[:page], :per_page => 10)
@mguterl
Contributor
mguterl commented Jul 15, 2011

I've done more investigation, I'm not sure, but it looks like AR might be screwing up the query when converting it to a COUNT(*).

>> Recipient.near("Cincinnati")
  Recipient Load (0.5ms)  SELECT *, 3958.755864232 * 2 * ASIN(SQRT(POWER(SIN((39.1031182 - latitude) * PI() / 180 / 2), 2) + COS(39.1031182 * PI() / 180) * COS(latitude * PI() / 180) * POWER(SIN((-84.5120196 - longitude) * PI() / 180 / 2), 2) )) AS distance, CAST(DEGREES(ATAN2( RADIANS(longitude - -84.5120196), RADIANS(latitude - 39.1031182))) + 360 AS decimal) % 360 AS bearing FROM `recipients` WHERE (latitude BETWEEN 38.8136546337783 AND 39.3925817662217 AND longitude BETWEEN -84.8850338515311 AND -84.1390053484689) GROUP BY recipients.id,recipients.list_id,recipients.email,recipients.created_at,recipients.updated_at,recipients.first_name,recipients.last_name,recipients.unsubscribed_at,recipients.latitude,recipients.longitude,recipients.location,recipients.secret_id HAVING 3958.755864232 * 2 * ASIN(SQRT(POWER(SIN((39.1031182 - latitude) * PI() / 180 / 2), 2) + COS(39.1031182 * PI() / 180) * COS(latitude * PI() / 180) * POWER(SIN((-84.5120196 - longitude) * PI() / 180 / 2), 2) )) <= 20 ORDER BY 3958.755864232 * 2 * ASIN(SQRT(POWER(SIN((39.1031182 - latitude) * PI() / 180 / 2), 2) + COS(39.1031182 * PI() / 180) * COS(latitude * PI() / 180) * POWER(SIN((-84.5120196 - longitude) * PI() / 180 / 2), 2) )) ASC

Compare that to:

>> Recipient.near("Cincinnati").count
  SQL (2.4ms)  SELECT COUNT(*) AS count_all, recipients.id,recipients.list_id,recipients.email,recipients.created_at,recipients.updated_at,recipients.first_name,recipients.last_name,recipients.unsubscribed_at,recipients.latitude,recipients.longitude,recipients.location,recipients.secret_id AS recipients_id_recipients_list_id_recipients_email_recipients_created_at_recipients_updated_at_recipients_first_name_recipients_last_name_recipients_unsubscribed_at_recipients_latitude_recipients_longitude_recipients_location_recipients_secret_id FROM `recipients` WHERE (latitude BETWEEN 38.8136546337783 AND 39.3925817662217 AND longitude BETWEEN -84.8850338515311 AND -84.1390053484689) GROUP BY recipients.id,recipients.list_id,recipients.email,recipients.created_at,recipients.updated_at,recipients.first_name,recipients.last_name,recipients.unsubscribed_at,recipients.latitude,recipients.longitude,recipients.location,recipients.secret_id HAVING 3958.755864232 * 2 * ASIN(SQRT(POWER(SIN((39.1031182 - latitude) * PI() / 180 / 2), 2) + COS(39.1031182 * PI() / 180) * COS(latitude * PI() / 180) * POWER(SIN((-84.5120196 - longitude) * PI() / 180 / 2), 2) )) <= 20 ORDER BY 3958.755864232 * 2 * ASIN(SQRT(POWER(SIN((39.1031182 - latitude) * PI() / 180 / 2), 2) + COS(39.1031182 * PI() / 180) * COS(latitude * PI() / 180) * POWER(SIN((-84.5120196 - longitude) * PI() / 180 / 2), 2) )) ASC

I think it is related to something in ActiveRecord::Calculations#execute_grouped_calculation. I'm still trying to gain a deeper understanding of what is happening here, I'll post back if I find anything else.

@mguterl
Contributor
mguterl commented Jul 15, 2011

ActiveRecord::Calculations#execute_grouped_calculation is used any time the query contains columns to group by. This makes ActiveRecord assume that the result should be returned in an OrderedHash.

I modified my local copy of ActiveRecord::Calculations#perform_calculation to avoid the #execute_grouped_calculation path and use #execute_simple_calculation and everything works as expected. I doubt I know enough about AR to be able to come up with a real solution.

@mguterl
Contributor
mguterl commented Jul 15, 2011

@alexreisner or anyone else, do you have any idea if it is possible to generate this query without using GROUP BY and HAVING? As long as GROUP BY is used (which is a requirement for HAVING) I don't think count chained onto the near scope will work.

I think we rolled this by hand at one point in another project, I'm going to go digging.

@alexreisner
Owner

I think the way to fix this is to add a count method to the near scope. For instance, you can do this:

class Venue < ActiveRecord::Base
  scope :blah, where("id > 10") do
    def count
      "HI THERE"
    end
  end
end

and then Venue.blah.count will return "HI THERE". The trick is figuring out how to get it to return the proper count.

@mguterl
Contributor
mguterl commented Jul 21, 2011

Thanks for the insight @alexreisner. I think this will work for my simple case, but I wonder if it will work when combined with additional scopes:

Venue.near("Cincinnati, OH").active.count

In our application we can easily account for this by applying the location filter last. We're going to eventually need pagination also, so I'd like for the solution to take that into account.

I'll try and take a look at how the methods defined in the scope work, it looks like a module is created containing them and the relation is extended with those methods.

@alexreisner
Owner

I've never used the technique for anything practical but I do believe the method will be available regardless of scope order (unless another scope also defines a custom count method).

@hairyheron

I also ran across this problem, which made using will_paginate impossible for me. I finally fixed this problem for me by modifying the near scope afterwards (in a pretty nasty way):

q = MyModel.near(origin, distance, options)

# remove GROUP BY
q.instance_variable_set('@group_values', [])

# transform HAVING into WHERE
where_values  = q.instance_variable_get('@where_values')
having_values = q.instance_variable_get('@having_values')

q.instance_variable_set('@where_values',  (where_values + having_values))
q.instance_variable_set('@having_values', [])

q

This works for me. However, I wonder whethere there are any downsides in building the query like this. What were your intentions in using HAVING here, performance reasons?

@hoverlover

Any word on this guys? I'm having problems using the near scope with pagination (kaminari) as well. Need to get this figured out soon.

@nlsrchtr

School.near(current_user, 250).page(params[:page]) works as expected with kaminari (0.12.4) and geocoder (1.0.2). And also total_count returns the correct number of selected elements. I'm using MySQL version 5.1.50.

@kot-begemot

Which version of geocoder you are running??

@mguterl
Contributor
mguterl commented Sep 22, 2011

It has been awhile since I have looked at this, but if I recall correctly, this was an issue with MySQL.

@hoverlover

@kot-begemot, I am using 1.0.4, but also tried 1.0.2 as @nlsrchtr suggested.

@mguterl I am using Postgres, not MySQL.

@nlsrchtr, what do you mean it "works as expected"?:

Business.near('90210', 100).page(1).count #=> {[nil, nil]=>1}
Business.near('90210', 100).page(1).total_count #=> 1
Business.near('90210', 100).count #=> {nil=>1}
businesses = Business.near('90210', 100)
businesses.size #=> 453

I don't know what #total_count does, and not sure what should be returned from #page(1) (I think page is just a scope, so it should be an ActiveRecord::Relation object that is returned). But, at the very least, I think the results from the third line should match what is returned on the fifth line, right?

The hash returned from these calls to #count may or may not be why Geocoder isn't playing nicely with pagination libraries such as Kaminari or will_paginate. Either way, I don't think it's right for them to return a hash. They should be returning the actual count.

@nlsrchtr

@hoverlover: As I mentioned in my first post I had a problem with using geocoder with kaminari. kaminari is a gem for easy pagination and I had problems with the number of pages, calculated by kaminari. That's why I have a .page scope on my .near relation. And kaminari is also offering the total_count method. And for me the number of pages is correct now - and as I thought - the .count method is also fixed now.

So sorry for the noise, also for me the .count returns a hash - but my app no longer depends on it.

@hoverlover

One thing to note: Business.near('90210', 100).count does in fact execute the sql statement and does a count(*). For some reason, an ActiveSupport::OrderedHash is returned for some reason instead of the actual record count.

@hoverlover

Sorry @nlsrchtr, I don't follow what you are trying to say. I am using Kaminari v0.12.4 and also tried Geocoder v1.0.2. I still couldn't get the #paginate of Kaminari to return anything other than "\n".

That's why I have a .page scope on my .near relation.

You mean like this?

Business.near('90210', 100).page(params[:page])

This is what I am doing in my controller, and the call to #paginate in my view still outputs nothing.

the .count method is also fixed now.

Where? In Gecoder? Doesn't appear to be for me.

@hairyheron

@hoverlover: Just to be sure, have you given my workaround a try (see above)?

@nlsrchtr

No problem @hoverlover, I'll try to explain again:

@schools = School.near(current_user, 250, { :units => :km }).page(params[:page])

@schools.count returns a hash {["school1", "school1"]=>1, ["school2", "school2"]=>1 ...
which is not correct - as in your case. Should be equal to number of schools per page - in my case 15.

@schools.total_count returns 26, which is correct for me.

For getting the total number I'm not using .count, because this would result in the number of elements per page. Therefor I need to use .total_number which is offered by kaminari. To sum it up: .count is still not working correct, but I came around it by using kaminari, because I needed a paginated view in any case.

I think I can't help with your problem, because in my case <%= render @schools %> and <%= paginate @schools %> are working as expect.

@hoverlover hoverlover added a commit to hoverlover/geocoder that referenced this issue Sep 22, 2011
@hoverlover hoverlover Fixes issue #86. 96da29c
@swrobel
swrobel commented Sep 22, 2011

I believe this is an ActiveRecord bug. I just upgraded my project to Rails 3.1 to test, but alas, it still seems to be a problem. Is anyone aware whether there is a bug filed in the rails project for this?

@hoverlover

@hairyheron, I didn't try your solution because I assumed it worked, but wanted try to figure out how to truly fix it rather than do that nasty (by your words :) workaround. The fix was essentially what you did in your workaround, except I made the change in the source.

My fork now works with Kaminari, and Business.near('90210', 100).count now works as expected, returning a Fixnum. Hopefully @alexreisner will get this pulled in quickly.

@hoverlover

@swrobel, try my fork. I am using Rails 3.1 with this and it is working fine.

@hairyheron

@hoverlover: Thanks for patching geocoder. Would love to see this in the next release.

@hoverlover

@hairyheron Np. I would too. It's a great gem!

@alexreisner
Owner

Sorry, @hoverlover, but other people have suggested that fix in the past and there's a performance issue.

Just to confirm for everyone who's concerned: this IS a bug and I do see it as high priority but I can't make any promises as to when I'll have time to fix it. If anyone wants to give it a shot I'd suggest looking at how Kaminari implements total_count to see if it's relevant.

@swrobel
swrobel commented Sep 22, 2011

@alexreisner is this an ActiveRecord bug, a geocoder bug, or both? It seems like it affects a lot of pagination gems as well as geocoder, so I'd imagine AR, and if so, is the rails team aware?

@alexreisner
Owner

That's a good question, and I'm not familiar enough with AR to know for sure. I know that the various DB drivers are inconsistent in how they cast results so, for example, you'll sometimes get strings instead of integers or floats with Postgres. I'm sure this is a challenge for AR developers but I don't know if this is a manifestation of that (or similar) problem, or something else. Clearly, fixing this properly is going to take some research. Understanding exactly how a hash gets returned is probably a good first step.

@mguterl
Contributor
mguterl commented Sep 22, 2011

@alexreisner - I did do some digging into the AR internals, if you look at my 2nd and 3rd comments they may be helpful. The project that encountered this bug has stalled for a bit or I would have gone further.

@hairyheron

@alexreisner: You mentioned performance issues as a reason not to pull the given patch. Could you go into details about these issues? My feeling was that the database should optimize the query execution in any way to evaluate the bounding box conditions first before calculating the real distance on the remaining rows - assuming that you have indexed your lat and lng fields. Am I wrong about this?

@hoverlover

Let me qualify what I'm about to say by letting everyone know that I'm not in any shape or form a database admin/guru. Having said that, this gist contains the results of running Postgres's EXPLAIN command against the two queries. The first is the current query with the group by and having. The second is without them, and what is in my fork that I am proposing. If I read the results correctly, the query without the group by and having is actually slightly faster. Am I reading this correctly? Of course, lots of things will affect either query, such as indexes on the table, etc. My particular table only has the following index:

businesses_pkey" PRIMARY KEY, btree (id)

@alexreisner what are your thoughts on all of this?

Edit:

Also, let me say that while I agree that if in fact this is an ActiveRecord bug, why not go ahead and make the change here that fixes it until the AR bug can be fixed. As long as it doesn't impact performance, of course.

@hoverlover

The difference in query cost actually gets worse as the radius is increased. I won't paste the entire contents of the results here for brevity's sake. The queries were ran in the same order as in the supplied gist. The radius was increased to 10,000 miles.

Sort  (cost=121020.99..121118.98 rows=39196 width=1289)

Sort  (cost=70291.00..70388.99 rows=39196 width=1289)

The first line is the query with the group by and having. The second line is without. Seems like the query without has the potential to be more performant.

@alexreisner
Owner

@hoverlover: Very interesting findings. I'm not sure MySQL is as smart as Postgres when it comes to optimizing different parts of a WHERE clause but I'm not sure it matters anyway. More complete response coming when I have some time to take a look at this more carefully. Thanks!

@alexreisner
Owner

OK, everyone, I've finally decided to get rid of the grouping. Can you please test by using the no_grouping branch of the gem? In your Gemfile:

gem 'geocoder', :git => 'git://github.com/alexreisner/geocoder.git', :branch => 'no_grouping'

Let me know how this works for you.

@mguterl
Contributor
mguterl commented Nov 1, 2011

@alexreisner - this branch passes all of our tests.

@atsuya
atsuya commented Nov 2, 2011

i was having the same issue and tried "no_grouping" branch, then now i get following error:

ActiveRecord::StatementInvalid: Mysql2::Error: Unknown column 'distance' in 'order clause'

@swrobel
swrobel commented Nov 3, 2011

Everything seems to work fine for me, and doing User.near("Los Angeles, CA",50).count returns a fixnum now as expected.

@alexreisner
Owner

Thanks for testing, guys. The no_grouping branch will be merged into master and included in gem version 1.1 (hopefully within a few weeks).

@jerrett
jerrett commented Nov 24, 2011

on no grouping branch i get an error ordering by unknown column 'distance' :|

@dgilperez
Contributor

Hi Alex,
do you have a time estimate for this merger? We're experiencing same problems after upgrading to rails3.1
Thanks !!

@alexreisner
Owner

Unfortunately I can't tell you exactly when I'm going to merge no_grouping into master. It's going to be part of gem version 1.1 but people are still reporting some problems I want to get straightened out before the release. I really hope to get it done in the next couple of weeks but I can't promise anything. For now please just use the no_grouping branch (see Gemfile instructions above).

@alexreisner
Owner

@jerrett and @atsuya: Can you please tell me more about your errors? Are you combining near with other scopes? Can you post your Ruby code and the generated SQL queries? I need enough information that I can duplicate the problem.

@jerrett
jerrett commented Nov 28, 2011

I got it working (but not entirely sure how... there was a lot of desperate changes and debugging and then at some point it started working, one of those days), using that branch

@jerrett
jerrett commented Nov 28, 2011

Oh wait, my bad - it's still there. Digging around to try to find out why, but yeah, distance is used by default as the order by, but it's not actually selected anymore (maybe because of some of the joins/includes?)

Will update if I can narrow down what causes it to not get selected ...

The distance calculation is being used, but it's being used as part of the WHERE and not doing 'distance AS ' anymore

@jerrett
jerrett commented Nov 28, 2011

Selecting on a Schedule model, including Location

geocoding in Schedule via: reverse_geocoded_by "locations.latitude", "locations.longitude"

error: Mysql2::Error: Unknown column 'distance' in 'order clause':

SELECT `schedules`.`id` AS t0_r0, `schedules`.`location_id` AS t0_r1, `schedules`.`provider_id` AS t0_r2, `schedules`.`facility_id` AS t0_r3, `schedules`.`deleted_at` AS t0_r4, `locations`.`id` AS t1_r0, `locations`.`facility_id` AS t1_r1, `locations`.`name` AS t1_r2, `locations`.`address` AS t1_r3, `locations`.`city` AS t1_r4, `locations`.`state` AS t1_r5, `locations`.`zip` AS t1_r6, `locations`.`phone` AS t1_r7, `locations`.`latitude` AS t1_r8, `locations`.`longitude` AS t1_r9, `locations`.`deleted_at` AS t1_r10, `locations`.`permalink` AS t1_r11 FROM `schedules` LEFT OUTER JOIN `locations` ON `locations`.`id` = `schedules`.`location_id` WHERE `schedules`.`deleted_at` IS NULL AND (3958.755864232 * 2 * ASIN(SQRT(POWER(SIN((36.0853996276855 - locations.latitude) * PI() / 180 / 2), 2) + COS(36.0853996276855 * PI() / 180) * COS(locations.latitude * PI() / 180) * POWER(SIN((-86.8377990722656 - locations.longitude) * PI() / 180 / 2), 2) )) <= 14000) ORDER BY distance

from Schedule.includes( :location ).near( 'Somewhere', 200 )

@jerrett
jerrett commented Nov 28, 2011

Basically what is happening is that when you use includes in AR, it clobbers any selects you have setup .. so having this all done as a select breaks when you use includes.

It looks like it was always using select, and maybe was always a bit broken (but not as obviously?), however before when it was :having at least the query would execute (except when paired with something trying to do a count with the query like will_paginate)

not sure what the answer is :(

@jerrett
jerrett commented Nov 28, 2011

As a temporary fix for people experiencing this, using joins instead of includes does fix the problem, however that has obvious performance impacts. I'd be nice to keep the ability to use includes as well (it used to work with includes)

@atsuya
atsuya commented Nov 28, 2011

i can't give you the exact sql statement, but i have includes and where then near at the end. i looked at the sql statement that errors out, and it refers to 'distance' but it doesn't have 'distance' defined in the sql statement.

@jerrett
jerrett commented Nov 28, 2011

Atsuya, if you change the includes to joins does the problem go away?

@atsuya
atsuya commented Nov 28, 2011

ah, it indeed goes away when i use joins instead of includes.

@jerrett
jerrett commented Nov 29, 2011

Just for fun, it seems like the select *, is causing grief with associations, too. If you use joins, id is ambigious and the id of the object you are looking for becomes overidden with the id of whatever it's joining.

I think doing "table.*" would fix this, but i wonder what other issues there are using "select" - AR doesn't seem to be very happy with it

@alexreisner
Owner

@jerrett: Yes, the next version of Geocoder will actually use the table.* syntax so joins work properly, but it still doesn't allow includes. I'm still looking for the correct way to solve that.

Thanks for the feedback, guys. I'm going to merge no_grouping into master soon.

@alexreisner
Owner

I've merged no_grouping and am closing this issue (finally!). Thanks again to everyone who contributed. If you have any more comments on the :includes problem please open a new issue. Also, if you're curious, check out Issue #99 for an interesting (though not yet complete) solution.

@jerrett
jerrett commented Nov 29, 2011

Sweet, I see that's already in master. When are you planning a release? I'm a little uncomfortable with locking to "master" ;]

@alexreisner
Owner

As you should be! Definitely not a long-term solution.

Please don't hold me to it as I've been really busy, but I'm hoping to release 1.1 this week.

@jerrett
jerrett commented Nov 29, 2011

No worries, just curious if it's close - we're gearing up to release a significant new release that's using geocoder in the next couple weeks ;]

thanks for the handy plugin and for the quick responses, refreshing to know it's an active project ;]

@hoverlover

@jarret No need to feel uncomfortable. Just include the commit sha where Alex merged and you will stay locked to that commit. I do it all the time. @alex thanks for working so hard on this!

@alexreisner
Owner

@hoverlover: Thanks, and good point about locking to a commit--totally slipped my mind. The exact Gemfile line would be:

gem 'geocoder', :git => 'git://github.com/alexreisner/geocoder.git', :ref => '4592b73'
@nickaknudson

I was only able to get geocoder to work after pointing my Gemfile to the no_grouping branch, even though I had geocoder gem version 1.1.1

Not sure what to make of this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment