Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_geo_distance sort: Support for many-to-many geo distance sort #3926

Closed
bobrik opened this issue Oct 17, 2013 · 10 comments
Closed

_geo_distance sort: Support for many-to-many geo distance sort #3926

bobrik opened this issue Oct 17, 2013 · 10 comments

Comments

@bobrik
Copy link
Contributor

bobrik commented Oct 17, 2013

What we'd like to see:

Ability to specify many geo points in geo distance sort, like this:

{
    "sort": [
        {
            "_geo_distance": {
                "geo_points.point": [
                    {
                        "lat": 59.959946,
                        "lon": 30.313819
                    },
                    {
                        "lat": 59.979788,
                        "lon": 30.304513
                    }
                ]
            }
        }
    ]
}

Use-case: each user has several points of interest and wants to find other users who are close to these points. For example, I have work, home, and favourite breakfast cafe. I'd like to find people who hang out near any of those places. Right now I should fire as many queries, as many points of interest I have, and then I need to merge results outside of elasticsearch. Having ability to use single query for this.

Looks like it could be quite trivial change, an extra loop to iterate requested points. I might be wrong about this.

I understand, that it could be very cpu intensive for querying with many points, but the point here is to use small amount of points.

@ghost ghost assigned chilling Oct 17, 2013
@bobrik
Copy link
Contributor Author

bobrik commented Oct 22, 2013

Any feedback about this?

@brwe
Copy link
Contributor

brwe commented Oct 22, 2013

Have you considered using the function_score query?
You can configure it so that people returned by a query are sorted by their closest distance to any number of defined points. Is this what you want?

Here is how the query would roughly look like:

"query": {
      "function_score": {
         "query": {
            put here which kinds of people you look for and also maybe put distance filters to get rid of people that are too far anyway
         },
         "functions": [
            {
               "gauss": {
                   "name of geo location field for person": {
                        "origin": "geo point of favorite cafe 1",
                        "scale": ".."
                  }
               }
            },
            {
               "gauss": {
                  "name of geo location field for person": {
                      "origin": "geo point of favorite cafe 2",
                      "scale": "..."
                   }
               }
            },
            ...put even more places...
         ],
         .. use the score of nearest place...
         "score_mode": "max"
      }
   }

@bobrik
Copy link
Contributor Author

bobrik commented Oct 23, 2013

@brwe I think It could work too, but _geo_distance would be easier to debug, because it tells you the distance. I also think _geo_distance should work faster in my case.

I tried function_score with query like this:

{
  "fields": [
    "geo_points"
  ],
  "size": 10000,
  "filter": {
    "and": [
      {
        "or": [
          {
            "geo_distance": {
              "geo_points.point": {
                "lat": 59.953478,
                "lon": 30.315557
              },
              "distance": "1.2km"
            }
          },
          {
            "geo_distance": {
              "geo_points.point": {
                "lat": 60.002,
                "lon": 30.298391
              },
              "distance": "1.2km"
            }
          }
        ]
      },
      {
        "terms": {
          "user_id": [
            51933602,
            63087823,
            45214178
          ]
        }
      }
    ]
  },
  "query": {
    "function_score": {
      "boost_mode": "replace",
      "score_mode": "max",
      "functions": [
        {
          "gauss": {
            "geo_points.point": {
              "origin": {
                "lat": 59.953478,
                "lon": 30.315557
              },
              "scale": "1.2km"
            }
          }
        },
        {
          "gauss": {
            "geo_points.point": {
              "origin": {
                "lat": 60.002,
                "lon": 30.298391
              },
              "scale": "1.2km"
            }
          }
        }
      ]
    }
  }
}

Notice that I filtered only 3 users, because their score for some reason is 0. Response looks like this:

{
  "total": 3,
  "max_score": 0,
  "hits": [
    {
      "_index": "female",
      "_type": "users",
      "_id": "45214178",
      "_score": 0,
      "fields": {
        "geo_points": [
          {
            "point": "59.957,30.303"
          },
          {
            "point": "0.004,0.004"
          },
          {
            "point": "59.948,30.276"
          },
          {
            "point": "59.944,30.272"
          }
        ]
      }
    },
    {
      "_index": "female",
      "_type": "users",
      "_id": "63087823",
      "_score": 0,
      "fields": {
        "geo_points": [
          {
            "point": "59.956,30.318"
          },
          {
            "point": "0,0"
          }
        ]
      }
    },
    {
      "_index": "female",
      "_type": "users",
      "_id": "51933602",
      "_score": 0,
      "fields": {
        "geo_points": [
          {
            "point": "59.956,30.314"
          },
          {
            "point": "0,-0.004"
          }
        ]
      }
    }
  ]
}

There are points far away from requested, but there are closer points in there same objects too. I thought that best match should be used. Am I missing something?

@brwe
Copy link
Contributor

brwe commented Oct 23, 2013

No, you did not miss something, I did. I completely misunderstood your request. function_score unfortunately does not work for fields with multiple values. Hope you did not waste too much time with it.

@imotov
Copy link
Contributor

imotov commented Oct 24, 2013

Perhaps you can try implementing it as script-based sort that will calculated distances to all points of interest and then return the shortest one.

@bobrik
Copy link
Contributor Author

bobrik commented Oct 24, 2013

@brwe maybe function_sort should fire an exception if it called against field with multiple values?

@imotov yep, that should work, but scripting is much slower than native support. But I'll try.

@bobrik
Copy link
Contributor Author

bobrik commented Oct 24, 2013

@imotov Looks like I cannot iterate array of objects in elasticsearch. I've found this thread: https://groups.google.com/forum/#!topic/elasticsearch/8Z2KwuPlyas

My mapping looks like this:

{
  "users" : {
    "properties" : {
      "geo_points" : {
        "properties" : {
          "point" : {
            "type" : "geo_point"
          }
        }
      }
    }
  }
}

and script looks like this:

current_to_any_geo_distance = 100;
if (!doc["geo_points"].empty && !current_geo.empty) {
    foreach (point : doc["geo_points"].values) {
        distance = point["point"].arcDistanceInKm(current_geo.lat, current_geo.lon);
        if (distance < current_to_any_geo_distance) {
            current_to_any_geo_distance = distance;
        }
    }
}

and elasticsearch complains like this:

{
  "took": 44,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 1,
    "failures": [
      {
        "index": "female",
        "shard": 3,
        "status": 500,
        "reason": "RemoteTransportException[[search04][inet[/192.168.1.91:9300]][search/phase/query]]; nested: QueryPhaseExecutionException[[female][3]: query[ConstantScore(cache(_type:users))],from[0],size[20],sort[<custom:\"_script\": org.elasticsearch.index.fielddata.fieldcomparator.DoubleScriptDataComparator$InnerSource@1dd5b162>!]: Query Failed [Failed to execute main query]]; nested: CompileException[[Error: No field found for [geo_points] in mapping with types [users]]\n[Near : {... order = OLDER_ORDER; ....}]\n             ^\n[Line: 1, Column: 1]]; nested: ElasticSearchIllegalArgumentException[No field found for [geo_points] in mapping with types [users]]; "
      }
    ]
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

Is that kind of iteration really supported in elasticsearch? It looks like bug to me, but I couldn't find an existing issue.

cc @dadoonet

@imotov
Copy link
Contributor

imotov commented Oct 25, 2013

@bobrik - such iteration is going to work only on source. A doc field can only be obtain by the actual field name - geo_points.point in your case. doc["geo_points"] simply doesn't exist in the index. Here is an example that demonstrates the idea - https://github.com/imotov/elasticsearch-test-scripts/blob/master/multi_geo_search.sh.

@bobrik
Copy link
Contributor Author

bobrik commented Oct 25, 2013

@imotov looks like I forgot to submit my solution yesterday :)

I looked at source code and found that I can actually iterate doc["geo_points.point"].values, but it will be an array of objects with lat and lon properties, not geo_points, therefore I cannot compute distance.

Looks like I can compute distance in pretty hardcore way:

import org.elasticsearch.common.geo.GeoDistance;
import org.elasticsearch.common.unit.DistanceUnit;

foreach (point : doc["geo_points.point"].values) {
    distance = GeoDistance.ARC.calculate(point.lat, point.lon, current_geo.lat, current_geo.lon, DistanceUnit.KILOMETERS);
}

I'm not sure if this is friendly. It's not in the docs for sure :)

@clintongormley clintongormley assigned brwe and unassigned chilling Jul 25, 2014
brwe added a commit to brwe/elasticsearch that referenced this issue Jul 30, 2014
Add computation of disyance to many geo points. Example request:

```
{
  "sort": [
    {
      "_geo_distance": {
        "location": [
          {
            "lat":1.2,
            "lon":3
          },
          {
             "lat":1.2,
            "lon":3
          }
        ],
        "order": "desc",
        "unit": "km",
        "sort_mode": "max"
      }
    }
  ]
}
```

closes elastic#3926
@brwe brwe closed this as completed in fe86c8b Jul 31, 2014
brwe added a commit that referenced this issue Jul 31, 2014
Add computation of disyance to many geo points. Example request:

```
{
  "sort": [
    {
      "_geo_distance": {
        "location": [
          {
            "lat":1.2,
            "lon":3
          },
          {
             "lat":1.2,
            "lon":3
          }
        ],
        "order": "desc",
        "unit": "km",
        "sort_mode": "max"
      }
    }
  ]
}
```

closes #3926
@bobrik
Copy link
Contributor Author

bobrik commented Jul 31, 2014

Nice, thanks!

@jpountz jpountz removed the review label Aug 1, 2014
@brwe brwe changed the title Support for many-to-many geo distance sort _geo_distance sort: Support for many-to-many geo distance sort Aug 4, 2014
brwe added a commit that referenced this issue Sep 8, 2014
Add computation of disyance to many geo points. Example request:

```
{
  "sort": [
    {
      "_geo_distance": {
        "location": [
          {
            "lat":1.2,
            "lon":3
          },
          {
             "lat":1.2,
            "lon":3
          }
        ],
        "order": "desc",
        "unit": "km",
        "sort_mode": "max"
      }
    }
  ]
}
```

closes #3926
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants