Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How can I improve performance ? #37

Open
Crystark opened this issue Jan 13, 2016 · 3 comments
Open

[Question] How can I improve performance ? #37

Crystark opened this issue Jan 13, 2016 · 3 comments
Labels

Comments

@Crystark
Copy link

Hi David,

So I've been playing with the benchmarks changing some stuff to comply with my use case and see how this goes:

  • As you did with the greek points, i created entries composed of circles (geocircles, most of which have a radius of 0.1 km) . My data set contains 85k geopoints + radius.
  • In the search benchmarks, instead of searching a rectangle, i'm searching a point ("is that point in any of the circles ?").

Well, the bench ran for about an hour and the results were really bad: I barely reach 100 ops / sec whatever the maxChildren I used.

When I take a look at the visualization, it seems a bit weird:
starm4:
starm4

defaultm4:
defaultm4

Maybe It will mean something to you but it looks nothing like what you're showing on your main page.

So I tried replacing my circles by rectangles (i did that really quick just using the circle's center from my data and creating a square by adding +0.5 to the lat and lon). Now i'm starting to get decent results: 5k / ops. I also tried increasing the rectangle size (+1.0 instead of +0.5) and got down to 4k ops / sec.

The visualisation seems a bit more coherent there:
starM4:
rect-starm4

defaultM4:
rect-defaultm4

So here are the things I'd like to ask:

  • Are circles that bad ?
  • When using Geometries.circle(x, y, radius) can you confirm that x is the lon and y is the lat (provided they've been normalized beforehand) ? Also what is the unit of the radius ? I assumed it was km but now that i'm looking at it, it seems it's not and it would be more of a x-y variation value.

It feels like I'm doing something wrong.

Just to be clear. my ultimate goal is to find the best compromise between precision and speed to search all 85k (but i'm targeting 1m) geocircles that matches a specific geopoint.

Thanks for your insight on the matter.

@davidmoten
Copy link
Owner

Hi, sorry I missed this question! I'm getting a lot of github issues from projects I'm interested in and it was lost in the noise.

When using Geometries.circle(x, y, radius) can you confirm that x is the lon and y is the lat (provided they've been normalized beforehand) ? Also what is the unit of the radius ? I assumed it was km but now that i'm looking at it, it seems it's not and it would be more of a x-y variation value.

The unit of the radius is that of the x and y points so it's in degrees.

Ultimately the performance of the rtree will be driven by searching with rectangles. Searching using other geometries still finds points in the bounding rectangle of a circle then filters the points for membership in the circle.

A couple of small test cases (with circles and without) would be great to benchmark if you can.

Using rectangles your run still sounds like its a bit slow so would be nice to narrow down.

@Crystark
Copy link
Author

Crystark commented Feb 4, 2016

Thanks for your answer :)
I'll be doing a little more tests. I don't suppose you have an util class that could convert a radius from KM to degrees ? :)

I'll do a lil more testing once I've been able to create proper cirlces. What do you mean by "narrow down" the test with rectangles ? You mean use smaller rectangles ?

Thanks again for your help.

@davidmoten
Copy link
Owner

I do have a util class in https://github.com/davidmoten/grumpy/blob/master/grumpy-core/src/main/java/com/github/davidmoten/grumpy/core/Position.java. The grumpy-core module is available on Maven Central.

The catch is that a degree of longitude varies in size depending on how close to the equator/poles it is. As a consequence converting a radius from km to degrees will only work on a local level (say where your range of data is within a few degrees of longitude). Any larger than that then it will be a distorted circle. However what you can do is use a rectangle that definitely contains the data then filter the data using the Point class to check distance from a centre.

I'll do a lil more testing once I've been able to create proper cirlces. What do you mean by "narrow down" the test with rectangles ? You mean use smaller rectangles ?

I meant don't test with circles initially because Rtree works by doing queries based on rectangles then filters the results based on circles. I'd just like to see if your queries are performant with rectangles only to isolate any problems. 4K ops/s sounds slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants