Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate lat/lon BKD and spatial3d [LUCENE-6699] #7757

Closed
asfimport opened this issue Jul 28, 2015 · 220 comments
Closed

Integrate lat/lon BKD and spatial3d [LUCENE-6699] #7757

asfimport opened this issue Jul 28, 2015 · 220 comments

Comments

@asfimport
Copy link

asfimport commented Jul 28, 2015

I'm opening this for discussion, because I'm not yet sure how to do
this integration, because of my ignorance about spatial in general and
spatial3d in particular :)

Our BKD tree impl is very fast at doing lat/lon shape intersection
(bbox, polygon, soon distance: LUCENE-6698) against previously indexed
points.

I think to integrate with spatial3d, we would first need to record
lat/lon/z into doc values. Somewhere I saw discussion about how we
could stuff all 3 into a single long value with acceptable precision
loss? Or, we could use BinaryDocValues? We need all 3 dims available
to do the fast per-hit query time filtering.

But, second: what do we index into the BKD tree? Can we "just" index
earth surface lat/lon, and then at query time is spatial3d able to
give me an enclosing "surface lat/lon" bbox for a 3d shape? Or
... must we index all 3 dimensions into the BKD tree (seems like this
could be somewhat wasteful)?


Migrated from LUCENE-6699 by Michael McCandless (@mikemccand), 1 vote, resolved Sep 02 2015
Attachments: Geo3DPacking.java, LUCENE-6699.patch (versions: 26)
Linked issues:

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

Hi Mike,

The coordinates you'd need to store for geospatial3d are: (x,y,z), rather than (lat,lon,z). The values are floating-point numbers that range between -1.0 and 1.0 for a sphere, and slightly more than that in abs value for a WGS84 ellipsoid.

There was a ticket where I attached code for a packing scheme that would ram all three values into a 64-bit long; I'll see if I can find it. That packing scheme basically gave you resolution of about a meter. However, as you have pointed out, it's not strictly necessary to stick to 64-bit longs either, so you're free to propose anything that makes sense. ;-)

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

Can we "just" index earth surface lat/lon, and then at query time is spatial3d able to give me an enclosing "surface lat/lon" bbox for a 3d shape?

@mikemccand Ok, so now you have me confused a bit as to what your requirements are for BKD. If you want to split the globe up into lat/lon rectangles, and use BKD that way to descend, then obviously you'd need points to be stored in lat/lon. But that would make less sense for geospatial3d, because what you're really trying to do is assess membership in a shape, or distance also in regards to a shape, both of which require (x,y,z) not lat/lon. Yes, you can convert to (x,y,z) from lat/lon, but the conversion is relatively expensive.

Instead, I could imagine just staying natively in (x,y,z), and doing your splits in that space, e.g. split in x, then in y, then in z. So you'd have a GeoPoint3D which would pack (x,y,z) in a format you could rapidly extract, and a splitting algorithm that would use the known ranges for these values. Does that make sense to you? Would that work with BKD?

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

OK, thanks @DaddyWri, I think I understand.

Yes BKD tree can easily split on the 3 (x,y,z) dimensions ... I just need to generalize it (it's currently "hardwired" for just 2 dimensions).

So for this to work, I need to be able to efficiently ask the query shape whether it separately overlaps with a range in each of the 3 dimensions. E.g. "do you intersect 0.3 <= x < 0.7", and same for y and z. I assume this is fast/simple to do with the spatial3d APIs?

And then of course with each indexed x,y,z point I also need to test whether the shape includes it.

Hmm, one source of efficiency for the BKD tree is it can recognize at search time when a given indexed cell is fully contained by the query shape, and then it doesn't need to test every point against the shape (it just blindly collects all docIDs in that cell). But in the 3D case, I think that optimization would never apply? I.e, the leaf cells would all be little cubes on the earth's surface, which include a volume (above and below earth's surface) that the shape does not accept?

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

@mikemccand

So for this to work, I need to be able to efficiently ask the query shape whether it separately overlaps with a range in each of the 3 dimensions. E.g. "do you intersect 0.3 <= x < 0.7", and same for y and z. I assume this is fast/simple to do with the spatial3d APIs?

Hmm. There's a "bounds()" method which obtains the lat/lon bounds of a shape. But for 3d we'd not be able to use that. However:

Hmm, one source of efficiency for the BKD tree is it can recognize at search time when a given indexed cell is fully contained by the query shape, and then it doesn't need to test every point against the shape (it just blindly collects all docIDs in that cell). But in the 3D case, I think that optimization would never apply?

Well, IF we presume that the records all lie on the surface of the world, then you can ask the question, "do these x, y, z bounds, when intersected with the world surface, all lie within the shape, or overlap the shape?" I'll have to think about how efficient that can be made to work, but you ARE after all describing a second surface shape by your x-y-z ranges, so in theory this should be doable. In particular, the GeoBBox construct allows you to find the relationship between a specific kind of surface shape (currently only lat/lon rectangles and their degenerate 3d equivalents) and any other GeoShape. We could try to introduce x-y-z 3d rectangles as shapes; they'd be completely describable by planes, so really all the same logic would apply. I would need to create a new GeoAreaFactory method, and a family of x-y-z bounding shapes, e.g. GeoXYZArea, and it should all work.

To be sure, have a look at the functionality that GeoArea gives you. If that looks like what you need, and you can convince yourself that it would be reasonably efficient as far as BKD is concerned, then I'll go ahead and create a patch that does what you need.

Thinking in more detail about BKD efficiency, I realize that GeoXYZArea could describe a huge number of 3D rectangles that have NO intersection with the world surface. That's worrisome because it implies that BKD over the (x,y,z) space may not be a very good way of organizing records? Or does it?

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

More analysis:

(1) Your BKD search would need to construct a GeoArea object each time it descends a level.
(2) Construction of an XYZ area will definitely involve construction of up to six additional objects (all planes), but should otherwise be computationally cheap.
(3) To minimize work on both construction and comparison, I'd need to create the following new classes, where Dg indicates "degenerate" in that dimension, e.g. "DgX" means "degenerate in X":

XYZArea
DgXYZArea
XDgYZArea
XYDgZArea
DgXDgYArea
DgXYDgZArea
XDgYDgZArea
DgXDgYDgZArea

Note: these are not named "Geo" objects, because they do not have anything to do with the surface of the world.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

you ARE after all describing a second surface shape by your x-y-z ranges

That's exactly it. Essentially the BKD tree needs a way to recursively chop up the earth surface into smaller and smaller curved-rectangle-like (I think?) shapes on the earth's surface, both at indexing time and at search time.

At search time, for a given cell, it needs to "relate" to the query shape to know if the query shape full contains the cell, does not overlap with the cell, or partially overlaps.

But I don't understand the "degenerate in X/Y" Area objects... it seems like GeoArea would be used for the lat/lon "approximation" (outer bounding box?) to a GeoShape? Seems like it's better if we can do all functions using proper earth-surface shapes? Sorry this is still all very new to me :)

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

Essentially the BKD tree needs a way to recursively chop up the earth surface into smaller and smaller curved-rectangle-like (I think?) shapes on the earth's surface, both at indexing time and at search time.

With x/y/z splitting, actually you're not quite doing that. Instead, you are chopping up the space that the entire world lives in (not just its surface). One these 3d rectangles may or may not actually intersect the surface (which is where all the geo shapes actually lie). If it does intersect, it might intersect on only one side of the world, or it might intersect on two sides of the world. A long, thin 3d rectangle could well encompass a little bit of territory in (say) the UK as well as Alaska, for instance. But as you describe the algorithm, I'm not sure that this is important at all to know.

At search time, for a given cell, it needs to "relate" to the query shape to know if the query shape full contains the cell, does not overlap with the cell, or partially overlaps.

The GeoArea interface gives you all of that, which is why I wanted to implement objects that aren't limited to the surface but do implement GeoArea.

But I don't understand the "degenerate in X/Y" Area objects... it seems like GeoArea would be used for the lat/lon "approximation" (outer bounding box?) to a GeoShape? Seems like it's better if we can do all functions using proper earth-surface shapes?

GeoArea objects are not constrained to be surface objects. Right now the only implementers of GeoArea are bounding boxes, bounded in latitude and longitude, but that's merely due to lack of need for anything else. The relationship types GeoArea objects can determine are against general GeoShape objects (which are surface objects), so the semantics are perfect for what you are trying to do. You can determine whether a lat/lon rectangle overlaps, contains, is within, or doesn't intersect at all with, any arbitrary spatial3d surface shape.

But let's be clear: for BKD in the x,y,z space, GeoArea objects bounding in lat/lon are useless. So we need to invent GeoArea objects that represent 3d rectangles. If we try to talk about a general XYZ-bounded area, then there will be up to two bounds for X (min and max), two bounds for Y (min and max), and two bounds for Z (min and max). The degenerate cases come into play when min-X == max-X, or min-Y == max-Y, etc, or when there is no min-X bound but just a max-X one, for instance. This can be conditional logic but the whole thing is faster and more efficient if there's an object for each funky case.

Hope this helps.

@mikemccand If this all is clear now, and you want to proceed, let me know and I can start working on it tonight.

@asfimport
Copy link
Author

Nick Knize (@nknize) (migrated from JIRA)

I had thrown this code together a while ago before I put the GeoPointField / Geo3d integration work on hold. This rough draft converts 3D LLA coordinates into ECEF cartesian coordinates using the WGS84 based non-iterative approach in GeoProjectionUtils (derived from the conversion approach illustrated in the Manual of Photogrammetry using 2 sin/cos and 1 sqrt). The ECEF cartesian coordinate is scaled to a unit spheroid (since this is presumably what Geo3D requires) and each of the 32 bits are interleaved (XYZ) akin to MortonEncoding. Decoding procedures are also provided. While this encoding is not nicely represented as a long (not yet convinced 21 bits per will preserve "acceptable" precision, though we could allocate bits differently since larger altitudes degrade w/ conversion) it is BinaryDocValue friendly and enables a space partitioning/prefix coded approach similar to the way GeoPointField currently works. The Most-Significant 3 bits represent an Oct-Cell at the first level, next 3 for level 2, etc.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Instead, you are chopping up the space that the entire world lives in (not just its surface).

Right, but, that approach seems wasteful? I mean it should work correctly, but as you said some cells will span strange parts of the surface. It should work fine but seems inefficient ...

Like, if there is a way instead to make smaller and smaller surface shapes in the BKD tree that would be best? I.e. fix BKD to operate entirely on the surface ...

Michael McCandless If this all is clear now, and you want to proceed, let me know and I can start working on it tonight.

Thank you for being so eager and volunteering here but maybe wait until we figure out the likely best approach here! I am still a newbie/lost in the details... I lack intuition.

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

Right, but, that approach seems wasteful? I mean it should work correctly, but as you said some cells will span strange parts of the surface. It should work fine but seems inefficient ... Like, if there is a way instead to make smaller and smaller surface shapes in the BKD tree that would be best? I.e. fix BKD to operate entirely on the surface ...

So, if you turn the BKD descent into something other than (x,y,z), it would imply that you store points for records in something other than (x,y,z) too, no? I am unsure whether the additional cost of representing records with, say, lat/lon, and doing the conversion to (x,y,z) at search time for every records encountered would be more or less expensive than having a somewhat wasteful representation for the tree itself. I expect the conversion for every record would turn out to be more expensive, but maybe this is something we'd just have to try. The obvious alternative would be to store just lat/lon, exactly like is done for Nicholas's code, and use the same BKD tree. The only difference would be: (a) you'd need to construct a GeoBBox, using GeoBBoxFactory, representing your area, and then use getRelationship() to determine intersection, contains, within, or disjoint for the shape, and (b) you'd have to create a GeoPoint at search time for every record encountered.

@asfimport
Copy link
Author

Nick Knize (@nknize) (migrated from JIRA)

The obvious alternative would be to store just lat/lon, exactly like is done for Nicholas's code

Just to clarify, so there's no confusion, the attached code converts from lat/lon to ECEF (either full range in meters, or scaled to the unit spheroid) and stores the result in a 96 bit BitSet. GeoPointField stores encoded lat/lon (if that's what you were referring to). The attached was intended to be used for a Geo3d integration w/ GeoPointField.

I expect the conversion for every record would turn out to be more expensive.

Just to note again, the conversion from lat/lon to ECEF XYZ is a non-iterative conversion (2 sines, 2 cosines, 1 sqrt). Conversion cost vs. wateful representation is the interesting question, so I threw together a quick encoding/decoding benchmark (on my i7 16GB System76) and ran it on 100M points (converting from lla to ECEF and back). The law of large numbers took over at around 35M. For interest the results are provided:

Avg computation: 620.4319757666667 ns  Trials: 30000000  Total time: 18612.959273 ms
Avg computation: 621.3008362285715 ns  Trials: 35000000  Total time: 21745.529268 ms
Avg computation: 621.582647925 ns  Trials: 40000000  Total time: 24863.305917 ms
Avg computation: 621.3724589555555 ns  Trials: 45000000  Total time: 27961.760653 ms
Avg computation: 621.16271364 ns  Trials: 50000000  Total time: 31058.135682 ms
Avg computation: 621.3857686909091 ns  Trials: 55000000  Total time: 34176.217278 ms
Avg computation: 621.8110524 ns  Trials: 60000000  Total time: 37308.663144 ms
Avg computation: 621.6768083230769 ns  Trials: 65000000  Total time: 40408.992541 ms
Avg computation: 621.5324306714285 ns  Trials: 70000000  Total time: 43507.270147 ms
Avg computation: 621.4440536933333 ns  Trials: 75000000  Total time: 46608.304027 ms
Avg computation: 621.7594845875 ns  Trials: 80000000  Total time: 49740.758767 ms
Avg computation: 621.9327540705882 ns  Trials: 85000000  Total time: 52864.284096 ms
Avg computation: 621.9429434555556 ns  Trials: 90000000  Total time: 55974.864911 ms
Avg computation: 621.8868688947368 ns  Trials: 95000000  Total time: 59079.252545 ms
Avg computation: 621.98037608 ns  Trials: 100000000  Total time: 62198.037608 ms

Again, those are both conversions to ECEF and back to LLA. So roughly 1 minute for 100M points. Halve those numbers and you have the cost for converting either direction.

Next we could benchmark tree access and compare? I suspect traversal cost should be a function of the on-disk representation and chosen split method (that's where RTrees tend to become costly). Since BKD splits a sorted space, and it can exploit file system cache? I suspect there aren't many random seeks? Seems benchmarking might be relatively straightforward?

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

Just to note again, the conversion from lat/lon to ECEF XYZ is a non-iterative conversion (2 sines, 2 cosines, 1 sqrt).

Right, but you'd be comparing 2 sines, 2 cosines, and 1 sqrt against only the cost of unpacking, which for 1 billion records would be a noticeable amount of time.

Since BKD splits a sorted space, and it can exploit file system cache? I suspect there aren't many random seeks? Seems benchmarking might be relatively straightforward?

Many Lucene systems (including ours) store the index in memory, using memory mapping, and SSDs would be expected too even when not, so I'd think a benchmarking should not overemphasize seeks as being costly. But Mike is the expert as far as that is concerned.

@asfimport
Copy link
Author

Nick Knize (@nknize) (migrated from JIRA)

Right, but you'd be comparing 2 sines, 2 cosines, and 1 sqrt against only the cost of unpacking

Encoding/Decoding ECEF into 96 Bits:

Avg computation: 664.6969666857143 ns  Trials: 35000000  Total time: 23264.393834 ms
Avg computation: 664.829008375 ns  Trials: 40000000  Total time: 26593.160335 ms
Avg computation: 667.3625471333334 ns  Trials: 45000000  Total time: 30031.314621 ms
Avg computation: 668.46880436 ns  Trials: 50000000  Total time: 33423.440218 ms
Avg computation: 667.8703028909091 ns  Trials: 55000000  Total time: 36732.866659 ms
Avg computation: 669.3753888666666 ns  Trials: 60000000  Total time: 40162.523332 ms
Avg computation: 668.4362739230769 ns  Trials: 65000000  Total time: 43448.357805 ms
Avg computation: 667.9539851 ns  Trials: 70000000  Total time: 46756.778957 ms
Avg computation: 667.3638297333333 ns  Trials: 75000000  Total time: 50052.28723 ms
Avg computation: 675.024778375 ns  Trials: 80000000  Total time: 54001.98227 ms
Avg computation: 674.1673578352941 ns  Trials: 85000000  Total time: 57304.225416 ms
Avg computation: 673.4723439777778 ns  Trials: 90000000  Total time: 60612.510958 ms
Avg computation: 673.0372402842105 ns  Trials: 95000000  Total time: 63938.537827 ms
Avg computation: 672.55224382 ns  Trials: 100000000  Total time: 67255.224382 ms

Compared to packing/unpacking lat/lon into 64 bits using using GeoPointField morton bit twiddling:

Avg computation: 60.397136 ns  Trials: 35000000  Total time: 2113.89976 ms
Avg computation: 61.6391708 ns  Trials: 40000000  Total time: 2465.566832 ms
Avg computation: 62.744074222222224 ns  Trials: 45000000  Total time: 2823.48334 ms
Avg computation: 63.51111108 ns  Trials: 50000000  Total time: 3175.555554 ms
Avg computation: 64.18207294545455 ns  Trials: 55000000  Total time: 3530.014012 ms
Avg computation: 64.73684656666667 ns  Trials: 60000000  Total time: 3884.210794 ms
Avg computation: 65.18073341538461 ns  Trials: 65000000  Total time: 4236.747672 ms
Avg computation: 65.5902512 ns  Trials: 70000000  Total time: 4591.317584 ms
Avg computation: 65.02902253333333 ns  Trials: 75000000  Total time: 4877.17669 ms
Avg computation: 63.6249806 ns  Trials: 80000000  Total time: 5089.998448 ms
Avg computation: 62.4193206 ns  Trials: 85000000  Total time: 5305.642251 ms
Avg computation: 61.344433977777776 ns  Trials: 90000000  Total time: 5520.999058 ms
Avg computation: 61.402236642105265 ns  Trials: 95000000  Total time: 5833.212481 ms
Avg computation: 61.10019762 ns  Trials: 100000000  Total time: 6110.019762 ms

So using the 3D BitSet approach is 10 times longer, with the obvious culprit being the for loop for each bit. This can be optimized, though, using a 3-way bit twiddle and 2 longs if a 64 bit 3D packing yields unacceptable loss of precision.

so I'd think a benchmarking should not overemphasize seeks as being costly

Maybe not. But it does depend on the on-disk representation of the tree (and I typically don't use SSDs as an excuse for not paying attention to good file layout). I was mentioning this in the context of index size as a function of a "wasteful" vs. efficient encoding.

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

If the decision is made to use (x,y,z) encoding rather than (lat,lon), then FWIW I think the right way forward with spatial3d would be to extend the current Bounds infrastructure to allow the finding of x,y,z bounds for a shape, rather than just lat/lon. I've thought about this for a day and have concluded it would be far and away the fastest solution, since computing the Bounds for any shape would need to be done only once.

Once x, y, and z bounds are known for a shape, the BKD code itself can make most of its decisions itself, without much computation at all, for most of the recursive steps. Only when descending within the computed bounds would it be necessary to construct a GeoArea XYZArea object to determine the exact relationship between a 3d rectangle and the specified GeoShape.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

I can't follow all the spatial-heavy discussion here, but:

Since BKD splits a sorted space, and it can exploit file system cache?

BKD tree's "tree" (the internal nodes) is already forced into heap, at least in the current impl. So walking that tree should always be fast, and then there's only seeking once we hit the leaf nodes. But that seeking should be sequential walk through the file...

If the decision is made to use (x,y,z) encoding rather than (lat,lon),

So, if you turn the BKD descent into something other than (x,y,z), it would imply that you store points for records in something other than (x,y,z) too, no?

I think the encoding for each point's value can be different from what the BKD tree uses?

I suspect the points shoudl use (x,y,z) encoding when stored in doc-values per document, because when we need to filter hits for the BKD leaf cells overlapping the shape's boundary, we want to take advantage of the fast math in (x,y,z) space that spatial3d gives us?

But then, for the inner-nodes (split values) for the BKD tree, we could use either (lat,lon) or (x,y,z), whichever gives us fastest shape relation computations? At each step in the recursion, BKD tree must check a split in one dimension against the query shape.

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

Patch to geo3d creating an XYZArea object, which can be used to determine intersection with any GeoShape. This is a preliminary patch; I will be including unit tests as well as a more general class of XYZArea degenerate objects if that should prove necessary.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @DaddyWri, I'll try to generalize BKD to 3 dimensions soon ...

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

@mikemccand: Turns out that my class is having problems properly relating to GeoWorld. I'm going to have to make some modifications to make things play nicely. Stay tuned.

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

Revised patch that fixes some of the corner cases, especially interactions with the GeoWorld shape.

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Initial dirty patch, but its testBasic and testRandom seem to be passing! I just made a 3D version of BKD, and the random test tests across the full cube of +/- 1.022.

I did a naive double -> int encoding for each dimension, storing in 12 byte binary DV per doc. Multi-valued docs aren't supported yet but I think it shouldn't be too hard.

There are tons of nocommits because I don't know how to tie into the geo3d APIs...

I think we should make a branch here, commit the two patches, and then iterate?

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

If you create a branch I can generate patches against it.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1695368 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1695368

LUCENE-6699: make branch

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1695369 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1695369

LUCENE-6699: Karl's patch to extend geo3d apis to 3d rectangles

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1695370 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1695370

LUCENE-6699: initial 3D BKD implementation

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

Ok, for a start – the way you get X, Y, and Z ranges for a given planet model is via PlanetModel.getMinimumX(), getMaximumX(), getMinimumY(), getMaximumY(), getMinimumZ(), getMaximumZ(). The GeoShape interface does not provide a means of obtaining the PlanetModel, so you will need to pass this in to your constructor in addition to what you currently have.

Second, the following code:

+                                             double x = BKD3DTreeDocValuesFormat.decodeValue(BKD3DTreeDocValuesFormat.readInt(bytes.bytes, bytes.offset));
+                                             double y = BKD3DTreeDocValuesFormat.decodeValue(BKD3DTreeDocValuesFormat.readInt(bytes.bytes, bytes.offset+4));
+                                             double z = BKD3DTreeDocValuesFormat.decodeValue(BKD3DTreeDocValuesFormat.readInt(bytes.bytes, bytes.offset+8));
+                                             //return GeoUtils.pointInPolygon(polyLons, polyLats, lat, lon);
+                                             // nocommit fixme!
+                                             return true;

... should call GeoShape.isWithin(x,y,z) to determine membership within the shape.

Finally,

+                                           public BKD3DTreeReader.Relation compare(int xMin, int xMax, int yMin, int yMax, int zMin, int zMax) {
+                                             // nocommit fixme!
+                                             return BKD3DTreeReader.Relation.INSIDE;
+                                           }

... should do the following:

GeoArea xyzSolid = new XYZSolid(planetModel, xMin, xMax, yMin, yMax, zMin, zMax);
return xyzSolid.getRelationship(geoShape) == GeoArea.<WHATEVER>?xxx:yyy

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I created the branch and committed our two latest patches!

https://svn.apache.org/repos/asf/lucene/dev/branches/lucene6699

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

ok, I'll try to work my proposed changes into them. The biggest thing, though, is that we need access to a PlanetModel instance inside the compare() method. So some plumbing will be necessary to set that up. Stay tuned. ;-)

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @DaddyWri, I started on one part, lemme go commit so we don't stomp on each other!

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1695377 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1695377

LUCENE-6699: fold in some feedback

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I committed my change, hack away!

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1698232 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1698232

LUCENE-6699: store planet's max in the index, and validate at search time

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1698321 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1698321

LUCENE-6699: fix encode to round instead of truncate to reduce quantization error; fix quantixed bbox decode

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1698454 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1698454

LUCENE-6699: add test case

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1698455 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1698455

LUCENE-6699: fix rename

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700049 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1700049

LUCENE-6699: iterate

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700102 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1700102

LUCENE-6699: iterate

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700165 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1700165

LUCENE-6699: randomize other shapes too; fix a nocommit

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700313 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1700313

LUCENE-6699: make encode/decodeValue symmetric

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700450 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1700450

LUCENE-6699: increase bounds fudge factor

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700457 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1700457

LUCENE-6699: only print verbose details of the failing iter

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700459 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1700459

LUCENE-6699: include the shape in the failure message

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700587 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1700587

LUCENE-6699: increase fudge factor further

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700788 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1700788

LUCENE-6699: remove nocommits

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700800 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1700800

LUCENE-6699: merge trunk

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700820 from @mikemccand in branch 'dev/branches/lucene6699'
https://svn.apache.org/r1700820

LUCENE-6699: fix precommit

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700883 from @mikemccand in branch 'dev/trunk'
https://svn.apache.org/r1700883

LUCENE-6699: add geo3d + BKD for fast, accurate earth-surface point-in-shape queries

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1700887 from @mikemccand in branch 'dev/branches/branch_5x'
https://svn.apache.org/r1700887

LUCENE-6699: add geo3d + BKD for fast, accurate earth-surface point-in-shape queries

@asfimport
Copy link
Author

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @DaddyWri!

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1703537 from @mikemccand in branch 'dev/trunk'
https://svn.apache.org/r1703537

LUCENE-6699: fix silly test bug

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1703538 from @mikemccand in branch 'dev/branches/branch_5x'
https://svn.apache.org/r1703538

LUCENE-6699: fix silly test bug

@asfimport
Copy link
Author

Terry Smith (migrated from JIRA)

Karl, were you able to find that packing scheme? I'm interested in poking the x,y,z values into a SortedNumericDocValuesField to see how well it would perform.

@asfimport
Copy link
Author

Karl Wright (@DaddyWri) (migrated from JIRA)

no time, I'm afraid...

@asfimport
Copy link
Author

asfimport commented Sep 23, 2015

Nick Knize (@nknize) (migrated from JIRA)

Terry Smith I updated the Geo3DPacking code some time ago to avoid the overhead of BitSet and use raw morton bit twiddling. The intent is to use it in #7539. Since that issue has stalled a bit I went ahead and attached the standalone class (with benchmarks) to the #7539 issue if you're interested in tinkering.

@asfimport
Copy link
Author

Terry Smith (migrated from JIRA)

Thanks guys. I was hoping to squeeze those x,y,z values into a 64 bits instead of 96. I'm not a bit twiddler but I'll take a look at Nicholas' patch and see if I can adapt it.

@asfimport
Copy link
Author

Nick Knize (@nknize) (migrated from JIRA)

I started by packing all 3 values into a 64 bit long - in fact those MAGIC numbers are still there (MAGIC[7:12]). The problem with this is precision loss from compressing each value into 21 bits. The decoded values will give, at best, 3 decimal precision accuracy - which is ~110m at the equator. If you're fine with course grain accuracy then copy the BitUtil.{interleave | deinterleave} methods and use MAGIC[7:12]. The code is much simpler with the accuracy trade-off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants