New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
builtins: Implement ST_GeneratePoints function #58288
builtins: Implement ST_GeneratePoints function #58288
Conversation
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. I have added a few people who may be able to assist in reviewing:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening as a draft for now, I have several items I would like to discuss.
pkg/geo/geomfn/generate_points.go
Outdated
bboxWidth := bbox.HiX - bbox.LoX | ||
bboxHeight := bbox.HiY - bbox.LoY | ||
bboxArea := bboxHeight * bboxWidth | ||
if area == 0.0 || bboxArea == 0.0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how it works in PostGIS, so I wanted to double-check - Can the bboxArea
be zero when area
is not zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, I removed bboxArea
check here.
pkg/geo/geomfn/generate_points.go
Outdated
return nil, errors.Newf("could not set X and Y for coord sequence: %v", err) | ||
} | ||
defer func() { | ||
// err = geos.CoordSequenceDestroy(gseq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I know that
gseq
should be destroyed, but when I uncomment this line I get a really weird error and can't figure out why:
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x5fd1210]
runtime stack:
runtime.throw(0x4a2e35f, 0x2a)
/usr/local/Cellar/go/1.15.5/libexec/src/runtime/panic.go:1116 +0x72
runtime.sigpanic()
/usr/local/Cellar/go/1.15.5/libexec/src/runtime/signal_unix.go:726 +0x48c
I compared the C API handler code to the one for PreparedGeometry
, I have noticed that the destroy method for CoordSequence
has no const
argument type:
extern void GEOS_DLL GEOSPreparedGeom_destroy_r(GEOSContextHandle_t handle,
const GEOSPreparedGeometry* g);
extern void GEOS_DLL GEOSCoordSeq_destroy_r(GEOSContextHandle_t handle,
GEOSCoordSequence* s);
Could you help to debug it, please? 🙏
2) Can you recommend any better way to handle errors in the defer
statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) looks like when you use GeometryPointCreate
, gseq
is owned by the gpt
object. you have to free that. basing that off reading libgeos/geos#247.
(2) use something similar to the above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still on (1) - sorry, but how do I do that? I have tried to add a new function like this:
CR_GEOS_Status CR_GEOS_GeometryDestroy(CR_GEOS* lib, CR_GEOS_Slice a) {
std::string error;
auto handle = initHandleWithErrorBuffer(lib, &error);
auto wkbReader = lib->GEOSWKBReader_create_r(handle);
auto geomA = lib->GEOSWKBReader_read_r(handle, wkbReader, a.data, a.len);
lib->GEOSWKBReader_destroy_r(handle, wkbReader);
if (geomA != nullptr) {
lib->GEOSGeom_destroy_r(handle, geomA);
}
lib->GEOS_finish_r(handle);
return toGEOSString(error.data(), error.length());
}
But in the end, I still got the same error.
I could make this function expect a CR_GEOS_Geometry
type input but in that case, I would probably need another way of keeping a pointer to CR_GEOS_Geometry
GeometryPoint and transforming it to a CR_GEOS_Slice
(so that it can be used in PrepareGeometry
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need to call GEOSCoordSeq_destroy_r
at all, since GeometryCreate
makes it the owner of the Seq object.
However, I think we should avoid doing this GeometryCreate altogether. Can we just create a using go-geom
and use that EWKB?
Also for simplicity, I'm happy for you to just use geos.Intersects
instead of PreparedGeometry
. We're planning on introducing this later with a more managed library, for now you can leave it as a TODO for us to optimise later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense. I will go with a simple point then 👍
pkg/geo/geomfn/generate_points.go
Outdated
defer func() { | ||
// err = geos.CoordSequenceDestroy(gseq) | ||
}() | ||
gpt, err := geos.GeometryPointCreate(gseq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does gpt
need to be destroyed explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think so.
6c58962
to
9fd9a16
Compare
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
aa5578d
to
4e6cd6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not getting back to you sooner - new year break.
Looking good, responded to your help comments!
pkg/geo/geomfn/generate_points.go
Outdated
case geopb.ShapeType_MultiPolygon: | ||
generatePointsFunction = multipolygonGeometryToPoints | ||
default: | ||
return geo.Geometry{}, errors.New("unsupported type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: print the type in the error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/geo/geomfn/generate_points.go
Outdated
// that are covered by the geometry provided. | ||
// npoints is the number of points to return. | ||
// seed is the seed used by the random numbers generator. | ||
func geometryToPoints(g geo.Geometry, npoints int, seed int64) (geo.Geometry, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to something to more descriptive. I think we should rename this generateRandomPoints
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/geo/geos/geos.cc
Outdated
if (geom != nullptr) { | ||
*ret = lib->GEOSPrepare_r(handle, geom); | ||
// TODO: make sure underlying geom is freed too. | ||
// lib->GEOSGeom_destroy_r(handle, geom); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to return a struct, so that we free the GEOSGeometry
underneath.
Consider something like:
struct PreparedGeometry {
CR_GEOS_Geometry *g;
CR_GEOS_PreparedInternalGeometry *p; // this corresponds to PreparedGeometry in GEOS.
}
CR_GEOS_Status CR_GEOS_PrepareGeometry(CR_GEOS* lib, CR_GEOS_Slice a, CR_GEOS_PreparedGeometry* ret) {
....
*ret = {.g = geom, .p = lib->GEOSPrepare_r(handle, geom)}
}
So that in CR_GEOS_PreparedGeometryDestroy
, we call GEOSGeom_destroy_r
on g
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I changed the PreparedGeometry
accordingly.
Please just note that in the updated Go code geos.PrepareGeometry
on error returns an empty PreparedGeometry
instead of nil
, I hope I got it right.
pkg/geo/geomfn/generate_points.go
Outdated
defer func() { | ||
// err = geos.CoordSequenceDestroy(gseq) | ||
}() | ||
gpt, err := geos.GeometryPointCreate(gseq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think so.
4e6cd6b
to
fa94001
Compare
fa94001
to
ddee8c8
Compare
ddee8c8
to
f456ea3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think I still need some comments to understand the algorithm, still confused about the iterations bit.
pkg/geo/geomfn/generate_points.go
Outdated
// Generate points and test them. | ||
nPointsGenerated := 0 | ||
iterations := 0 | ||
for nPointsGenerated < nPoints || iterations <= 100 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm still not sure why 100 is here. This <=
would also give 101 iterations; we could do for nPointsGenerated, iterations := 0, 0; nPointsGenerated < nPoints || iterations <= 100; iterations++
as well
Do we stop generating points after 100? Do we really need to cap this? What if nPoints > 100? Should the cap be 10 * nPoints instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with different geometries, I think I understand your concern now.
For each iteration
here we are still going through each cell in the grid, for which we are generating a point. The purpose of limiting iterations
is to avoid going through the grid over and over if there is a little likelihood of generating a point that will intersect with our geometry.
However, from the above, you can see that the real complexity here is iterations * len(cells)
. The number of cells is calculated from sampleNPoints := float64(nPoints) * bboxArea / area
so, the more points we are supposed to generate and the higher bounding box area / geometry area
ratio, the more complex calculations.
For sampleNPoints
of ~14000000 it took about 18 seconds to generate points on my machine.
I suggest we get rid of iterations
cap at all. If we need a complexity limitation, keeping in mind differences in execution time between different machines, I can make the code throw a timeout error after some time. What's your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big fan of timeouts as they are non deterministic. if we need something like this (not opposed but this has to be clear in the function Info
to the user) i'd rather it be a function of the nPoints as input, e.g. nPoints * 10
. curious where you got the idea from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, so you'd prefer to keep the condition, but instead of 100
use nPoints * 10
?
I do not have much experience with optimization like this, I just thought that in theory, a user running CRDB on a very fast computer could want to handle much more complexity. This solution sounds good, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah prefer the latter if we decide we need this.
f456ea3
to
a04b881
Compare
pkg/geo/geos/geos.go
Outdated
@@ -59,6 +59,9 @@ var geosOnce struct { | |||
once sync.Once | |||
} | |||
|
|||
// PreparedGeometry is an instance of a GEOS PreparedGeometry. | |||
type PreparedGeometry C.CR_GEOS_PreparedGeometry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, GitHub lost a conversation so I only just got around to this.
I think this should be a pointer. Not sure how well "copy" plays around with cgo, and it seems natural, i.e. type PreparedGeometry *C.CR_GEOS_PreparedGeometry
You mean need to use double pointer CR_GEOS_Prepare
, i.e. CR_GEOS_Prepare(CR_GEOS_PreparedGeometry **g, ....)
.
You can return nil, err below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to edit it that way and I have changed the related functions:
CR_GEOS_Status CR_GEOS_Prepare(CR_GEOS* lib, CR_GEOS_Slice a, CR_GEOS_PreparedGeometry** ret) {
std::string error;
auto handle = initHandleWithErrorBuffer(lib, &error);
auto wkbReader = lib->GEOSWKBReader_create_r(handle);
auto geom = lib->GEOSWKBReader_read_r(handle, wkbReader, a.data, a.len);
lib->GEOSWKBReader_destroy_r(handle, wkbReader);
if (geom != nullptr) {
auto preparedGeom = lib->GEOSPrepare_r(handle, geom);
CR_GEOS_PreparedGeometry tmp = {.g = geom, .p = preparedGeom};
CR_GEOS_PreparedGeometry *tmp2 = &tmp;
*ret = tmp2;
}
lib->GEOS_finish_r(handle);
return toGEOSString(error.data(), error.length());
}
CR_GEOS_Status CR_GEOS_PreparedGeometryDestroy(CR_GEOS* lib, CR_GEOS_PreparedGeometry* a) {
std::string error;
auto handle = initHandleWithErrorBuffer(lib, &error);
lib->GEOSPreparedGeom_destroy_r(handle, a->p);
lib->GEOSGeom_destroy_r(handle, a->g);
lib->GEOS_finish_r(handle);
return toGEOSString(error.data(), error.length());
}
CR_GEOS_Status CR_GEOS_PreparedIntersects(CR_GEOS* lib, CR_GEOS_PreparedGeometry* a, CR_GEOS_Slice b, char* ret) {
return CR_GEOS_PreparedBinaryPredicate(lib, lib->GEOSPreparedIntersects_r, a->p, b, ret);
}
I managed to get no compilation errors, but an execution one:
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1 pc=0x5fd5861]
runtime stack:
runtime.throw(0x4a2ecae, 0x2a)
/usr/local/Cellar/go/1.15.5/libexec/src/runtime/panic.go:1116 +0x72
runtime.sigpanic()
/usr/local/Cellar/go/1.15.5/libexec/src/runtime/signal_unix.go:726 +0x48c
goroutine 15 [syscall]:
runtime.cgocall(0x48ae000, 0xc000080928, 0x4010100)
/usr/local/Cellar/go/1.15.5/libexec/src/runtime/cgocall.go:133 +0x5b fp=0xc0000808e0 sp=0xc0000808a8 pc=0x400875b
github.com/cockroachdb/cockroach/pkg/geo/geos._Cfunc_CR_GEOS_PreparedIntersects(0xd004190, 0x7ffeefbff348, 0xc000013600, 0x15, 0xc0000af1c0, 0x0, 0x0)
_cgo_gotypes.go:724 +0x4d fp=0xc000080928 sp=0xc0000808e0 pc=0x47acfed
github.com/cockroachdb/cockroach/pkg/geo/geos.PreparedIntersects.func1(0xd004190, 0x7ffeefbff348, 0xc000013600, 0x15, 0x40, 0xc0000af1c0, 0x0, 0x0)
/Users/marcinknychala/go/src/github.com/cockroachdb/cockroach/pkg/geo/geos/geos.go:766 +0xc5 fp=0xc000080980 sp=0xc000080928 pc=0x47b8c65
Could you help me out, please?
Should I somehow explicitly allocate memory for this CR_GEOS_PreparedGeometry?
I can also change ST_GeneratePoints
to work without PreparedGeometry
, but I really want to make sense out of it and deliver something optimized :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would need to malloc
space for tmp
beforehand I think:
if (geom != nullptr) {
auto preparedGeom = lib->GEOSPrepare_r(handle, geom);
*ret = new CR_GEOS_PreparedGeometry(); // or define initializer on this method
ret->g = geom;
ret -> p = preparedGeom;
}
and to free the pointer in CR_GEOS_PreparedGeometryDestroy
.
this does not work because the data is only saved on the stack, and will be destroyed once the function ends, hence giving the seg fault:
CR_GEOS_PreparedGeometry tmp = {.g = geom, .p = preparedGeom};
CR_GEOS_PreparedGeometry *tmp2 = &tmp;
not like go, which promotes&tmp
to the heap, so you don't have to worry about this kind of thing ;)
I can also change ST_GeneratePoints to work without PreparedGeometry, but I really want to make sense out of it and deliver something optimized :)
Of course! Unfortunately it means more pain, but if you like it I'm only happy to help...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, it's working now!
name string | ||
args args | ||
want geo.Geometry | ||
wantErr bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you separate error cases into another struct like
errorTestCases := []struct {
desc string
args string
errMatch string
}
and catch error messages for error cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, changed
a04b881
to
6c6b479
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks awesome! thanks for all the hard work you've put into this, and going through the hard work to get it nice :)
bors r+
Build failed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
bors r- looks like you need to run |
Canceled. |
Not at all |
6c6b479
to
40cdd9d
Compare
Release note (sql change): Implement geo builtin ST_GeneratePoints
40cdd9d
to
1af7a03
Compare
I am sorry @miretskiy, I uploaded too many changes by an accident causing additional reviewers to be assigned. Fixed now, please disregard. @otan Please note that in the last push I also removed assigning NULL to a deleted pointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
bors r+
Build failed (retrying...): |
Build succeeded: |
This function generates pseudo-random points until the requested number are found within the input area.
Release note (sql change): Implement geo builtin ST_GeneratePoints
Resolves: #48942
Dependent on: #54476