Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtins: Implement ST_GeneratePoints function #58288

Merged
merged 1 commit into from Jan 14, 2021

Conversation

mknycha
Copy link
Contributor

@mknycha mknycha commented Dec 27, 2020

This function generates pseudo-random points until the requested number are found within the input area.

Release note (sql change): Implement geo builtin ST_GeneratePoints

Resolves: #48942

Dependent on: #54476

@blathers-crl
Copy link

blathers-crl bot commented Dec 27, 2020

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

I have added a few people who may be able to assist in reviewing:

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Dec 27, 2020
@blathers-crl blathers-crl bot requested a review from otan December 27, 2020 20:11
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor Author

@mknycha mknycha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opening as a draft for now, I have several items I would like to discuss.

bboxWidth := bbox.HiX - bbox.LoX
bboxHeight := bbox.HiY - bbox.LoY
bboxArea := bboxHeight * bboxWidth
if area == 0.0 || bboxArea == 0.0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how it works in PostGIS, so I wanted to double-check - Can the bboxArea be zero when area is not zero?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds impossible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I removed bboxArea check here.

return nil, errors.Newf("could not set X and Y for coord sequence: %v", err)
}
defer func() {
// err = geos.CoordSequenceDestroy(gseq)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I know that gseq should be destroyed, but when I uncomment this line I get a really weird error and can't figure out why:
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x5fd1210]

runtime stack:
runtime.throw(0x4a2e35f, 0x2a)
	/usr/local/Cellar/go/1.15.5/libexec/src/runtime/panic.go:1116 +0x72
runtime.sigpanic()
	/usr/local/Cellar/go/1.15.5/libexec/src/runtime/signal_unix.go:726 +0x48c

I compared the C API handler code to the one for PreparedGeometry, I have noticed that the destroy method for CoordSequence has no const argument type:

extern void GEOS_DLL GEOSPreparedGeom_destroy_r(GEOSContextHandle_t handle,
                                                const GEOSPreparedGeometry* g);

extern void GEOS_DLL GEOSCoordSeq_destroy_r(GEOSContextHandle_t handle,
                                            GEOSCoordSequence* s);

Could you help to debug it, please? 🙏
2) Can you recommend any better way to handle errors in the defer statement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) looks like when you use GeometryPointCreate, gseq is owned by the gpt object. you have to free that. basing that off reading libgeos/geos#247.
(2) use something similar to the above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still on (1) - sorry, but how do I do that? I have tried to add a new function like this:

CR_GEOS_Status CR_GEOS_GeometryDestroy(CR_GEOS* lib, CR_GEOS_Slice a) {
  std::string error;
  auto handle = initHandleWithErrorBuffer(lib, &error);

  auto wkbReader = lib->GEOSWKBReader_create_r(handle);
  auto geomA = lib->GEOSWKBReader_read_r(handle, wkbReader, a.data, a.len);
  lib->GEOSWKBReader_destroy_r(handle, wkbReader);
  if (geomA != nullptr) {
    lib->GEOSGeom_destroy_r(handle, geomA);
  }
  lib->GEOS_finish_r(handle);
  return toGEOSString(error.data(), error.length());
}

But in the end, I still got the same error.
I could make this function expect a CR_GEOS_Geometry type input but in that case, I would probably need another way of keeping a pointer to CR_GEOS_Geometry GeometryPoint and transforming it to a CR_GEOS_Slice (so that it can be used in PrepareGeometry).

Copy link
Contributor

@otan otan Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need to call GEOSCoordSeq_destroy_r at all, since GeometryCreate makes it the owner of the Seq object.

However, I think we should avoid doing this GeometryCreate altogether. Can we just create a using go-geom and use that EWKB?

Also for simplicity, I'm happy for you to just use geos.Intersects instead of PreparedGeometry. We're planning on introducing this later with a more managed library, for now you can leave it as a TODO for us to optimise later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense. I will go with a simple point then 👍

defer func() {
// err = geos.CoordSequenceDestroy(gseq)
}()
gpt, err := geos.GeometryPointCreate(gseq)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does gpt need to be destroyed explicitly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think so.

@blathers-crl
Copy link

blathers-crl bot commented Dec 27, 2020

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@mknycha mknycha force-pushed the implement-st-generate-points branch 2 times, most recently from aa5578d to 4e6cd6b Compare December 28, 2020 17:14
Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not getting back to you sooner - new year break.
Looking good, responded to your help comments!

pkg/geo/geomfn/generate_points.go Outdated Show resolved Hide resolved
pkg/geo/geomfn/generate_points.go Outdated Show resolved Hide resolved
case geopb.ShapeType_MultiPolygon:
generatePointsFunction = multipolygonGeometryToPoints
default:
return geo.Geometry{}, errors.New("unsupported type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: print the type in the error message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// that are covered by the geometry provided.
// npoints is the number of points to return.
// seed is the seed used by the random numbers generator.
func geometryToPoints(g geo.Geometry, npoints int, seed int64) (geo.Geometry, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to something to more descriptive. I think we should rename this generateRandomPoints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

pkg/geo/geomfn/generate_points.go Outdated Show resolved Hide resolved
pkg/geo/geomfn/generate_points.go Outdated Show resolved Hide resolved
pkg/geo/geomfn/generate_points.go Outdated Show resolved Hide resolved
pkg/geo/geomfn/generate_points_test.go Show resolved Hide resolved
if (geom != nullptr) {
*ret = lib->GEOSPrepare_r(handle, geom);
// TODO: make sure underlying geom is freed too.
// lib->GEOSGeom_destroy_r(handle, geom);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to return a struct, so that we free the GEOSGeometry underneath.

Consider something like:

struct PreparedGeometry {
  CR_GEOS_Geometry *g;
   CR_GEOS_PreparedInternalGeometry *p; // this corresponds to PreparedGeometry in GEOS.
}

CR_GEOS_Status CR_GEOS_PrepareGeometry(CR_GEOS* lib, CR_GEOS_Slice a, CR_GEOS_PreparedGeometry* ret) {
    ....
    *ret = {.g = geom, .p = lib->GEOSPrepare_r(handle, geom)}
}

So that in CR_GEOS_PreparedGeometryDestroy, we call GEOSGeom_destroy_r on g as well.

Copy link
Contributor Author

@mknycha mknycha Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I changed the PreparedGeometry accordingly.
Please just note that in the updated Go code geos.PrepareGeometry on error returns an empty PreparedGeometry instead of nil, I hope I got it right.

defer func() {
// err = geos.CoordSequenceDestroy(gseq)
}()
gpt, err := geos.GeometryPointCreate(gseq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think so.

@mknycha mknycha force-pushed the implement-st-generate-points branch from 4e6cd6b to fa94001 Compare January 6, 2021 12:01
@blathers-crl blathers-crl bot requested a review from otan January 6, 2021 12:01
@mknycha mknycha marked this pull request as ready for review January 6, 2021 12:04
@mknycha mknycha force-pushed the implement-st-generate-points branch from fa94001 to ddee8c8 Compare January 6, 2021 13:02
pkg/geo/geomfn/generate_points.go Outdated Show resolved Hide resolved
pkg/geo/geomfn/generate_points.go Outdated Show resolved Hide resolved
pkg/geo/geos/geos.cc Outdated Show resolved Hide resolved
@mknycha mknycha force-pushed the implement-st-generate-points branch from ddee8c8 to f456ea3 Compare January 8, 2021 21:06
@blathers-crl blathers-crl bot requested a review from otan January 8, 2021 21:06
Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think I still need some comments to understand the algorithm, still confused about the iterations bit.

// Generate points and test them.
nPointsGenerated := 0
iterations := 0
for nPointsGenerated < nPoints || iterations <= 100 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm still not sure why 100 is here. This <= would also give 101 iterations; we could do for nPointsGenerated, iterations := 0, 0; nPointsGenerated < nPoints || iterations <= 100; iterations++ as well
Do we stop generating points after 100? Do we really need to cap this? What if nPoints > 100? Should the cap be 10 * nPoints instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with different geometries, I think I understand your concern now.
For each iteration here we are still going through each cell in the grid, for which we are generating a point. The purpose of limiting iterations is to avoid going through the grid over and over if there is a little likelihood of generating a point that will intersect with our geometry.
However, from the above, you can see that the real complexity here is iterations * len(cells). The number of cells is calculated from sampleNPoints := float64(nPoints) * bboxArea / area so, the more points we are supposed to generate and the higher bounding box area / geometry area ratio, the more complex calculations.
For sampleNPoints of ~14000000 it took about 18 seconds to generate points on my machine.

I suggest we get rid of iterations cap at all. If we need a complexity limitation, keeping in mind differences in execution time between different machines, I can make the code throw a timeout error after some time. What's your opinion?

Copy link
Contributor

@otan otan Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big fan of timeouts as they are non deterministic. if we need something like this (not opposed but this has to be clear in the function Info to the user) i'd rather it be a function of the nPoints as input, e.g. nPoints * 10. curious where you got the idea from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, so you'd prefer to keep the condition, but instead of 100 use nPoints * 10?
I do not have much experience with optimization like this, I just thought that in theory, a user running CRDB on a very fast computer could want to handle much more complexity. This solution sounds good, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah prefer the latter if we decide we need this.

pkg/geo/geomfn/generate_points.go Show resolved Hide resolved
@mknycha mknycha force-pushed the implement-st-generate-points branch from f456ea3 to a04b881 Compare January 11, 2021 15:46
@blathers-crl blathers-crl bot requested a review from otan January 11, 2021 15:46
@@ -59,6 +59,9 @@ var geosOnce struct {
once sync.Once
}

// PreparedGeometry is an instance of a GEOS PreparedGeometry.
type PreparedGeometry C.CR_GEOS_PreparedGeometry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, GitHub lost a conversation so I only just got around to this.

I think this should be a pointer. Not sure how well "copy" plays around with cgo, and it seems natural, i.e. type PreparedGeometry *C.CR_GEOS_PreparedGeometry

You mean need to use double pointer CR_GEOS_Prepare, i.e. CR_GEOS_Prepare(CR_GEOS_PreparedGeometry **g, ....).

You can return nil, err below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to edit it that way and I have changed the related functions:

CR_GEOS_Status CR_GEOS_Prepare(CR_GEOS* lib, CR_GEOS_Slice a, CR_GEOS_PreparedGeometry** ret) {
  std::string error;
  auto handle = initHandleWithErrorBuffer(lib, &error);

  auto wkbReader = lib->GEOSWKBReader_create_r(handle);
  auto geom = lib->GEOSWKBReader_read_r(handle, wkbReader, a.data, a.len);
  lib->GEOSWKBReader_destroy_r(handle, wkbReader);
  if (geom != nullptr) {
    auto preparedGeom = lib->GEOSPrepare_r(handle, geom);
    CR_GEOS_PreparedGeometry tmp = {.g = geom, .p = preparedGeom};
    CR_GEOS_PreparedGeometry *tmp2 = &tmp;
    *ret = tmp2;
  }
  lib->GEOS_finish_r(handle);
  return toGEOSString(error.data(), error.length());
}

CR_GEOS_Status CR_GEOS_PreparedGeometryDestroy(CR_GEOS* lib, CR_GEOS_PreparedGeometry* a) {
  std::string error;
  auto handle = initHandleWithErrorBuffer(lib, &error);
  lib->GEOSPreparedGeom_destroy_r(handle, a->p);
  lib->GEOSGeom_destroy_r(handle, a->g);
  lib->GEOS_finish_r(handle);
  return toGEOSString(error.data(), error.length());
}

CR_GEOS_Status CR_GEOS_PreparedIntersects(CR_GEOS* lib, CR_GEOS_PreparedGeometry* a, CR_GEOS_Slice b, char* ret) {
  return CR_GEOS_PreparedBinaryPredicate(lib, lib->GEOSPreparedIntersects_r, a->p, b, ret);
}

I managed to get no compilation errors, but an execution one:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1 pc=0x5fd5861]

runtime stack:
runtime.throw(0x4a2ecae, 0x2a)
	/usr/local/Cellar/go/1.15.5/libexec/src/runtime/panic.go:1116 +0x72
runtime.sigpanic()
	/usr/local/Cellar/go/1.15.5/libexec/src/runtime/signal_unix.go:726 +0x48c

goroutine 15 [syscall]:
runtime.cgocall(0x48ae000, 0xc000080928, 0x4010100)
	/usr/local/Cellar/go/1.15.5/libexec/src/runtime/cgocall.go:133 +0x5b fp=0xc0000808e0 sp=0xc0000808a8 pc=0x400875b
github.com/cockroachdb/cockroach/pkg/geo/geos._Cfunc_CR_GEOS_PreparedIntersects(0xd004190, 0x7ffeefbff348, 0xc000013600, 0x15, 0xc0000af1c0, 0x0, 0x0)
	_cgo_gotypes.go:724 +0x4d fp=0xc000080928 sp=0xc0000808e0 pc=0x47acfed
github.com/cockroachdb/cockroach/pkg/geo/geos.PreparedIntersects.func1(0xd004190, 0x7ffeefbff348, 0xc000013600, 0x15, 0x40, 0xc0000af1c0, 0x0, 0x0)
	/Users/marcinknychala/go/src/github.com/cockroachdb/cockroach/pkg/geo/geos/geos.go:766 +0xc5 fp=0xc000080980 sp=0xc000080928 pc=0x47b8c65

Could you help me out, please?
Should I somehow explicitly allocate memory for this CR_GEOS_PreparedGeometry?
I can also change ST_GeneratePoints to work without PreparedGeometry, but I really want to make sense out of it and deliver something optimized :)

Copy link
Contributor

@otan otan Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to malloc space for tmp beforehand I think:


  if (geom != nullptr) {
    auto preparedGeom = lib->GEOSPrepare_r(handle, geom);
*ret = new CR_GEOS_PreparedGeometry(); // or define initializer on this method
ret->g = geom;
ret -> p = preparedGeom;
}

and to free the pointer in CR_GEOS_PreparedGeometryDestroy.


this does not work because the data is only saved on the stack, and will be destroyed once the function ends, hence giving the seg fault:

    CR_GEOS_PreparedGeometry tmp = {.g = geom, .p = preparedGeom};
    CR_GEOS_PreparedGeometry *tmp2 = &tmp;

not like go, which promotes&tmp to the heap, so you don't have to worry about this kind of thing ;)


I can also change ST_GeneratePoints to work without PreparedGeometry, but I really want to make sense out of it and deliver something optimized :)

Of course! Unfortunately it means more pain, but if you like it I'm only happy to help...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, it's working now!

name string
args args
want geo.Geometry
wantErr bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you separate error cases into another struct like

errorTestCases := []struct {
   desc string
   args string
   errMatch string
}

and catch error messages for error cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, changed

pkg/geo/geomfn/generate_points_test.go Show resolved Hide resolved
@mknycha mknycha force-pushed the implement-st-generate-points branch from a04b881 to 6c6b479 Compare January 13, 2021 12:42
@blathers-crl blathers-crl bot requested a review from otan January 13, 2021 12:42
Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks awesome! thanks for all the hard work you've put into this, and going through the hard work to get it nice :)

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 14, 2021

Build failed:

Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

@otan
Copy link
Contributor

otan commented Jan 14, 2021

bors r-

looks like you need to run make bazel-generate. mind doing that?

@craig
Copy link
Contributor

craig bot commented Jan 14, 2021

Canceled.

@mknycha
Copy link
Contributor Author

mknycha commented Jan 14, 2021

looks like you need to run make bazel-generate. mind doing that?

Not at all

@mknycha mknycha force-pushed the implement-st-generate-points branch from 6c6b479 to 40cdd9d Compare January 14, 2021 10:36
@mknycha mknycha requested a review from a team January 14, 2021 10:36
@mknycha mknycha requested review from a team as code owners January 14, 2021 10:36
@mknycha mknycha requested review from miretskiy and removed request for a team January 14, 2021 10:36
Release note (sql change): Implement geo builtin ST_GeneratePoints
@mknycha mknycha force-pushed the implement-st-generate-points branch from 40cdd9d to 1af7a03 Compare January 14, 2021 12:57
@mknycha
Copy link
Contributor Author

mknycha commented Jan 14, 2021

I am sorry @miretskiy, I uploaded too many changes by an accident causing additional reviewers to be assigned. Fixed now, please disregard.

@otan Please note that in the last push I also removed assigning NULL to a deleted pointer a = NULL; that used to be here. I think it will be ok as long as a is not used anymore, besides I believe it should be handled one level abstraction higher.

Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 14, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jan 14, 2021

Build succeeded:

@craig craig bot merged commit 4b79aab into cockroachdb:master Jan 14, 2021
@mknycha mknycha deleted the implement-st-generate-points branch January 15, 2021 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-community Originated from the community X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

geo/geomfn: implement ST_GeneratePoints({geometry,int4})
3 participants