Skip to content

Complex Geospatial Query Excessive Time #7190

@tmichaud314

Description

@tmichaud314

Hello,

I have been testing the same ST_Contains(<complex WKT>, my_st_point) transformation function on a single machine (i.e., 8 core laptop with 32GB memory and SSD) with varying table size on a basic cluster of 1 broker/controller/server container. This "kick the tires" experiment shows these rough results with the PQL submitted using the Pinot "Query Console" UI.

Table Size (records) Query Response Time
300 10ms
1K 8.5s
10K 85s
100K 80s
1M 82s
10M 84s

The query with much of the WKT omitted:

select * from "test" where st_contains(st_geomfromtext("MULTIPOLYGON (((-107.888838605 41.0180204540003,-107.888889102 41.014255698,-107.888937695 41.0106329080003,-107.879372778 41.0106025780002,-107.879471012 41.0033569920004,-107.879532005 41.0019229880002,-107.886612003 41.0019229880002,-107.889095926 41.0019260450003,-107.898804012 41.001937991,-107.898712006 41.0034180120001,-107.89853021 41.010663326,-107.898439312 41.0142859800001,-107.893671822 41.0142708630003,-107.893621903 41.018012684,-107.888838605 41.0180204540003))
,((-107.716352303276 41.0103825046501,-107.716384379463 41.0178298208872,-107.716384385044 41.0178683989511,-107.716384903785 41.0214527614842,-107.716384907739 41.0214800849727,-107.716434979282 41.0214799794529,-107.71642 41.01785,-107.71644 41.01419,-107.71646 41.01053,-107.72601 41.0105,-107.725951222677 41.0141697813734,-107.726046524359 41.010343653198,-107.716352303276 41.0103825046501,-107.716336739313 41.006759586392,-107.721119085977 41.0067409198176,-107.721103342127 41.0031367730833,-107.721103260149 41.0031180002171,-107.726227 41.003098,-107.726089 41.002134,-107.735016 41.002106,-107.744965 41.00209,-107.75386 41.002014,-107.754567531742 41.0020150243201,-107.754567531701 41.0031125077887,-107.764236 41.003082,-107.764186305586 41.0067049235521,-107.764235740826 41.006704971405,-107.773751023988 41.0067129977712,-107.773799937838 41.0067130392442,-107.783203258294 41.0067209711681,-107.78323724287 41.006720995628,-107.783271484865 41.0103438986861,-107.783271488706 41.0103438986892,-107.783271488852 41.0103438986893,-107.788054130892 41.0103438986892,-107.788088635321 41.0139668446042,-107.792837035325 41.0139668446042,-107.792871541576 41.0139668403091,-107.792906342492 41.0175933869572,-107.792906346898 41.0176543356283,-107.78812317235 41.0176236678071,-107.788123177873 41.0177000134088,-107.79049 41.01771,-107.79286 41.01772,-107.792906356429 41.0177201953907,-107.792906346898 41.0176543356283,-107.802368 41.017715,-107.80233465258 41.0222567938367,-107.802325423139 41.0249610203361,-107.802366952339 41.0249606317072,-107.811882536138 41.0248699982772,-107.821499976649 41.0247783946893,-107.82158005355 41.0247787066638,-107.831153965817 41.0248160060118,-107.83124001457 41.0248163412523,-107.840913046047 41.024854862997,-107.841081266021 41.0248555046931,-107.850648608141 41.0248927920441,-107.850723990499 41.032028385696,-107.850720631233 41.0322115398222,-107.850419408319 41.0394545118551,-107.850359645336 41.0394533651647,-107.85035 41.03971,-107.85021 41.04343,-107.85007 41.04716,-107.84046 41.04716,-107.840465610547 41.0470281521572,-107.831057975786 41.0470503830851,-107.826272489108 41.0470616915401,-107.821487 41.047073,-107.816633174369 41.0470462167871,-107.811847776745 41.0470198111573,-107.807062383101 41.0469934055503,-107.802277 41.046967,-107.802200780865 41.046965525206,-107.8022 41.04708,-107.78329 41.0467,-107.7833 41.04308,-107.76412 41.043,-107.74499 41.04314,-107.73548 41.04324,-107.73549 41.03959,-107.726 41.03968,-107.726000233382 41.0395945822185,-107.725937524623 41.0395951398342,-107.725967762302 41.035972071063,-107.725997907355 41.0323601007092,-107.71647 41.03239,-107.71169 41.0324,-107.71169 41.02876,-107.711680000035 41.0251200186679,-107.711601716503 41.0251201831678,-107.711602238542 41.032333099295,-107.706818 41.032333,-107.706818 41.025113877756,-107.70693002276 41.0251134307264,-107.70694 41.01785,-107.711601190522 41.01785,-107.711569771936 41.0104199376802,-107.711569692699 41.01040117122,-107.716352303276 41.0103825046501),(-107.73536 41.01778,-107.735360710429 41.0177149957248,-107.733035499832 41.0177338437896,-107.733035684596 41.0177532543502,-107.733036118699 41.0177996939266,-107.73536 41.01778,-107.7354 41.02141,-107.735439124196 41.0249605208045,-107.735496236022 41.0249600578882,-107.735427741691 41.0177923919791,-107.735427619439 41.0177795720291,-107.73536 41.01778),(-107.74483 41.01772,-107.754302936927 41.0176705587843,-107.75441 41.01767,-107.754421873654 41.0193893050818,-107.754422319046 41.0194537978134,-107.756711735459 41.0194535424082,-107.756710979 41.019374837,-107.756693714 41.0175785620001,-107.756693568001 41.0175634211329,-107.754302056033 41.0175785368961,-107.749519442183 41.0176087658301,-107.744830447107 41.0176384030389,-107.74483 41.01772,-107.744736769464 41.0177205913142,-107.744736767554 41.0177205913264,-107.744804506 41.0249025060002,-107.744821633 41.0267183825002,-107.74245804714 41.0267380463908,-107.742441762687 41.0249220620627,-107.742441598615 41.0249037652974,-107.740279754227 41.024921287179,-107.74028012832 41.0249601532001,-107.740297194947 41.0267326835592,-107.740314635771 41.028544077821,-107.740314905465 41.0285720881768,-107.744838750085 41.0285342590831,-107.744838496824 41.0285074041022,-107.744948974833 41.0285067098399,-107.74493 41.02678,-107.74491 41.02497,-107.74483 41.01772),(-107.759137508 41.022982551,-107.759154985 41.0247939660001,-107.761546757 41.0247788510001,-107.761623346706 41.0247783669648,-107.761606138562 41.0230507839221,-107.759203458625 41.0230559047191,-107.75918565325 41.0212545865627,-107.761588220125 41.0212518921564,-107.763990787 41.0212491977501,-107.764017368868 41.0238975301362,-107.764009752022 41.0211402319999,-107.764009752 41.0211402320001,-107.761511673 41.0211560200001,-107.759120032 41.0211711360001,-107.759137508 41.022982551),(-107.75919012756 41.0284167910028,-107.759190938978 41.028500434346,-107.759190939003 41.0285004343459,-107.759190939 41.0285004340001,-107.759190128 41.0284167910001,-107.75919012756 41.0284167910028),(-107.77368 41.0176449819309,-107.777813703862 41.0176599204581,-107.773604134696 41.0175359320092,-107.773603880341 41.0175546648492,-107.768820943628 41.017535832438,-107.768821199286 41.0175171012671,-107.768845088124 41.0157662762011,-107.768761516443 41.0157662054667,-107.768737179347 41.0176271193885,-107.77368 41.0176449819309),(-107.777813703877 41.017659920458,-107.783340005109 41.0176798916034,-107.78334 41.017593,-107.783339971649 41.0175900003588,-107.777813703877 41.017659920458),(-107.74485559241 41.0303189018951,-107.744873013886 41.0321659999003,-107.744989394083 41.0321651645489,-107.74497 41.03041,-107.744968985473 41.0303181853234,-107.74485559241 41.0303189018951),(-107.81182935217 41.032098939299,-107.811829586023 41.0322884053301,-107.81183829932 41.039344712499,-107.821542874068 41.0391529814501,-107.821568441546 41.03552885118,-107.821593612505 41.0319609265038,-107.82148 41.03196,-107.821480143951 41.0319082496773,-107.81182935217 41.032098939299),(-107.740349931308 41.0322098527985,-107.744863289912 41.032166094143,-107.740349877078 41.0322042204942,-107.740349931308 41.0322098527985),(-107.744820022997 41.010389015655,-107.735611540695 41.0103667791125,-107.735609217379 41.0104592829774,-107.74016 41.01044,-107.744819546529 41.0104301280795,-107.744820022997 41.010389015655),(-107.77370156287 41.0103573398623,-107.778452947844 41.0103399327397,-107.773701853509 41.0103359253012,-107.77370156287 41.0103573398623)), ... )))"),
location_st_point) = 1

The MULTIPOLYGON WKT contains 374 polygons of varying size. Some contain holes.

Config

The test table config:

{
  "tableName": "test",
  "tableType": "OFFLINE",
  "isDimTable": false,
  "segmentsConfig": {
    "replication": 1,
    "timeColumnName": "CreateDateInEpoch",
    "timeType": "MILLISECONDS",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": 365
  },
  "tenants": {
    "broker": "DefaultTenant",
    "server": "DefaultTenant"
  },
  "fieldConfigList": [{
    "name": "location_st_point",
    "encodingType":"RAW",
    "indexType":"H3",
    "properties": {
      "resolutions": "5"
    }
  }],
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "noDictionaryColumns": [
      "location_st_point"
    ],
    "startTreeIndexConfigs": [{
      "dimensionsSplitOrder": [
	"ID"
      ],
      "skipStarNodeCreationForDimensions": [
      ],
      "functionColumnPairs": [
        "SUM__Area",
	"MIN__Area",
	"MAX__Area"
      ],
      "maxLeafRecords": 1
    }]
  },
  "ingestionConfig": {
    "batchIngestionConfig": {
      "segmentIngestionType": "APPEND",
      "segmentIngestionFrequency": "DAILY"
    },
    "transformConfigs": [{
      "columnName": "CreateDateInEpoch",
      "transformFunction": "fromEpochHours(CreateDate)"
    }]
  },
  "metadata": {}
}

The test schema:

{
  "schemaName": "test",
  "dimensionFieldSpecs": [
    {
      "name": "ID",
      "dataType": "STRING"
    },
    {
      "name": "latitude",
      "dataType": "DOUBLE"
    },
    {
      "name": "longitude",
      "dataType": "DOUBLE"
    },
    {
      "name": "location_st_point",
      "dataType": "BYTES",
      "transformFunction": "stPoint(longitude,latitude)"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "Area",
      "dataType": "DOUBLE"
    }
  ],
  "dateTimeFieldSpecs": [{
    "name": "CreateDateInEpoch",
    "dataType": "LONG",
    "format" : "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
  }]
}

Memory/GC JVM flags

  • Server - -Xms4G -Xmx4G -XX:+UseG1GC
  • Broker - -Xms4G -Xmx16G -XX:+UseG1GC
  • Controller - -Xms1G -Xmx4G -XX:+UseG1GC

Questions
The results were surprising in the following ways:

  1. The 1000 record table took 85 seconds to query where the slightly smaller 300 record table took sub-second time.
  2. The query returned in 80-85 seconds irrespective of table size.

Are these results consistent with the experience of others here?

Do the developers or users here with large clusters supporting geospatial queries expect Pinot's (org.locationtech) ST_Contains() function to be able to handle a MULTIPOLYGON geometry consisting of 100s of polygons?

I'm wondering if Pinot is the right or wrong tool for this job?

Thanks in advance.

Tom

Metadata

Metadata

Assignees

No one assigned

    Labels

    geoRelated to geospatial features

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions