Skip to content

feat: change underlying geospatial index to RBush#1

Merged
sanDer153 merged 5 commits into
mainfrom
sander/change-underlying-index-to-rbush
Aug 17, 2025
Merged

feat: change underlying geospatial index to RBush#1
sanDer153 merged 5 commits into
mainfrom
sander/change-underlying-index-to-rbush

Conversation

@sanDer153
Copy link
Copy Markdown
Member

This PR changes the underlying geospatial index of clustering library from a KD-Tree to an R-Tree. This is done as a first step towards making the cluster mutable after it is built.

Unfortunately, using the new index comes with a cost as well. These are the results form the benchmark that is included in the project:

  • Before:
prepare 1000000 points: 471.137ms
z6: 230371 clusters in 416ms
z5: 64804 clusters in 150ms
z4: 16648 clusters in 49ms
z3: 4212 clusters in 7ms
z2: 1074 clusters in 2ms
z1: 276 clusters in 0ms
z0: 72 clusters in 0ms
total time: 1.110s
memory used: 66452 KB 
  • After:
prepare 1000000 points: 1.317s
z6: 230065 clusters in 1584ms
z5: 64701 clusters in 312ms
z4: 16633 clusters in 58ms
z3: 4185 clusters in 11ms
z2: 1057 clusters in 3ms
z1: 267 clusters in 1ms
z0: 73 clusters in 1ms
total time: 3.294s
memory used: 205868 KB

Copy link
Copy Markdown

@Max-Verbinnen Max-Verbinnen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, lgtm! feel free to merge after addressing my small remarks

Comment thread README.md Outdated
| radius | 40 | Cluster radius, in pixels. |
| extent | 512 | (Tiles) Tile extent. Radius is calculated relative to this value. |
| nodeSize | 64 | Size of the KD-tree leaf node. Affects performance. |
| nodeSize | 9 | Size of the R-tree nodes. Affects performance. |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 9 as default here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 is the default nodeSize for the RBush library as well. Different numbers result in either higher query time or build time, and I guess this number strikes a balance.

Comment thread package.json Outdated
"name": "mutable-supercluster",
"version": "1.0.0",
"description": "A library for fast and mutable geospatial point clustering.",
"main": "dist/supercluster.js",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's maybe use mutable-supercluster.js everywhere in this file instead to avoid confusion

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

Comment thread index.js
map: props => props, // props => ({sum: props.my_value})

// a function that maps a point to its Id, which should be numerical
getId: point => point.geometry.coordinates[0]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also mention getId in the readme

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot to do this as this option was only added as preparation for the new features.

Comment thread index.js
if (log) console.time(timerId);

this.points = points;
this.points.sort((a, b) => this.getId(a) - this.getId(b));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have to sort here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the remove and update features we need a way to quickly access points by their id. However, I came to the conclusion that sorting also won't really work, so something has to be found to index over the point id.

Comment thread index.js
_rbushWithin(ax, ay, zoom, radius) {
const result = [];
const r2 = radius * radius;
const pointsInSquare = this.trees[zoom].search({minX: ax - radius, minY: ay - radius, maxX: ax + radius, maxY: ay + radius});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice trick :)

Comment thread index.js Outdated
const pointsInSquare = this.trees[zoom].search({minX: ax - radius, minY: ay - radius, maxX: ax + radius, maxY: ay + radius});
for (const [bx, by, idx] of pointsInSquare) {
if (sqDist(ax, ay, bx, by) <= r2) {
result.push(idx);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo it would be more readable to do a filter on pointsInSquare here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@sanDer153 sanDer153 merged commit 142fb0b into main Aug 17, 2025
2 checks passed
@sanDer153 sanDer153 deleted the sander/change-underlying-index-to-rbush branch August 17, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants