docs: document betweenness sampling config by shahar-biron · Pull Request #380 · FalkorDB/docs

shahar-biron · 2026-02-27T10:30:28Z

Documents Betweenness Centrality config options:

Add samplingSize and samplingSeed to the parameters section (including defaults and behavior).
Add usage notes around reproducibility and performance/accuracy tradeoffs.
Add the terms to the docs wordlist for spellcheck.

Co-Authored-By: Oz oz-agent@warp.dev

Summary by CodeRabbit

New Features
- Betweenness-centrality accepts optional sampling configuration (samplingSize, samplingSeed) to tune sampling behavior and reproducibility.
Documentation
- Procedure docs and examples updated to show new sampling options, defaults, call syntax, and guidance on accuracy vs. performance.
Chores
- Updated internal word list to include the new sampling tokens.

Co-Authored-By: Oz <oz-agent@warp.dev>

coderabbitai · 2026-02-27T10:30:48Z

📝 Walkthrough

Walkthrough

Adds two new sampling configuration options—samplingSize and samplingSeed—to the betweenness centrality documentation and the repository word list; updates CALL signature, parameter table, yield/notes, and examples. No implementation code changes.

Changes

Cohort / File(s)	Summary
Wordlist `./.wordlist.txt`	Adds two new tokens: `samplingSize`, `samplingSeed`.
Betweenness docs `algorithms/betweenness-centrality.md`	Updates procedure CALL signature and examples to accept an optional configuration map including `samplingSize` and `samplingSeed`; expands parameter table with defaults and descriptions; updates yield/notes to describe sampling effects and reproducibility.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A seed, a size, I nibble the line,
New tokens tucked in the doc's design.
I hop through examples, tidy and neat,
Sampling whispers make the notes complete,
Tiny changes — a rabbit's small rhyme.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: documenting betweenness sampling configuration parameters.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch docs-betweenness-sampling-config

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

.wordlist.txt (1)
541-542: Reverse the order of these two entries.

The entries are not in alphabetical order. samplingSeed should come before samplingSize (alphabetically 'e' comes before 'i').
📝 Proposed fix
-samplingSize
 samplingSeed
+samplingSize
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.wordlist.txt around lines 541 - 542, The two wordlist entries are out of
alphabetical order: swap the positions of the tokens "samplingSize" and
"samplingSeed" so that "samplingSeed" appears before "samplingSize" in the
.wordlist.txt file; locate the two entries (the strings samplingSize and
samplingSeed) and reverse their order.
algorithms/betweenness-centrality.md (1)
100-102: LGTM! Clear and helpful usage notes.

The usage notes effectively communicate:

Performance considerations

The accuracy/performance trade-off

Reproducibility requirements

Consider adding a second example demonstrating the sampling parameters:
// Example: Using sampling for better performance on large graphs
CALL algo.betweenness({
    'nodeLabels': ['Person'], 
    'relationshipTypes': ['FRIEND'],
    'samplingSize': 16,
    'samplingSeed': 42
})
YIELD node, score
RETURN node.name AS person, score
ORDER BY score DESC
This would help users understand how to apply the sampling configuration in practice.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@algorithms/betweenness-centrality.md` around lines 100 - 102, Add a second
usage example showing how to enable sampling for performance: include a short
cypher example that calls algo.betweenness with samplingSize and samplingSeed
alongside nodeLabels and relationshipTypes so readers can see practical usage of
the sampling parameters (use the same parameter names samplingSize and
samplingSeed and demonstrate reproducibility by setting a fixed seed).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@algorithms/betweenness-centrality.md`:
- Around line 39-44: Update the parameter table entry for samplingSeed to show
the actual default value instead of the word "time-based": change the Default
cell to `0` (or `null` if code uses null) and add a parenthetical note such as
"`0` triggers time-based seeding" so readers know the concrete default; ensure
the description for samplingSeed remains consistent with this change (it already
mentions "If omitted (or set to `0`)") and reference the samplingSeed parameter
name so the table and description align.

---

Nitpick comments:
In @.wordlist.txt:
- Around line 541-542: The two wordlist entries are out of alphabetical order:
swap the positions of the tokens "samplingSize" and "samplingSeed" so that
"samplingSeed" appears before "samplingSize" in the .wordlist.txt file; locate
the two entries (the strings samplingSize and samplingSeed) and reverse their
order.

In `@algorithms/betweenness-centrality.md`:
- Around line 100-102: Add a second usage example showing how to enable sampling
for performance: include a short cypher example that calls algo.betweenness with
samplingSize and samplingSeed alongside nodeLabels and relationshipTypes so
readers can see practical usage of the sampling parameters (use the same
parameter names samplingSize and samplingSeed and demonstrate reproducibility by
setting a fixed seed).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b21c87 and 2c25bd2.

📒 Files selected for processing (2)

.wordlist.txt
algorithms/betweenness-centrality.md

algorithms/betweenness-centrality.md

swilly22 · 2026-02-27T10:57:59Z

algorithms/betweenness-centrality.md

+|---------------------|---------|-------------|---------|
+| `nodeLabels`        | Array   | *(Optional)* List of node labels to include in the computation. | `[]` (all labels) |
+| `relationshipTypes` | Array   | *(Optional)* List of relationship types to traverse. | `[]` (all relationship types) |
+| `samplingSize`      | Integer | *(Optional)* Number of randomly sampled **source nodes** used to approximate betweenness centrality. Larger values improve accuracy but increase runtime. The value is clamped to the number of reachable nodes in the graph. | `32` |


Please rephrase: "The value is clamped to the number of reachable nodes in the graph."
It's a bit unclear, does that mean the max value equals to the number of nodes in the graph ?

Co-Authored-By: Oz <oz-agent@warp.dev>

coderabbitai

♻️ Duplicate comments (1)

algorithms/betweenness-centrality.md (1)
44-44: ⚠️ Potential issue | 🟡 Minor

Use a concrete default value for samplingSeed in the table.

time-based describes behavior, not the actual default input value. Since the description already states “omitted (or set to 0)”, set the Default cell to 0 (or null, if that is the real default) and keep the time-based behavior in the description.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@algorithms/betweenness-centrality.md` at line 44, Update the table row for
the parameter samplingSeed so the Default column contains a concrete value (`0`)
instead of the phrase "time-based"; keep the explanatory text in the Description
(which already notes "omitted (or set to `0`), a time-based seed is used") so
behavior is preserved while the Default cell shows the actual input value
consumers should use.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@algorithms/betweenness-centrality.md`:
- Line 44: Update the table row for the parameter samplingSeed so the Default
column contains a concrete value (`0`) instead of the phrase "time-based"; keep
the explanatory text in the Description (which already notes "omitted (or set to
`0`), a time-based seed is used") so behavior is preserved while the Default
cell shows the actual input value consumers should use.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c25bd2 and e6d8d78.

📒 Files selected for processing (1)

algorithms/betweenness-centrality.md

swilly22 · 2026-02-27T16:46:24Z

algorithms/betweenness-centrality.md

 | `nodeLabels`        | Array   | *(Optional)* List of node labels to include in the computation. | `[]` (all labels) |
 | `relationshipTypes` | Array   | *(Optional)* List of relationship types to traverse. | `[]` (all relationship types) |
-| `samplingSize`      | Integer | *(Optional)* Number of randomly sampled **source nodes** used to approximate betweenness centrality. Larger values improve accuracy but increase runtime. The value is clamped to the number of reachable nodes in the graph. | `32` |
+| `samplingSize`      | Integer | *(Optional)* Number of randomly sampled **source nodes** used to approximate betweenness centrality. Larger values improve accuracy but increase runtime. For similar accuracy across different graph sizes, keep `samplingSize` proportional to the number of input nodes (i.e., sample a similar fraction of nodes). If `samplingSize` is larger than the number of eligible nodes, all eligible nodes are used. | `32` |


this is too long, no need to go into so much details, keep this simple.

Co-Authored-By: Oz <oz-agent@warp.dev>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@algorithms/betweenness-centrality.md`:
- Around line 100-102: Clarify that when samplingSize is used the algorithm is
approximate: update the three bullets to state that with the default
samplingSize=32 the betweenness score is an approximation (not exact
all-shortest-paths), and add a note that to obtain exact betweenness you must
disable sampling (i.e., set samplingSize to disable sampling / full mode) so the
implementation computes all shortest paths; also keep the existing guidance
about samplingSeed for reproducibility. Reference samplingSize and samplingSeed
and the “all shortest paths” claim so the wording reconciles both behaviors.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6d8d78 and 981b9b4.

📒 Files selected for processing (1)

algorithms/betweenness-centrality.md

coderabbitai · 2026-03-01T06:33:35Z

algorithms/betweenness-centrality.md

+- Betweenness Centrality can be computationally expensive on large, dense graphs.
+- Use `samplingSize` to trade accuracy for performance (larger samples are slower but usually more accurate).
+- Set `samplingSeed` to a fixed value to make runs reproducible; if you omit it, results may vary between runs due to random sampling.


⚠️ Potential issue | 🟡 Minor

Clarify that sampling produces an approximation by default.

With default samplingSize=32, results are typically approximate, so the usage notes should explicitly qualify this to avoid conflicting with the earlier “all shortest paths” statement (Line 98).

✏️ Suggested wording

-- Betweenness Centrality can be computationally expensive on large, dense graphs. -- Use `samplingSize` to trade accuracy for performance (larger samples are slower but usually more accurate). -- Set `samplingSeed` to a fixed value to make runs reproducible; if you omit it, results may vary between runs due to random sampling. +- Betweenness Centrality can be computationally expensive on large, dense graphs. +- By default, scores are estimated using sampled source nodes (`samplingSize=32`); increase `samplingSize` for higher fidelity at higher cost. +- Set `samplingSeed` to a fixed value to make sampled runs reproducible; if omitted (or `0`), a time-based seed is used.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@algorithms/betweenness-centrality.md` around lines 100 - 102, Clarify that when samplingSize is used the algorithm is approximate: update the three bullets to state that with the default samplingSize=32 the betweenness score is an approximation (not exact all-shortest-paths), and add a note that to obtain exact betweenness you must disable sampling (i.e., set samplingSize to disable sampling / full mode) so the implementation computes all shortest paths; also keep the existing guidance about samplingSeed for reproducibility. Reference samplingSize and samplingSeed and the “all shortest paths” claim so the wording reconciles both behaviors.

docs: document betweenness sampling config

2c25bd2

Co-Authored-By: Oz <oz-agent@warp.dev>

shahar-biron requested review from AviAvni and swilly22 February 27, 2026 10:31

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

algorithms/betweenness-centrality.md Outdated Show resolved Hide resolved

swilly22 requested changes Feb 27, 2026

View reviewed changes

docs: clarify samplingSize accuracy note

e6d8d78

Co-Authored-By: Oz <oz-agent@warp.dev>

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

swilly22 requested changes Feb 27, 2026

View reviewed changes

docs: clarify betweenness sampling defaults

981b9b4

Co-Authored-By: Oz <oz-agent@warp.dev>

shahar-biron requested a review from swilly22 March 1, 2026 06:33

coderabbitai bot reviewed Mar 1, 2026

View reviewed changes

swilly22 approved these changes Mar 1, 2026

View reviewed changes

swilly22 merged commit 78db3fd into main Mar 1, 2026
4 checks passed

swilly22 deleted the docs-betweenness-sampling-config branch March 1, 2026 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: document betweenness sampling config#380

docs: document betweenness sampling config#380
swilly22 merged 3 commits intomainfrom
docs-betweenness-sampling-config

shahar-biron commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

swilly22 Feb 27, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

swilly22 Feb 27, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shahar-biron commented Feb 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

swilly22 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

swilly22 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shahar-biron commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading