Skip to content

docs: document betweenness sampling config#380

Merged
swilly22 merged 3 commits intomainfrom
docs-betweenness-sampling-config
Mar 1, 2026
Merged

docs: document betweenness sampling config#380
swilly22 merged 3 commits intomainfrom
docs-betweenness-sampling-config

Conversation

@shahar-biron
Copy link
Contributor

@shahar-biron shahar-biron commented Feb 27, 2026

Documents Betweenness Centrality config options:

  • Add samplingSize and samplingSeed to the parameters section (including defaults and behavior).
  • Add usage notes around reproducibility and performance/accuracy tradeoffs.
  • Add the terms to the docs wordlist for spellcheck.

Co-Authored-By: Oz oz-agent@warp.dev

Summary by CodeRabbit

  • New Features

    • Betweenness-centrality accepts optional sampling configuration (samplingSize, samplingSeed) to tune sampling behavior and reproducibility.
  • Documentation

    • Procedure docs and examples updated to show new sampling options, defaults, call syntax, and guidance on accuracy vs. performance.
  • Chores

    • Updated internal word list to include the new sampling tokens.

Co-Authored-By: Oz <oz-agent@warp.dev>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

📝 Walkthrough

Walkthrough

Adds two new sampling configuration options—samplingSize and samplingSeed—to the betweenness centrality documentation and the repository word list; updates CALL signature, parameter table, yield/notes, and examples. No implementation code changes.

Changes

Cohort / File(s) Summary
Wordlist
./.wordlist.txt
Adds two new tokens: samplingSize, samplingSeed.
Betweenness docs
algorithms/betweenness-centrality.md
Updates procedure CALL signature and examples to accept an optional configuration map including samplingSize and samplingSeed; expands parameter table with defaults and descriptions; updates yield/notes to describe sampling effects and reproducibility.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A seed, a size, I nibble the line,
New tokens tucked in the doc's design.
I hop through examples, tidy and neat,
Sampling whispers make the notes complete,
Tiny changes — a rabbit's small rhyme.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: documenting betweenness sampling configuration parameters.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch docs-betweenness-sampling-config

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
.wordlist.txt (1)

541-542: Reverse the order of these two entries.

The entries are not in alphabetical order. samplingSeed should come before samplingSize (alphabetically 'e' comes before 'i').

📝 Proposed fix
-samplingSize
 samplingSeed
+samplingSize
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.wordlist.txt around lines 541 - 542, The two wordlist entries are out of
alphabetical order: swap the positions of the tokens "samplingSize" and
"samplingSeed" so that "samplingSeed" appears before "samplingSize" in the
.wordlist.txt file; locate the two entries (the strings samplingSize and
samplingSeed) and reverse their order.
algorithms/betweenness-centrality.md (1)

100-102: LGTM! Clear and helpful usage notes.

The usage notes effectively communicate:

  • Performance considerations
  • The accuracy/performance trade-off
  • Reproducibility requirements

Consider adding a second example demonstrating the sampling parameters:

// Example: Using sampling for better performance on large graphs
CALL algo.betweenness({
    'nodeLabels': ['Person'], 
    'relationshipTypes': ['FRIEND'],
    'samplingSize': 16,
    'samplingSeed': 42
})
YIELD node, score
RETURN node.name AS person, score
ORDER BY score DESC

This would help users understand how to apply the sampling configuration in practice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@algorithms/betweenness-centrality.md` around lines 100 - 102, Add a second
usage example showing how to enable sampling for performance: include a short
cypher example that calls algo.betweenness with samplingSize and samplingSeed
alongside nodeLabels and relationshipTypes so readers can see practical usage of
the sampling parameters (use the same parameter names samplingSize and
samplingSeed and demonstrate reproducibility by setting a fixed seed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@algorithms/betweenness-centrality.md`:
- Around line 39-44: Update the parameter table entry for samplingSeed to show
the actual default value instead of the word "time-based": change the Default
cell to `0` (or `null` if code uses null) and add a parenthetical note such as
"`0` triggers time-based seeding" so readers know the concrete default; ensure
the description for samplingSeed remains consistent with this change (it already
mentions "If omitted (or set to `0`)") and reference the samplingSeed parameter
name so the table and description align.

---

Nitpick comments:
In @.wordlist.txt:
- Around line 541-542: The two wordlist entries are out of alphabetical order:
swap the positions of the tokens "samplingSize" and "samplingSeed" so that
"samplingSeed" appears before "samplingSize" in the .wordlist.txt file; locate
the two entries (the strings samplingSize and samplingSeed) and reverse their
order.

In `@algorithms/betweenness-centrality.md`:
- Around line 100-102: Add a second usage example showing how to enable sampling
for performance: include a short cypher example that calls algo.betweenness with
samplingSize and samplingSeed alongside nodeLabels and relationshipTypes so
readers can see practical usage of the sampling parameters (use the same
parameter names samplingSize and samplingSeed and demonstrate reproducibility by
setting a fixed seed).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b21c87 and 2c25bd2.

📒 Files selected for processing (2)
  • .wordlist.txt
  • algorithms/betweenness-centrality.md

|---------------------|---------|-------------|---------|
| `nodeLabels` | Array | *(Optional)* List of node labels to include in the computation. | `[]` (all labels) |
| `relationshipTypes` | Array | *(Optional)* List of relationship types to traverse. | `[]` (all relationship types) |
| `samplingSize` | Integer | *(Optional)* Number of randomly sampled **source nodes** used to approximate betweenness centrality. Larger values improve accuracy but increase runtime. The value is clamped to the number of reachable nodes in the graph. | `32` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rephrase: "The value is clamped to the number of reachable nodes in the graph."
It's a bit unclear, does that mean the max value equals to the number of nodes in the graph ?

Co-Authored-By: Oz <oz-agent@warp.dev>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
algorithms/betweenness-centrality.md (1)

44-44: ⚠️ Potential issue | 🟡 Minor

Use a concrete default value for samplingSeed in the table.

time-based describes behavior, not the actual default input value. Since the description already states “omitted (or set to 0)”, set the Default cell to 0 (or null, if that is the real default) and keep the time-based behavior in the description.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@algorithms/betweenness-centrality.md` at line 44, Update the table row for
the parameter samplingSeed so the Default column contains a concrete value (`0`)
instead of the phrase "time-based"; keep the explanatory text in the Description
(which already notes "omitted (or set to `0`), a time-based seed is used") so
behavior is preserved while the Default cell shows the actual input value
consumers should use.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@algorithms/betweenness-centrality.md`:
- Line 44: Update the table row for the parameter samplingSeed so the Default
column contains a concrete value (`0`) instead of the phrase "time-based"; keep
the explanatory text in the Description (which already notes "omitted (or set to
`0`), a time-based seed is used") so behavior is preserved while the Default
cell shows the actual input value consumers should use.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c25bd2 and e6d8d78.

📒 Files selected for processing (1)
  • algorithms/betweenness-centrality.md

| `nodeLabels` | Array | *(Optional)* List of node labels to include in the computation. | `[]` (all labels) |
| `relationshipTypes` | Array | *(Optional)* List of relationship types to traverse. | `[]` (all relationship types) |
| `samplingSize` | Integer | *(Optional)* Number of randomly sampled **source nodes** used to approximate betweenness centrality. Larger values improve accuracy but increase runtime. The value is clamped to the number of reachable nodes in the graph. | `32` |
| `samplingSize` | Integer | *(Optional)* Number of randomly sampled **source nodes** used to approximate betweenness centrality. Larger values improve accuracy but increase runtime. For similar accuracy across different graph sizes, keep `samplingSize` proportional to the number of input nodes (i.e., sample a similar fraction of nodes). If `samplingSize` is larger than the number of eligible nodes, all eligible nodes are used. | `32` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too long, no need to go into so much details, keep this simple.

Co-Authored-By: Oz <oz-agent@warp.dev>
@shahar-biron shahar-biron requested a review from swilly22 March 1, 2026 06:33
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@algorithms/betweenness-centrality.md`:
- Around line 100-102: Clarify that when samplingSize is used the algorithm is
approximate: update the three bullets to state that with the default
samplingSize=32 the betweenness score is an approximation (not exact
all-shortest-paths), and add a note that to obtain exact betweenness you must
disable sampling (i.e., set samplingSize to disable sampling / full mode) so the
implementation computes all shortest paths; also keep the existing guidance
about samplingSeed for reproducibility. Reference samplingSize and samplingSeed
and the “all shortest paths” claim so the wording reconciles both behaviors.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6d8d78 and 981b9b4.

📒 Files selected for processing (1)
  • algorithms/betweenness-centrality.md

Comment on lines +100 to +102
- Betweenness Centrality can be computationally expensive on large, dense graphs.
- Use `samplingSize` to trade accuracy for performance (larger samples are slower but usually more accurate).
- Set `samplingSeed` to a fixed value to make runs reproducible; if you omit it, results may vary between runs due to random sampling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify that sampling produces an approximation by default.

With default samplingSize=32, results are typically approximate, so the usage notes should explicitly qualify this to avoid conflicting with the earlier “all shortest paths” statement (Line 98).

✏️ Suggested wording
-- Betweenness Centrality can be computationally expensive on large, dense graphs.
-- Use `samplingSize` to trade accuracy for performance (larger samples are slower but usually more accurate).
-- Set `samplingSeed` to a fixed value to make runs reproducible; if you omit it, results may vary between runs due to random sampling.
+- Betweenness Centrality can be computationally expensive on large, dense graphs.
+- By default, scores are estimated using sampled source nodes (`samplingSize=32`); increase `samplingSize` for higher fidelity at higher cost.
+- Set `samplingSeed` to a fixed value to make sampled runs reproducible; if omitted (or `0`), a time-based seed is used.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@algorithms/betweenness-centrality.md` around lines 100 - 102, Clarify that
when samplingSize is used the algorithm is approximate: update the three bullets
to state that with the default samplingSize=32 the betweenness score is an
approximation (not exact all-shortest-paths), and add a note that to obtain
exact betweenness you must disable sampling (i.e., set samplingSize to disable
sampling / full mode) so the implementation computes all shortest paths; also
keep the existing guidance about samplingSeed for reproducibility. Reference
samplingSize and samplingSeed and the “all shortest paths” claim so the wording
reconciles both behaviors.

@swilly22 swilly22 merged commit 78db3fd into main Mar 1, 2026
4 checks passed
@swilly22 swilly22 deleted the docs-betweenness-sampling-config branch March 1, 2026 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants