Skip to content

fix: resolve CodeQL Java code-scanning alerts at their true sources#4383

Merged
robfrank merged 4 commits into
mainfrom
fix/codeql-java-scanning-alerts
May 28, 2026
Merged

fix: resolve CodeQL Java code-scanning alerts at their true sources#4383
robfrank merged 4 commits into
mainfrom
fix/codeql-java-scanning-alerts

Conversation

@robfrank
Copy link
Copy Markdown
Collaborator

@robfrank robfrank commented May 28, 2026

Summary

Resolves the 30 open CodeQL (language:java) code-scanning alerts on main. Triaging the SARIF code-flows showed the reported sink locations collapse to four root causes - the alert path:line is the sink, while the real source lives upstream in the flow.

Rule Alerts Root cause Fix
java/insecure-randomness 24 RandFunction.execute() returned Math.random(); CodeQL follows the SQL rand() value through config-string resolution into 24 security sinks (BoltSslHelper, SocketFactory, ServerSecurity, ArcadeDBServer, HttpServer, etc.) ThreadLocal<SecureRandom> - cryptographically strong source without cross-thread lock contention on the query hot path
java/tainted-format-string 4 BinarySerializer concatenated an untrusted value into a log message Pass value as a %s argument instead of concatenating into the template
java/path-injection 1 Untrusted Neo4j import labels became schema type names and on-disk bucket file names validateLabel() at the import boundary rejects /, \, NUL and ..
java/implicit-cast-in-compound-assignment 1 int += long narrowing in a test Widen totalRecords to long

The path-injection fix is at the import trust boundary rather than the file-open sink because at the sink the trusted directory and untrusted name are a single indistinguishable string, and ArcadeDB database paths can legitimately be relative (contain ..), so a sink-side .. rejection would break valid usage.

Test plan

  • engine and integration modules compile
  • CountStepTest, BinarySerializerTest, JavaBinarySerializerTest, JsonSerializerTest
  • Cypher math tests exercising rand() (OpenCypherMathNumericFunctionsComprehensiveTest, CypherFunctionFactoryExtendedTest, FunctionCachingTest)
  • Neo4jImporterIT - 8 tests incl. 2 new regressions (rejectNodeLabelWithPathTraversal, rejectEdgeLabelWithPathTraversal)
  • CodeQL re-scan on the PR confirms the alerts clear

Performance (rand() SecureRandom)

rand() now uses a thread-local SecureRandom. Measured per-call cost (RandFunctionBenchmark, @Tag("benchmark")):

source ns/op
RandFunction (ThreadLocal) ~95
ThreadLocalRandom.nextDouble() ~2.6
Math.random() ~9.6

Verdict: within budget. RandFunction has no internal engine callers - it backs only the user-facing rand() (Cypher + SQL). The ~95 ns/op is incurred only when a user writes rand(), and it sits on top of per-row record fetch + binary deserialization (hundreds of ns to several µs/row) that dominates by 1-2 orders of magnitude. Worst realistic case ORDER BY rand() over 1M rows adds ~92 ms vs ThreadLocalRandom, behind a sort+materialization that already costs far more. The benchmark stays so the tradeoff is re-checkable if a heavy random-sampling workload ever justifies revisiting.

Triaged 30 open CodeQL alerts on main via the SARIF code-flows: the
reported sink locations collapse to four root causes.

- insecure-randomness (24 alerts): RandFunction returned Math.random(),
  which CodeQL follows through config-string resolution into security
  sinks. Switched to a per-thread SecureRandom (secure source, no
  cross-thread lock contention on the query hot path).
- tainted-format-string (4 alerts): BinarySerializer concatenated an
  untrusted value into a log message; pass it as a %s argument instead.
- path-injection (1 alert): untrusted Neo4j import labels became schema
  type names and on-disk bucket file names. Validate labels at the
  import boundary, rejecting path separators and parent traversal.
- implicit-cast-in-compound-assignment (1 alert): widen totalRecords to
  long in CountStepTest.

Added Neo4jImporterIT regression tests for malicious node/edge labels.
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 28, 2026

Not up to standards ⛔

🔴 Issues 1 minor

Alerts:
⚠ 1 issue (≤ 0 issues of at least minor severity)

Results:
1 new issue

Category Results
CodeStyle 1 minor

View in Codacy

🟢 Metrics 85 complexity

Metric Results
Complexity 85

View in Codacy

🟢 Coverage 90.91% diff coverage

Metric Results
Coverage variation Report missing for 198d9051
Diff coverage 90.91% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (198d905) Report Missing Report Missing Report Missing
Head commit (e57af51) 159513 105296 66.01%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4383) 11 10 90.91%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

1 Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces path traversal validation for Neo4j importer labels to prevent writing files outside the database directory, adds corresponding integration tests, fixes a logging format string in BinarySerializer, and updates a test variable type from int to long. Additionally, it replaces Math.random() with a thread-local SecureRandom in RandFunction. The reviewer raised a performance concern regarding the use of SecureRandom for the SQL rand() function, recommending ThreadLocalRandom instead to avoid unnecessary overhead on the query hot path.

Comment on lines +30 to 41
// Per-thread SecureRandom: cryptographically strong source without cross-thread lock contention on the query hot path.
private static final ThreadLocal<SecureRandom> RANDOM = ThreadLocal.withInitial(SecureRandom::new);

@Override
public String getName() {
return "rand";
}

@Override
public Object execute(final Object[] args, final CommandContext context) {
return Math.random();
return RANDOM.get().nextDouble();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Performance & Architectural Concern: Use of SecureRandom for SQL rand()

While using SecureRandom resolves the CodeQL java/insecure-randomness alert, doing so introduces a significant performance bottleneck on the query hot path:

  1. Performance Overhead: SecureRandom is cryptographically secure but orders of magnitude slower than ThreadLocalRandom or Math.random(). SQL rand() is commonly used in queries for non-cryptographic purposes (e.g., ORDER BY rand(), random sampling, or generating mock data) where performance is critical.
  2. Seeding Cost: ThreadLocal.withInitial(SecureRandom::new) seeds a new SecureRandom instance for every thread that executes a query containing rand(), which adds substantial latency to the first query execution on each thread.
  3. Architectural Misalignment: A SQL rand() function returns a double (0.0 to 1.0), which is inherently unsuitable for generating high-entropy cryptographic keys or tokens. If security-sensitive sinks (like SSL helpers or socket factories) are accepting random values from SQL queries or configuration strings, the vulnerability should be addressed at those sinks (by using SecureRandom directly there) rather than forcing all SQL queries to pay a performance penalty.

Recommendation:
Consider reverting to ThreadLocalRandom.current().nextDouble() for the SQL rand() function to maintain high query performance, and handle the CodeQL alert either by dismissing it as a false positive (since SQL rand() is not a security-sensitive source) or by securing the actual sinks.

Suggested change
// Per-thread SecureRandom: cryptographically strong source without cross-thread lock contention on the query hot path.
private static final ThreadLocal<SecureRandom> RANDOM = ThreadLocal.withInitial(SecureRandom::new);
@Override
public String getName() {
return "rand";
}
@Override
public Object execute(final Object[] args, final CommandContext context) {
return Math.random();
return RANDOM.get().nextDouble();
}
@Override
public String getName() {
return "rand";
}
@Override
public Object execute(final Object[] args, final CommandContext context) {
return java.util.concurrent.ThreadLocalRandom.current().nextDouble();
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping SecureRandom. Reverting to ThreadLocalRandom would re-open all 24 java/insecure-randomness alerts (CodeQL treats ThreadLocalRandom, a java.util.Random subclass, identically to Math.random()), so it trades the fix for a manual FP-dismissal rather than solving it - and the goal of this PR is to resolve the alerts in code.

The flows into the SSL/socket sinks are spurious (CodeQL routes the SQL rand() value through ContextConfiguration.toString()); there is no real sink receiving rand() output to harden, so "secure the sinks" has nothing to target.

The performance point is fair, so I added RandFunctionBenchmark (8d7c22a, @Tag("benchmark"), excluded from regular CI) to quantify it. Measured per-call cost on this machine:

source ns/op
RandFunction (ThreadLocal) 104.2
ThreadLocalRandom.nextDouble() 2.6
Math.random() 9.6

So ORDER BY rand() over 1M rows is ~104ms vs ~3ms - measurable but bounded, and now tracked if it ever needs revisiting.

@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

Code Review

Well-structured security fix PR - the root-cause table in the description is a good read and the approach of patching the source rather than the sinks is correct. Some targeted observations below.


RandFunction.java - ThreadLocal<SecureRandom>

Correct approach for silencing CodeQL's taint-tracking from rand() into crypto sinks. The ThreadLocal pattern avoids the synchronization overhead that a shared static SecureRandom would introduce.

Minor concern - inline comment: Per project style (CLAUDE.md), comments should only be added when the WHY is non-obvious. The comment on the ThreadLocal field is reasonable for explaining the concurrency decision, but it could be tightened:

// ThreadLocal avoids the per-call synchronization of a shared SecureRandom instance.
private static final ThreadLocal<SecureRandom> RANDOM = ThreadLocal.withInitial(SecureRandom::new);

Performance note (informational): SecureRandom.nextDouble() is measurably slower than Math.random() / ThreadLocalRandom. For a pure math function like rand() this is usually fine, but worth benchmarking if this function shows up in hot query paths. If performance becomes an issue, SplittableRandom can be an alternative that still avoids shared state while being faster (though it is not CSPRNG - whether that matters for a SQL rand() is a design call).


BinarySerializer.java - log format fix

Clean fix. Passing value as a %s argument correctly separates the format template from user-controlled data.


CountStepTest.java - int to long

Correct. The count variable is a long, accumulating into an int could silently overflow for large row counts.


Neo4jImporter.java - validateLabel

Coverage looks complete. Traced all three code paths:

  • syncSchema() - node labels via typeNameFromLabels() (validated), edge label direct via validateLabel() (validated)
  • parseVertices() - calls typeNameFromLabels() (validated)
  • parseEdges() - calls validateLabel() directly (validated)

Comment style: The Javadoc block on validateLabel is multi-line, which goes against CLAUDE.md's "don't write multi-paragraph docstrings or multi-line comment blocks" guidance. Since this is a private static helper, a single line is sufficient:

// Rejects labels that could become path-traversal sequences in on-disk bucket file names.
private static String validateLabel(final String label) {

.. check breadth: label.contains("..") is conservative - it also rejects labels with two consecutive dots that are not path-traversal sequences (e.g. a hypothetical version label v1..2). That trade-off is acceptable for an import boundary, but worth documenting in a comment or error message if users ever report confusion.

Windows reserved names: The current validation blocks /, \, \0, and ... On Windows, NTFS reserved names (CON, NUL, COM1, LPT1, etc.) and trailing spaces/dots can also cause unexpected file-system behavior. This is a corner case but worth a follow-up issue if Windows deployments are a priority.


Tests

The two new integration tests (rejectNodeLabelWithPathTraversal, rejectEdgeLabelWithPathTraversal) follow the project's AssertJ style and cover both node and edge injection vectors. The finally blocks for cleanup are correct.

One small nit: the test string for the edge case uses Java string literal \\\\ to produce \\ in the JSON, which yields a Windows-style traversal (..\..\evil). This is a good complement to the Unix-style test, but the path is only tested via the label field. A multi-label node with a Windows-style traversal in the labels array is not covered - minor gap.


Overall

The PR does what it says. The four root causes are correctly identified and fixed at the right layer. No blocking issues - the nits above are all style/comment-level or minor follow-up candidates.

Address PR review:
- RandFunctionBenchmark compares the rand() SecureRandom source against
  ThreadLocalRandom and Math.random() so the security/perf tradeoff is
  measurable (tagged benchmark, excluded from regular CI).
- Collapse the validateLabel Javadoc block to a single-line comment per
  the project comment-style guidance.
@robfrank
Copy link
Copy Markdown
Collaborator Author

Addressing the review points (claude[bot] + gemini):

Done

  • validateLabel Javadoc collapsed to a single-line comment (CLAUDE.md: no multi-line comment blocks). 8d7c22a2e
  • Added RandFunctionBenchmark to measure the rand() SecureRandom vs ThreadLocalRandom/Math.random tradeoff; see the numbers in the inline thread. 8d7c22a2e

rand() RNG - keeping SecureRandom (details in the inline thread): reverting re-opens all 24 alerts since CodeQL flags ThreadLocalRandom the same as Math.random(), and the flagged sinks are spurious flows with no real rand() consumer to harden.

Declining / deferring (out of scope for this CodeQL PR):

  • Windows reserved names (CON, NUL, COM1...): not a path-traversal vector - they don't escape the database directory, so they're outside this PR's scope. Reasonable follow-up issue if Windows deployments need it.
  • label.contains("..") breadth (rejects e.g. v1..2): intentional conservative choice at an import trust boundary; the exception message already states the rule. A label containing .. is pathological for a Neo4j export.
  • Multi-label Windows-traversal test: the multi-label path routes each element through the same validateLabel (.map(Neo4jImporter::validateLabel)); node (Unix) + edge (Windows) cases already exercise both separators through that one function, so the gap is a duplicate of covered code.

@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

Code Review

Good targeted security fix. The four root causes are clearly diagnosed and the fixes land at the right abstraction layer. A few notes below.


RandFunction.java - SecureRandom change

Good. ThreadLocal<SecureRandom> is the correct pattern: avoids the synchronization penalty of a shared SecureRandom and satisfies CodeQL's taint tracking away from security sinks.

One performance consideration: SecureRandom.nextDouble() is measurably slower than ThreadLocalRandom because it calls into the OS entropy pool (especially on first call per thread). Since rand() is used in query expressions like ORDER BY rand() LIMIT 10, callers running many concurrent queries will notice the difference. The benchmark confirms the overhead exists - worth calling out explicitly in the PR description whether the numbers are acceptable (e.g., "X ns/op vs Y ns/op - within budget"). If this proves too costly in production profiling, ThreadLocalRandom.current().nextDouble() would silence the CodeQL alert if the taint never actually reaches a security sink in practice (the 24 alerts were false positives at the sink end).

The inline comment at line 24 is appropriate - the non-obvious constraint (CodeQL taint chain) is exactly what comments are for.


BinarySerializer.java - Log format-string fix

Clean and correct. No concerns.


CountStepTest.java - int -> long

Correct. longValue() assigned to int was a silent narrowing cast.


Neo4jImporter.java - validateLabel() path-traversal guard

The trust-boundary placement is the right call - explained well in the PR description.

One concern with the .. check:

label.contains("..")

This rejects any label containing the two-character sequence .., including legitimate names like v2..0, schema..type, or any label with consecutive dots. A path traversal requires .. as a complete path segment (preceded/followed by /, \, or the start/end of the string). Consider a tighter check:

label.equals("..") || label.startsWith("../") || label.startsWith("..\\")
    || label.endsWith("/..") || label.endsWith("\\..")
    || label.contains("/../") || label.contains("\\..\\")
    || label.contains("/..\\") || label.contains("\\../")

Or more concisely with a regex: label.matches("(?:.*[/\\\\])?\\.\\.(?:[/\\\\].*)?"). That said, the current broad check is safe-by-default and may be acceptable if labels with .. are not expected in real Neo4j exports.

NUL byte check (label.indexOf('\0') >= 0) - correct and necessary; some OS path APIs truncate at NUL.


RandFunctionBenchmark.java - New benchmark

  • @Tag("benchmark") and placement in performance/ are correct per project conventions.
  • System.out.println / System.out.printf - intentional for a benchmark (the output IS the purpose), so this is fine to keep, unlike debug prints.
  • The blackhole trick prevents dead-code elimination, good.
  • Minor: RandFunction.execute(null, null) works because neither args nor context is used in the implementation, but this is fragile if the method signature changes. Consider passing new Object[0] for args to be explicit.

Neo4jImporterIT.java - Regression tests

Both tests are well-structured with proper finally cleanup. A small note: rejectEdgeLabelWithPathTraversal uses a Windows-style ..\\..\\evil escape in a Java string literal which becomes ..\..\evil at runtime. On Linux CI, this will still trigger the \ path-separator check in validateLabel, so the test will pass - just worth noting it is testing the backslash branch, not a POSIX path-traversal.


Summary

Item Assessment
SecureRandom taint fix Correct; benchmark overhead needs a pass/fail threshold
validateLabel .contains("..") Overly broad - worth narrowing to segment-boundary check
Format string fix Clean
int->long cast fix Correct
Test coverage Good regression tests added

Overall this is solid work - the approach of fixing at the source rather than the sink is the right call throughout.

@robfrank
Copy link
Copy Markdown
Collaborator Author

Follow-up on the rand() SecureRandom perf concern - checked the actual call surface:

RandFunction has no internal engine callers. It backs the user-facing rand() for both engines (Cypher via CypherFunctionFactory, SQL via FunctionRegistry.getStateless("rand") in SQLQueryEngine) and is invoked only when a user writes rand() in a query. Every internal RNG need already uses something else: retry/backoff jitter and async-executor selection use ThreadLocalRandom; encryption/tokens/salt and SQL randomInt() use SecureRandom; graph-ML algos use seeded Random for reproducibility. None route through RandFunction.

So the ~95 ns/op cost is bounded by how often a user calls rand() and over how many rows, and it sits on top of per-row record fetch + binary deserialization (hundreds of ns to several µs/row) that dominates by 1-2 orders of magnitude:

scenario rand() calls added cost vs ThreadLocalRandom (~92 ns/call)
scalar RETURN rand() 1 ~0.0001 ms
typical result set (1K-100K rows) 1K-100K 0.1-9 ms
ORDER BY rand() on 1M rows 1M ~92 ms
WHERE rand() < 0.01 over 10M rows 10M ~0.9 s

Conclusion: not a hot-path bottleneck. Keeping SecureRandom; the RandFunctionBenchmark stays so the tradeoff is measurable if a heavy random-sampling/shuffling workload ever justifies revisiting.

Pass a shared Object[0] to RandFunction.execute instead of null - more
explicit about the unused args contract and avoids per-call allocation
inside the measured loop.
@robfrank
Copy link
Copy Markdown
Collaborator Author

Addressing this round:

Done

  • Benchmark now passes a reused Object[0] constant to RandFunction.execute instead of null - more explicit and avoids per-call allocation skewing the measured loop. fba6dd104
  • PR description now states the perf verdict explicitly ("within budget") with the ns/op table and the no-internal-callers reasoning.

validateLabel .contains("..") - keeping the broad check, for two reasons:

  1. It adds no false-negative risk. The guard is an OR: /, \ and NUL are rejected first, so any label that reaches the .. clause has no path separators at all - and .. without a separator cannot traverse (it just yields a file literally named ..something inside the DB dir). The separator checks are what actually prevent traversal; .. is defense-in-depth.
  2. It is the pattern CodeQL's path-traversal-guard recognition keys on. contains("..") is the canonical dotDotCheck sanitizer. Narrowing it to a segment-boundary check (which, per point 1, would only ever match separator-adjacent .. that we already reject anyway) risks CodeQL no longer recognizing the barrier and reopening the very java/path-injection alert this PR closes.

The cost is rejecting an otherwise-legal label containing literal .. (e.g. My..Type) - not a valid bare Neo4j identifier and vanishingly rare in real exports. For an import trust boundary, failing closed with a clear message is the right tradeoff. If precision is ever wanted, the safe path is to narrow it and confirm against a CodeQL re-scan, not narrow it speculatively.

The Windows-style ..\..\evil test note is correct - it intentionally exercises the backslash branch; the node test covers the POSIX / branch.

@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

test

@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

Code Review

This is a well-reasoned set of security fixes that addresses root causes rather than sinks. Four distinct issues are bundled, all related. Overall the approach is sound; a few targeted observations below.


RandFunction.java - ThreadLocal<SecureRandom>

Correct fix. ThreadLocal avoids the contention of a shared static SecureRandom. The PR description analysis is solid: RandFunction has no internal engine callers and ~95 ns/op sits well below per-row deserialization cost.

One minor style note: CLAUDE.md asks that comments only explain non-obvious WHY. The existing comment on the ThreadLocal field is borderline; this is one of the clearer cases where the concurrency rationale is genuinely non-obvious from the field declaration alone, so it is defensible.


BinarySerializer.java - format string fix

Clean. Using %s as a positional argument is the correct fix for CodeQL's java/tainted-format-string.


CountStepTest.java - int to long

Correct. Accumulating longValue() into an int was a silent narrowing cast.


Neo4jImporter.java - validateLabel

Pre-existing guard inconsistency worth noting:

syncSchema() at line 241 correctly guards before calling validateLabel:

json.has("label") && !json.isNull("label") ? validateLabel(json.getString("label")) : null

But parseEdges() at line 404 calls without that guard:

final String type = validateLabel(json.getString("label"));

JSONObject.getString(name) throws JSONException when the key is absent (confirmed at JSONObject.java:597). So the if (type == null) skip at line 405 only handles "label": null - it will not catch a relationship JSON where the "label" key is entirely missing; that throws instead of warning-and-skipping. This is a pre-existing asymmetry, not introduced by this PR, but wrapping the call with validateLabel without fixing it locks the inconsistency in. Consider:

final String type = json.has("label") && !json.isNull("label") ? validateLabel(json.getString("label")) : null;

contains("..") decision - accepted. The reasoning (separator checks already reject traversal; contains("..") is CodeQL's canonical dotDotCheck sanitizer pattern) is correct and well documented in the PR thread. The tradeoff of rejecting My..Type labels is acceptable at an import trust boundary.

NUL byte check - correct and necessary.


RandFunctionBenchmark.java - benchmark

  • @Tag("benchmark") and placement in performance/ match project conventions. Good.
  • System.out usage is intentional (the output IS the purpose) - fine.
  • blackhole via static long sink and XOR prevents dead-code elimination - correct approach.
  • The multi-line class Javadoc (lines 28-33) is longer than the preferred one-short-line style for comments. Consider condensing to one line or removing it since the class name, tag, and method name make the intent clear.

Neo4jImporterIT.java - regression tests

Both tests are well structured with finally cleanup. The rejectEdgeLabelWithPathTraversal test exercises the backslash branch of validateLabel. The node test uses ../../etc/evil which exercises the slash branch. Together they cover both separator types.

.hasMessageContaining("path separators") is robust enough as long as the error message does not change.


Summary

Change Assessment
SecureRandom via ThreadLocal Correct; performance justified by PR analysis
Format string fix Clean
int->long in test Correct
validateLabel placement Correct trust-boundary approach
Guard inconsistency in parseEdges() Pre-existing; worth fixing while the code is being touched
Regression tests Good coverage of both separator types
Benchmark Correctly tagged; class Javadoc slightly over style limit

Overall this is solid security work. The one actionable suggestion is aligning the parseEdges() null guard with syncSchema() so that a missing "label" key is warned-and-skipped rather than thrown.

@robfrank robfrank merged commit 5afa451 into main May 28, 2026
8 of 12 checks passed
@robfrank robfrank deleted the fix/codeql-java-scanning-alerts branch May 28, 2026 09:53
@claude
Copy link
Copy Markdown

claude Bot commented May 28, 2026

Code Review

This PR addresses 30 CodeQL language:java alerts on main by fixing four root causes. Well-scoped, well-justified, and fixes are at the correct trust boundary.


RandFunction.java

Good: ThreadLocal is the right tradeoff - cryptographically strong without the synchronized overhead of a shared instance. The inline comment explains the rationale concisely.

Minor follow-up worth considering: SQLFunctionRandomInt (engine/.../sql/math/SQLFunctionRandomInt.java) still holds a shared static SecureRandom. It is thread-safe (SecureRandom is internally synchronized) but will contend under parallel query execution. Worth aligning to ThreadLocal in a follow-up.


BinarySerializer.java

Good: Correct one-line fix. Passing value as a format argument rather than concatenating into the template closes the taint sink. Follows the existing LogManager.log() pattern throughout the file.


Neo4jImporter.java

Good: Validation at the import boundary is the right call (the PR body explains why a sink-side fix would break valid usage). All three extraction paths are covered: syncSchema edges, parseEdges, and typeNameFromLabels (both single and multi-label nodes).

One concern - contains("..") is overly broad. This rejects any label with two consecutive dots anywhere, including legitimate names like "schema..version" or "v2..0". Real path traversal requires ".." to appear as a path segment (flanked by separators or at string boundaries). A more precise check would test only for "../" or "..\" or the label equaling ".." outright. If double-dot labels are genuinely not expected in Neo4j exports, the conservative check is fine - just worth a comment stating that intent.

Missing test variants (not blockers):

  • Label with bare "/" or "" without ".."
  • NUL character injection

CountStepTest.java

Good: Correct widening of totalRecords to long. The previous int += long narrowing would silently truncate on large row counts.


RandFunctionBenchmark.java

Good: Properly tagged @tag("benchmark"), consistent with VarIntBenchmark and SelectCacheHitsBenchmark. The blackhole() pattern prevents dead-code elimination. Leaving the benchmark alongside the fix is good practice - the performance claim is now re-checkable.

Minor nit: The private DoubleOp inner interface could be replaced with java.util.function.DoubleSupplier to avoid defining a new type.


Overall

Four root causes fixed cleanly at the right abstraction level, with regression tests for the highest-risk one (path injection). The contains("..") breadth in validateLabel is the only thing worth a follow-up discussion. The PR description quality is high and the benchmark is a valuable addition.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 72.72727% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.34%. Comparing base (cdb1dd5) to head (e57af51).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...m/arcadedb/integration/importer/Neo4jImporter.java 75.00% 0 Missing and 2 partials ⚠️
...java/com/arcadedb/serializer/BinarySerializer.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main    #4383   +/-   ##
=========================================
  Coverage     64.33%   64.34%           
+ Complexity      430      422    -8     
=========================================
  Files          1647     1647           
  Lines        127831   127835    +4     
  Branches      27403    27403           
=========================================
+ Hits          82239    82250   +11     
+ Misses        34043    34039    -4     
+ Partials      11549    11546    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant