Conversation
|
@copilot see description - can you help? |
There was a problem hiding this comment.
Pull request overview
This PR introduces group operations (groupSum, groupCumSum, groupDiff) and a high-performance Tower.splitAmntFast implementation optimized with SIMD vectorization. The changes include refactoring group operations into separate files with inline modifiers, implementing a fast path using Java's Vector API, reorganizing tests, and adding comprehensive benchmarks.
Key changes:
- New inline group operation functions for better performance through method inlining
- High-performance
splitAmntFastmethod with SIMD optimizations using column-major matrix storage - Benchmark harness to compare original vs fast implementations
- Test suite reorganization into separate files for better maintainability
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| vecxtensions/src/groupSum.scala | Refactored to add inline modifier and restructure control flow with if-else instead of early return |
| vecxtensions/src/groupCumSum.scala | Extracted groupCumSum into separate file with inline modifier for better inlining |
| vecxtensions/src/groupDiff.scala | Extracted groupDiff into separate file with inline modifier for better inlining |
| vecxtensions/src/Tower.scala | New Tower implementation with splitAmnt and splitAmntFast methods using SIMD vectorization |
| vecxtensions/src/Layer.scala | Removed Tower case class definition (moved to Tower.scala) |
| vecxtensions/src-jvm/rpt.scala | Added Matrix extension methods for reinsurance and franchise functions |
| vecxtensions/src-native/rpt.scala | Added Matrix extension methods for reinsurance and franchise functions |
| vecxtensions/package.mill | Added references to shared source directories (src-js-native, src-jvm-native) |
| vecxtensions/test/src/tower.test.scala | New comprehensive test suite for Tower and group operations |
| vecxtensions/test/src/groupSum.test.scala | Removed tests for groupCumSum and groupDiff (moved to separate files) |
| vecxtensions/test/src/groupCumul.test.scala | New test file for groupCumSum |
| vecxtensions/test/src/groupDiff.test.scala | New test file for groupDiff |
| benchmark/src/splitAmnt.scala | New JMH benchmark comparing splitAmnt vs splitAmntFast implementations |
| benchmark/package.mill | Added vecxtensions.jvm module dependency |
| vecxt/src-jvm-native/arrayutil.scala | Consolidated arrayUtil from separate platform implementations |
| vecxt/package.mill | Added jvmNativeSharedSources configuration |
| vecxt/src-native/array.scala | Removed arrayUtil (moved to shared src-jvm-native) |
| vecxt/jvm-native/src/* | Deleted old unused files (doublematrix.scala, dimCheck.scala, array.scala) |
| numLayers: Int, | ||
| species: VectorSpecies[java.lang.Double] | ||
| ): Unit = | ||
| val vectorSize = species.length() |
There was a problem hiding this comment.
Unused variable: vectorSize is computed on line 226 but never used in this method. Either remove the variable or check if there's missing SIMD optimization that should be applied here.
| val vectorSize = species.length() |
| numLayers: Int, | ||
| species: VectorSpecies[java.lang.Double] | ||
| ): Unit = | ||
| val vectorSize = species.length() |
There was a problem hiding this comment.
Unused variable: vectorSize is computed on line 443 but never used in this method. Remove this computation to reduce overhead.
| val vectorSize = species.length() |
|
|
||
| // SIMD species for vectorization | ||
| val species = DoubleVector.SPECIES_PREFERRED | ||
| val vectorSize = species.length() |
There was a problem hiding this comment.
Unused variable: vectorSize is computed on line 198 but never used in splitAmntFast. Remove this computation.
| val vectorSize = species.length() |
| val mask = vector.compare(VectorOperators.GT, retVector) | ||
| val result = vector.min(limVector).blend(zeroVector, mask.not()) |
There was a problem hiding this comment.
Potential SIMD inefficiency: The mask logic on line 338-339 uses blend with mask.not(). This creates an inverted mask and then blends. Consider restructuring to avoid the mask inversion by using: vector.min(limVector).blend(vector, mask) instead of vector.min(limVector).blend(zeroVector, mask.not()), which would blend the original vector when the condition is false, avoiding the not() operation.
| val mask = vector.compare(VectorOperators.GT, retVector) | ||
| val result = vector.blend(zeroVector, mask.not()) |
There was a problem hiding this comment.
Potential SIMD inefficiency: The mask logic on line 357-358 uses blend with mask.not(). Consider restructuring to avoid the mask inversion by using: zeroVector.blend(vector, mask) instead of vector.blend(zeroVector, mask.not()), which would be semantically clearer and avoid the not() operation.
|
|
||
| // SIMD species for vectorization | ||
| val species = DoubleVector.SPECIES_PREFERRED | ||
| val vectorSize = species.length() |
There was a problem hiding this comment.
Potential optimization: The species and vectorSize are computed before the early return check. Consider moving these computations after the isEmpty check on line 185 to avoid unnecessary work when losses is empty.
| val vectorSize = species.length() |
| def jsNativeSharedSources = Task.Sources{ | ||
| (os.sub / "src-js-native") | ||
| } | ||
|
|
||
| def jvmNativeSharedSources = Task.Sources{ | ||
| (os.sub / "src-jvm-native") | ||
| } |
There was a problem hiding this comment.
Build configuration references non-existent directories: The package.mill file references src-js-native and src-jvm-native directories on lines 25 and 29, but these directories don't exist in the vecxtensions module. This will cause build failures. Either create these directories or remove the references from the build configuration.
| Layer( | ||
| occLimit = Some(25.0), | ||
| occRetention = Some(10.0), | ||
| aggLimit = Some(50), |
There was a problem hiding this comment.
Type mismatch: aggLimit is declared as Option[Double] in Layer, but Some(50) is being passed, which is an Int. This should be Some(50.0) to maintain type consistency with Double.
| aggLimit = Some(50), | |
| aggLimit = Some(50.0), |
… operations (#59) * Initial plan * Optimize group operations: eliminate array copies, add in-place variants - Eliminate unnecessary array copies in applyGroupCumSumFast and applyGroupDiffFast - Add groupCumSumInPlace and groupDiffInPlace for 34-53% performance gains - Optimize group lookups by caching values to avoid redundant array access - Add isolated benchmark for group operations performance testing Performance improvements (10k elements, 100 groups): - groupCumSum: 34% faster with in-place variant (11.6μs → 7.6μs) - groupDiff: 53% faster with in-place variant (10.3μs → 4.9μs) Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com> * Remove ineffective group lookup optimization from code review feedback Code review correctly identified that caching group values in the while condition doesn't reduce array accesses since the array is still accessed on every iteration. Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Quafadas <24899792+Quafadas@users.noreply.github.com>
|
@copilot can you write up a very detailed algorithmic write up which could be used to re implement this. Include three sections:
|
@copilot please review the “fast” implementation.
Run a benchmark for it in isolation. Look for any performance wins. Ensure method dispatch is inlined where possible.
is there anywhere we are leaving performance on the table? This is kind of a hot path so Micro optimisation is acceptable