Skip to content

perf: QR sampling and finder detection optimizations#18

Merged
PrzemyslawKlys merged 48 commits intomasterfrom
perf/qr-optimizations-20260123
Jan 25, 2026
Merged

perf: QR sampling and finder detection optimizations#18
PrzemyslawKlys merged 48 commits intomasterfrom
perf/qr-optimizations-20260123

Conversation

@PrzemyslawKlys
Copy link
Member

Summary\n- optimize QR sampling hot paths (bilinear/nearest, early exits, row-incremental transforms)\n- speed up finder-pattern scanning and cross-check loops\n- refresh benchmark reporting/data outputs (including Δ and latest compare run)\n\n## Testing\n- Build\Run-Benchmarks-Compare.ps1\n- Build\Generate-BenchmarkReport.ps1 -ArtifactsPath Build\BenchmarkResults\windows-20260123-172458 -RunMode quick -Framework net8.0 -Configuration Release -FailOnMissingCompare\n- dotnet test CodeGlyphX.Tests\CodeGlyphX.Tests.csproj -c Release

Copilot AI review requested due to automatic review settings January 23, 2026 16:29
@claude
Copy link

claude bot commented Jan 23, 2026

Claude finished @PrzemyslawKlys's task —— View job


PR Review: QR Sampling and Finder Detection Optimizations

I've completed a comprehensive review of this performance optimization PR. This is an impressive optimization effort with significant technical depth, but there are several important issues that should be addressed.

Review Tasks

  • Analyze core QR sampling optimizations
  • Review finder pattern detection improvements
  • Examine rendering optimizations and memory management
  • Investigate SonarQube security hotspot and code duplication
  • Assess test coverage and benchmark changes
  • Review API changes and backward compatibility
  • Provide overall assessment and recommendations

Code Quality Assessment

Strengths

1. Performance Optimizations are Well-Implemented

  • QR Sampling (QrPixelSampling.cs): Excellent hot-path optimizations with early exits, row-incremental transform calculations, and aggressive inlining. The bilinear/nearest neighbor sampling modes with MethodImpl(MethodImplOptions.AggressiveInlining) are well-designed.
  • Finder Pattern Detection: Smart algorithmic improvements in QrFinderPatternDetector.cs with direct array access, adaptive row stepping based on image size, and early termination conditions.
  • Memory Management: Good use of stackalloc for small arrays, ArrayPool<T> for larger buffers, and elimination of intermediate allocations in rendering pipelines.

2. Code Structure & Patterns

  • Proper use of Span<T> and modern .NET performance patterns
  • Good separation of concerns with specialized sampling modes
  • Consistent error handling and bounds checking

⚠️ Critical Issues

1. Code Duplication (High Priority)

  • Luma calculation duplication: The same luma tables (299, 587, 114 factors) are implemented in both LumaTables.cs and QrGrayImage.cs lines 366, 395, 444, 489
  • Impact: Memory waste, maintenance burden, potential inconsistency
  • Recommendation: Consolidate into LumaTables.Luma() method and use throughout codebase

2. Encoding Issues in PowerShell Script

  • File: Build/Generate-BenchmarkReport.ps1:158-164
  • Issue: Character replacement patterns show UTF-8 encoding corruption (µs, �s)
  • Impact: Benchmark reports may display garbled text
  • Recommendation: Fix file encoding and replace with proper Unicode microsecond symbol (μs)

3. Hash Function Concerns

  • File: CodeGlyphX/Internal/ByteArrayComparer.cs:16-28
  • Issue: FNV-1a hash can have poor distribution for structured QR code data patterns
  • Security: Not a security issue (non-cryptographic use case), but potential performance impact
  • Recommendation: Consider using .NET's HashCode.Combine() for better distribution

📊 Performance Considerations

Positive Optimizations:

  • Early exit patterns in sampling (lines 110-111, 119-120 in QrPixelSampling.cs)
  • Pre-calculated transforms and row-incremental updates
  • Direct array access patterns in finder detection
  • Lookup tables for luma calculations

Potential Issues:

  • Large stackalloc usage (could cause stack overflow on deep recursion)
  • FNV-1a hash collisions could impact HashSet performance for large decode operations

🔒 Security Review

SonarQube Security Hotspot Analysis:
The security hotspot is likely the byte array hash function in ByteArrayComparer. This is not a security vulnerability - it's a non-cryptographic hash used for deduplication, which is appropriate. However, the hash quality could be improved for performance.

📈 Test Coverage & Benchmarks

Good Additions:

  • Comprehensive benchmark suite with realistic test cases (clean, noisy, screenshot scenarios)
  • New stress test comparisons in QrDecodeStressCompareBenchmarks.cs
  • External sample testing infrastructure

Missing Test Cases:

  • No specific tests for edge cases in the new sampling optimizations
  • Limited coverage of error conditions in optimized paths

🔄 API Compatibility

Public API Changes:

  • New methods in BitMatrix: Invert(), CopyFrom(), Words accessor
  • New properties in QrEasyOptions
  • Enhanced QrPixelDecodeOptions

Assessment: Changes appear backward compatible with existing public APIs.

Recommendations

High Priority

  1. Fix code duplication: Consolidate luma calculation into single implementation
  2. Fix PowerShell encoding: Correct UTF-8 encoding issues in benchmark script
  3. Improve hash function: Replace FNV-1a with HashCode.Combine() for better distribution

Medium Priority

  1. Add edge case tests: Test boundary conditions in optimized sampling paths
  2. Document performance characteristics: Add XML docs explaining when to use different sampling modes
  3. Consider stack usage: Review stackalloc patterns for deeply nested scenarios

Low Priority

  1. JSON formatting: Standardize benchmark JSON output formatting
  2. Memory diagnostics: Add more granular memory allocation tracking

Overall Assessment

This is a high-quality performance optimization PR that demonstrates deep understanding of the QR decoding algorithms and .NET performance patterns. The optimizations are well-implemented and should provide significant performance improvements.

Recommendation: Approve with requested changes - address the code duplication and encoding issues before merging.

Performance Impact: Expected significant improvements in QR decode performance, especially for large images and complex scenarios.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements significant performance optimizations for QR code sampling, finder pattern detection, and various rendering operations. The changes focus on reducing allocations, improving hot-path performance, and optimizing cross-check loops in the decoder.

Changes:

  • Optimized rendering hot paths by eliminating intermediate buffer allocations across XPM, XBM, PPM, PGM, PBM, and PAM writers
  • Introduced LumaTables for lookup-table-based luma calculations
  • Optimized QR sampling with row-incremental transform calculations and mode-based early exits
  • Improved finder pattern detection and cross-check performance with direct array access
  • Reduced allocations in QR encoder by reusing BitMatrix instances
  • Added ByteArrayComparer for efficient deduplication
  • Optimized tile scanning to use direct slicing instead of buffer copying

Reviewed changes

Copilot reviewed 63 out of 63 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
CodeGlyphX/Rendering/LumaTables.cs New lookup table class for optimized luma calculations
CodeGlyphX/Rendering/Xpm/XpmWriter.cs Added scanline buffer support to eliminate intermediate allocations
CodeGlyphX/Rendering/Png/PngWriter.cs Optimized PNG encoding with pre-calculated chunk types and improved Adler/CRC
CodeGlyphX/Rendering/Png/PngRenderHelpers.cs New helper class for shared rendering optimizations
CodeGlyphX/Qr/QrPixelDecoder.Part5.cs Optimized sampling with row-incremental transforms and mode switching
CodeGlyphX/Qr/QrFinderPatternDetector.cs Improved finder detection with direct array access
CodeGlyphX/Qr/QrGrayImage.cs Added luma table implementation (duplicates LumaTables)
CodeGlyphX/BitMatrix.cs Added Invert(), CopyFrom(), and Words accessor
CodeGlyphX/Qr/QrEncoder.cs Reduced allocations by reusing BitMatrix instances
CodeGlyphX/Internal/ByteArrayComparer.cs New comparer for byte array deduplication
Build/Generate-BenchmarkReport.ps1 Updated benchmark report generation with encoding issues

Comment on lines 158 to 161
$normalized = $normalized -replace "µs", "μs"
$normalized = $normalized -replace "�s", "μs"
$normalized = $normalized -replace "µs", "μs"
$normalized = $normalized -replace "Âμs", "μs"
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The character replacements in the PowerShell script appear to have encoding issues. The replacements are converting to multi-byte UTF-8 sequences (µ, μ, etc.) instead of proper single-character replacements. This suggests the file may have been saved with incorrect encoding. The original intent was likely to normalize various microsecond symbol representations (µs, μs) to a consistent format. Verify the file encoding is UTF-8 with BOM or UTF-8, and ensure the replacement strings are correct single-character representations of the microsecond symbol.

Suggested change
$normalized = $normalized -replace "µs", "μs"
$normalized = $normalized -replace "�s", "μs"
$normalized = $normalized -replace "µs", "μs"
$normalized = $normalized -replace "Âμs", "μs"
$normalized = $normalized -replace "µs", "µs"
$normalized = $normalized -replace "�s", "µs"
$normalized = $normalized -replace "µs", "µs"
$normalized = $normalized -replace "Âμs", "µs"

Copilot uses AI. Check for mistakes.
Comment on lines 2 to 65
"schemaVersion": 1,
"entries": [
{
"os": "windows",
"runMode": "full",
"generatedUtc": "2026-01-22T15:35:36.4149651Z",
"publish": true,
"framework": "net8.0",
"configuration": "Release",
"artifacts": "C:\\Support\\GitHub\\CodeMatrix\\Build\\BenchmarkResults\\windows-20260122-155044",
"meta": {
"commit": null,
"branch": null,
"dotnetSdk": "10.0.102",
"runtime": null,
"osDescription": "Microsoft Windows 10.0.26200",
"osArchitecture": null,
"processArchitecture": null,
"machineName": "EVOMONSTER",
"processorCount": 32
}
},
{
"os": "linux",
"runMode": "quick",
"generatedUtc": "2026-01-22T23:04:32.829577+01:00",
"publish": false,
"framework": "net8.0",
"configuration": "Release",
"artifacts": "/mnt/c/Support/GitHub/CodeMatrix/Build/BenchmarkResults/linux-20260122-230352",
"meta": {
"commit": null,
"branch": null,
"dotnetSdk": null,
"runtime": null,
"osDescription": "Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39",
"osArchitecture": "x86_64",
"processArchitecture": "x86_64",
"machineName": "EVOMONSTER",
"processorCount": 32
}
},
{
"os": "windows",
"runMode": "quick",
"generatedUtc": "2026-01-23T16:27:37.1081965Z",
"publish": false,
"framework": "net8.0",
"configuration": "Release",
"artifacts": "Build\\BenchmarkResults\\windows-20260123-172458",
"meta": {
"commit": null,
"branch": null,
"dotnetSdk": "10.0.102",
"runtime": ".NET Framework 4.8.9221.0",
"osDescription": "Microsoft Windows 10.0.26200 ",
"osArchitecture": "X64",
"processArchitecture": "X64",
"machineName": "EVOMONSTER",
"processorCount": 32
},
"runModeSource": "explicit"
}
]
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON formatting has changed from compact to a verbose format with excessive indentation and spaces. This significantly increases file size without providing any functional benefit. The schemaVersion and entries keys now have 4 spaces after the colon, and nested objects have inconsistent indentation patterns. Consider using a standard JSON formatter with consistent, compact formatting (e.g., 2-space indentation without trailing spaces) to reduce file size and improve readability.

Suggested change
"schemaVersion": 1,
"entries": [
{
"os": "windows",
"runMode": "full",
"generatedUtc": "2026-01-22T15:35:36.4149651Z",
"publish": true,
"framework": "net8.0",
"configuration": "Release",
"artifacts": "C:\\Support\\GitHub\\CodeMatrix\\Build\\BenchmarkResults\\windows-20260122-155044",
"meta": {
"commit": null,
"branch": null,
"dotnetSdk": "10.0.102",
"runtime": null,
"osDescription": "Microsoft Windows 10.0.26200",
"osArchitecture": null,
"processArchitecture": null,
"machineName": "EVOMONSTER",
"processorCount": 32
}
},
{
"os": "linux",
"runMode": "quick",
"generatedUtc": "2026-01-22T23:04:32.829577+01:00",
"publish": false,
"framework": "net8.0",
"configuration": "Release",
"artifacts": "/mnt/c/Support/GitHub/CodeMatrix/Build/BenchmarkResults/linux-20260122-230352",
"meta": {
"commit": null,
"branch": null,
"dotnetSdk": null,
"runtime": null,
"osDescription": "Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39",
"osArchitecture": "x86_64",
"processArchitecture": "x86_64",
"machineName": "EVOMONSTER",
"processorCount": 32
}
},
{
"os": "windows",
"runMode": "quick",
"generatedUtc": "2026-01-23T16:27:37.1081965Z",
"publish": false,
"framework": "net8.0",
"configuration": "Release",
"artifacts": "Build\\BenchmarkResults\\windows-20260123-172458",
"meta": {
"commit": null,
"branch": null,
"dotnetSdk": "10.0.102",
"runtime": ".NET Framework 4.8.9221.0",
"osDescription": "Microsoft Windows 10.0.26200 ",
"osArchitecture": "X64",
"processArchitecture": "X64",
"machineName": "EVOMONSTER",
"processorCount": 32
},
"runModeSource": "explicit"
}
]
"schemaVersion": 1,
"entries": [
{
"os": "windows",
"runMode": "full",
"generatedUtc": "2026-01-22T15:35:36.4149651Z",
"publish": true,
"framework": "net8.0",
"configuration": "Release",
"artifacts": "C:\\Support\\GitHub\\CodeMatrix\\Build\\BenchmarkResults\\windows-20260122-155044",
"meta": {
"commit": null,
"branch": null,
"dotnetSdk": "10.0.102",
"runtime": null,
"osDescription": "Microsoft Windows 10.0.26200",
"osArchitecture": null,
"processArchitecture": null,
"machineName": "EVOMONSTER",
"processorCount": 32
}
},
{
"os": "linux",
"runMode": "quick",
"generatedUtc": "2026-01-22T23:04:32.829577+01:00",
"publish": false,
"framework": "net8.0",
"configuration": "Release",
"artifacts": "/mnt/c/Support/GitHub/CodeMatrix/Build/BenchmarkResults/linux-20260122-230352",
"meta": {
"commit": null,
"branch": null,
"dotnetSdk": null,
"runtime": null,
"osDescription": "Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39",
"osArchitecture": "x86_64",
"processArchitecture": "x86_64",
"machineName": "EVOMONSTER",
"processorCount": 32
}
},
{
"os": "windows",
"runMode": "quick",
"generatedUtc": "2026-01-23T16:27:37.1081965Z",
"publish": false,
"framework": "net8.0",
"configuration": "Release",
"artifacts": "Build\\BenchmarkResults\\windows-20260123-172458",
"meta": {
"commit": null,
"branch": null,
"dotnetSdk": "10.0.102",
"runtime": ".NET Framework 4.8.9221.0",
"osDescription": "Microsoft Windows 10.0.26200 ",
"osArchitecture": "X64",
"processArchitecture": "X64",
"machineName": "EVOMONSTER",
"processorCount": 32
},
"runModeSource": "explicit"
}
]

Copilot uses AI. Check for mistakes.
Comment on lines 8 to 10
private static readonly int[] LumaR = BuildLumaTable(299);
private static readonly int[] LumaG = BuildLumaTable(587);
private static readonly int[] LumaB = BuildLumaTable(114);
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate luma table implementation detected. The same luma lookup tables (LumaR, LumaG, LumaB with factors 299, 587, 114) are now defined in both CodeGlyphX/Rendering/LumaTables.cs and CodeGlyphX/Qr/QrGrayImage.cs. This creates unnecessary code duplication and wastes memory by storing identical tables twice. Consider consolidating these into a single shared implementation, perhaps by having QrGrayImage use the LumaTables class from the CodeGlyphX.Rendering namespace.

Copilot uses AI. Check for mistakes.
Comment on lines +119 to +127
Array.Clear(integral, 0, stride);

try {
for (var y = 1; y <= h; y++) {
var rowSum = 0;
var row = (y - 1) * w;
var baseIdx = y * stride;
var prevIdx = (y - 1) * stride;
integral[baseIdx] = 0;
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization to Array.Clear(integral, 0, stride) instead of clearing the entire array is good, but the subsequent integral[baseIdx] = 0; assignment inside the loop (line 127) is redundant since the first element (index 0) was already cleared. This creates unnecessary writes in every iteration. Consider removing the explicit zero assignment or documenting why it's needed if there's a specific reason.

Copilot uses AI. Check for mistakes.
Comment on lines 18 to 25
unchecked {
uint hash = 2166136261;
for (var i = 0; i < obj.Length; i++) {
hash ^= obj[i];
hash *= 16777619;
}
return (int)hash;
}
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetHashCode implementation for byte arrays uses FNV-1a hash, which is a simple non-cryptographic hash. While this is appropriate for HashSet usage, be aware that this hash function can have poor distribution for certain data patterns. For QR code byte data which may have structured patterns, consider profiling hash collisions if performance issues arise with large numbers of decoded codes. The implementation is correct but could be a performance bottleneck with many similar codes.

Suggested change
unchecked {
uint hash = 2166136261;
for (var i = 0; i < obj.Length; i++) {
hash ^= obj[i];
hash *= 16777619;
}
return (int)hash;
}
var hash = new HashCode();
foreach (var b in obj) {
hash.Add(b);
}
return hash.ToHashCode();

Copilot uses AI. Check for mistakes.
- Create benchmark.html template with honest, transparent presentation
- Add benchmark.js to load and render JSON benchmark data
- Add CSS styles for benchmark tables, cards, and mode selector
- Update navigation to include Benchmarks link
- Update Generate-StaticPages.ps1 to generate /benchmarks/ page

The page:
- Explains Quick vs Full benchmark modes clearly
- Shows limitations and methodology transparently
- Handles missing data gracefully (e.g., Full mode not yet run)
- Loads data from existing benchmark JSON files at runtime
* Introduced a new section for visual comparison of CodeGlyphX against the fastest competitor.
* Added loading text and chart container in `benchmark.html`.
* Implemented chart rendering logic in `benchmark.js` to display execution times.
* Styled new chart elements in `app.css` for better visual representation.
* Updated benchmark timestamps and artifact paths.
* Revised performance metrics for various encoding scenarios.
* Improved clarity in benchmark comparisons for better analysis.
* No code changes were made in this commit.
* This serves as a placeholder for future updates.
…culation

* Introduced caching for computed divisors to improve performance.
* Refactored `ComputeRemainder` to use a more efficient approach with spans.
* Enhanced readability and maintainability of the code.

feat(QrPngRenderer): 🎨 Improve PNG rendering efficiency

* Replaced for-loops with while-loops for better performance in `QrPngRenderer`.
* Implemented run-length encoding for dark modules to reduce processing time.
* Enhanced pixel filling logic to minimize redundant operations.
* Refactored multiple code segments to enhance readability.
* Improved organization of files and directories.
* Removed redundant code snippets to streamline functionality.
* Added a script to manage theme settings based on local storage.
* Introduced CSS variables for dark and light themes.
* Updated navigation links for brevity and added a new "Benchmarks" section.
* Improved loading of CSS for better performance.

feat(qr): 🚀 Optimize QR processing with aggressive inlining

* Added `[MethodImpl(MethodImplOptions.AggressiveInlining)]` to critical methods in `QrGrayImage`, `QrMath`, and `QrPerspectiveTransform` for performance improvements.
* Refactored pixel decoding logic in `QrPixelDecoder` and `QrPixelSampling` to enhance efficiency and readability.
* Updated checks for finite values in pixel decoding to improve robustness.

docs: 📝 Update README with target framework feature matrix

* Added a feature matrix for `netstandard2.0`, `net472`, `net8.0`, and `net10.0`.
* Clarified feature availability across different frameworks.
* Refactored the pixel decoding process to enhance clarity and efficiency.
* Simplified the handling of threshold maps and inversion logic.
* Optimized stack management during area calculations to prevent overflow.
* Improved boundary checks for pixel sampling to ensure robustness.
* Enhanced readability by reducing nested loops and redundant calculations.
…nd sampling logic

* Consolidated budget expiration checks across multiple methods for improved readability and performance.
* Enhanced sampling logic in `QrPixelSampling` to reduce redundant calculations and improve clarity.
* Introduced new variables to streamline calculations and minimize repeated expressions.
* Improved handling of threshold checks to ensure consistent behavior across different sampling scenarios.
* Refactored checks for finite values to consolidate conditions.
* Enhanced handling of small denominators to prevent division errors.
* Optimized budget checks by reducing redundant calls within loops.
* Introduced early exit for budget expiration to improve performance.
* Utilized stackalloc for pixel coordinate storage to reduce heap allocations.
* Changed the failure reason from `DecodeFailureReason.Error` to `DecodeFailureReason.PlatformNotSupported` in `QrImageDecoder` methods.
* Updated README to clarify that `QrImageDecoder.TryDecodeImage(...)` and `QrDecoder.TryDecode(...)` are only available in .NET 8 and later.
* Added information about runtime checks via `CodeGlyphXFeatures` for better diagnostics.
* Set up the project structure.
* Prepare for future code changes.
… clarity

* Updated feature availability descriptions in documentation.
* Added detailed notes on benchmark scenarios and results.
* Introduced new options for QR rendering, including module shape and logo customization.
* Enhanced README with target framework guidance and benchmark focus.
* Improved XML documentation for QR builder methods.
…mance

* Introduced new benchmarks for resampled QR images and images without quiet zones.
* Added methods to build resampled and no-quiet QR samples.
* Enhanced the QR decoding process by utilizing `ArrayPool` for memory management.
* Updated documentation to clarify benchmark run differences between quick and full runs.
* Improved error correction logic in the Reed-Solomon decoder for better performance.
* Introduced `Download-ImageSamples.ps1` for fetching image samples.
* Created `manifest.json` to define sample entries with metadata.
* Added `README.md` for instructions on using the download script.
* Implemented `ImageDecodeSamplesTests` to validate image decoding functionality.
…allation

* Introduced a new job step to download image samples on Linux runners.
* Updated Playwright installation to include the `serve` package.
* Enhanced `.gitignore` to exclude various image formats used in tests.
* Improved image decoding tests for better error handling and validation.
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
18.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@PrzemyslawKlys PrzemyslawKlys merged commit a8c9da9 into master Jan 25, 2026
6 of 7 checks passed
@PrzemyslawKlys PrzemyslawKlys deleted the perf/qr-optimizations-20260123 branch January 25, 2026 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants