Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enhancements
Implemented ultra-fast WER-only path with space-optimized 2-row dynamic programming algorithm and batch buffer reuse. Added four new functions (
calculations_wer_only(),_calculations_wer_only_reuse_ptr(),_metrics_batch_wer_only(),metrics_wer_only()) that eliminate backtrace overhead and use O(n) memory instead of O(m×n). This optimization uses pointer swapping instead of value copying and reuses DP buffers across entire batches, providing significant performance gains forwer()andwers()functions that only need the WER metric without error counts or word lists.Fixed portability issue in WER-only batch processing by replacing platform-dependent
int*pointers with guaranteed 32-bitcnp.int32_t*pointers. This ensures correct behavior on all platforms wheresizeof(int)may differ from 4 bytes, while also removing unnecessary type casts for cleaner code that follows NumPy/Cython best practices.Expanded benchmarking support by adding optional third-party WER libraries (
pywer,evaluate,universal-edit-distance,torchmetrics) topyproject.tomlunder thebenchmarksextra. Updatedbenchmark_synthetic_data_local.pyto safely import optional dependencies, ensure all benchmark functions are always defined, and enforce consistent numeric return types. This fixes static analysis warnings, prevents runtime errors when optional packages are missing, and enables more comprehensive and reliable cross-package performance comparisons.Standardized all Levenshtein dynamic programming buffers and memoryviews to use cnp.int32_t instead of platform-dependent int. This ensures strict dtype alignment with NumPy int32 arrays, removes undefined behavior on platforms where sizeof(int) != 4, and improves type safety without impacting performance.