Describe the enhancement requested
The lpad_utf8_int32_utf8 and rpad_utf8_int32_utf8 functions have a memory safety issue and performance inefficiency.
Memory safety issue:
When the fill string is longer than the padding space needed, the initial memcpy writes more bytes than allocated, causing a buffer overflow.
Performance issues:
-
Single-byte fill: Iterates character-by-character even for single-byte fills like space padding, when a single memset call would suffice.
-
Multi-byte fill: Copies the fill pattern character-by-character in O(n) iterations instead of using a doubling strategy with O(log n) memcpy calls.
Proposed fixes:
- Use
std::min(fill_text_len, total_fill_bytes) for the initial copy to prevent overflow
- Add single-byte fill fast path using
memset
- Replace character-by-character loop with doubling strategy for multi-byte fills
Component(s)
C++, Gandiva