-
-
Notifications
You must be signed in to change notification settings - Fork 978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement language-aware word counting #10284
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you done any benchmarks to see how much is this slower? This code is executed on each source string change. Maybe we want to use this for a few affected languages only?
Can you please add tests for the East Asian languages, so that we can verify it works as expected? I've just added test for the current implementation in f790db2.
@@ -40,6 +40,7 @@ pyparsing>=3.1.1,<3.2 | |||
python-dateutil>=2.8.1 | |||
python-redis-lock[django]>=4,<4.1 | |||
rapidfuzz>=2.6.0,<3.5 | |||
regex>=1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such version, please choose something reasonably recent as lower bound. Adding upper bound is also a good idea to avoid accidental breakage.
@nijel Hi, as I'm not familiar enough with Python to benchmark a specific method in Django, I made a small script to test each core logic. Also thanks for your suggestion in #10278 (comment). import regex # re does not support script extensions
import unicodedataplus # unicodedata does not support script extensions
import pyperf
en = """Cortana was demonstrated for the first time at the Microsoft Build developer conference in San Francisco in April 2014. It was launched as a key ingredient of Microsoft's planned "makeover" of future operating systems for Windows Phone and Windows."""
zh = """小娜在2014年4月2日举行的微软Build开发者大会上正式展示并发布。2014年中旬,微软发布了“小娜”这一名字,作为Cortana在中国大陆使用的中文名。与这一中文名一起发布的是小娜在中国大陆的另一个形象。“小娜”一名源自微软旗下知名FPS游戏《光环》中的同名女角色。"""
monogram = r"[\p{scx=Hani}\p{scx=Hang}\p{scx=Hira}\p{scx=Kana}\p{scx=Bopo}]"
splitter = regex.compile(
rf"(?<!^)(?:\s+|(?<={monogram})(?=\S)|(?={monogram}))(?!$)", flags=regex.U | regex.V1
)
monolist = set(['Hani', 'Hang', 'Hira', 'Kana', 'Bopo'])
def simple_split(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
len(txt.split())
return pyperf.perf_counter() - t0
def regex_split(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
len(splitter.split(txt))
return pyperf.perf_counter() - t0
def loop_split(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
count = 0
was_asian = False
was_space = True
for ch in txt:
scx = unicodedataplus.script_extensions(ch)
asian = not monolist.isdisjoint(scx)
space = ch.isspace()
if asian or ((was_asian or was_space) and not space):
count += 1
was_asian = asian
was_space = space
# return count
return pyperf.perf_counter() - t0
if __name__ == "__main__":
runner = pyperf.Runner()
runner.bench_time_func('simple split en', simple_split, en)
runner.bench_time_func('simple split zh (wrong)', simple_split, zh)
runner.bench_time_func('regex split en', regex_split, en)
runner.bench_time_func('regex split zh', regex_split, zh)
runner.bench_time_func('loop split en', loop_split, en)
runner.bench_time_func('loop split zh', loop_split, zh) Result on my machine (WSL2):
It seems that script extension lookup is quite heavyweight anyways, and I wonder if hardcoding code point range would improve anything. |
Thanks for the benchmark! I've added the code from SO as well to the test (see below) and here are my results:
So the unicodedata only solution seems the fastest out of these, but still is 30x slower than simple split. We can still fallback to the current implementation for most of the languages if that is an issue. benchmark.pyimport regex # re does not support script extensions
import unicodedataplus # unicodedata does not support script extensions
import pyperf
import unicodedata
en = """Cortana was demonstrated for the first time at the Microsoft Build developer conference in San Francisco in April 2014. It was launched as a key ingredient of Microsoft's planned "makeover" of future operating systems for Windows Phone and Windows."""
zh = """小娜在2014年4月2日举行的微软Build开发者大会上正式展示并发布。2014年中旬,微软发布了“小娜”这一名字,作为Cortana在中国大陆使用的中文名。与这一中文名一起发布的是小娜在中国大陆的另一个形象。“小娜”一名源自微软旗下知名FPS游戏《光环》中的同名女角色。"""
monogram = r"[\p{scx=Hani}\p{scx=Hang}\p{scx=Hira}\p{scx=Kana}\p{scx=Bopo}]"
splitter = regex.compile(
rf"(?<!^)(?:\s+|(?<={monogram})(?=\S)|(?={monogram}))(?!$)", flags=regex.U | regex.V1
)
monolist = set(['Hani', 'Hang', 'Hira', 'Kana', 'Bopo'])
def simple_split(txt):
return len(txt.split())
def regex_split(txt):
return len(splitter.split(txt))
def loop_split(txt):
count = 0
was_asian = False
was_space = True
for ch in txt:
scx = unicodedataplus.script_extensions(ch)
asian = not monolist.isdisjoint(scx)
space = ch.isspace()
if asian or ((was_asian or was_space) and not space):
count += 1
was_asian = asian
was_space = space
return count
def loop_native(txt):
wordcount = 0
start = True
for c in txt:
cat = unicodedata.category(c)
if cat == 'Lo': # Letter, other
wordcount += 1 # each letter counted as a word
start = True
elif cat[0] == 'Z': # Some kind of separator
start = True
elif cat[0] != 'P': # Some kind of punctuation
# Everything else
if start:
wordcount += 1 # Only count at the start
start = False
return wordcount
def wrapper(loops, callback ,txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
callback(txt)
return pyperf.perf_counter() - t0
if __name__ == "__main__":
runner = pyperf.Runner()
for func in (simple_split, regex_split, loop_split, loop_native):
name = func.__name__
runner.bench_time_func(f'{name} en', wrapper, func, en)
runner.bench_time_func(f'{name} zh', wrapper, func, zh) |
I also tried two versions:
The two strategies (block and scx) differ in details but might be fine for it does not change count of modifier characters between two CJK characters. test_split.pyimport unicodedataplus # unicodedata does not support script extensions neither blocks
import pyperf
en = """Cortana was demonstrated for the first time at the Microsoft Build developer conference in San Francisco in April 2014. It was launched as a key ingredient of Microsoft's planned "makeover" of future operating systems for Windows Phone and Windows."""
zh = """小娜在2014年4月2日举行的微软Build开发者大会上正式展示并发布。2014年中旬,微软发布了“小娜”这一名字,作为Cortana在中国大陆使用的中文名。与这一中文名一起发布的是小娜在中国大陆的另一个形象。“小娜”一名源自微软旗下知名FPS游戏《光环》中的同名女角色。"""
monolist = set(['Hani', 'Hang', 'Hira', 'Kana', 'Bopo'])
monoset = set([chr(m) for m in range(0x110000) if not monolist.isdisjoint(unicodedataplus.script_extensions(chr(m)))])
cjkset = set([
'CJK Unified Ideographs',
'CJK Unified Ideographs Extension A',
'CJK Unified Ideographs Extension B',
'CJK Unified Ideographs Extension C',
'CJK Unified Ideographs Extension D',
'CJK Unified Ideographs Extension E',
'CJK Unified Ideographs Extension F',
'CJK Unified Ideographs Extension G',
'CJK Unified Ideographs Extension H',
'CJK Unified Ideographs Extension I',
'CJK Compatibility',
'CJK Compatibility Forms',
'CJK Compatibility Ideographs',
'CJK Compatibility Ideographs Supplement',
'CJK Radicals Supplement',
'CJK Strokes',
'CJK Symbols and Punctuation',
'Hiragana',
'Katakana',
'Katakana Phonetic Extensions',
'Kana Extended-A',
'Kana Extended-B',
'Kana Supplement',
'Small Kana Extension',
'Hangul Jamo',
'Hangul Compatibility Jamo',
'Hangul Jamo Extended-A',
'Hangul Jamo Extended-B',
'Hangul Syllables',
'Halfwidth and Fullwidth Forms',
'Enclosed CJK Letters and Months',
'Enclosed Ideographic Supplement',
'Kangxi Radicals',
'Ideographic Description Characters',
])
def simple_split(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
len(txt.split())
return pyperf.perf_counter() - t0
def loop_scxset(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
count = 0
was_break = True
for ch in txt:
asian = ch in monoset
space = ch.isspace()
if asian or (was_break and not space):
count += 1
was_break = asian or space
# return count
return pyperf.perf_counter() - t0
def loop_block(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
count = 0
was_break = True
for ch in txt:
asian = unicodedataplus.block(ch) in cjkset
space = ch.isspace()
if asian or (was_break and not space):
count += 1
was_break = asian or space
# return count
return pyperf.perf_counter() - t0
if __name__ == "__main__":
runner = pyperf.Runner()
runner.bench_time_func('simple split en', simple_split, en)
runner.bench_time_func('simple split zh (wrong)', simple_split, zh)
runner.bench_time_func('loop scxset en', loop_scxset, en)
runner.bench_time_func('loop scxset zh', loop_scxset, zh)
runner.bench_time_func('loop block en', loop_block, en)
runner.bench_time_func('loop block zh', loop_block, zh)
Note that the specific logic from SO is not accurate, mostly because we also count each piece of Asian punctuation single word. The category-based approach also unintentionally affects wide range of unrelated Arabic and Indian characters. With the same examples: en = """Cortana was demonstrated for the first time at the Microsoft Build developer conference in San Francisco in April 2014. It was launched as a key ingredient of Microsoft's planned "makeover" of future operating systems for Windows Phone and Windows."""
zh = """小娜在2014年4月2日举行的微软Build开发者大会上正式展示并发布。2014年中旬,微软发布了“小娜”这一名字,作为Cortana在中国大陆使用的中文名。与这一中文名一起发布的是小娜在中国大陆的另一个形象。“小娜”一名源自微软旗下知名FPS游戏《光环》中的同名女角色。"""
|
I doubt that the current word counting is 100% compatible with Libre Office and I don't know if we should aim at that. Honestly, my original expectation was to find Python package implementing this, but apparently there is none existing. If we were that much into performance, implementing it in C or interfacing Rust crate words-count would be the way to go. |
I would be fine for tens of microseconds each time saving a string, unless it runs for all strings every time. So, if it is acceptable to carry around an additional large (>5MB) object in the program, I think the precomputation approach is more efficient, but otherwise matching by block seems better. |
Speaking of memory usage, both regex and unicodedataplus take about 2MB and the pre-computed list about 4MB. In the end, I think the best approach would be to pre-compute the list statically and include it in the code. We're doing that already for other purposes (though with a smaller set).
But anyway, we should first focus on how to actually count the words and then look at the implementation. Let's do this in #10278. |
Closing this PR as this won't be the final solution. |
I found out that lookbehind ( test_split.pyimport regex
import pyperf
en = """Cortana was demonstrated for the first time at the Microsoft Build developer conference in San Francisco in April 2014. It was launched as a key ingredient of Microsoft's planned "makeover" of future operating systems for Windows Phone and Windows."""
zh = """小娜在2014年4月2日举行的微软Build开发者大会上正式展示并发布。2014年中旬,微软发布了“小娜”这一名字,作为Cortana在中国大陆使用的中文名。与这一中文名一起发布的是小娜在中国大陆的另一个形象。“小娜”一名源自微软旗下知名FPS游戏《光环》中的同名女角色。"""
splitter1 = regex.compile(r"([\p{scx=Hani}\p{scx=Hang}\p{scx=Hira}\p{scx=Kana}\p{scx=Bopo}\p{InHalfAndFullForms}]+)")
splitter2 = regex.compile(r"(\s+)|[\p{scx=Hani}\p{scx=Hang}\p{scx=Hira}\p{scx=Kana}\p{scx=Bopo}\p{InHalfAndFullForms}]+")
def simple_split(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
len(txt.split())
return pyperf.perf_counter() - t0
def loop_split_both(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
count = 0
even = True
for sec in splitter1.split(txt):
if even:
count += len(sec)
else:
count += len(sec.split())
even = not even
# return count
return pyperf.perf_counter() - t0
def loop_split_noncjk(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
count = len(txt)
even = True
for sec in splitter2.split(txt):
if even and sec:
count -= len(sec) - 1
else:
count -= len(sec or '')
even = not even
# return count
return pyperf.perf_counter() - t0
if __name__ == "__main__":
runner = pyperf.Runner()
runner.bench_time_func('simple split en', simple_split, en)
runner.bench_time_func('simple split zh (wrong)', simple_split, zh)
runner.bench_time_func('regex partition en', loop_split_both, en)
runner.bench_time_func('regex partition zh', loop_split_both, zh)
runner.bench_time_func('regex & subtract en', loop_split_noncjk, en)
runner.bench_time_func('regex & subtract zh', loop_split_noncjk, zh)
|
Anyway, most of the word count will happen on English strings (as that is what is typically the source language), so the performance here should be the main focus. Counting fast Chinese strings is nice, but not that relevant for Weblate. Performance-wise, using native import re
splitter_re = re.compile(r"([\u2E80-\u2FD5\u3190-\u319f\u3400-\u4DBF\u4E00-\u9FCC\U00020000-\U0002F7FF\uFF00-\uFFEF\u3000-\u303F]+)")
def loop_split_both_re(txt):
# logic here
count = 0
even = True
for sec in splitter_re.split(txt):
count += len(sec.split()) if even else len(sec)
even = not even
return count Do you see any issues with this approach? |
Thank you for the correction and suggestion. It seems translating to test_split.pyimport re
import regex
import pyperf
en = """Cortana was demonstrated for the first time at the Microsoft Build developer conference in San Francisco in April 2014. It was launched as a key ingredient of Microsoft's planned "makeover" of future operating systems for Windows Phone and Windows."""
zh = """小娜在2014年4月2日举行的微软Build开发者大会上正式展示并发布。2014年中旬,微软发布了“小娜”这一名字,作为Cortana在中国大陆使用的中文名。与这一中文名一起发布的是小娜在中国大陆的另一个形象。“小娜”一名源自微软旗下知名FPS游戏《光环》中的同名女角色。"""
splitter1 = regex.compile(r"([\p{scx=Hani}\p{scx=Hang}\p{scx=Hira}\p{scx=Kana}\p{scx=Bopo}\p{InHalfAndFullForms}]+)")
splitter2 = regex.compile(r"(\s+)|[\p{scx=Hani}\p{scx=Hang}\p{scx=Hira}\p{scx=Kana}\p{scx=Bopo}\p{InHalfAndFullForms}]+")
splitter_re = re.compile(r"([\u2E80-\u2FD5\u3190-\u319f\u3400-\u4DBF\u4E00-\u9FCC\U00020000-\U0002F7FF\uFF00-\uFFEF\u3000-\u303F]+)")
cjkcps = [c for c in range(0x110000) if splitter1.fullmatch(chr(c))]
cjkranges = []
prev = None
start = None
for cp in cjkcps:
if start is None:
start = cp
prev = cp
elif prev == cp - 1:
prev = cp
else:
cjkranges.append((start, prev))
start = cp
prev = cp
if start is not None:
cjkranges.append((start, prev))
splitter_nonscx = re.compile(rf"([{''.join(['-'.join([chr(c) for c in tup]) for tup in cjkranges])}]+)")
def simple_split(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
len(txt.split())
return pyperf.perf_counter() - t0
def loop_split_both(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
count = 0
even = True
for sec in splitter1.split(txt):
if even:
count += len(sec.split())
else:
count += len(sec)
even = not even
# return count
return pyperf.perf_counter() - t0
def loop_split_both_re(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
count = 0
even = True
for sec in splitter_re.split(txt):
if even:
count += len(sec.split())
else:
count += len(sec)
even = not even
# return count
return pyperf.perf_counter() - t0
def loop_split_both_nonscx(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
count = 0
even = True
for sec in splitter_nonscx.split(txt):
if even:
count += len(sec.split())
else:
count += len(sec)
even = not even
# return count
return pyperf.perf_counter() - t0
if __name__ == "__main__":
runner = pyperf.Runner()
runner.bench_time_func('simple split en', simple_split, en)
runner.bench_time_func('simple split zh (wrong)', simple_split, zh)
runner.bench_time_func('regex partition en', loop_split_both, en)
runner.bench_time_func('regex partition zh', loop_split_both, zh)
runner.bench_time_func('re given pat en', loop_split_both_re, en)
runner.bench_time_func('re given pat zh', loop_split_both_re, zh)
runner.bench_time_func('re equivalent en', loop_split_both_nonscx, en)
runner.bench_time_func('re equivalent zh', loop_split_both_nonscx, zh)
where my pattern should translate roughly to: r"([\U000002EA-\U000002EB\U00001100-\U000011FF\U00002E80-\U00002E99\U00002E9B-\U00002EF3\U00002F00-\U00002FD5\U00003001-\U00003003\U00003005-\U00003011\U00003013-\U0000301F\U00003021-\U00003035\U00003037-\U0000303F\U00003041-\U00003096\U00003099-\U000030FF\U00003105-\U0000312F\U00003131-\U0000318E\U00003190-\U000031E3\U000031F0-\U0000321E\U00003220-\U00003247\U00003260-\U0000327E\U00003280-\U000032B0\U000032C0-\U000032CB\U000032D0-\U00003370\U0000337B-\U0000337F\U000033E0-\U000033FE\U00003400-\U00004DBF\U00004E00-\U00009FFF\U0000A700-\U0000A707\U0000A960-\U0000A97C\U0000AC00-\U0000D7A3\U0000D7B0-\U0000D7C6\U0000D7CB-\U0000D7FB\U0000F900-\U0000FA6D\U0000FA70-\U0000FAD9\U0000FE45-\U0000FE46\U0000FF00-\U0000FFEF\U00016FE2-\U00016FE3\U00016FF0-\U00016FF1\U0001AFF0-\U0001AFF3\U0001AFF5-\U0001AFFB\U0001AFFD-\U0001AFFE\U0001B000-\U0001B122\U0001B132-\U0001B132\U0001B150-\U0001B152\U0001B155-\U0001B155\U0001B164-\U0001B167\U0001D360-\U0001D371\U0001F200-\U0001F200\U0001F250-\U0001F251\U00020000-\U0002A6DF\U0002A700-\U0002B739\U0002B740-\U0002B81D\U0002B820-\U0002CEA1\U0002CEB0-\U0002EBE0\U0002EBF0-\U0002EE5D\U0002F800-\U0002FA1D\U00030000-\U0003134A\U00031350-\U000323AF]+)" If it is a sign that fragmented code point range slows down the regexp, we can probably consider |
Okay... so reducing range complexity does mean a lot. With manually collapsing the block ranges generated by #10284 (comment) into: splitter_temp_unicodedataplus = re.compile(r"([\U00001100-\U000011FF\U00002E80-\U00002FDF\U00002FF0-\U00009FFF\U0000A960-\U0000A97F\U0000AC00-\U0000D7FF\U0000F900-\U0000FAFF\U0000FE30-\U0000FE4F\U0000FF00-\U0000FFEF\U0001AFF0-\U0001B16F\U0001F200-\U0001F2FF\U00020000-\U0003FFFF]+)") runs like:
I believe the equivalent result can be calculated somehow automatically. |
I have figured out couple of things:
Current best: test_split.pyimport re
import tinyunicodeblock
import pyperf
en = """Cortana was demonstrated for the first time at the Microsoft Build developer conference in San Francisco in April 2014. It was launched as a key ingredient of Microsoft's planned "makeover" of future operating systems for Windows Phone and Windows."""
zh = """小娜在2014年4月2日举行的微软Build开发者大会上正式展示并发布。2014年中旬,微软发布了“小娜”这一名字,作为Cortana在中国大陆使用的中文名。与这一中文名一起发布的是小娜在中国大陆的另一个形象。“小娜”一名源自微软旗下知名FPS游戏《光环》中的同名女角色。"""
cjkset = set([
'CJK Unified Ideographs',
'CJK Unified Ideographs Extension A',
# 'CJK Unified Ideographs Extension B', # assumes entire Plane 2-3 would be CJK
# 'CJK Unified Ideographs Extension C',
# 'CJK Unified Ideographs Extension D',
# 'CJK Unified Ideographs Extension E',
# 'CJK Unified Ideographs Extension F',
# 'CJK Unified Ideographs Extension G',
# 'CJK Unified Ideographs Extension H',
# 'CJK Unified Ideographs Extension I',
'CJK Compatibility',
'CJK Compatibility Forms',
'CJK Compatibility Ideographs',
# 'CJK Compatibility Ideographs Supplement',
'CJK Radicals Supplement',
'CJK Strokes',
'CJK Symbols and Punctuation',
'Hiragana',
'Katakana',
'Katakana Phonetic Extensions',
'Kana Extended-A',
'Kana Extended-B',
'Kana Supplement',
'Small Kana Extension',
'Hangul Jamo',
'Hangul Compatibility Jamo',
'Hangul Jamo Extended-A',
'Hangul Jamo Extended-B',
'Hangul Syllables',
'Halfwidth and Fullwidth Forms',
'Enclosed CJK Letters and Months',
'Enclosed Ideographic Supplement',
'Kangxi Radicals',
'Ideographic Description Characters',
'Kanbun',
'Yijing Hexagram Symbols', # not strictly necessary but for the sake of range continuity
'Bopomofo',
'Bopomofo Extended',
])
cjkranges = [(b[0], b[1]) for b in tinyunicodeblock.BLOCKS if b[2] in cjkset]
cjkranges.sort(key=lambda r: ord(r[0]))
cjkmerged = []
prev = None
for r in cjkranges:
if prev is None:
prev = r
elif ord(prev[1]) == ord(r[0]) - 1:
prev = (prev[0], r[1])
else:
cjkmerged.append(prev)
prev = r
cjkmerged.append(prev)
splitter_nonscx = re.compile(rf"([{''.join([f'{r[0]}-{r[1]}' for r in cjkmerged])}\U00020000-\U0003FFFF]+)")
def simple_split(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
len(txt.split())
return pyperf.perf_counter() - t0
def loop_split_both_nonscx(loops, txt):
loops = range(loops)
t0 = pyperf.perf_counter()
for _ in loops:
# logic here
count = 0
even = True
for sec in splitter_nonscx.split(txt):
if even:
count += len(sec.split())
else:
count += len(sec)
even = not even
# return count
return pyperf.perf_counter() - t0
if __name__ == "__main__":
runner = pyperf.Runner()
runner.bench_time_func('simple split en', simple_split, en)
runner.bench_time_func('simple split zh (wrong)', simple_split, zh)
runner.bench_time_func('re tinyunicodeblock en', loop_split_both_nonscx, en)
runner.bench_time_func('re tinyunicodeblock zh', loop_split_both_nonscx, zh)
|
Yes, I tried to collapse ranges, but I really didn't attempt to make it complete, so some ranges were most likely missing. On the other side I intentionally included some reserved blocks because it really doesn't matter in this case (the behavior of reserved blocks is not defined, so let's choose what performs better). |
My last attempt automatically consolidates described blocks into consecutive ranges as much as possible, with some educated heuristics, so what you see is the performance of a feature complete version (although generation logic should be written cleaner). Do you find some more room of optimization? |
I think this is fine performance wise. I'd like to avoid |
Proposed changes
See #10278.
Any suggestions are welcome. Especially whether:
Checklist
Other information