Limit index fields in hhk file of chm file. #9919

albert-github · 2023-03-15T14:34:12Z

Based on some comments in #9894 where a warning was shown like

Warning: Keyword string:
...
is too long.  The maximum size is 488 characters.

it was found that this was due to an extremely long value field in the index.hhk (index file), limiting this field and placing an ellipse when necessary

Based on some comments in doxygen#9894 where a warning was shown like ``` Warning: Keyword string: ... is too long. The maximum size is 488 characters. ``` tit was found that this was due to an extremely long `value` field in the `index.hhk` (index file), limiting this field and placing an ellipse when necessary

doxygen · 2023-03-18T11:57:41Z

@albert-github Kept the changes a bit more local, see b8db19d

albert-github · 2023-03-18T12:10:28Z

@doxygen,

I've been thinking about this as well but it might lead to problems in case s.left(400) contains characters in the form that run into the code that transform it into <, &gt, & etc, so that the result might still overflow.

albert-github · 2023-03-18T12:28:36Z

Previous comment has been reformulated.

doxygen · 2023-03-18T12:53:39Z

@albert-github Good point. I made this fix 0b5c6f6, which assumes this is a very rare case so it does not have to be very efficient.

albert-github · 2023-03-18T14:56:12Z

@doxygen I think this is better not it is not good.
Say that the length of s is 400 or less then we will directly run return convertToHtml(s,true);, but the convertToHtml can result in a string that is larger than 488 characters, this will be a rare case but still it can happen.

doxygen · 2023-03-18T15:02:12Z

@albert-github correct, so when it now happens, the do..while() loop will start reducing the input size with 10 characters at a time, until the result after conversion is less than 450 characters.

albert-github · 2023-03-18T15:09:28Z

@doxygen, I don't think your remark "correct" is correct as we will never get into the do ... while() loop but directly jump to the else part.

doxygen · 2023-03-18T15:43:47Z

@albert-github Ah now I see what you mean. Indeed, that's not ok. We should also use the length after conversion. Second try f482317.

albert-github · 2023-03-18T16:24:24Z

@doxygen I think this will work. For a small efficiency improvement I think it would even be better to change the lines:

      result = convertToHtml(s.left(maxLen));
      maxLen-=20;

into

      maxLen-=20;
      result = convertToHtml(s.left(maxLen));

as otherwise the initial conversation will be done twice in case of an initial conversation that is to long or the while loop should have the condition at the end (so a do ... while() loop).

doxygen · 2023-03-19T11:08:14Z

@albert-github Doing the maxLen-=20 earlier would be the same as starting with maxLen=380. The idea is that by truncating with maxLen and comparing the resulting length after conversion against maxExpandedLen that after the first iteration of the while loop the result is very likely already below maxExandedLen (so the expansion adds less than 50 extra characters). Only if this is not the case, the gap between maxLen and maxExpandedLen is increased in steps of 20 bytes until the expansion no longer exceeds the gap. The initial gap should be such that it is likely already enough the first time. What the right value is to cover say >95% of the cases, I don't know, so I assumed 50.

albert-github · 2023-03-19T11:25:42Z

I see indeed the first conversion takes the full string and on failure the first 400 bytes are taken (back in my head was, incorrectly, that the first conversion was already with the first 400 bytes).

The limit of 400 characters was an arbitrary number (its should be a number less than 488), so even the 450 could be expanded to a larger number and the 400 could also at a higher number but I think this is not worth the effort as

the estimate of 95% is probably even an underestimation I think that it is more like 99%
when having such a long string probably it is a generated string and nobody will read / study it

so the current choices are OK for me.

albert-github added bug HTML HTML / XHTML output labels Mar 15, 2023

albert-github mentioned this pull request Mar 15, 2023

config.xml to Chinese #9894

Closed

doxygen merged commit 1b7ea9b into doxygen:master Mar 18, 2023

albert-github deleted the feature/bug_chk_hhk_length branch March 18, 2023 12:12

albert-github added the fixed but not released Bug is fixed in github, but still needs to make its way to an official release label Mar 18, 2023

doxygen removed the fixed but not released Bug is fixed in github, but still needs to make its way to an official release label May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit index fields in hhk file of chm file. #9919

Limit index fields in hhk file of chm file. #9919

albert-github commented Mar 15, 2023 •

edited

doxygen commented Mar 18, 2023

albert-github commented Mar 18, 2023 •

edited

albert-github commented Mar 18, 2023

doxygen commented Mar 18, 2023

albert-github commented Mar 18, 2023

doxygen commented Mar 18, 2023

albert-github commented Mar 18, 2023

doxygen commented Mar 18, 2023

albert-github commented Mar 18, 2023 •

edited

doxygen commented Mar 19, 2023

albert-github commented Mar 19, 2023

Limit index fields in hhk file of chm file. #9919

Limit index fields in hhk file of chm file. #9919

Conversation

albert-github commented Mar 15, 2023 • edited

doxygen commented Mar 18, 2023

albert-github commented Mar 18, 2023 • edited

albert-github commented Mar 18, 2023

doxygen commented Mar 18, 2023

albert-github commented Mar 18, 2023

doxygen commented Mar 18, 2023

albert-github commented Mar 18, 2023

doxygen commented Mar 18, 2023

albert-github commented Mar 18, 2023 • edited

doxygen commented Mar 19, 2023

albert-github commented Mar 19, 2023

albert-github commented Mar 15, 2023 •

edited

albert-github commented Mar 18, 2023 •

edited

albert-github commented Mar 18, 2023 •

edited