Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit index fields in hhk file of chm file. #9919

Merged
merged 1 commit into from
Mar 18, 2023

Conversation

albert-github
Copy link
Collaborator

@albert-github albert-github commented Mar 15, 2023

Based on some comments in #9894 where a warning was shown like

Warning: Keyword string:
...
is too long.  The maximum size is 488 characters.

it was found that this was due to an extremely long value field in the index.hhk (index file), limiting this field and placing an ellipse when necessary

Based on some comments in doxygen#9894 where a warning was shown like
```
Warning: Keyword string:
...
is too long.  The maximum size is 488 characters.
```
tit was found that this was due to an extremely long `value` field in the `index.hhk` (index file), limiting this field and placing an ellipse when necessary
@albert-github albert-github added bug HTML HTML / XHTML output labels Mar 15, 2023
@doxygen doxygen merged commit 1b7ea9b into doxygen:master Mar 18, 2023
@doxygen
Copy link
Owner

doxygen commented Mar 18, 2023

@albert-github Kept the changes a bit more local, see b8db19d

@albert-github
Copy link
Collaborator Author

albert-github commented Mar 18, 2023

@doxygen,

I've been thinking about this as well but it might lead to problems in case s.left(400) contains characters in the form that run into the code that transform it into <, &gt, & etc, so that the result might still overflow.

@albert-github albert-github deleted the feature/bug_chk_hhk_length branch March 18, 2023 12:12
@albert-github albert-github added the fixed but not released Bug is fixed in github, but still needs to make its way to an official release label Mar 18, 2023
@albert-github
Copy link
Collaborator Author

Previous comment has been reformulated.

@doxygen
Copy link
Owner

doxygen commented Mar 18, 2023

@albert-github Good point. I made this fix 0b5c6f6, which assumes this is a very rare case so it does not have to be very efficient.

@albert-github
Copy link
Collaborator Author

@doxygen I think this is better not it is not good.
Say that the length of s is 400 or less then we will directly run return convertToHtml(s,true);, but the convertToHtml can result in a string that is larger than 488 characters, this will be a rare case but still it can happen.

@doxygen
Copy link
Owner

doxygen commented Mar 18, 2023

@albert-github correct, so when it now happens, the do..while() loop will start reducing the input size with 10 characters at a time, until the result after conversion is less than 450 characters.

@albert-github
Copy link
Collaborator Author

@doxygen, I don't think your remark "correct" is correct as we will never get into the do ... while() loop but directly jump to the else part.

@doxygen
Copy link
Owner

doxygen commented Mar 18, 2023

@albert-github Ah now I see what you mean. Indeed, that's not ok. We should also use the length after conversion. Second try f482317.

@albert-github
Copy link
Collaborator Author

albert-github commented Mar 18, 2023

@doxygen I think this will work. For a small efficiency improvement I think it would even be better to change the lines:

      result = convertToHtml(s.left(maxLen));
      maxLen-=20;

into

      maxLen-=20;
      result = convertToHtml(s.left(maxLen));

as otherwise the initial conversation will be done twice in case of an initial conversation that is to long or the while loop should have the condition at the end (so a do ... while() loop).

@doxygen
Copy link
Owner

doxygen commented Mar 19, 2023

@albert-github Doing the maxLen-=20 earlier would be the same as starting with maxLen=380. The idea is that by truncating with maxLen and comparing the resulting length after conversion against maxExpandedLen that after the first iteration of the while loop the result is very likely already below maxExandedLen (so the expansion adds less than 50 extra characters). Only if this is not the case, the gap between maxLen and maxExpandedLen is increased in steps of 20 bytes until the expansion no longer exceeds the gap. The initial gap should be such that it is likely already enough the first time. What the right value is to cover say >95% of the cases, I don't know, so I assumed 50.

@albert-github
Copy link
Collaborator Author

I see indeed the first conversion takes the full string and on failure the first 400 bytes are taken (back in my head was, incorrectly, that the first conversion was already with the first 400 bytes).

The limit of 400 characters was an arbitrary number (its should be a number less than 488), so even the 450 could be expanded to a larger number and the 400 could also at a higher number but I think this is not worth the effort as

  • the estimate of 95% is probably even an underestimation I think that it is more like 99%
  • when having such a long string probably it is a generated string and nobody will read / study it

so the current choices are OK for me.

@doxygen doxygen removed the fixed but not released Bug is fixed in github, but still needs to make its way to an official release label May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug HTML HTML / XHTML output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants