New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tabs (in input) and multibytes characters (Origin: bugzilla #564462) #3198

Closed
doxygen opened this Issue Jul 2, 2018 · 0 comments

Comments

Projects
None yet
1 participant
@doxygen
Owner

doxygen commented Jul 2, 2018

status RESOLVED severity minor in component general for ---
Reported in version 1.5.7.1 on platform Other
Assigned to: Dimitri van Heesch

Original attachment names and IDs:

On 2008-12-14 10:40:22 +0000, Gingko wrote:

Hello,

I want to report a bug about the Doxygen feature that makes Doxygen replacing tabs in input source code by a computed number of spaces, according to the TAB_SIZE configuration option.

The bug appears if, in your input source code, you have multibytes characters (for example French letters with accents inside C/C++ character strings) followed by tabs on the same line.

Because of this bug, the number of spaces inserted for replacing these tabs is not computed correctly, resulting in misaligned code in the generated output.

This lets me thinking that columns positions are probably computed by counting bytes rather than counting characters when processing source code lines, which is not appropriated if these lines include multibytes characters like all characters outside the 0x20 - 0x7f ASCII code range inside a UTF-8 characters string.

This is not a very important bug, but it is anyway a little irritating, and I think it should be quite easy to fix.

Gingko

On 2008-12-16 18:59:01 +0000, Dimitri van Heesch wrote:

You are correct about the byte counting. For my convenience: can you attach a self contained example (source + config file in a zip) which allows me to reproduce the problem to this bug report?

On 2008-12-16 19:40:43 +0000, Gingko wrote:

Created attachment 124820
Sample file (zipped) for bug # 564462

Ok. This is the sample that you asked for.

Content :
sample.cpp
Doxyfile

Best regards,

Gingko

On 2009-05-30 15:04:27 +0000, Tobias Mueller wrote:

Reopening as the requested information has been provided.

On 2013-03-17 20:15:36 +0000, albert wrote:

Created attachment 239073
PATCH: count multi-byte characters in source code output correctly

Problem was indeed the byte counting as the special characters are converted in util.cpp to UTF8 characters. These characters are printed correctly but as each character had multiple bytes these bytes were counted separately. Corrected output for HTML, man, rtf and xml. Output for latex / PDF looks already correct.

On 2013-03-18 19:21:25 +0000, albert wrote:

Created attachment 239184
PATCH: extend to all defined UTF-8 characters

This patch extends the previous patch. In the previous patch only the UTF-8 "characters" starting with a byte as set by Doxygen were supported. With this patch all currently valid UTF-8 characters are supported (valid UTF-8 "characters" taken from http://en.wikipedia.org/wiki/UTF-8).

On 2013-03-18 19:31:05 +0000, Dimitri van Heesch wrote:

Hi Albert,

Can you make a patch with only the last set of changes?

Note that the function nextUtf8CharPosition() in util.cpp contains a somewhat compacter way to find the next character in a UTF-8 byte stream.

On 2013-03-18 20:39:57 +0000, albert wrote:

Created attachment 239194
PATCH: count multi-byte characters in source code output correctly, based on comment 6 from Dimitri

Making one patch of both changes (making both obsolete). Also incorporated remark regarding nextUtf8CharPosition, created analogous function in util.cpp for this.

On 2013-03-20 19:09:01 +0000, Dimitri van Heesch wrote:

Thanks, I'll include the patch in the next subversion update.

On 2013-05-19 12:36:12 +0000, Dimitri van Heesch wrote:

This bug was previously marked ASSIGNED, which means it should be fixed in
doxygen version 1.8.4. Please verify if this is indeed the case. Reopen the
bug if you think it is not fixed and please include any additional information
that you think can be relevant.

@doxygen doxygen closed this Jul 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment