Tabs (in input) and multibytes characters (Origin: bugzilla #564462) #3198

doxygen · 2018-07-02T01:37:03Z

status RESOLVED severity minor in component general for ---
Reported in version 1.5.7.1 on platform Other
Assigned to: Dimitri van Heesch

Original attachment names and IDs:

Doxygen_bug.zip (ID 124820)
patch_564462.res (ID 239073)
bug_564462_2.res (ID 239184)
bug_564462_3.res (ID 239194)

On 2008-12-14 10:40:22 +0000, Gingko wrote:

Hello,

I want to report a bug about the Doxygen feature that makes Doxygen replacing tabs in input source code by a computed number of spaces, according to the TAB_SIZE configuration option.

The bug appears if, in your input source code, you have multibytes characters (for example French letters with accents inside C/C++ character strings) followed by tabs on the same line.

Because of this bug, the number of spaces inserted for replacing these tabs is not computed correctly, resulting in misaligned code in the generated output.

This lets me thinking that columns positions are probably computed by counting bytes rather than counting characters when processing source code lines, which is not appropriated if these lines include multibytes characters like all characters outside the 0x20 - 0x7f ASCII code range inside a UTF-8 characters string.

This is not a very important bug, but it is anyway a little irritating, and I think it should be quite easy to fix.

Gingko

On 2008-12-16 18:59:01 +0000, Dimitri van Heesch wrote:

You are correct about the byte counting. For my convenience: can you attach a self contained example (source + config file in a zip) which allows me to reproduce the problem to this bug report?

On 2008-12-16 19:40:43 +0000, Gingko wrote:

Created attachment 124820
Sample file (zipped) for bug # 564462

Ok. This is the sample that you asked for.

Content :
sample.cpp
Doxyfile

Best regards,

Gingko

On 2009-05-30 15:04:27 +0000, Tobias Mueller wrote:

Reopening as the requested information has been provided.

On 2013-03-17 20:15:36 +0000, albert wrote:

Created attachment 239073
PATCH: count multi-byte characters in source code output correctly

Problem was indeed the byte counting as the special characters are converted in util.cpp to UTF8 characters. These characters are printed correctly but as each character had multiple bytes these bytes were counted separately. Corrected output for HTML, man, rtf and xml. Output for latex / PDF looks already correct.

On 2013-03-18 19:21:25 +0000, albert wrote:

Created attachment 239184
PATCH: extend to all defined UTF-8 characters

This patch extends the previous patch. In the previous patch only the UTF-8 "characters" starting with a byte as set by Doxygen were supported. With this patch all currently valid UTF-8 characters are supported (valid UTF-8 "characters" taken from http://en.wikipedia.org/wiki/UTF-8).

On 2013-03-18 19:31:05 +0000, Dimitri van Heesch wrote:

Hi Albert,

Can you make a patch with only the last set of changes?

Note that the function nextUtf8CharPosition() in util.cpp contains a somewhat compacter way to find the next character in a UTF-8 byte stream.

On 2013-03-18 20:39:57 +0000, albert wrote:

Created attachment 239194
PATCH: count multi-byte characters in source code output correctly, based on comment 6 from Dimitri

Making one patch of both changes (making both obsolete). Also incorporated remark regarding nextUtf8CharPosition, created analogous function in util.cpp for this.

On 2013-03-20 19:09:01 +0000, Dimitri van Heesch wrote:

Thanks, I'll include the patch in the next subversion update.

On 2013-05-19 12:36:12 +0000, Dimitri van Heesch wrote:

This bug was previously marked ASSIGNED, which means it should be fixed in
doxygen version 1.8.4. Please verify if this is indeed the case. Reopen the
bug if you think it is not fixed and please include any additional information
that you think can be relevant.

doxygen closed this as completed Jul 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tabs (in input) and multibytes characters (Origin: bugzilla #564462) #3198

Tabs (in input) and multibytes characters (Origin: bugzilla #564462) #3198

doxygen commented Jul 2, 2018

Tabs (in input) and multibytes characters (Origin: bugzilla #564462) #3198

Tabs (in input) and multibytes characters (Origin: bugzilla #564462) #3198

Comments

doxygen commented Jul 2, 2018