Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enables using unicode (non ASCII) to name output files (Origin: bugzilla #705220) #5254

Closed
doxygen opened this Issue Jul 2, 2018 · 0 comments

Comments

Projects
None yet
1 participant
@doxygen
Copy link
Owner

doxygen commented Jul 2, 2018

status RESOLVED severity enhancement in component general for ---
Reported in version 1.8.6-GIT on platform Other
Assigned to: Dimitri van Heesch

Original attachment names and IDs:

On 2013-08-01 00:49:15 +0000, Suzumizaki-Kimitaka wrote:

Created attachment 250573
fix to enable to make the file named with non ASCII(Unicode) characters

As you know, we can use unicode for filenames
even the files over the internet. (For example, non English versions of Wikipedia)

With the patch here, the doxygen is enabled to use unicode (not only ASCII) characters directly for filenames, instead of using _xHH_xHH_xHH format.

I added LIMIT_FNAME_WITH_ASCII option and we can set it NO(0) to enable this feature (Ofcourse default is YES due to compatibility).

Note: For Windows, Another fix is also required. See Bug 705217.

Regards,
Suzumizaki-Kimitaka

On 2013-12-27 06:00:33 +0000, Suzumizaki-Kimitaka wrote:

Created attachment 264920
Updated patch for 1.8.6 release

Here is updated patch to catch up with 1.8.6 release.

Regards,
Suzumizaki-Kimitaka

On 2013-12-30 20:27:37 +0000, Dimitri van Heesch wrote:

Hi Suzumizaki,

Thanks for the update.

I've reworked your patch a bit (result should be the same).
I've named the option 'ALLOW_UNICODE_NAMES' and used the code below.

With no 'continue' and 'break' statements and a sightly different way to count the bytes (based on http://www.opensource.apple.com/source/tidy/tidy-2.2/tidy/src/utf8.c):

char ids[5];
const unsigned char uc = (unsigned char)c;
bool doEscape = TRUE;
if (allowUnicodeNames && uc <= 0xf7)
{
const char* pt = p;
ids[ 0 ] = c;
int l = 0;
if ((uc&0xE0)==0xC0)
{
l=2; // 11xx.xxxx: >=2 byte character
}
if ((uc&0xF0)==0xE0)
{
l=3; // 111x.xxxx: >=3 byte character
}
if ((uc&0xF8)==0xF0)
{
l=4; // 1111.xxxx: >=4 byte character
}
doEscape = l==0;
for (int m=1; m<l && !doEscape; ++m)
{
unsigned char ct = (unsigned char)*pt;
if (ct==0 || (ct&0xC0)!=0x80) // invalid unicode character
{
doEscape=TRUE;
}
else
{
ids[ m ] = *pt++;
}
}
if ( !doEscape ) // got a valid unicode character
{
ids[ l ] = 0;
growBuf.addStr( ids );
p += l - 1;
}
}
if (doEscape) // not a valid unicode char or escaping needed
{
static char map[] = "0123456789ABCDEF";
unsigned char id = (unsigned char)c;
ids[0]='_';
ids[1]='x';
ids[2]=map[id>>4];
ids[3]=map[id&0xF];
ids[4]=0;
growBuf.addStr(ids);
}

Let me know if this work for you too.

On 2014-01-03 04:18:33 +0000, Suzumizaki-Kimitaka wrote:

Hi Dimitri, happy new year!

The code on the git seems to work correctly. The difference is how working against invalid utf-8, especially the C0 or C1 header byte. But that's no problem for me.

Thank you!
Suzumizaki-Kimitaka

On 2014-04-21 10:09:36 +0000, Dimitri van Heesch wrote:

This bug was previously marked ASSIGNED, which means it should be fixed in
doxygen version 1.8.7. Please verify if this is indeed the case. Reopen the
bug if you think it is not fixed and please include any additional information
that you think can be relevant (preferrably in the form of a self-contained example).

@doxygen doxygen closed this Jul 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.