Table width and number of characters #18

huashan · 2013-03-06T12:42:47Z

I see in helper.R that Pander uses nchar() to determine column width by number of characters in the string. That is not suitable for cjk characters, I'd suggest using nchar(x, type='type') to handle CJK characters.

The text was updated successfully, but these errors were encountered:

daroczig · 2013-03-06T12:58:30Z

Thank you @huashan for reporting this issue. But please give me a hand with the solution as I have no experience with CJK characters.

The type argument of nchar can be only something from c("bytes", "chars", "width"). Did you mean chars there instead of type? But AFAIK that's the default.

Could you please also write here an example for the case so that I could test?

huashan · 2013-03-06T14:55:36Z

The default for type is chars,
I simply replaced all the nchar() calls with char(x, type='bytes') in helper.R and the results are what I expected.

daroczig · 2013-03-06T14:58:56Z

Right, so there is no sense in replacing nchar(x) to nchar(x, type='char') this way.
But for CJK, would you need bytes instead? Or I miss the point.

huashan · 2013-03-06T16:06:15Z

you need to use bytes or width to deal with CJK characters.

daroczig · 2013-03-06T16:40:53Z

Cool, thanks a lot for making this clear to me.
I will dig into this deeper in no later then a few days and will definitely update the package. Hopefully today, but we will see.

daroczig · 2013-03-06T20:06:48Z

Just did some testing (sorry, I have no idea what is that character is below, but looks cool):

> nchar('乂')
[1] 1
> nchar('乂', 'byte')
[1] 3
> nchar('乂', 'width')
[1] 2

So I decided to choose width which would fall back to char if needed. Anyway, please see below the before-after test and verify:

Before:

> pander(data.frame(x=1:3, y=c('xxx','乂乂乂', 'yyy')))

-------
 x   y 
--- ---
 1  xxx

 2  乂乂乂

 3  yyy
-------

After:

> pander(data.frame(x=1:3, y=c('xxx','乂乂乂', 'yyy')))

----------
 x    y   
--- ------
 1   xxx  

 2  乂乂乂

 3   yyy  
----------

Thanks again for reporting this issue and I would love to hear some feedback if this works or if there would be any need for more tweaks.

huashan · 2013-03-07T06:23:11Z

Thanks Daroczig!

Another related issue is strwrap() and CJK characters. Strings are wrapped by the ocurrence of whitespaces in strwrap(), however, with CJK characters, there are no whitespaces between each word. My solution is to treat each CJK character as one whitespace and then split the string at the specified width parameter. In this case, the width parameter for this hacked strwrap() should be set to the half of the expected width when dealing with CJK characters and that would be a little bit cumbersome. Or we have to treat each CJK character as two and then split at the first whitespace.

… whitespace - fix #18

daroczig · 2013-03-08T00:04:21Z

Thanks a lot!

First I tried to fix this issue with a way you described as "a little bit cumbersome" - but was easier to implement. After all I was not pleased with this method, as e.g. you might have a mixed cell with both e.g. latin and CJK chars, so that it might split a word with even latin chars. Just imagine: you might have a cell with a latin text and some Unicode chars, this could be split by any char, not just at white spaces - which is not good.

So I tried to work on the second option too: currently the script would check the real width of each word and split text on white space based on nchar(..., type = 'width'). Please verify if it would work.

Demo:

> library(pander)
> x <- data.frame(x='1乂2乂 12345678 1234567 1234 12 1 1 1 1 123112 3乂4乂5乂6')
> pander(x)
----------------------------
             x              
----------------------------
1乂2乂 12345678 1234567 1234
12 1 1 1 1 123112 3乂4乂5乂6
----------------------------
> panderOptions('table.split.cells',10)
> pander(x)

----------
    x     
----------
  1乂2乂  
 12345678 
 1234567  
1234 12 1 
  1 1 1   
  123112  
3乂4乂5乂6
----------

huashan · 2013-03-27T09:54:04Z

panderOptions('table.split.cells', 4)
x <- data.frame(x='1乂2乂 1234 1234 1234 12 1 1 1 1 123 3乂4乂5乂6')
pander(x)
+------------+
| x |
+============+
| 1乂2乂 |
| 1234 |
| 1234 |
| 1234 |
| 12 1 |
| 1 1 |
| 1 |
| 123 |
| 3乂4乂5乂6 |
+------------+

The CJK characters are completely splitable, so the last few lines should be expected to be:
| 3乂4乂|
|5乂6 |
+------------+

or:
| 3乂4|
|乂5乂|
| 6 |
+------------+

daroczig · 2013-03-27T11:14:46Z

Hm, that's a feature not a bug based on the last commit :)

But joking apart, in your last comment you wrote that "Or we have to treat each CJK character as two and then split at the first whitespace." So I implemented that as it seems pretty hard to check if CJK or any other Unicode character is present, and other chars would probably not allow break(s) between them.

So @huashan please verify if handling CJK chars as double and breaking those only on white-space would work, or we need some more magic here.

daroczig closed this as completed in 681315c Mar 6, 2013

daroczig added a commit that referenced this issue Mar 8, 2013

fix strwrap for CJK chars - further fix #18

37f9b33

daroczig added a commit that referenced this issue Mar 8, 2013

dealing with CJK chars on a character not cell basis only breaking on…

adc8f27

… whitespace - fix #18

daroczig mentioned this issue Dec 22, 2014

incorrect column width with Chinese characters #144

Closed

billdenney mentioned this issue Dec 9, 2023

Incorrect table cell width with Unicode soft-hyphen character #368

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table width and number of characters #18

Table width and number of characters #18

huashan commented Mar 6, 2013

daroczig commented Mar 6, 2013

huashan commented Mar 6, 2013

daroczig commented Mar 6, 2013

huashan commented Mar 6, 2013

daroczig commented Mar 6, 2013

daroczig commented Mar 6, 2013

huashan commented Mar 7, 2013

daroczig commented Mar 8, 2013

huashan commented Mar 27, 2013

daroczig commented Mar 27, 2013

Table width and number of characters #18

Table width and number of characters #18

Comments

huashan commented Mar 6, 2013

daroczig commented Mar 6, 2013

huashan commented Mar 6, 2013

daroczig commented Mar 6, 2013

huashan commented Mar 6, 2013

daroczig commented Mar 6, 2013

daroczig commented Mar 6, 2013

huashan commented Mar 7, 2013

daroczig commented Mar 8, 2013

huashan commented Mar 27, 2013

daroczig commented Mar 27, 2013