Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date::Language::* encodings need to be standardized [rt.cpan.org #113419] #19

Open
atoomic opened this issue Jan 15, 2020 · 0 comments
Open

Comments

@atoomic
Copy link
Owner

atoomic commented Jan 15, 2020

Migrated from rt.cpan.org#113419 (status was 'new')

Requestors:

From mbethke@cpan.org on 2016-03-29 03:59:30
:

While it's a neat thing to be able to use localized period names transparently by just plugging a new module into Date::Language, the approach fails completely if plugins don't agree on character encoding:

$ perl -CO -MDate::Language -e'for(qw/ German Greek Chinese /){$t=Date::Language->new($_); print $t->time2str("%B ", 28*86400*$_) for 1..12; print "\n"}'
Januar Februar März April Mai Juni Juli August September Oktober November Dezember 
�ανο�α�ίο� Φεβ�ο�α�ίο� �α��ίο� ���ιλί� �α�ο� �ο�νίο� �ο�λίο� ��γο���ο� Σε��εμ�ο� �κ��β�ίο� �οεμβ�ίο� �εκεμβ�ο� 
�� �� �� �� �� �� �� �� �� �� ��� ��� 

German and Greek work fine; German uses uses Latin-1 strings that upgrade transparently while Greek has UTF-8 encoded as "\x{03..}" escapes. Chinese is in UTF-8 directly but without the "use utf8" so it returns UTF-8 as a byte string.

$ perl -CO -MDevel::Peek -MDate::Language -e'for(qw/ German Greek Chinese /){print STDERR "$_\n";Dump($t=Date::Language->new($_)->time2str("%B", 0))}'
German
SV = PVMG(0x85a470) at 0x826058
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x8614a0 "Januar"\0
Greek
SV = PVMG(0x85a470) at 0x826058
  FLAGS = (POK,IsCOW,pPOK,UTF8)
  PV = 0x82b4b0 "\316\231\316\261\316\275\316\277\317\205\316\261\317\201\316\257\316\277\317\205"\0 [UTF8 "\x{399}\x{3b1}\x{3bd}\x{3bf}\x{3c5}\x{3b1}\x{3c1}\x{3af}\x{3bf}\x{3c5}"]
Chinese
SV = PVMG(0x85a470) at 0x826058
  FLAGS = (POK,pPOK)
  PV = 0x82b4b0 "\344\270\200\346\234\210"\0
[boring lines deleted]

As I see it, that's a pretty hard one to fix without causing incompatibilities unless you want to do it the PHP way and add *_utf8 versions of everything (Ick!)
Perhaps a new constructor option would do so you could say
   Date::Language->new('Chinese', encoding => 'utf8');
Although from the way the constructor works this doesn't seem straightforward either.
In any case, language plugins should not return anything but UTF-8 text in 2016 and probably all use the utf8 pragma explicitly so text is readable in a regular editor unlike D::L::Greek.
I might contribute a Lao and possibly Thai module if I don't have to hack my own decoding logic :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant